About KDE laboratory

Map of Japan featuring Tsukuba city
Location of Tsukuba city on main Japanese island

KDE laboratory belongs to the University of Tsukuba. Tsukuba city is located at the foot of mount Tsukuba in the Ibaraki prefecture. This prefecture is in the Kantō plain on the main Japanese island, Honshū.
The city is about 50 km northeast of Tōkyō and approximately 45 min by its dedicated TsukubaExpress train line. Tsukuba is a planned science city built in 1962 to relieve Tōkyō's overpopulation problem and to create the largest Japanese research center. The city hosts 60 national research institutes and more than 240 private research facilities. Some major research institutes such as the Japan Aerospace Exploration Agency (JAXA) and the High Energy Accelerator Research Organization (KEK) are in the city. According to [T-INFO], 19,000 researchers (40% of Japanese researchers) are working there. In order to create this science city, Japanese government spent close to 50% of the public research budget for several decades according to [W-TSUKUBA].

University of Tsukuba

Tsukuba University logo
Tsukuba Univ.

Thanks to its implementation in Tsukuba Science city, the University of Tsukuba, established in 1973, took advantage of its advanced research environment and became one of the major universities in Japan. In 2010, the University of Tsukuba was ranked 11th among universities in Japan, 20th among approximately 200 universities in Asia and 174th among more than 600 universities in the world (found in [QSTOP] chart).

In May 2010, 16,828 students were registered at Tsukuba University (source [T-UNIV]). This number includes 1,697 international students. Most of them are Chinese citizen (763 students). Since Tsukuba city was designed as an international science city, its university hosts many foreign students. Global30 is a government project which aims at "establishing core universities for internationalization". Only 13 Japanese universities (including Tsukuba) participate in this project.
With 2.58 Km2, the campus of Tsukuba University is the largest single campus in Japan. Despite its location in city center, the campus features many green areas, sports fields, lakes and little forests. Approximately 4000 dormitory rooms are provided by the university. The institution also includes a university hospital with 800 beds.
According to [T-CCS1] facilities description, University of Tsukuba has central position in the Tsukuba research computer network (20 Gbits/s) and in Japanese universities network (10 Gbits/s).

The university research system is constituted of 26 research institutes. The department of computer science belongs the Graduate School of Systems and Information Engineering (SIE) and is composed of 35 laboratories divided into five research groups. The department of Computer Science of the University of Tsukuba has the largest number of professor in Japan (the course has 61 faculties).

KDE laboratory

KDE laboratory logo
KDE Lab.

KDE acronym stands for Kitagawa Data Engineering. This research laboratory in computer science belongs to the "Software system & computer architecture" research group and focus on management issues of massive data (e.g. very large databases). It exists since 1993 (deduced from publications list found in [T-KDE].
The staff of KDE laboratory is composed of 44 people (as of September 2010). This number includes 3 professors, 2 post-graduate students in Doctoral's program, 28 graduate students in Master's program, 5 undergraduate students in Bachelor's program and 4 other students (for example: short-term internship).
Researches are carried out in three main domains related to data engineering. The laboratory is also involved in the creation of meteorological database for the Global Environment and Biological Sciences division. In the following subsections, each field comes with a relevant and recent research paper example published by the KDE laboratory.

Infrastructure for Information Integration

Laboratory investigates infrastructures, systems and applications to integrate heterogeneous databases and data sources. The research especially focuses on stream data integration (like GPS position and video stream from street cameras) into conventional relational databases.

StreamSpinner (see [STRSPIN]) is a project created at KDE lab. It's a data stream management system which employs an architecture combining a stream processing engine and DBMS. The system is able to process both continuous queries and traditional one-shot queries. The system is based on an extensible framework and can cope with streaming video or audio as well. An example of extension is the analysis of video frames acquired through several surveillance camera.
Researches are also done about distributed stream processing since some processes can require heavy computational analysis (as, for example, streaming video frames). Distributed stream processing engines (DSPE) are built on the cooperation of several stream processing engines (SPE), thus a node failure can trigger a failure of whole system. The paper below suggests an adaptive strategy to overcome those unpredictable events.

A-SAS: An Adaptive High-Availability Scheme for Distributed Stream Processing Systems
Hiroaki Shiokawa, Hideyuki Kawashima and Hiroyuki Kitagawa
Proceedings of third International Workshop on Sensor Network Technologies for Information Explosion Era (SeNTIE 2010), May 2010

The laboratory also researches further in database infrastructure for time-series data obtained by sensors. The issue of real-world monitoring databases is the data insertion function since it has to be extremely fast. The DBMS also has to include data analysis functions and continual query support.
KRAFT is a sensing database infrastructure created at KDE lab. for that purpose.

Encrypted databases is another field studied at KDE lab. This research addresses especially web DBMSs of businesses, governments or even individuals because they have numerous entry points that can put database at risk. Privacy protection became a great challenge in our society of instant information communication. Encrypted database means that the data storage format is encrypted. It prevents data from being read even if someone gains access to the storage medium (using stolen hard drive or remote access to the server file system).
The paper below studies database security by cryptography techniques. It proposes a mixed cryptography database (MCDB). This framework aims to encrypt database over untrusted network while keeping querying efficiency.

MV-OPES: Multivalued-Order Preserving Encryption Scheme: A Novel Scheme for Encrypting Integer Value to Many Different Values
Hasan Kadhem, Toshiyuki Amagasa and Hiroyuki Kitagawa
IEICE Transactions on Information and Systems, September 2010

XML and Web Programming

XML is widely used language for machine-readable data representation. It is a recommendation of World Wide Web Consorsium (W3C) since 1998 for a better interoperability in network environments, therefore the amount of generated data using this language is huge and still increasing. Several issues of XML data management are studied at KDE lab.

XML Functional Dependency (XFD) is similar to functional dependency in relational database. It is a kind of constraint between two sets of attributes in a relation from a database. FD are important in normal forms definition for relational databases. A functional dependency between a set of attributes and another dependent attribute can denote redundancy in the DB content.
XFD enables the same, but for XML data: XML Normal Forms (XNF) but unlike relational databases, since XML is flexible and hierarchical, XFD definition is uneasy. The paper below discuss a scheme for efficient XFD detection based on OLAP-inspired algorithm.

Fast Detection of Functional Dependencies in XML Data
Hang Shi, Toshiyuki Amagasa and Hiroyuki Kitagawa
The 7th International XML Database Symposium (XSym2010), September 2010

XPath is a simple XML query language that is the base to more complex SQL-like query languages (like XQuery). Querying process on large XML data (of several gigabytes) can be a problem despite the creation of new query processing algorithms (such as TwigStack) optimized for XML and its hierarchical semi-structured data storage.
XML query algorithm optimization is expected to be done thanks to data parallel execution. Parallel XML query processing can take advantage of new multi-core architectures. The challenge of efficient partitioning has been studied at KDE lab. In the paper below, a partitioning model is presented so that several instances of the same query algorithm can be executed in parallel on different parts on the XML document.

Executing Parallel TwigStack Algorithm on a Multi-core System
Imam Machdi, Toshiyuki Amagasa and Hiroyuki Kitagawa
Proceedings 11th International Conference on Information Integration and Web-based Applications and Services (iiWAS2009), December 2009

Data Mining and Knowledge Discovery

Many data mining and knowledge discovery techniques were studied at KDE lab: outlier detection, association and ratio rule mining, information extraction from documents, time-series document clustering, topic detection, mobility histogram construction for mobile objects.

In the paper below, a way to measure the freshness of a web page is proposed. The freshness of a web page is not an easy criteria to evaluate since freshness depends on page's content. Defining the freshness as "whether or not a page has been recently bookmarked" is not enough because the lifetime of freshness is variable. News scripts pages and manual or reference pages have not the same lifetime (the first is short while the second is longer). This method uses social bookmarks (especially the spread of bookmark timestamps) to extract up-to-date pages among the huge content available on the internet.

A Ranking Method for Web Search Using Social Bookmarks
Tsubasa Takahashi and Hiroyuki Kitagawa
Proceedings International Conference on Database Systems for Advanced Applications (DASFAA 2009), April 2009

New topic detection needs emerged recently. Video-sharing services host a content which is difficult to categorize automatically (without any human intervention). The paper below introduces a system for topic extraction from a set of videos by making use of time data and author's diversity.

Topic-Based Awareness Computing Model for Video-Sharing Service
Mariko Kamie, Takako Hashimoto and Hiroyuki Kitagawa
Second International Symposium on Aware Computing (ISAC 2010), November 2010

Scientific Databases

KDE lab. created and manages the JRA-25 Archive (more information in [JRA25]). This is a meteorological database designed to store long-term analysis of global weather data provided by the Japan Meteorological Agency (JMA). The database currently contains 25 years of data (circa 700GB, in August 2007). Web Services have been implemented on top of this database to provide reanalysis maps through the internet (e.g. GoogleEarth).

Other kind of scientific databases research includes satellite DEM images (DEM = Digital Elevation Model). The paper below provides a method to match change between two DEM image of the same area and information found in Web content. For example, if a new shopping center is built, some news report will talk about it on the internet and the elevation model will change in the same time. By matching those two events, the location of the new building and the article featuring further information can be linked. A prototype has been evaluated on Tsukuba city and was able to output some good results about real buildings.

Provinding Constructed Buildings Information by ASTER Satellite DEM Images and Web Contents
Takashi Takagi, Hideyuki Kawashima, Toshiyuki Amagasa and Hiroyuki Kitagawa
Proceedings of Data Intensive eScience Workshop (DIEW 2010) (DASFAA2010 Workshop), April 2010

Collaboration with CCS as Computational Intelligence group

The Center for Computational Sciences or CCS is a framework for cooperative research in Computational Science which involves several laboratories in different fields.
"Computational science has shifted the paradigm of scientific research to include simulation as a fundamental method of science, along with experiment and theory" in [T-CCS2]. CCS has been created in 2004 as an inter-university research facility. Its main mission is to carry out large-scale simulation and data analyses in the following domains: fundamental science (physics of particles and astrophysics), material science and life and environmental science. CSS is the association of 32 professors and associate or assistant professors from different graduate schools.

The main features of CCS are high-performance computing systems and high-speed network infrastructure. The center has been designing massively parallel computers cluster since decades (from 1977). PACS-CS system, working since 2006, is divided into 2560 nodes connected by 20480 Gigabit Ethernet wires. With 10.35 Tflops, this system is ranked 34th (as of June 2006) in the famous [TOP500] of the most powerful known computer systems in the world. The system's network, named Hypercrossbar, is an original 3-dimensional interconnection web among computation nodes.

KDE lab. attempts to make its research achievements practical as much as possible in cooperation with other groups. Both Computational Media group and Global Environmental Science group also belong to CSS. Meteorological database design is a result of these cooperations.

xhtml valid? | css valid? | last update on September 2010