Research topic and schedule

Vincent Jordan (KDE lab.)

project logo

Research topic

Before arriving in Japan, my research topic was only defined as follows: XML query processing using GPGPU. In order to clearly introduce this topic, terms and acronyms will be defined first, then the core idea will be explained.

Definitions

eXtensible Markup Language (XML) is both human- and machine-readable, tree-based language widely used in computer world for the transmission and the storage of data. This simplification of the SGML language says [W-XML], was designed to ease and spread the usage of an interoperable language over the internet. This aim has been reached nowadays since the XML language is used by a huge amount of applications and most programming languages feature an XML parser. Actually XML is a meta-markup language and is used for creating other markup languages. XML tags are used to describe the contents of a document.

XML Path (XPath) language has been created to address parts of XML documents. XPath is required by both XPointer and XSL transformation standards, consequently of the broad XML usage, the need to locate specific data in XML documents became high and revealed a trade off: Keep data in convenient XML format with slow querying or convert data into more efficient database format with faster querying?
The biggest is the amount a XML data, the most time-consuming is the conversion option. My research is about finding a way to speedup XPath queries over big XML documents.

Graphic Processing Unit (GPU) is a processor architecture designed from the beginning for efficient and massive parallel execution. According to the increase of graphic rendering complexity, these chips gained a more general purpose design. GPGPU acronym refers to these new GPU architectures and stands for General Purpose GPU.
The strategy of Professor Amagasa (who suggested this idea) is to execute on GPU, several instances of the same query processing algorithm addressing different data partitions. This is data parallelism (opposed to task parallelism where one algorithm processes several data in the same time). Although this hardware is called "general purpose", several limitations have to be overcome.

"Stream processing is a computer programming paradigm" [W-STRPROC]. This paradigm was invented to create programs that use a limited form of parallel execution. A stream is a set of data and the stream processing consists in applying one or more hardware instructions to each element of the stream. That kind of parallel execution is limited since there is no control on the way this process is done (synchronization, communication, ...). The series of operations is the same for each data therefore it cannot use any conditional branching which depends on the data value.

Stream S0418125
Operations
1 m = 10
2 if S0[i] > m
3 	S1[i] += 2
4 else
5 	S1[i] = 0
Stream S1020140
This is NOT stream processing
Stream S0418125
Operations
1 m = 10
2 if m > 5
3 	S1[i] += 2
4 else
5 	S1[i] = 0
Stream S1620147
This is stream processing

Vector processing is a computer programming paradigm similar to stream processing paradigm. The major difference between vector and stream paradigm is the memory access pattern. Vector processing operators involve a memory read, an execution on an instruction over several data and finally a memory write. Memory access for each single-instruction leads to high bandwidth use.

Idea

The practical target of this research is to evaluate the opportunity of using GPU as XML query coprocessor for programs dealing with a big amount of XML data. In comparison with their parallel processing power, GPU are cheaper than most dedicated chips for parallel execution (such as DSP).
Making good use of GPU hardware, because it is still very graphic processing oriented, is a great challenge:

Research on parallel XML query processing was already done in the same laboratory by Imam Machdi. My main work was to create a GPU algorithm based on the results of his Ph.D. thesis. This task is challenging since current XML processing algorithms do not fit the stream processing paradigm recommended for GPGPU computation.
Two solutions are possible to solve this issue:
  1. overcome GPU limitations in order to do more than stream processing
  2. create a new algorithm that is "stream processing compliant"
During this internship, the first solution has been studied. This choice was made because of the tight schedule since creating a whole new algorithm would have been risky and may lead to "no result found" at the end of my internship.

What is the practical purpose of this research?
My long-term plan is to build a hybrid webserver which makes use of CPU and GPU. This solution might be a cheaper alternative to multiprocessor configurations found nowadays. The main advantage of the hybrid solution would be its scalability since current motherboards can host from zero up to four discrete graphic cards (known as 4-way SLI for nVIDIA, 4-way Crossfire for AMD). At low request rate, CPU would handle the whole process and GPU devices would be added progressively while the rate increases.

The GPU webserver
The GPU-CPU hybrid webserver

Schedule

The planned schedule (in orange) was divided into one task per month. This expected schedule was flexible and no deadline has been set. It has not been provided by my professor, I created it myself to have a goal to reach.

Internship schedule at KDE laboratory
Schedule of my 6 months internship at KDE laboratory

After an adaptation week to my new life in Japan, I began with reading several research papers about XML query processing. Professor Amagasa suggested those three papers after an evaluation of my knowledge about XML. They are the most relevant papers on XML query processing. This was intended to fill my lack of knowledge in that field. After this first step, I was able to understand more easily the section of the Ph.D. dissertation that I was expected to use in my implementation. I started documentation and implementation in parallel. Implementation helps understand the documentation.

First a simple implementation has been done. The same algorithm as the one used by Imam Machdi was implemented from original research paper. This task was done to deeply understand the execution requirements. Consequently to the discovery of missing required features of the targeted nVIDIA toolkit, a data structure library has been designed and implemented for this framework. Meanwhile original and GPU versions of the algorithm have been upgraded in order to make use of this new library. The aim is to compare new solution with previous one. Unfortunately I did not manage to fix some execution errors of the GPU version, despite the several strategies tried.

By comparing planned and real schedule, it is obvious that until end of June, I was still on schedule, but from July to August, I have not been able to do what was planned: benchmark and research paper. Instead I tried to fix my implementation of parallel twigstack algorithm on GPU. For example, some parts of the project were ported to Windows in order to use another debugger (released in June 2010 only).

Japanese lesson

Because of its great community of foreign students, the international student center has important staff and facilities at Tsukuba University. The center offers a complete set of Japanese language courses for all levels of international students. According to [T-INTERSC], there are nine lecture levels (from J100 to J900) and six kanji levels (from K200 to K700). Japanese courses are not mandatory for short-term exchange research students like I was and they can be attended only if his academic adviser allows him to do so. Professor Kitagawa approved my request for Japanese course even if it was not useful for my current short internship.
At the UTBM, I attended "Japanese for real beginners" course (LJ00) but lessons were given using rōmaji transliteration. Japanese writing system is composed of three different scripts: Chinese-based ideographs (kanji), syllabic-based characters (hiragana and katakana). Since the Japanese level tests of Tsukuba University are not transliterated into rōmaji, I have not even been able to use my initial little knowledge of Japanese language and I got a grade close to zero. My level was J100 then, but fortunately I have been assigned to the right course since, despite its name, J100 is too fast for real beginners on my opinion.

Japanese lessons schedule
Japanese lessons schedule

J100 is divided into six lessons. Course takes place every mornings (8:40-10:00) from Monday to Friday. Courses are free of charge. Students only have to purchase these three books: the J100 drill booklet, the "SFJ" book and the "Waku waku" book. The first contains structure drills, model conversation and conversation drills of the six lessons showed in the schedule above. The second contains grammar notes which goes outside the scope of J100. The third contains listening exercises.

Drill book
Drill book
Situational Functional Japanese book
Situational Functional Japanese book
Waku waku book
Waku waku book
xhtml valid? | css valid? | last update on September 2010