ChemBase: a computer information system with chemical data for Chinese medicine

Tsz-Pun Chan, Foo-Tim Chau*, Kwok-Leung Cheung, Chun-Hung Suen
(Union Laboratory of Asymmetric Synthesis and Department of Applied Biology and Chemical Technology ,The Hong Kong Polytechnic University, China)

ReceivedApr. 16, 1999;Supported by the Research Committee of the Hong Kong Polytechnic University (Grant Nos. A020 and G-V422).

Abstract
A multi-media software package, ChemBase, has been developed and implemented for collecting, and retrieving information of Chinese medicine. ChemBase consists of two subprograms, namely CMBASE and CHIMED. The database CMBASE contains general information of Chinese medicine as well as that from chemical analysis not available in most software of this kind. Addition of new data or revision of existing records is allowed. CHIMED is a user interface for data processing. It makes ChemBase more user-friendly and easy to use. In addition, a World-Wide-Web site was setup in the INTERNET for the public to access the information about traditional Chinese medicine (TCM).
Keywords Chinese medicines, Database, Multimedia

1. INTRODUCTION
With advancement in microelectronics, multi-media database becomes more and more popular in many applications. Both private and public sectors develop their own databases for recording different kinds of information. Data of several gigabytes can be archived in a harddisk to replace thousands of data files. Multimedia databases have been developed recently in chemistry, for example, those of periodic table [1], molecular properties of inorganic molecules [2] and Chemical Abstract [3]. However, their number is still small compared to that in the commercial sector. Many chemical databases are stored in Internet, for instance, Chemical Registry System (CAS) [3]. For accessing them, one has to register to the site and the charging fee depends on the length of the login time. Transfer rate between different sites is improving but is still slow for graphic data. Difficulties are also encountered in expanding these databases because many of them are on-line or developed in CD-ROM[4]. Users themselves are not allowed to add, update or delete data as they like.
In this paper, a software package, ChemBase, was reported. It provides a multi-media database, CMBASE, and a user friendly interface, CHIMED for data processing. In this study, the data stored in CMBASE are those relating to general and chemical information of traditional Chinese medicine (TCM) which is not available in most software packages of this kind. In addition, TCM has a long and reported therapeutic history. About three thousand five hundred of them are commonly used nowaday[5]. Hopefully, this work will enhance the development of multi-media database in chemistry and the application of chemical information in TCM. Details of the methodology and software are described in the following sections.

2. DESCRIPTION OF CHEM BASE
ChemBase is a muti-media database software package with a user friendly interface. Chinese medicine (CM) was employed as an example in this investigation to demonstrate how it works. At this stage, ChemBase consists of forty-five poisonous or commonly used CMs as listed in Table 1. They include twenty-four plants, nine animals and twelve minerals. Thirty-one of them were classified as toxic or potent herbs by the Preparatory Committee on Chinese Medicine in Hong Kong. Text, graphic and audio data were employed to set up the database. As most references and information are presented mainly in Chinese, the database and the user interface were developed in both Chinese and English. Moreover, ChemBase also contains a user friendly interface to display data of each CM and provides several searching functions to meet different needs. It allows a CM to be found by its name, alias, class, nature, or even its X-ray spectrum or TLC chromatogram. One can add comments to the CM of interest. As for the structure of ChemBase, it consists of two subprograms, namely CMBASE and CHIMED. CMBASE is a database while CHIMED is a user interface for processing data available in CMBASE.

Fig.1
Structure of CMBASE

Table 1. Data files of Chinese medicines under study

Types Name Serial No. appeared in the record Picture filename under the directory:Picture Spectrum filename under the directory:Spectrum
Plants Chuanwu   00001(a,b,c,d) 00001a
Daji   00002(a,b) 00002a
Shandougen   00003(a,b,c) 00003a
Tianma   00004(a,b,c) 00004a
Tiannanxing   00005(a,b,c) 00005a
Shuibanxia   00006(a,b,c) 00006a
Banxia   00007(a,b,c,d) 00007a
Ganshui   00008(a,b,c) 00008a
Guijiu   00009(a,b) 00009a
Renshen 10 00010(a,b,c) 00010a
Caowu 11 00011(a,b) 00011a
Fuzi 12 00012(a,b) 00012a
Baifuzi 13 00013(a,b,c,d) 00013a
Maqianzi 14 00014(a,b,c,d) 00014a
Baodou 15 00015(a,b,c) 00015a
Langdu 16 00016(a,b) 00016a
Guangdonglangdu 17 00017(a,b) 00017(a,b)
Tenghuang 18 00018a 00018a
Tianxianzi 19 00019(a,b) 00019a
Qianjinzi 20 00020a 00020a
Naoyanghua 21 00021(a,b) 00021a
Xueshangyizhihao 22 00022a 00022a
Yangjinhua 23 00023(a,b) 00023a
Kulianpi 24 00024(a,b) 00024a
Animals Tubbiechong 25 00025a 00025a
Shuizhi 26 00026a 00026a
Quanxie 27 00027a 00027a
Mangchong 28 00028a 00028a
Hongniangzi 29 00029a 00029a
Banmao 30 00030a 00030a
Wugong 31 00031a 00031a
Chansu 32 00032a 00032a
Qingniangzi 33 00033a 00033a
Minerals Mangxiao 34 00034a 00034a
Zhusha 35 00035a 00035a
Liuhuang 36 00036a 00036a
Xionghuang 37 00037a 00037a
Qingfen 38 00038a 00038a
Danfan 39 00039a 00039a
Peishi 40 00040a 00040(a,b)
Peishuang 41 00041a 00041(a,b)
Cihuang 42 00042a 00042a
Shuiyin 43 00043a 00043(a,b)
Hongfen 44 00044(a,b) 00044a
Baijiangdan 45 00045a 00045a

The database CMBASE with CM data was developed under Access Ver. 7.0 in the Microsoft Windows 95 environment [6]. Access was used because it is a relatively low-cost and easy to communicate with other programs available for IBM compatibles, such as Visual Basic. The chemistry database software such as ISIS/Base and CS/find were not adopted in this work mainly because they are expensive and not as popular as the Microsoft softwares. The structure of CMBASE is depicted in Figure 1. It contains text, graphic and audio data. Four tables of Herb1, Herb2, Eherb1 and Eherb2 were utilized in CMBASE to hold the text information recorded under different fields. Herb1 and Herb2 were presented in Chinese while the corresponding ones, Eherb1 and Eherb2, in English. CM data including general and chemical information are archived in one of the four tables (see Figure 1).
The common field, “Serial_No.”, which is the record number of CMs is utilized in all these tables to link up different types of data together. In this way, all the text data in Herb1, Eherb1, Herb2 and Eherb2 are related together. As for the graphic data, the connection is mainly achieved by the graphic filenames and “Serial_No.”. Each picture or spectrum was saved in the BMP, JPG or TIF format with a filename under a particular directory “Picture” or “Spectrum” (see Table 1).

The filename is composed of five digits plus one character code with the last two digits being the serial no. For example, the file “00001a” under the directory “picture” (Table 1) contains information of the first picture as indicated by the character “a” of the CM which is the first record in CMBASE with the corresponding “Serial_No.” being one. As shown in Table 1, there are four pictures available for the first CM “Chuanwu”. The filenames for the second, third and fourth pictures are “00001b”, “00001c” and “00001d” respectively. The graphic data of CMBASE contain the appearance pictures before and after processing of the medicine[7] , thin-layer chromatographic (TLC) patterns[7]   and as well as X-ray diffraction[8]  and Differential Scanning Calorimetry (DSC) spectra[8]  of the CMs. The pictures and related graphics for the CMs chosen in this work were acquired by utilizing a HP Scan jet II[9]   with 100 dpi. As for the audio data, the method for naming the file is similar to that for graphics. These data with pronunciation of the medicine name in both Chinese and English versions were recorded digitally by a SoundBlaster compatible sound-card[10] through a microphone in Wave file format.
The user interface CHIMED was devised to display the information archived in CMBASE. The screen output of the main menu is shown in Figure 2.
01100302.gif (46589 bytes)

Fig.2 The main menu provided by CHIMED.

With this user interface, the basic text information and appearance picture of CM can be displayed. The main menu provides a fast way for the user to access most CHIMED functions via keyboard. Besides, one may select which kind of information is needed or wanted to know, for example, the toxicity of a particular CM. By using the arrow button, the previous record or the next one can be viewed. Most icons are labeled with text to explain their functions so as to minimize the chance of misunderstanding and misleading. As for the searching part, the user may search a CM through seven different items of name, alias, classification, symptom, nature, picture or spectrum. This should meet the need of most users and professionals. For instance, in searching via name, one can input the name of the medicine or selected it from the list. Then CHIMED will search the “name” field of the database, CMBASE, and to show information of the one that matches the input. If a class is chosen, the “class” field will be searched and all CMs with the desired class will be listed. For the “nature” searching, a list of medicine’s natures including warming the body, alleviating pain, increasing circulation, and others will be shown. One may select the desired items and the related medicines together with their information will be displayed. With regard to the searching via appearance pictures, X-ray spectrum or chromatogram, it is mainly carried out by the browsing mode provided by CHIMED. The desired picture or spectra can be selected by the user. CHIMED will provide a few graphics data in the screen. After choosing the picture or spectra, the computer will show the information about that CM. Beside browsing information, one may add new record to CMBASE by using the data format adopted by ChemBase. CHIMED was developed in Visual BasicTM (VB) 6.0 [11]. All the data in CMBASE are presented via CHIMED. Its searching facility was accomplished by using SQL (Structure Query Language) [12]. SQL is a non-procedural, English-like query language based on relational calculus. ChemBase may be executed under Microsoft Windows 95 using a P5 or higher IBM compatible PC.

3. RESULTS AND DISCUSSION
A multimedia software, ChemBase, was developed and implemented in this work to store medical and chemical information of Chinese medicine. The appearance and chemical information of TLC and DSC patterns as well as IR spectra are included as graphic data. Text and audio data can also be stored. ChemBase has several advantages over the existing databases of traditional Chinese medicines in medicinal chemistry. Firstly, one may search the desired CM not only by its name or formula but also by its properties or functions against a particular disease. Several searching methods are provided to satisfy the needs of different users. Secondly, it is more flexible for the user. In ChemBase, records in CMBASE can be added or revised as he/she likes by following the right formats, for example, the picture dimension. In addition, one may also add remarks to any CM because CMBASE is an expandable database. This is a useful facility for database software packages. Although only forty-five CMs are included in CMBASE, it can be expanded to hundreds or thousands of CMs by storing these data in harddisk also. This work is just to introduce the functions of ChemBase provided. Moreover, Chinese medicine was employed as an example for applying ChemBase. We wish this work can enhance the interest of people on Chinese medicines because more and more people find CM an effective and valuable health care alternative. Hence, besides the development of ChemBase, a homepage has been established by our group for searching information about traditional Chinese medicine on the Internet through various media (URL Address: http://fg702-6.abct.polyu.edu.hk/tcm.html). It can reduce the time on searching the right materials and help the user to get the desired information via a user-friendly interface.
There are limitations in using ChemBase in the present version. For example, although it is an expandable database, yet the number of fields and the field names are fixed. One can only input data under the given field names but cannot add a new field or change the field name of CMBASE. For future development of ChemBase, more flexible environment should be provided to the user.

4. CONCLUSION
A multimedia software package, ChemBase, has been developed for collecting, searching and re-organizing existing information of Chinese Medicine. Besides TCM, ChemBase can be applied to other chemistry or related disciplines just by changing the filename of CMBASE as listed in Figure 1. New text, graphic and audio data may be inputed into CHIMED by following the right formats for data archive such as the picture dimension and colour depth. In addition, ChemBase allows inputting information of Chinese medicine in Chinese or English. Therefore, it is suitable for different disciplines.

5.ACKNOWLEDGMENTS
Thanks are due to Professor Peishan Xie of Guangzhou Institute for Drug Control, Guangzhou for providing us the picture as shown in Figure 2.