ShaoXueguang, Yu Zhengliang
(Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China)
Abstract Based on the wavelet compression and immune algorithm (IA), a novel algorithm for fast resolution of twodimensional multicomponent overlapping chromatogram is proposed. Due to the characteristic of the linear property of the wavelet transform (WT), the overlapping chromatogram (antigen) can be compressed by WT before it is input into the immune network, the standard chromatogram of each component (antibodies) is also compressed to the same scheme. Especially, for speeding up the computation, the twodimensional data matrix is arranged into onedimensional vector form before it is compressed. After the compressed information of each component was extracted by IA, the chromatogram can be reconstructed by the inverse WT algorithm and rearranged back into matrix form. It was proven that the result is almost the same with the result from IA, but the calculation speed is much faster. At the same time, satisfactory quantitative result can also be obtained.
 INTRODUCTION
Along with the development of modern chemical instrumentation, multicomponent twodimensional data matrices can be easily obtained. For the aim of resolving the multicomponent overlapping matrices, several methods, such as chemical factor analysis (CFA),^{[1,2]} wavelet transform (WT) ^{[35]} and immune algorithm (IA) ^{[68]} have been proposed. In our previous works,^{[610]} it has been proven that the IA is an efficient tool for the resolution of overlapping multicomponent analytical signals. Multicomponent overlapping chromatogram can be easily resolved by an IA and the calculation speed is faster than conventional leastsquare method.^{[7]} However, when the size of data set is large and there are parameters need to be optimized, the consumed time of the computation is too long to be feasible in practical uses, e.g., in GAIA^{[10]} method, because the IA procedure is invoked repeatedly. One useful way to speed up the calculation is to compress the raw experimental data. There have been many efficient tools for analytical data compression, such as binary coding method, Adams and Black algorithm, Fourier transform, chemical factor analysis, and wavelet transform.^{ [1114]}
In this paper, both the matrices of the standard chromatogram (antibodies) and the multicomponent overlapping chromatogram (antigen) are converted into onedimensional vector form and compressed by wavelet transform at first, then perform the resolution by using an IA. It was found that the calculation speed can be improved by the conversion and compression. It can provide a fast preprocessing tool to the GAIA method.  ALGORITHM
The principle and application of IA has been reported in our previous works.^{[610]} The essence of an IA is that, taking the signal of the multicomponent mixture as antigen and the signals of the standard samples as antibodies, the information of each component in multicomponent overlapping signal can be extracted by a process of recognition, iterative elimination etc. The calculation process of an IA can be simply described by the following formulae:
(1)
(2)
(3)
where T is the weight of input layer, k is the number of iteration, V is the overlapping chromatographic signal (antigen), V_{0i} is the standard chromatographic signal of the ith component with known concentration (antibodies), c_{i} is the relative concentration of the ith component, and V_{F} is the feedback vector or matrix denoting the eliminated antigen. It can be seen that when dc^{(k)} approaches to zero, V_{F} will be the information of each component in the overlapping chromatographic signal. In many cases, due to the variation caused by the experimental reproducibility etc., parameters of V_{0i}, such as the position and the shape of the peaks may need to be optimized. The optimization will be a timeconsumed procedure when the data number of V and V_{0i} is large. Therefore, an efficient way to compress the V and theV_{0i} will be necessary for speeding up the algorithm.
The wavelet transform has been proven to be an efficient technique for analytical data compression.^{[15,16]} The dual localization characteristic in both frequency and time domains, the linearity, and the existence of fast algorithm make the WT an ideal candidate for preprocessing data for the IA. In this paper, the multiresolution signal decomposition (MRSD) algorithm^{[17,18]} is used.
Based on the algorithm of the IA and WT compression, a fast algorithm for resolution of multicomponent overlapping chromatogram is proposed. The flowchart can be described in Figure 1, including the following steps:
Fig.1 The flowchart of the proposed algorithm
(1) Input the overlapping 2D chromatographic data matrix as antigen.
(2) Estimate the possible components from the original chromatogram.
(3) Input the standard 2D chromatographic data matrices as antibodies.
(4) Arrange all 2D data matrices of the antigen and antibodies into onedimensional vectors along with the chromatographic orientation.
(5) Apply the wavelet compression to the antigen vector, i.e., perform WT on the antigen vector, then suppress those coefficients whose value is less than a threshold. The value of the threshold is determined by a predefined compression ratio that is determined by trial and inspection of the reconstruction error. The remained coefficients will be taken as the antigen for further calculation.
(6) Compress the antibody vectors by performing WT with the same parameters as in the last step and remaining those coefficients at the same position with the compressed antigen.
(7) Extract the compressed information of each component from the compressed antigen by the IA mentioned above, where the compressed antigen is taken as V and the compressed antibodies are taken as V_{0i}. In this step, the compressed information of each component can be extracted from the compressed antigen.
(8) Reconstruct the extracted information of each component by the inverse WT algorithm to obtain a full extracted chromatographic data in vector form.
(9) Finally, rearrange the extracted data vector back into matrix form.
After all above calculation, the chromatograms of each component can be resolved from the multicomponent overlapping data matrix. From the theory of the IA, c_{i} is the concentration of each component relative to the concentration of the standard sample, i.e., if the concentration of the standard sample is known, then the concentration of each component in the mixture can be calculated by the parameter c_{i}. Therefore, this method can give us both the resolution and the quantitative determination simultaneously.
 EXPERIMENTAL
The experimental data sets were measured on an HPLC system comprising a Spectrasystem FL2000 (SpectraPhysics, USA) with the spectra Focus multiwavelength UV detector (SpectraPhysics) and a Spectrasystem workstation. The column was packed with 10mm ODS silica (250 mm×5 mm, Shimadzu). The mobile phase was 0.25mol/L (pH~3.5) lactic acid (A.R.) with 0.01mol/L dodecyl sulfonic acid sodium. The color developing reagent of post column was 1.0×10^{4}mol/L arsenazoⅢ (Fluka Chemie AG). The flow rate of the mobile phase was 1.0mL/min. The flow rate of the color developing reagent was 1.0mL/min. The temperature of the column was 20°C. The detection wavelength was from 580nm to 720nm. The interval of sampling time is 0.005min.
Table 1 Composition of the samples (unit: mg/ml)
Sample No.  Er  Tm  Yb 
1  0.2000  0.1999  0.2001 
2  0  0.1499  0.2001 
3  0  0.1999  0.2001 
Table 1 shows the composition of the three samples, which are mixed by Yb, Tm and Er, respectively. Figure 2 and Figure 3 (a) (b) (c) show the twodimensional chromatogram of the mixture sample 1 (antigen) and the standard chromatograms of Yb, Tm and Er (antibodies) obtained by the experiment.
Fig.2 The experimental multicomponent overlapping chromatogram (antigen) of the sample No.1
Fig.3 The standard chromatograms of single component (antibodies)
(a) Er (b) Tm (c) Yb
 RESULTS AND DISCUSSION
4.1 Selection of the wavelet basis and the decomposition level
Figure 4 shows the coefficients obtained by the wavelet transform of the chromatogram (arrange in vector form) in Figure 2 at level 7 using the Symmlet5 (L=10) wavelet basis. It can be found that the information of the chromatographic signal is mainly concentrated on only a few of the coefficients. Removing the smaller coefficient will not affect the total information. In order to obtain the optimal wavelet basis and decomposition level, reconstruction error obtained by Haar, Daubechies (L=420), Coiflets (L=420), Symmlets (L=630) at different decomposition level, where L is the length of filter, was investigated with the data sets in Figures 2 and 3. Table 2 summarized some of the results when the compression ratio is 1/58. The reconstruction error is calculated by
(4)
where, X is the original signal, X_{R} is the reconstructed signal from remained coefficients, n is the size of the original data set.
Fig.4 The wavelet coefficients obtained by WT of the chromatogram in Figure 2
From Table 2, it can be found that the variation of reconstructed error is almost the same for all the four data sets. For every wavelet basis the minimal reconstruction errors generally appear at decomposition level 6 or 7. Comparing the reconstructed errors between different wavelet basis, it can be found that Symmlet 4, 5, 6 give smaller results. Therefore, Symmlet5 at decomposition level 7 is adopted in the following studies.
Table 2 Reconstruction errors by different wavelet basis at different decomposition level
Wavelet basis  Data  Decomposition level  
4  5  6  7  8  
Haar  Mix.  309.601  194.953  177.447  171.839  168.520 
Er  94.252  78.993  74.375  73.626  74.169  
Tm  69.741  58.817  55.637  55.588  56.266  
Yb  69.070  58.046  53.986  52.874  51.045  
Db4  Mix.  276.538  114.605  92.823  88.633  95.276 
Er  68.043  43.651  37.795  37.023  40.095  
Tm  51.618  34.362  31.237  32.520  33.644  
Yb  53.556  37.539  32.522  30.989  32.992  
Db8  Mix.  277.929  109.555  89.389  88.817  99.135 
Er  68.060  42.529  38.665  40.783  46.580  
Tm  51.735  34.138  32.182  35.101  38.922  
Yb  53.529  37.544  33.170  33.769  37.752  
Sym3  Mix.  276.373  120.943  98.082  92.855  94.883 
Er  68.186  44.532  37.727  38.435  39.737  
Tm  51.816  35.000  31.922  32.380  34.307  
Yb  53.856  38.130  33.074  31.258  32.033  
Sym4  Mix.  275.467  115.130  92.764  85.817  87.348 
Er  67.950  43.506  37.304  37.294  40.324  
Tm  51.568  34.594  30.649  31.270  32.425  
Yb  53.539  37.624  32.082  30.208  30.807  
Sym5  Mix.  279.138  112.817  89.699  84.277  84.930 
Er  68.870  44.143  38.136  38.141  39.722  
Tm  52.365  35.039  31.369  31.726  31.908  
Yb  54.374  38.296  32.159  29.435  29.962  
Sym6  Mix.  275.515  110.207  87.794  82.778  86.383 
Er  67.938  42.396  36.546  37.034  39.834  
Tm  51.502  33.856  30.895  31.911  33.564  
Yb  53.468  37.125  32.402  30.549  31.978  
Sym7  Mix.  279.568  110.628  87.745  84.261  87.822 
Er  68.977  43.888  37.681  38.089  40.615  
Tm  52.404  34.883  31.168  31.642  34.493  
Yb  54.392  38.508  32.354  30.697  31.047  
Coif2  Mix.  276.228  113.440  90.970  84.528  83.760 
Er  68.185  43.589  37.174  37.834  39.003  
Tm  51.813  34.415  30.914  31.251  31.713  
Yb  53.780  37.451  32.044  29.921  30.360  
Coif4  Mix.  277.145  107.641  86.738  83.806  87.135 
Er  68.182  42.591  37.072  37.435  41.578  
Tm  51.813  33.722  31.162  33.030  35.006  
Yb  53.752  36.947  32.133  31.387  33.073 
Fig.5 The remained wavelet coefficients after compression and the extracted results (wavelet basis: Symmlet 5, decomposition level: 7)
4.2 Resolved result by the proposed algorithm
In order to resolve the overlapping chromatogram by the proposed algorithm, both the antigen (the multicomponent overlapping chromatogram) and the antibodies (the standard chromatograms of each component) were compressed with Symmlet5 wavelet basis at decomposition level 7. The number of data point is reduced from 52200 to 930. The solid line in Figure 5 shows the compressed result of the overlapping chromatogram in Figure 2. The dot lines show the resolved result by the IA. In order to see clearly, three different regions are enlarged in Figure 6, in which (a) (b) (c) are corresponding to the data points in the range of 140~190, 550~600, 660~690 respectively. It can be seen that the IA can give a very good resolution of the compressed wavelet coefficients.
Fig.6 The enlargement of Figure 5
(a) 140~190 data point (b) 550~600 data point (c) 660~690 data point
Fig.7 The reconstructed 2D chromatograms
(a) Er (b) Tm (c) Yb
The reconstructed chromatograms from the resolved coefficients in Figure 5 are shown in Figure 7. (a) (b) (c) are corresponding to the reconstructed chromatogram of each component respectively. In comparison with the standard chromatogram of each component in Figure 3, it can be seen that the overlapping chromatogram is well resolved and the chromatogram of each component can be well obtained. The residual is shown in Figure 8, the intensity of the residual is very small compared with that of the overlapping chromatograms or the reconstructed chromatograms, which indicates that almost all the information contained in the overlapping chromatogram was extracted. The little error is mainly caused by the irreproducibility of the experiment. Read more about us.
Fig.8 The reconstructed residual information
4.3 Comparison of the proposed algorithm with immune algorithm and WTIA
In our previous works, a WTIA useing a twodimensional wavelet compression algorithm was proposed for the sake of improving the calculation speed. In order to investigate the efficiency of the proposed algorithm, the consumed time and the residual after resolution are compared with those of the IA and WTIA, where the value of the residual is the summation of every data point of the residual matrix as in Figure 8. The results are listed in Table 3. It can be seen clearly that the speed of the proposed algorithm is 2.48 times faster than that of the IA, 2.16 times faster than that of WTIA. The residual is also smaller than that of IA and WTIA.
Table 3 Comparison of conventional IA, WTIA and the proposed algorithm*
No. Run  Consumed time (s)  Residual (×10^{2})  
Conv. IA  WTIA  Proposed Algorithm  Conv. IA  WTIA  Proposed Algorithm  
1  18.56  16.37  7.47  2.1011  2.2326  2.0946 
2  18.84  16.21  7.52  
3  18.84  16.36  7.58  
4  18.89  16.37  7.63  
5  18.51  16.37  7.58  
Aver.  18.72  16.34  7.56 
* Program runs on Pentium(r)/233MHz/Memory 64M.
Table 4 Quantitative results by the proposed algorithm
Sample  Added Conc. (mg/ml) 
Calculated conc. (mg/ml) 
Recovery (%) 

1  Er  0.2000  0.1977  98.85 
Tm  0.1999  0.1914  95.75  
Yb  0.2001  0.2070  103.45  
2  Er  0  0  0 
Tm  0.1499  0.1498  99.93  
Yb  0.2001  0.2046  102.25  
3  Er  0  0  0 
Tm  0.1999  0.1966  98.35  
Yb  0.2001  0.2041  102.00 
4.4 Quantitative determination using the proposed algorithm
In order to investigate the ability of the proposed algorithm for the quantitative determination, the three samples listed in Table 2 were analyzed and the results were listed in Table 4. It can be seen that all the recoveries are between 100± 5% with the minimum being 95.72% and maximum being 103.44%. The results are satisfactory.
 CONCLUSION
Based on the wavelet compression and immune algorithm, a fast algorithm for resolution of 2D multicomponent overlapping chromatogram is proposed. By application of the method in resolution and quantitative determination of multicomponent 2D overlapping chromatograms, it has been proven that this method is fast in calculation speed and accurate in quantitative calculation. Therefore, the proposed algorithm may be an alternative effective method for resolution of multicomponent 2D overlapping chromatogram.