|
1.IntroductionDisease diagnosis through optical methods is an area of considerable research interest.1 Optical tools are sensitive and are hence potentially capable of discriminating different stages of disease progression.1–7 However, tissue being a complex medium, with several fluorophores, scatterers, and absorption domains, makes it difficult for proper diagnosis through optical means.1 Hence, identifying reliable markers for accurately depicting the tissue condition through noninvasive optical methods has received significant attention.1 For this purpose, recent approaches have focused on extracting intrinsic fluorescence2 and tissue multifractality, characterizing the morphological changes by multifractal detrended fluctuation analysis (MFDFA).3 Other approaches make use of principal component analysis4 for identification of underlying spectral correlation and other image processing tools like wavelets5 for pin pointing parameters that faithfully capture the disease progression. Clinical application of this approach, not only depends on these biomarkers but also crucially depends on the validation of the diagnosis outcome through a suitable diagnostic algorithm, which can accurately classify the measured spectra from an unknown tissue, using the stored database of spectra of tissues of known histopathologic classification. This will supplement and augment the histopathological approach, the current industry gold standard. Over the last few decades, a variety of diagnostic algorithms have been developed for optical diagnosis of cancer.7 Classification schemes like artificial neural network6 and support vector machine (SVM)7 have been found promising in binary classification, e.g., normal versus cancer. However, in a clinical situation, it is often required to classify the tissue site as normal and different grades of cancer. Hence, there is strong interest on statistical classifiers to extract the information content of the entire spectral data in order to get the best diagnostic features and enhance accuracy in classification of tissues into corresponding histopathologic categories.8 Total principal component regression developed by Tan et al.8 has classified various cancers, based on gene expression profiles and provided the optimized results compared to other methods for multiclass classifications. SVM has been deployed for multiclass cancer classification with classwise optimized gene.9 Its usefulness for optical diagnosis is yet to be explored. In recent times, hidden Markov models (HMMs) are being widely used in biological sequence analysis as a robust method.10 An HMM has been deployed to analyze hyperspectral images and a new HMM-based spectral measure has been referred to as the HMM information divergence in order to characterize the spectral properties.11 Here, we demonstrate the efficacy of MFDFA-HMM integrated framework for optical diagnosis of cancer. More specifically, it is found that HMM on the multifractal light scattering properties of the tissues shows remarkable efficiency in differentiating normal and different stages of precancer. It is particularly effective when applied on global fractal parameters like generalized Hurst exponent and the corresponding singularity spectrum width/strength of multifractality, characterizing the global morphological conditions of the tissues for multiclass cancer classification, as compared to the MFDFA-SVM integrated framework under same application. 2.Methods and Materials2.1.Sample PreparationBiopsied cervical precancer tissue slices (the histopathologically characterized grade I, grade II, and grade III precancer tissue) and normal tissues were collected from Ganesh Shankar Vidyarthi Memorial (GSVM) Medical College, Kanpur, India (age of patients between 35 and 60 years; , with , , ; and six biopsies from the normal counterparts, ). The standardized histological preparation of the excised tissues involving fixation, dehydration, embedding in wax, sectioning under a rotary microtome with thickness , and lateral dimension , is followed by performing subsequent dewaxing. The consent for the use of all the intact tissue (human cervix with cancer and normal) samples in our study was obtained from the Ethical Committee, GSVM Medical College and Hospital, Kanpur, India. The sample preparation methods follow approved guidelines in our study. 2.2.Experimental SystemThe spatial distribution of tissue refractive index (RI) was recorded by a differential interference contrast (DIC) microscope (Olympus IX-81, United States). At a magnification of , these DIC images were recorded by a CCD camera (ORCA-ERG, Hamamatsu, dimension ). The elastic scattering spectra from the multiple sites of the biopsied tissue sections were recorded by the angle resolved spectral light scattering measurements (Fig. 1). In brief, light emitted from a Xe-lamp (HPX-2000, Ocean Optics, United States) was collimated by a combination of lenses and illuminated the tissue sample at the center of a goniometric arrangement (spot size -diameter). The collimated scattered light from the sample was focused into a collecting fiber probe coupled to a spectrometer (USB4000FL, Ocean Optics) for wavelength resolved signal detection. The recordings of spectra were performed (360 to 800 nm) with a spectral resolution of 2.05 nm, where the angular range was kept at 10 deg to 150 deg with an interval of 10 deg. For the inverse multifractal study, the spectra were recorded at backscattering angle (Fig. 3). 2.3.Background AnalysisIn the first step, the elastic scattering spectra data were processed through Fourier domain preprocessing via the Born approximation, followed by the multifractal analysis. For analyzing the dataset, 6 normal, 14 grade-I, 6 grade-II, and 9 grade-III samples were taken. 2.3.1.Light scattering-based inverse analysis in Born approximationAt first, we extract the fractal parameters from the spatial variation of RI, as manifested through corresponding light scattering data.12 We have followed the inverse analysis strategy12 for quantification of multifractal signature from the scattered light signal through MFDFA, before applying an HMM on them. The normalized RI fluctuations of a weakly fluctuating scattering medium are given by , where and are the average RI of the medium and location within the volume, respectively. The fluctuation part is responsible for phase distortion and scattering. It is known that the elastic scattering field for scalar excitation can be related to in first-order Born approximation via Fourier transform.12 For continuous random media, such as tissues, the expression for scattered intensity is given by where is the wavelength; ; is the scattering angle; is the spatial inhomogeneity distribution of index; and is the strength of fluctuation index, where the amplitude of fluctuating index is . The scattering vector has the modulus , where is related to the spatial frequency via . For a medium exhibiting self-similarity in index fluctuations, the information on multifractality can be extracted from the scattering signal as follows:The parameter exhibits the index inhomogeneity distribution with spatial scale , where is the distance between any two points in the medium. It represents the randomness of the medium in statistical sense and embodies the essential multifractal features of index fluctuations in complex systems, such as tissues. This is subsequently analyzed using MFDFA13 to yield the multifractal tissue optical properties. 2.3.2.Multifractal detrended fluctuation analysisMFDA is an effective tool for quantitative estimation of the generalized Hurst exponents. In MFDFA, one first generates a profile function (spatial series of length , ) from the one-dimensional spatial index fluctuations. Here, length of the series . Subsequently, the profile is divided into nonoverlapping segments of equal segment length . The segment length varies from 16 to 128. The local trend of the series is determined for each segment by least square polynomial fitting and then subtracted from the segmented profiles to yield detrended fluctuations. The resulting variance of the detrended fluctuation is determined for each segment as follows: The moment () dependent fluctuation function can be obtained by taking the average over all the segments: where may vary from negative to positive values with fraction or integer values. Since length of the series is often not an integer multiple of variable segment length , in order to take into account all the data points, the procedure was performed twice on the series, starting from either end of the series.13 The scaling behavior can be obtained by analyzing the variations of versus for each value , assuming the general scaling function as follows:The relation between the generalized Hurst exponent and the multifractal scaling exponent can be demonstrated as follows: The Hurst exponent () is defined as and the values , , and correspond to long range correlation, uncorrelated random fluctuations, and anticorrelated behavior, respectively.11 The Hurst exponent and the scaling exponent , along with the singularity spectrum completely, characterize any nonstationary multifractal fluctuation series. Here, is related to via a Legendre transformation: where is the singularity strength and the width () [considered at ] of is a quantitative measure of multifractality.132.4.Data AnalysisThe obtained two global fractal parameters, generalized Hurst exponent and the corresponding singularity spectrum width, were subjected to (a) SVM-based multiclass classification and (b) HMM-based multiclass classification. For analyzing the dataset, 6 normal, 14 grade-I, 6 grade-II, and 9 grade-III samples were taken. 2.4.1.Support vector machineSVMs are powerful statistical classifiers under the supervised learning scheme. The central idea behind SVM operation is to separate classes with a surface that maximizes margin between them by avoiding overfitting to form an optimal separating hyperplane (OSH). Hence, by following structural risk minimization (SRM) of statistical learning makes prediction on a function as , where is the kernel function defined on a basis function, is the corresponding model weights, and is the bias weight. The training data points lie far away from the OSH, does not participate in the specification and hence receives zero weight. Data point that lies close to decision boundary receives nonzero weights. These training data points are “support vectors.”14,15 If we remove these points, it will change the boundary location. Unlike relevance vector machine, there are restrictions while choosing of kernels in SVM. An appropriate selection of kernel function is an important aspect as it defines the accuracy level of SVM-based operation while determining training data classification. The kernel function will produce optimum results in classification as long as it obeys the Mercer’s theorem.14,15 Figure 2 displays the simplified workflow of SVM-based multiclass classification on extracted multifractal parameters from tissue samples. 2.4.2.Hidden Markov modelHMM16 is a statistical Markov model with hidden states and is also the simplest dynamic Bayesian network. An HMM is closely related with mixture models, which are statistically independent. An HMM can be efficiently employed for a time series data, where actual parameters are unknown and only observational information are known. From this series of observations, the probabilities of parameters giving such observations and the transmission probabilities can be found by the Baum–Welch algorithm.17 The basic principle of an HMM can be described as follows. For states as input to the model, the ’th state can be predicted by the traditional Markov model, where given the present input, the future is independent of the past: If ’th observation can be made on the basis of ’th observation, it is a first-order assumption, which generally is used for Bayesian modeling. Using the Markov assumption, we also can write it as follows: According to the Bayes’ formula: In a more general way, the above Eq. (8) can be rewritten as follows: The measure of probability can be achieved by likelihood parameter , which is proportional to the probability: In the case of an HMM, the model is prepared with the training data and a sequence of states is . Here, is defined as the prior probability, is the transition probability, and is the emission probability. The probability of a state sequence obtained from an HMM with parameters can be expressed as a product of the transition probabilities: For an observational sequence , a (hidden) state sequence can be determined from an HMM with parameter . Hence, the likelihood of along the path takes the form: It can be expressed as the product of the emission probabilities computed along the considered path. With the likelihood of an observed sequence and the parameter defined by an HMM, the probability can be expanded as follows: Using the Baum–Welch algorithm, the hidden parameters in an HMM are found. This algorithm utilizes the expectation maximization (EM) algorithm for finding the maximum likelihood estimation of the parameters of an HMM, given a set of observed feature vectors. As we mentioned earlier, hidden Markov chain can be represented as . Here, the stochastic transition matrix , where is the discrete variable. The emission probability , where is the observational sequence. The EM algorithm defines a local maximum for . After defining the initial condition, the Baum–Welch algorithm follows the forward and backward procedure to find the proper estimation of the predicted results. In the case of forward procedure, let the probability of viewing at state in time is . Under recursion procedure, and . Similarly, let the probability of viewing at state in time is . Under recursion procedure, and . In the final step, according to the Bayes’ theorem, the probability of the observed sequence and the parameters in state at time given as . The probability of being in state and at times and , respectively, given the observed sequence and parameters : Hence, can be updated at expected frequency spent in state at time as . The expected number of transitions from state to state compared to the expected total number of transitions away from state is . If is the expected number of times, the output observations have been equal to while in state over the expected total number of times in state , then . Here, the indicator function exists only for . These steps are iterated until a desired level of convergence is achieved. In this paper, an HMM is applied on the known multifractal fractal parameters, which leads to a significant improvement on the prediction accuracy. For experimentation, we first train the multifractal parameters in an HMM for each of the categories. Prior probabilities are first selected as a random function. and are modeled as Gaussian densities, with mean 0 and variance 1. Subsequently, a representative data is trained on the model iteratively to fit and modify the model using EM algorithm. The model is optimized using Lagrange multipliers. We use forward and backward algorithms to compute a set of sufficient statistics for our EM step tractably. Once the model is sufficiently trained for a given sequence of data we calculate the likelihood of sequence for each category, i.e., as , when the sum of the joint likelihoods of the sequence over all possible state sequences , allowed by the model for each category. The maximum likelihood gives the prediction for the sequence data. The proposed HMM-based model on light scattering derived from multifractal tissue optical properties has been demonstrated in Fig. 3. The HMM-based data analysis steps for normal and different cancerous grades have been shown here in detail for the ease of understanding. 3.Results and DiscussionThe DIC images of different pathological grades have been presented in Fig. 4 for comparisons. The histopathologically characterized tissue samples were provided by the pathologists of GSVM Medical College and Hospital, Kanpur, where cancer stages were defined by the pathologists. The results of the inverse analysis on the light scattering spectra recorded (using the spectral light scattering measurement system in Fig. 1) from a grade-1 dysplastic cervical tissue (corresponding to Fig. 1) are displayed in Fig. 5. The large variation of the slopes of log versus [Fig. 5(c)] is the evidence of strong multifractality. The derived spectrum and the singularity spectrum [Figs. 5(d) and 5(e)] demonstrate the strong multifractality in the spatial variations of tissue refractive indices. The parameter , obtained by Fourier preprocessing, contains submicron level spatial index fluctuations information [as evident from Fig. 5(b)]. The observed small-scale fluctuations of [Fig. 5(d)] and the resulting width of [Fig. 5(e)] possibly originate from the microarchitecture of the fibrous network of connective tissue. To summarize Fourier preprocessing of light scattering in Born approximation and its subsequent analysis through MFDFA documents, there are small spectral variations as signature of subtle or hidden changes in the refractive indices spatial distribution via multifractal parameters. After MFDFA analysis, the observed trends are , , , , and , , , for normal and different grades of cancer, respectively. Here, it can be observed that there exists an overlapping of multifractal parameters like Hurst exponent and singularity spectrum width among normal and different grade of precancerous tissues. Hence, the supervised classifiers like SVM and HMM have been applied here for multiclass classification purpose. Our initial dataset consists of 35 samples. We have used Monte Carlo cross-validation, where we randomly split the dataset into training and testing dataset. This process has been repeated 100 times. The size of training and testing dataset varies for each split; we just ensured that a minimum two number of samples have been included in both training and testing dataset for each split. For each such split, the model has been fit to the training data, and predictive accuracy has been assessed using just the validation data (testing data). Then, the results of the testing data have been averaged over the splits. The advantage of this method is that the proportion of the training/validation split is not dependent on the number of iterations, and the predictive accuracy is more or less independent of the samples used for training dataset. We additionally used nine unknown samples taken at different time than the dataset and the prediction done by our model has been compared by manual verification. Results of the unknown samples also have been averaged and presented in the results. The above process has been repeated for SVM and HMM. The data in Table 1 depicts the mean and variance () for each MFDA parameter of each grade for the entire dataset. The same dataset has been used for model generation in SVM and HMM. Table 1Summary of the multifractal tissue optical properties derived from light scattering spectra.
There are nine unknown samples, which are considered as test samples for SVM and HMM classification purpose. SVM creates an optimum manifold barrier with radial basis function kernel between healthy and different grades of cancer depending on MFDFA parameters. Figure 6 displays the prediction analysis carried out by training data in SVM. From Fig. 6, it is clearly visible that SVM works very well in binary classification, i.e., between normal and grade III with 98.5% and 100% accuracy, respectively. While trying out SVM classification for multiclass classification, the MFDFA-SVM integrated framework performs poorly by degrading the overall performance of the system with 57.14% and 55% accuracy, respectively. This error in prediction has occurred due to wrongly predicting normal tissues as grade I tissues (1.5%), grade I as normal (14.29%), and grade II tissues (28.57%) as well as grade II tissues as normal (10%) and grade I tissues (35%). An HMM creates abstract Markov models for the classification purpose on multifractal parameters. Figure 7 displays the prediction analysis carried out by training data in an HMM. As can be seen from the graph, the parameters clearly show distinction between normal and different grades of precancer with singularity spectrum width () and Hurst exponent . The normal and grade III tissues were correctly predicted. However, there is error while predicting grade I and II tissues (correct prediction rates are 85.45% and 78.57%, respectively). This error in prediction is also limited to wrongly predicting grade I tissues as grade II tissues (14.54%) as well as grade II tissues as grade I tissues (21.43%). The results demonstrate that binary classification between normal and cancerous tissues (grade III) is very good both in SVM and HMM. Meanwhile, in multiclass classification cases, when precancerous grades (grade I, grade II) are to be classified along with normal and precancerous tissues (grade III), abstract parameters achieved using an HMM that performs better than the SVM. The presence of noise in the obtained signal damages the SVM performance as SVM clearly classifies based on the kernel formed after considering all the multifractal parameters. While in the case of HMM, the Markov model finds abstract parameters by controlling the actual multifractal parameters and produces a prediction based on the derived abstract parameters. As a consequence, an HMM avoids the noise added to the signal and able to produce better multiclass classification results than SVM. In our current manuscript, we proceeded with Fourier domain preprocessing as our main focus was on global behavior analysis of time series through MFDFA-HMM and MFDFA-SVM integrated model classifications in early-stage cancer detection. It is known to us that wavelet domain analysis is more appreciable than Fourier domain (short-time Fourier transform) analysis in the case of local behavior analysis of time series.18 In wavelet domain preprocessing, the performance of the SVM model can perform better than an HMM model as SVM has better generalization due to the principle of SRM than an HMM for system abnormality detection.19 A comparative study between MFDFA-SVM and MFDFA-HMM integrated model for local behavior analysis of time series through wavelet domain preprocessing in early-stage cancer diagnosis will be a part of our future study. 4.ConclusionsWe have explored an integrated framework of light scattering-derived multifractal tissue optical properties (generalized Hurst exponent and width of singularity spectrum) along with a robust HMM for multiclass classification of different precancerous grades of human uterine cervix. The results clearly demonstrate that the use of HMM on the multifractal properties leads to significantly improved classification as compared to MFDFA-SVM based integrated model for multiclass classification. These MFDFA-HMM based classification results show considerable promise by exploring multifractal tissue optical properties as a biomarker for precancer detection. We are currently expanding our investigations toward in-vivo deployment of this integrated approach for precancer detection using tissue light scattering spectra. In general, the use of this MFDFA-HMM integrated model on elastic scattering spectroscopic data may lead to a diagnostic modality for the detection of other types of cancer. AcknowledgmentsThe authors thank Dr. Asha Agarwal, GSVM Medical College and Hospital, Kanpur, for providing the histopathologically characterized tissue samples. ReferencesP. N. Prasad, Introduction to Biophotonics, Wiley-Interscience, Hoboken, New Jersey
(2003). Google Scholar
K. Pandey et al.,
“Fluorescence spectroscopy: a new approach in cervical cancer,”
J. Obstet. Gynaecol. India, 62
(4), 432
–436
(2012). http://dx.doi.org/10.1007/s13224-012-0298-6 JOBYA4 Google Scholar
N. Das et al.,
“Probing multifractality in tissue refractive index: prospects for precancer detection,”
Opt. Lett., 38 211
–213
(2013). http://dx.doi.org/10.1364/OL.38.000211 OPLEDP 0146-9592 Google Scholar
S. Devi, P. K. Panigrahi and A. Pradhan,
“Detecting cervical cancer progression through extracted intrinsic fluorescence and principal component analysis,”
J. Biomed. Opt., 19 127003
(2014). http://dx.doi.org/10.1117/1.JBO.19.12.127003 JBOPFO 1083-3668 Google Scholar
A. H. Gharekhan et al.,
“Distinguishing cancer and normal breast tissue autofluorescence using continuous wavelet transform,”
IEEE J. Sel. Top. Quantum Electron., 16 893
–899
(2010). http://dx.doi.org/10.1109/JSTQE.2009.2033018 IJSQEN 1077-260X Google Scholar
G. A. Rovithakis et al.,
“Artificial neural networks for discriminating pathologic from normal peripheral vascular tissue,”
IEEE Trans. Biomed. Eng., 48
(10), 1088
–1097
(2001). http://dx.doi.org/10.1109/10.951511 Google Scholar
S. K. Majumder, N. Ghosh and P. K. Gupta,
“Support vector machine for optical diagnosis of cancer,”
J. Biomed. Opt., 10
(2), 024034
(2005). http://dx.doi.org/10.1117/1.1897396 JBOPFO 1083-3668 Google Scholar
Y. Tan et al.,
“Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data,”
Nucleic Acids Res., 33
(1), 56
–65
(2005). http://dx.doi.org/10.1093/nar/gki144 Google Scholar
A. Anand and P. N. Suganthan,
“Multiclass cancer classification by support vector machines with class-wise optimized genes and probability estimates,”
J. Theor. Biol., 259
(3), 533
–540
(2009). http://dx.doi.org/10.1016/j.jtbi.2009.04.013 JTBIAP 0022-5193 Google Scholar
B. Yoon,
“Hidden Markov models and their applications in biological sequence analysis,”
Curr. Genomics, 10 402
–415
(2009). http://dx.doi.org/10.2174/138920209789177575 CGUEA8 1389-2029 Google Scholar
Q. Du and C. I. Chang,
“Hidden Markov model approach to spectral analysis for hyperspectral imagery,”
Opt. Eng., 40
(10), 2277
–2284
(2001). http://dx.doi.org/10.1117/1.1404430 Google Scholar
N. Das et al.,
“Tissue multifractality and Born approximation in analysis of light scattering: a novel approach for precancers detection,”
Sci. Rep., 4 6129
(2014). http://dx.doi.org/10.1038/srep06129 SRCEC3 2045-2322 Google Scholar
J. W. Kantelhardt et al.,
“Multifractal detrended fluctuation analysis of nonstationary time series,”
Phys. A, 316 87
–114
(2002). http://dx.doi.org/10.1016/S0378-4371(02)01383-3 Google Scholar
V. N. Vapnik, Statistical Learning Theory, Wiley, New York
(1998). Google Scholar
C. J. C. Burges,
“A tutorial on support vector machines for pattern recognition,”
Data Min. Knowl. Discovery, 2
(2), 121
–167
(1998). http://dx.doi.org/10.1023/A:1009715923555 Google Scholar
L. E. Baum and T. Perie,
“Statistical inference for probabilistic functions of finite state Markov chains,”
Ann. Math. Stat., 37
(6), 1554
–1563
(1966). http://dx.doi.org/10.1214/aoms/1177699147 AASTAD 0003-4851 Google Scholar
R. Hogg, J. McKean and A. Craig, Introduction to Mathematical Statistics, 7th ed.Pearson Prentice Hall, Boston
(2005). Google Scholar
F. Auger and P. Flandrin,
“Improving the readability of time–frequency and time-scale representations by the reassignment method,”
IEEE Trans. Signal Process., 43
(5), 1068
–1089
(1995). http://dx.doi.org/10.1109/78.382394 ITPRED 1053-587X Google Scholar
Q. Miao, H. Huang and X. Fang,
“A comparison study of support vector machines and hidden Markov models in machinery condition monitoring,”
J. Mech. Sci. Technol., 21 607
–615
(2007). http://dx.doi.org/10.1007/BF03026965 1738-494X Google Scholar
|