We report a multi-class classification model built using random forest (RF) and synthetic minority oversampling technique (SMOTE) applied to extracted intrinsic fluorescence (IF) data to detect normal, pre-cancer, and cancer samples. Important features in the fluorescence signal often get suppressed by the noise which makes denoising an essential pre-processing step. The proposed algorithm implements a wavelet-based denoising technique as a pre-processing step before data analysis which utilizes the “coif3” mother wavelet function to denoise IF data. Synthetic minority oversampling technique (SMOTE) is utilized to generate a balanced dataset. We achieved the best classification for the denoised balanced dataset with accuracy, sensitivity, and specificity above 90% for normal/pre-cancer and precancer/cancer groups. Further, the receiver operating curve (ROC) shows a clear distinction among three grades with the area under curve (AUC) of 0.96 for normal and precancer samples and 1.00 for cancer samples. The python script prepared for this study is available on GitHub and Signal Science Lab.
|