Presentation
5 October 2023 Batch effect detection and removal for human liver RNA-Seq with an unsupervised learning approach
Author Affiliations +
Abstract
In Bioinformatics, batch effect detection is a challenging task where the clustering approaches have been explored most of the time. In this study, we proposed a novel approach to identify batch effects and visualization with unsupervised analysis methods. We used the most significant gene sets 500,1500, and 2500 genes out of 35238 genes for the human-liver RNA seq dataset by applying standard deviation (SD). The skmeans and kmeans methods were explored on the selected gene subsets. Then, principal component analysis (PCA) was used for embedding to the 10-dimensional subspace. Finally, the Uniform Manifold Approximation and Project (UMAP) was applied to cluster and visualize the outputs. The experimental results demonstrate the robust representation and achieve the best clustering and visualization for features extracted from 1500 genes. These findings are not only useful for batch effect detection and removal tasks but also can be used to label new samples to train the supervised machine learning methods.
Conference Presentation
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Shamima Nasrin, Md. Zahangir Alom, and Tarek M. Taha "Batch effect detection and removal for human liver RNA-Seq with an unsupervised learning approach", Proc. SPIE 12675, Applications of Machine Learning 2023, 126750N (5 October 2023); https://doi.org/10.1117/12.2677962
Advertisement
Advertisement
KEYWORDS
Machine learning

Biological samples

Databases

Liver

Principal component analysis

Visualization

Bioinformatics

Back to Top