Poster + Paper
13 March 2024 Predicting gene families from human DNA sequences using machine learning: a logistic regression approach
N. T. Tsebesebe, K. Mpofu, S. Ndlovu, S. Sivarasu, P. Mthunzi-Kufa
Author Affiliations +
Conference Poster
Abstract
Machine learning is a powerful technique for analysing large-scale data and learning patterns, which provides high accuracy and shorter processing times. In this work, a machine learning algorithm (multinomial logistic regression) is used to predict the gene families from a human DNA sequence. 4380 sequences were converted into overlapping k-mers of length 6 to produce 232 414 k-mers. The data set was split into 80/20 train and test datasets, and the multinomial logistic regression model achieved a 93.9% accuracy in predicting 6 gene families within 0.24 seconds. The model was 94.8% precise, 93.9% sensitive, and had an f1-score of 94%. The developed model in this study offers an alternative approach for medical professionals to gain insights into genetic information carried within DNA segments. By leveraging machine learning techniques, accurate and efficient predictions of gene families can aid in understanding genetic characteristics and contribute to advancements in personalised medicine, diagnostics and genetic research.
© (2024) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
N. T. Tsebesebe, K. Mpofu, S. Ndlovu, S. Sivarasu, and P. Mthunzi-Kufa "Predicting gene families from human DNA sequences using machine learning: a logistic regression approach", Proc. SPIE 12857, Computational Optical Imaging and Artificial Intelligence in Biomedical Sciences, 128570J (13 March 2024); https://doi.org/10.1117/12.3002539
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Machine learning

Proteins

Data modeling

Deep learning

Genetics

Ion channels

Performance modeling

Back to Top