Document Type : Research Paper

Authors

1 Department of Industrial Engineering, Alborz Campus, University of Tehran, Tehran, Iran.

2 Department of Industrial Engineering, College of Engineering, University of Tehran, Tehran, Iran.

Abstract

Feature selection is the process of picking the most effective feature among a considerable number of features in the dataset. However, choosing the best subset that gives a higher performance in classification is challenging. This study constructed and validated multiple metaheuristic algorithms to optimize Machine Learning (ML) models in diagnosing Alzheimer’s. This study aims to classify Cognitively Normal (CN), Mild Cognitive Impairment (MCI), and Alzheimer’s by selecting the best features. The features include Freesurfer features extracted from Magnetic Resonance Imaging (MRI) images and clinical data. We have used well-known ML algorithms for classifying, and after that, we used multiple metaheuristic methods for feature selection and optimizing the objective function of the classification. We considered the objective function a macro-average F1 score because of the imbalanced data. Our procedure not only reduces the irreverent features but also increases the classification performance. Results showed that metaheuristic algorithms could improve the performance of ML methods in diagnosing Alzheimer’s by 20%. We found that classification performance can be significantly enhanced by using appropriate metaheuristic algorithms. Metaheuristic algorithms can help find the best features for medical classification problems, especially Alzheimer’s.

Keywords

Main Subjects

[1]     Mohammadi, S., Hemati, N., & Mohammadi, N. (2022). Speech recognition system based on machine learning in persian language. Computational algorithms and numerical dimensions, 1(2), 72–83.
[2]     Zimmermann, J., Perry, A., Breakspear, M., Schirner, M., Sachdev, P., Wen, W., … & Solodkin, A. (2018). Differentiation of Alzheimer’s disease based on local and global parameters in personalized virtual brain models. NeuroImage: clinical, 19, 240–251.
[3]     Salmon, D. P., & Bondi, M. W. (2009). Neuropsychological assessment of dementia. Annual review of psychology, 60, 257–282. https://doi.org/10.1146/annurev.psych.57.102904.190024
[4]     Zhang, D., & Shen, D. (2012). Alzheimer’s disease neuroimaging initiative. Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer’s disease. NeuroImage, 59(2), 895–907. https://doi.org/10.1016/j.neuroimage.2011.09.069
[5]     Zhang, D., Wang, Y., Zhou, L., Yuan, H., & Shen, D. (2011). Multimodal classification of Alzheimer’s disease and mild cognitive impairment. NeuroImage, 55(3), 856–867.
[6]     Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157–1182. https://www.jmlr.org/papers/volume3/guyon03a/guyon03a.pdf?ref=driverlayer.com/web
[7]     Faris, H., Al-Zoubi, A. M., Heidari, A. A., Aljarah, I., Mafarja, M., Hassonah, M. A., & Fujita, H. (2019). An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks. Information fusion, 48, 67–83. DOI:https://doi.org/10.1016/j.inffus.2018.08.002
[8]     Zelinka, I. (2015). A survey on evolutionary algorithms dynamics and its complexity-mutual relations, past, present and future. Swarm and evolutionary computation, 25, 2–14. https://www.sciencedirect.com/science/article/pii/S2210650215000437
[9]     Ala, A., & Chen, F. (2020). Alternative mathematical formulation and hybrid meta-heuristics for patient scheduling problem in health care clinics. Neural computing and applications, 32, 8993–9008.
[10]   Aminoshariae, A., Kulild, J., & Nagendrababu, V. (2021). Artificial intelligence in endodontics: current applications and future directions. Journal of endodontics, 47(9), 1352–1357.
[11]   Hariri, A. H., Bagheri, E., & Davoodi, S. M. R. (2022). Presenting a model for the diagnosis of heart failure using cumulative and deep learning algorithms: a case study of tehran heart center. Big data and computing visions, 2(1), 18–30. https://www.bidacv.com/article_145116.html
[12]   Hassan, S. A., & Khan, T. (2017). A machine learning model to predict the onset of alzheimer disease using potential cerebrospinal fluid (CSF) biomarkers. International journal of advanced computer science and applications, 8(12). https://pdfs.semanticscholar.org/f8f0/87dde0db02dd5bdf1ddfef61ee39b793d8ce.pdf
[13]   Stamate, D., Kim, M., Proitsi, P., Westwood, S., Baird, A., Nevado-Holgado, A., … & Legido-Quigley, C. (2019). A metabolite-based machine learning approach to diagnose Alzheimer-type dementia in blood: results from the european medical information framework for alzheimer disease biomarker discovery cohort. Alzheimer’s & dementia: translational research & clinical interventions, 5(1), 933–938.
[14]   Gu, F., Ma, S., Wang, X., Zhao, J., Yu, Y., & Song, X. (2022). Evaluation of feature selection for Alzheimer’s disease diagnosis. Frontiers in aging neuroscience, 14, 924113. https://pubmed.ncbi.nlm.nih.gov/35813964/
[15]   Bahmani, V., Adibi, M. A., & Mehdizadeh, E. (2023). Integration of two-stage assembly flow shop scheduling and vehicle routing using improved whale optimization algorithm. Journal of applied research on industrial engineering, 10(1). https://doi.org/10.22105/jarie.2022.329251.1450
[16]   Dirik, M. (2022). Detection of counterfeit banknotes using genetic fuzzy system. Journal of fuzzy extension and applications, 3(4), 302–312. https://www.journal-fea.com/article_154622.html
[17]   Zhang, J., Chung, H. S. H., & Lo, W. L. (2007). Clustering-based adaptive crossover and mutation probabilities for genetic algorithms. IEEE transactions on evolutionary computation, 11(3), 326–335. DOI:10.1109/TEVC.2006.880727
[18]   Negahbani, M., Joulazadeh, S., Marateb, H. R., & Mansourian, M. (2015). Coronary artery disease diagnosis using supervised fuzzy c-means with differential search algorithm-based generalized Minkowski metrics. Archive of biomedical science and engineering, 1(1), 006-0014. https://www.peertechzpublications.org/articles/ABSE-1-102.php  
[19]   Al-Tashi, Q., Rais, H., & Jadid, S. (2019). Feature selection method based on grey wolf optimization for coronary artery disease classification. Recent trends in data science and soft computing: proceedings of the 3rd international conference of reliable information and communication technology (IRICT 2018) (pp. 257-266). Springer International Publishing. https://doi.org/10.1007/978-3-319-99007-1_25
[20]   Mafarja, M. M., & Mirjalili, S. (2017). Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing, 260, 302–312.
[21]   UCI. (2015). UCI repository of machine learning databases. https://archive.ics.uci.edu/
[22]   Sperling, R. A., Aisen, P. S., Beckett, L. A., Bennett, D. A., Craft, S., Fagan, A. M., … & Phelps, C. H. (2011). Toward defining the preclinical stages of Alzheimer’s disease: recommendations from the national institute on aging-Alzheimer’s association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s & dementia, 7(3), 280–292.
[23]   Crous-Bou, M., Minguillón, C., Gramunt, N., & Molinuevo, J. L. (2017). Alzheimer’s disease prevention: from risk factors to early intervention. Alzheimer’s research & therapy, 9(1), 71. https://doi.org/10.1186/s13195-017-0297-z
[24]   Van Cauwenberghe, C., Van Broeckhoven, C., & Sleegers, K. (2016). The genetic landscape of Alzheimer disease: clinical implications and perspectives. Genetics in medicine, 18(5), 421–430. https://doi.org/10.1038/gim.2015.117
[25]   Chawla, N. V. (2010). Data mining for imbalanced datasets: an overview. In Data mining and knowledge discovery handbook (pp. 875-886). Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_45
[26]   Hossin, M., & Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International journal of data mining & knowledge management process, 5(2), 1-11. https://www.academia.edu/download/37219940/5215ijdkp01.pdf
[27]   Choeikiwong, T., & Vateekul, P. (2016). Improve accuracy of defect severity categorization using semi-supervised approach on imbalanced data sets [presentation]. Proceedings of the international multi conference of engineers and computer scientists (Vol. 1). https://www.iaeng.org/publication/IMECS2016/IMECS2016_pp434-438.pdf
[28]   He, H., & Ma, Y. (2013). Imbalanced learning: foundations, algorithms, and applications. Wiley-IEEE Press. http://dx.doi.org/10.1002/9781118646106
[29]   Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm [presentation]. Proceedings of the thirteenth international conference on international conference on machine learning (Vol. 96, pp. 148–156). https://dl.acm.org/doi/10.5555/3091696.3091715
[30]   Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. The annals of statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
[31]   Chen, T., & Guestrin, C. (2016). Xgboost: a scalable tree boosting system [presentation]. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794). https://dl.acm.org/doi/abs/10.1145/2939672.2939785
[32]   Rish, I. (2001). An empirical study of the naive bayes classifier [presentation]. IJCAI 2001 workshop on empirical methods in artificial intelligence (Vol. 3, No. 22, pp. 41-46). https://www.researchgate.net/publication/228845263_An_Empirical_Study_of_the_N
aive_Bayes_Classifier
[33]   Pal, S. K., & Skowron, A. (1999). Rough-fuzzy hybridization: a new trend in decision making. Springer-Verlag. https://dl.acm.org/doi/abs/10.5555/553340