Medical and pharmaceutical applications
Farzaneh Salami; Ali Bozorgi-Amiri; Reza Tavakkoli-Moghaddam
Abstract
Feature selection is the process of picking the most effective feature among a considerable number of features in the dataset. However, choosing the best subset that gives a higher performance in classification is challenging. This study constructed and validated multiple metaheuristic algorithms to ...
Read More
Feature selection is the process of picking the most effective feature among a considerable number of features in the dataset. However, choosing the best subset that gives a higher performance in classification is challenging. This study constructed and validated multiple metaheuristic algorithms to optimize Machine Learning (ML) models in diagnosing Alzheimer’s. This study aims to classify Cognitively Normal (CN), Mild Cognitive Impairment (MCI), and Alzheimer’s by selecting the best features. The features include Freesurfer features extracted from Magnetic Resonance Imaging (MRI) images and clinical data. We have used well-known ML algorithms for classifying, and after that, we used multiple metaheuristic methods for feature selection and optimizing the objective function of the classification. We considered the objective function a macro-average F1 score because of the imbalanced data. Our procedure not only reduces the irreverent features but also increases the classification performance. Results showed that metaheuristic algorithms could improve the performance of ML methods in diagnosing Alzheimer’s by 20%. We found that classification performance can be significantly enhanced by using appropriate metaheuristic algorithms. Metaheuristic algorithms can help find the best features for medical classification problems, especially Alzheimer’s.
Data mining
Sadegh Eskandari
Abstract
Multi-label learning is an emerging research direction that deals with data in which an instance may belong to multiple class labels simultaneously. As many multi-label data contain very large feature space with hundreds of irrelevant andredundant features, multi-label feature selection is a fundamental ...
Read More
Multi-label learning is an emerging research direction that deals with data in which an instance may belong to multiple class labels simultaneously. As many multi-label data contain very large feature space with hundreds of irrelevant andredundant features, multi-label feature selection is a fundamental pre-processing tool for selecting a subset of most representative and discriminative features. This paper introduces a Python-based open-source library that provides the state-ofthe-art information theoretical filter-based multi-label feature selection algorithms. The library, called PyIT-MLFS, is designed to facilitate the development of new algorithms. It is the first comprehensive open-source library for implementing algorithms of multilabel feature selection. Moreover, it provides a high-level interface that enables the end-users to test and compare different already implemented algorithms. PyIT-MLFS is available from https://github.com/Sadegh28/PyIT-MLFS.
H. Abbasimehr; S. Alizadeh
Volume 2, Issue 4 , December 2013, , Pages 1-14
Abstract
Customer churn has become a critical problem for all companies in particular for those that are operating in service-based industries such as telecommunication industry. Data mining techniques have been used for constructing churn prediction models. Past research in churn prediction context have mainly ...
Read More
Customer churn has become a critical problem for all companies in particular for those that are operating in service-based industries such as telecommunication industry. Data mining techniques have been used for constructing churn prediction models. Past research in churn prediction context have mainly focused on the accuracy aspect of the constructed churn models. However, in addition to the accuracy, comprehensibility aspect should be considered in evaluating a churn prediction model. Being comprehensible, a model can reveal the main reasons for customer churn; thereby mangers can use such information for effective decisions making about marketing actions. In this paper, we demonstrate the application of a genetic-algorithm (GA) method for building accurate and comprehensible churn prediction model. The proposed method, GA-based method uses a wrapper based feature selection approach for choosing the best feature subset. The key advantage of this method, is taking into account the comprehensibility measure (measured as the number of rules extracted from C4.5 decision tree) in evaluating the performance of a candidate model. The GA-based method is compared to the two filter feature selection methods including Chi-squared based and Correlation based feature selection using two telecommunication churn datasets. The results of experiments indicated that the GA-based method performs better than the two filter methods in terms of both accuracy and comprehensibility