Medical and pharmaceutical applications
Farzaneh Salami; Ali Bozorgi-Amiri; Reza Tavakkoli-Moghaddam
Abstract
Feature selection is the process of picking the most effective feature among a considerable number of features in the dataset. However, choosing the best subset that gives a higher performance in classification is challenging. This study constructed and validated multiple metaheuristic algorithms to ...
Read More
Feature selection is the process of picking the most effective feature among a considerable number of features in the dataset. However, choosing the best subset that gives a higher performance in classification is challenging. This study constructed and validated multiple metaheuristic algorithms to optimize Machine Learning (ML) models in diagnosing Alzheimer’s. This study aims to classify Cognitively Normal (CN), Mild Cognitive Impairment (MCI), and Alzheimer’s by selecting the best features. The features include Freesurfer features extracted from Magnetic Resonance Imaging (MRI) images and clinical data. We have used well-known ML algorithms for classifying, and after that, we used multiple metaheuristic methods for feature selection and optimizing the objective function of the classification. We considered the objective function a macro-average F1 score because of the imbalanced data. Our procedure not only reduces the irreverent features but also increases the classification performance. Results showed that metaheuristic algorithms could improve the performance of ML methods in diagnosing Alzheimer’s by 20%. We found that classification performance can be significantly enhanced by using appropriate metaheuristic algorithms. Metaheuristic algorithms can help find the best features for medical classification problems, especially Alzheimer’s.
Data mining
Sadegh Eskandari
Abstract
Multi-label learning is an emerging research direction that deals with data in which an instance may belong to multiple class labels simultaneously. As many multi-label data contain very large feature space with hundreds of irrelevant andredundant features, multi-label feature selection is a fundamental ...
Read More
Multi-label learning is an emerging research direction that deals with data in which an instance may belong to multiple class labels simultaneously. As many multi-label data contain very large feature space with hundreds of irrelevant andredundant features, multi-label feature selection is a fundamental pre-processing tool for selecting a subset of most representative and discriminative features. This paper introduces a Python-based open-source library that provides the state-ofthe-art information theoretical filter-based multi-label feature selection algorithms. The library, called PyIT-MLFS, is designed to facilitate the development of new algorithms. It is the first comprehensive open-source library for implementing algorithms of multilabel feature selection. Moreover, it provides a high-level interface that enables the end-users to test and compare different already implemented algorithms. PyIT-MLFS is available from https://github.com/Sadegh28/PyIT-MLFS.
Data mining
Shookoofa Mostofi; Sohrab Kordrostami; Amir Hossein Refahi; Marzieh Faridi Masooleh; Soheil Shokri
Abstract
Existing systems for diagnosing heart disease are time consuming, expensive, and prone to error. In this regard, a diagnostic algorithm has been proposed for the causes of heart disease based on a frequent pattern with the B-mine algorithm optimized by association rules. Initially, a data set of disease ...
Read More
Existing systems for diagnosing heart disease are time consuming, expensive, and prone to error. In this regard, a diagnostic algorithm has been proposed for the causes of heart disease based on a frequent pattern with the B-mine algorithm optimized by association rules. Initially, a data set of disease is used to select a feature, so that it deals with a set of training features. Then, association rules are used to classify educational and experimental sets, and then the factors affecting heart disease are analyzed. The numerical results from the experiments of real and standard datasets of cardiac patients show that the average accuracy of the proposed method is approximately 98%, which has been tested on the Cleveland database that includes 76 features in the case of heart disease dataset, 14 features of which are related to heart disease. This paper also uses four common categories such as decision tree to build the model. The data set studied in this article contains 270 records as well as 14 features. The accuracy of predicting the results of the support vector machine classifications, k nearest neighbor, decision tree and simple Bayesian is 81.11%, 66.67%, 59.72% and 19.85%, respectively, which are relatively satisfactory results.
H. Abbasimehr; S. Alizadeh
Volume 2, Issue 4 , December 2013, , Pages 1-14
Abstract
Customer churn has become a critical problem for all companies in particular for those that are operating in service-based industries such as telecommunication industry. Data mining techniques have been used for constructing churn prediction models. Past research in churn prediction context have mainly ...
Read More
Customer churn has become a critical problem for all companies in particular for those that are operating in service-based industries such as telecommunication industry. Data mining techniques have been used for constructing churn prediction models. Past research in churn prediction context have mainly focused on the accuracy aspect of the constructed churn models. However, in addition to the accuracy, comprehensibility aspect should be considered in evaluating a churn prediction model. Being comprehensible, a model can reveal the main reasons for customer churn; thereby mangers can use such information for effective decisions making about marketing actions. In this paper, we demonstrate the application of a genetic-algorithm (GA) method for building accurate and comprehensible churn prediction model. The proposed method, GA-based method uses a wrapper based feature selection approach for choosing the best feature subset. The key advantage of this method, is taking into account the comprehensibility measure (measured as the number of rules extracted from C4.5 decision tree) in evaluating the performance of a candidate model. The GA-based method is compared to the two filter feature selection methods including Chi-squared based and Correlation based feature selection using two telecommunication churn datasets. The results of experiments indicated that the GA-based method performs better than the two filter methods in terms of both accuracy and comprehensibility
A. Mesforoush; M.J. Tarokh
Volume 2, Issue 1 , February 2013, , Pages 30-44
Abstract
It is important to segment the most profitable customers of a company. Many CRM researchers have been performed to calculate customer profitability and develop a comprehensive model of it. This paper with the aid of data mining tools tries to customer segmentation based on kind of RFM. Customers are ...
Read More
It is important to segment the most profitable customers of a company. Many CRM researchers have been performed to calculate customer profitability and develop a comprehensive model of it. This paper with the aid of data mining tools tries to customer segmentation based on kind of RFM. Customers are clustered using K-means and finally calculated CLV. This approach is essential for an SME to be able to provide a personalized service to each customer and to reach customer satisfaction.