Data mining
Sadegh Eskandari
Abstract
Multi-label learning is an emerging research direction that deals with data in which an instance may belong to multiple class labels simultaneously. As many multi-label data contain very large feature space with hundreds of irrelevant andredundant features, multi-label feature selection is a fundamental ...
Read More
Multi-label learning is an emerging research direction that deals with data in which an instance may belong to multiple class labels simultaneously. As many multi-label data contain very large feature space with hundreds of irrelevant andredundant features, multi-label feature selection is a fundamental pre-processing tool for selecting a subset of most representative and discriminative features. This paper introduces a Python-based open-source library that provides the state-ofthe-art information theoretical filter-based multi-label feature selection algorithms. The library, called PyIT-MLFS, is designed to facilitate the development of new algorithms. It is the first comprehensive open-source library for implementing algorithms of multilabel feature selection. Moreover, it provides a high-level interface that enables the end-users to test and compare different already implemented algorithms. PyIT-MLFS is available from https://github.com/Sadegh28/PyIT-MLFS.
H. Abbasimehr; S. Alizadeh
Volume 2, Issue 4 , December 2013, , Pages 1-14
Abstract
Customer churn has become a critical problem for all companies in particular for those that are operating in service-based industries such as telecommunication industry. Data mining techniques have been used for constructing churn prediction models. Past research in churn prediction context have mainly ...
Read More
Customer churn has become a critical problem for all companies in particular for those that are operating in service-based industries such as telecommunication industry. Data mining techniques have been used for constructing churn prediction models. Past research in churn prediction context have mainly focused on the accuracy aspect of the constructed churn models. However, in addition to the accuracy, comprehensibility aspect should be considered in evaluating a churn prediction model. Being comprehensible, a model can reveal the main reasons for customer churn; thereby mangers can use such information for effective decisions making about marketing actions. In this paper, we demonstrate the application of a genetic-algorithm (GA) method for building accurate and comprehensible churn prediction model. The proposed method, GA-based method uses a wrapper based feature selection approach for choosing the best feature subset. The key advantage of this method, is taking into account the comprehensibility measure (measured as the number of rules extracted from C4.5 decision tree) in evaluating the performance of a candidate model. The GA-based method is compared to the two filter feature selection methods including Chi-squared based and Correlation based feature selection using two telecommunication churn datasets. The results of experiments indicated that the GA-based method performs better than the two filter methods in terms of both accuracy and comprehensibility