Data mining
Sadegh Eskandari
Abstract
Multi-label learning is an emerging research direction that deals with data in which an instance may belong to multiple class labels simultaneously. As many multi-label data contain very large feature space with hundreds of irrelevant andredundant features, multi-label feature selection is a fundamental ...
Read More
Multi-label learning is an emerging research direction that deals with data in which an instance may belong to multiple class labels simultaneously. As many multi-label data contain very large feature space with hundreds of irrelevant andredundant features, multi-label feature selection is a fundamental pre-processing tool for selecting a subset of most representative and discriminative features. This paper introduces a Python-based open-source library that provides the state-ofthe-art information theoretical filter-based multi-label feature selection algorithms. The library, called PyIT-MLFS, is designed to facilitate the development of new algorithms. It is the first comprehensive open-source library for implementing algorithms of multilabel feature selection. Moreover, it provides a high-level interface that enables the end-users to test and compare different already implemented algorithms. PyIT-MLFS is available from https://github.com/Sadegh28/PyIT-MLFS.
Data mining
Abdolmajid Imani; Meysam Abbasi; Farahnaz Ahang; Hassan Ghaffari; Mohamad Mehdi
Abstract
Accurate identification, attracting, and keeping the customers particularly loyal Customer Relationship Management (CRM) with the goal of optimum allotment of resources and achievement to higher profit is not a competitive profit, but it is a life persistence necessity of companies in virtual space. ...
Read More
Accurate identification, attracting, and keeping the customers particularly loyal Customer Relationship Management (CRM) with the goal of optimum allotment of resources and achievement to higher profit is not a competitive profit, but it is a life persistence necessity of companies in virtual space. One of the challenges of companies in this part is how to identify the customer’s traits and the separation of different segments of them. Now Customer Lifetime Value (CLV) is the comparison priority in the segmentation of customers to congruous segments. The main goal of this research is to identify key or strategic customers using the RFM model. In this part after determining the amount of Recency, Frequency and Monetary (RFM) in, registered transactions of one store in Iran (Refah Chain Store) at a time about seven months from 23 September 2017 to 20 April 2018 (71161 transactions as final inputs were used), the weight of each variable according to the fuzzy Analytic Hierarchy Process (AHP) was determined. At the next stage customers using the K-means and Two-step’s algorithms were clustered and K-means the method according to the Silhouette index was the better algorithm of this letter. According to the results, customers were segmented into three parts and CLV was calculated and for identifying key or strategic customer segmentation, the clustering process was repeated and priorities of all clusters were indicated. Results of data analysis are below: Segment 3: customers of this segment were 3425 members and 11.5% of all company customers were the most loyal customers those are identified as golden customer's segment and all of the variables were higher than average of all data. This research identified the valuable customers for the shop, and it gives them a chance to choose goal customers and invest in them.
Data mining
Shookoofa Mostofi; Sohrab Kordrostami; Amir Hossein Refahi; Marzieh Faridi Masooleh; Soheil Shokri
Abstract
Existing systems for diagnosing heart disease are time consuming, expensive, and prone to error. In this regard, a diagnostic algorithm has been proposed for the causes of heart disease based on a frequent pattern with the B-mine algorithm optimized by association rules. Initially, a data set of disease ...
Read More
Existing systems for diagnosing heart disease are time consuming, expensive, and prone to error. In this regard, a diagnostic algorithm has been proposed for the causes of heart disease based on a frequent pattern with the B-mine algorithm optimized by association rules. Initially, a data set of disease is used to select a feature, so that it deals with a set of training features. Then, association rules are used to classify educational and experimental sets, and then the factors affecting heart disease are analyzed. The numerical results from the experiments of real and standard datasets of cardiac patients show that the average accuracy of the proposed method is approximately 98%, which has been tested on the Cleveland database that includes 76 features in the case of heart disease dataset, 14 features of which are related to heart disease. This paper also uses four common categories such as decision tree to build the model. The data set studied in this article contains 270 records as well as 14 features. The accuracy of predicting the results of the support vector machine classifications, k nearest neighbor, decision tree and simple Bayesian is 81.11%, 66.67%, 59.72% and 19.85%, respectively, which are relatively satisfactory results.