Gini-based Classification and its Applications in Data Mining
Binshan Lin
Louisiana State University Shreveport
Dr. Binshan Lin is the BellSouth Professor at Louisiana State University in Shreveport. He received his Ph.D. from the Louisiana State University in 1988. He is a nine-time recipient of the Outstanding Faculty Award at LSUS. Professor Lin received the Computer Educator of the Year by the International Association for Computer Information Systems (IACIS) in 2005, Ben Bauman Award for Excellence in IACIS 2003, IACIS Directors’ Award in 2012, Distinguished Service Award at the Southwest Decision Sciences Institute (SWDSI) in 2007, Outstanding Educator Award at the SWDSI in 2004, and Emerald Literati Club Awards for Excellence in 2003. Dr. Lin has published over 270 articles in refereed journals. Currently he serves as Editor-in-Chief of Expert Systems with Applications. Professor Lin served as President of SWDSI (2004-2005) and Program Chair of IACIS Pacific 2005 Conference. He also served as a vice president (2007-2009; 2010-2012) of Decision Sciences Institute (DSI).
Abstract
Gini-based classification is a rank-based classification, which takes into account both the variate values and the ranks. The methodology relies only on first order moment assumptions hence it is valid for a wider range of... [ view full abstract ]
Gini-based classification is a rank-based classification, which takes into account both the variate values and the ranks. The methodology relies only on first order moment assumptions hence it is valid for a wider range of statistical distributions. This paper begins with reviewing formulation types of Gini-based classification and surveys the main properties, efficiency, and selection biases. Gini-based classification under independent censoring and co-variate-dependent censoring are reviewed as well. Several empirical evidences for variable selection bias with the Gini-based classification from the literature is presented. We then discuss statistical explanations for variable selection bias in different settings by identifying several main sources of variable selection bias, such as estimation bias, variance effects and multiple comparisons effect. Gini-based classification method can be modified to overcome the bias problems, by normalizing the Gini indexes with information about the splitting status of all attributes.
Related literature reviews on Gini-based classification and detection are compared. Practical applications of the Gin-based classification are discussed in terms of big data analytics and artificial intelligence for medical diagnoses. Gini-based classification can be extendible to categorical and ordinal predictor variables and to other split selection criteria in data mining.
The paper outlines several future research opportunities for Gini-based classification. Several challenges remain in the area of modeling of classification, clustering and detection using Gini-based classification, many of which require efforts from various discipline groups. Our paper is interdisciplinary and makes contributions to both the Gini literature and the literature of statistical inference of performance measures in data mining.
Authors
-
Binshan Lin
(Louisiana State University Shreveport)
Topic Area
Topics: Analytics, Business Intelligence, Data Mining, & Statistics
Session
AS2 » Multi-Server Queueing Systems/Data Mining/Graph Presentation (16:30 - Thursday, 23rd February, Wraggborough)
Paper
SEDSI_2017_Gini.pdf
Presentation Files
The presenter has not uploaded any presentation files.