Multiclass Imbalanced Learning in Ensembles through Selective Sampling

Author/Creator

Author/Creator ORCID

Date

2015-01-01

Department

Information Systems

Program

Information Systems

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.

Abstract

Imbalanced learning is the problem of learning from datasets when the class proportions are highly imbalanced. Imbalanced datasets are increasingly seen in many domains and pose a challenge to traditional classification techniques. Learning from imbalanced multiclass data (three or more classes) creates additional complexities. Studies suggest that ensemble learners can be trained to emphasize different segments of data pertaining to different classes and thereby produce more accurate results than regular imbalance learning techniques. Thus, we propose a new approach to building ensembles of classifiers for multiclass imbalanced datasets, called Multiclass Imbalance Learning in Ensembles through Selective Sampling (MILES). Each member of MILES is trained with the data selectively sampled from the bands around cluster centroids in a way that diversity is aggressively encouraged within the ensemble. Resampling techniques are utilized to balance the distribution of the data that comes from each cluster. We performed several experiments applying our approach to different real-word datasets demonstrating improved performance for recognizing minority class examples and balancing the G-mean and Mean Area Under the Curve (MAUC) across classes. We further applied MILES to classify prolonged emergency department (ED) stays with consistently higher performance as compared to existing ensemble methods.