An Efficient Method for Discretizing Continuous Attributes

Author/Creator ORCID

Date

2010-04

Department

Program

Citation of Original Publication

Engle, Kelley M.; Gangopadhyay, Aryya; An Efficient Method for Discretizing Continuous Attributes; International Journal of Data Warehousing and Mining (IJDWM) 6(2), 1-21, April 2010; https://doi.org/10.4018/jdwm.2010040101

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Subjects

Abstract

In this paper the authors present a novel method for finding optimal split points for discretization of continuous attributes. Such a method can be used in many data mining techniques for large databases. The method consists of two major steps. In the first step search space is pruned using a bisecting region method that partitions the search space and returns the point with the highest information gain based on its search. The second step consists of a hill climbing algorithm that starts with the point returned by the first step and greedily searches for an optimal point. The methods were tested using fifteen attributes from two data sets. The results show that the method reduces the number of searches drastically while identifying the optimal or near-optimal split points. On average, there was a 98% reduction in the number of information gain calculations with only 4% reduction in information gain.