Benchmarking Discretisation Level of Continuous Attributes: Theoretical and Experimental Approaches
| dc.contributor.author | Chen, Wanghu | |
| dc.contributor.author | Wang, Chao | |
| dc.contributor.author | Li, Jing | |
| dc.contributor.author | Yang, Bo | |
| dc.contributor.author | Liu, Yang | |
| dc.contributor.author | Wang, Jianwu | |
| dc.date.accessioned | 2024-02-13T17:41:59Z | |
| dc.date.available | 2024-02-13T17:41:59Z | |
| dc.date.issued | 2020-02-24 | |
| dc.description | 2019 IEEE International Conference on Big Data 9-12 Dec. 2019 | |
| dc.description.abstract | The discretisation of an attribute refers to partitioning its continuous numerical values into intervals, each of which is associated a categorical label. The amount of such different categorical labels is called as target discretisation level of the continuous attribute. For data mining algorithms that can only work on discrete data, the discretisation will be necessary. At the same time, the discretisation can also make the original data more concise and interpretable. However, it is challenging to balance the target discretisation level and the information loss during the discretisation process. In this paper, we propose to use entropy of a continuous attribute as a benchmark to determine its target discretisation level for the first time. An entropy based naive unsupervised discretisation approach is also proposed and shows big advantages in terms of both data reduction and accuracy, which is evaluated by performing classifiers on the dataset whose continuous attributes are discretised based on the proposed approach. Our experiments on 28 datasets and 9 popular classifiers show that the accuracy of a discretisation approach will be largely affected when the target discretisation level of each continuous attribute is lower than the entropy benchmark. Meanwhile increasing the target discretisation level from the benchmark does not always improve the accuracy of the discretizer. These discoveries can provide valuable guidance to explore or optimise the approaches to the discretisation of continuous attributes. | |
| dc.description.sponsorship | This work is partially supported by grants OAC–1942714 and OAC–2118285 from National Science Foundation. | |
| dc.description.uri | https://ieeexplore.ieee.org/document/9006513 | |
| dc.format.extent | 4 pages | |
| dc.genre | conference papers and proceedings | |
| dc.genre | preprints | |
| dc.identifier | doi:10.13016/m283xw-am0v | |
| dc.identifier.citation | W. Chen, C. Wang, J. Li, B. Yang, Y. Liu and J. Wang, "Benchmarking Discretisation Level of Continuous Attributes: Theoretical and Experimental Approaches," 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 2019, pp. 3623-3631, doi: 10.1109/BigData47090.2019.9006513. | |
| dc.identifier.uri | https://doi.org/10.1109/BigData47090.2019.9006513 | |
| dc.identifier.uri | http://hdl.handle.net/11603/31609 | |
| dc.language.iso | en | |
| dc.publisher | IEEE | |
| dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
| dc.relation.ispartof | UMBC Faculty Collection | |
| dc.relation.ispartof | UMBC Center for Accelerated Real Time Analysis | |
| dc.relation.ispartof | UMBC Computer Science and Electrical Engineering Department | |
| dc.relation.ispartof | UMBC Data Science | |
| dc.relation.ispartof | UMBC Joint Center for Earth Systems Technology (JCET) | |
| dc.relation.ispartof | UMBC Center for Real-time Distributed Sensing and Autonomy | |
| dc.relation.ispartof | UMBC Information Systems Department Collection | |
| dc.rights | © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. | |
| dc.subject | UMBC Big Data Analytics Lab | |
| dc.title | Benchmarking Discretisation Level of Continuous Attributes: Theoretical and Experimental Approaches | |
| dc.type | Text | |
| dcterms.creator | https://orcid.org/0000-0002-9933-1170 |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- 2022 Benchmarking Probabilistic Machine Learning Models for Arctic Sea Ice.pdf
- Size:
- 479.81 KB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 2.56 KB
- Format:
- Item-specific license agreed upon to submission
- Description:
