Statistical Entropy Measures in C4.5 Trees
The main goal of this article is to present a statistical study of decision tree learning algorithms based on the measures of different parametric entropies. Partial empirical evidence is presented to support the conjecture that the parameter adjusting of different entropy measures might bias the classification. Here, the receiver operating characteristic (ROC) curve analysis, precisely, the area under the ROC curve (AURC) gives the best criterion to evaluate decision trees based on parametric entropies. The authors emphasize that the improvement of the AURC relies on of the type of each dataset. The results support the hypothesis that parametric algorithms are useful for datasets with numeric and nominal, but not for mixed, attributes; thus, four hybrid approaches are proposed. The hybrid algorithm, which is based on Renyi entropy, is suitable for nominal, numeric, and mixed datasets. Moreover, it requires less time when the number of nodes is reduced, when the AURC is maintaining or increasing, thus it is preferable in large datasets.
Year of publication: |
2018
|
---|---|
Authors: | Arellano, Aldo Ramirez ; Bory-Reyes, Juan ; Hernandez-Simon, Luis Manuel |
Published in: |
International Journal of Data Warehousing and Mining (IJDWM). - IGI Global, ISSN 1548-3932, ZDB-ID 2399996-2. - Vol. 14.2018, 1 (01.01.), p. 1-14
|
Publisher: |
IGI Global |
Subject: | Classification | Data Mining | Decision Trees | Entropy Measures | Information Theory |
Saved in:
Saved in favorites
Similar items by subject
-
Hannah, M. Esther, (2019)
-
A survey on data mining and knowledge discovery techniques for spatial data
Shishehgar, Majid, (2015)
-
Erdamar, Cengiz, (2013)
- More ...