Variable selection bias in classification trees based on imprecise probabilities
Classification trees based on imprecise probabilities provide an advancement of classical classification trees. The Gini Index is the default splitting criterion in classical classification trees, while in classification trees based on imprecise probabilities, an extension of the Shannon entropy has been introduced as the splitting criterion. However, the use of these empirical entropy measures as split selection criteria can lead to a bias in variable selection, such that variables are preferred for features other than their information content. This bias is not eliminated by the imprecise probability approach. The source of variable selection bias for the estimated Shannon entropy, as well as possible corrections, are outlined. The variable selection performance of the biased and corrected estimators are evaluated in a simulation study. Additional results from research on variable selection bias in classical classification trees are incorporated, implying further investigation of alternative split selection criteria in classification trees based on imprecise probabilities. Keywords. Classification trees ; credal classification ; variable selection bias ; attribute selection error ; Shannon entropy ; entropy estimation
Year of publication: |
2005
|
---|---|
Authors: | Strobl, Carolin |
Publisher: |
München : Ludwig-Maximilians-Universität München, Sonderforschungsbereich 386 - Statistische Analyse diskreter Strukturen |
Saved in:
freely available
Series: | Discussion Paper ; 419 |
---|---|
Type of publication: | Book / Working Paper |
Type of publication (narrower categories): | Working Paper |
Language: | English |
Other identifiers: | 10.5282/ubm/epub.1788 [DOI] 485089939 [GVK] hdl:10419/31010 [Handle] |
Source: |
Persistent link: https://www.econbiz.de/10010266143
Saved in favorites
Similar items by person
-
Maximally selected chi-square statistics and umbrella orderings
Boulesteix, Anne-Laure, (2006)
-
Strobl, Carolin, (2005)
-
Unbiased split selection for classification trees based on the Gini Index
Strobl, Carolin, (2005)
- More ...