RFCC : Random Forest Consensus Clustering for Regression and Classification
Random forests are invariant and robust estimators that can fit complex interactions between input data of different types and binary, categorical, or continuous outcome variables, including those with multiple dimensions. In addition to these desirable properties, random forests impose a structure on the observations from which researchers and data analysts can infer clusters or groups of interest. These clusters not only provide a structure to the data at hand, they also can be used to elucidate new patterns, define subgroups for further analysis, derive prototypical observations, identify outlier observations, catch mislabeled data, and evaluate the performance of the estimation model in more detail.We present a novel clustering algorithm called Random Forest Consensus Clustering and implement it in the Scikit-Learn / SciPy data science ecosystem. This algorithm differs from prior approaches by making use of the entire tree structure. Observations become proximate if they follow similar decision paths across trees of a random forest. We illustrate why this approach improves the resolution and robustness of clustering and that is especially suited to hierarchical approaches
Year of publication: |
[2021]
|
---|---|
Authors: | Marquart, Ingo ; Koca Marquart, Ebru |
Publisher: |
[S.l.] : SSRN |
Subject: | Regressionsanalyse | Regression analysis | Clusteranalyse | Cluster analysis | Klassifikation | Classification | Forstwirtschaft | Forestry | Regionales Cluster | Regional cluster |
Saved in:
freely available
Saved in favorites
Similar items by subject
-
A real data-driven clustering approach for countries based on happiness score
Chakraborty, Aditya, (2021)
-
Clustered Covariate Regression
Soale, Abdul-Nasah, (2019)
-
Presentation Slides : Nonparametric Regression Using Clusters
Viole, Fred, (2019)
- More ...