A clustering procedure based on the comparison between the k nearest neighbors graph and the minimal spanning tree
We present a procedure for the identification of clusters in multivariate data sets, based on the comparison between the k nearest neighbors graph, Gk, and the minimal spanning tree, MST. Our key statistic is the random quantity the smallest k such that Gk contains the MST. Under regularity assumptions, we show that for i.i.d. data from a density on with compact support having one connected component, , where n denotes sample size, a bound that seems to be sharp, according to simulations. This leads to a consistent test for the identification of crisp clusters. We illustrate the use of our procedure with an example.
Year of publication: |
2003
|
---|---|
Authors: | González-Barrios, José María ; Quiroz, Adolfo J. |
Published in: |
Statistics & Probability Letters. - Elsevier, ISSN 0167-7152. - Vol. 62.2003, 1, p. 23-34
|
Publisher: |
Elsevier |
Keywords: | Nearest neighbors graph Minimal spanning tree Clustering |
Saved in:
Saved in favorites
Similar items by person
-
Graph-Theoretic Procedures for Dimension Identification
Brito, María R., (2002)
-
A Statistic for Testing the Null Hypothesis of Elliptical Symmetry
Manzotti, A., (2002)
-
Quiroz, Adolfo J., (1996)
- More ...