Intrinsic dimension identification via graph-theoretic methods
Three graph theoretical statistics are considered for the problem of estimating the intrinsic dimension of a data set. The first is the “reach” statistic, r¯j,k, proposed in Brito et al. (2002) [4] for the problem of identification of Euclidean dimension. The second, Mn, is the sample average of squared degrees in the minimum spanning tree of the data, while the third statistic, Unk, is based on counting the number of common neighbors among the k-nearest, for each pair of sample points {Xi,Xj}, i<j≤n. For the first and third of these statistics, central limit theorems are proved under general assumptions, for data living in an m-dimensional C1 submanifold of Rd, and in this setting, we establish the consistency of intrinsic dimension identification procedures based on r¯j,k and Unk. For Mn, asymptotic results are provided whenever data live in an affine subspace of Euclidean space. The graph theoretical methods proposed are compared, via simulations, with a host of recently proposed nearest neighbor alternatives.
Year of publication: |
2013
|
---|---|
Authors: | Brito, M.R. ; Quiroz, A.J. ; Yukich, J.E. |
Published in: |
Journal of Multivariate Analysis. - Elsevier, ISSN 0047-259X. - Vol. 116.2013, C, p. 263-277
|
Publisher: |
Elsevier |
Subject: | Intrinsic dimension | Graph theoretical methods | Stabilization methods | Dimensionality reduction |
Saved in:
Saved in favorites
Similar items by subject
-
Mao, Luke Lunhua, (2016)
-
The Low Dimensionality of Development
Kraemer, Guido, (2020)
-
Bottai, Carlo, (2022)
- More ...
Similar items by person