A Geometrical Framework for Covariance Matrices of Continuous and Categorical Variables
It is well known that a categorical random variable can be represented geometrically by a simplex. Accordingly, several measures of association between categorical variables have been proposed and discussed in the literature. Moreover, the standard definitions of covariance and correlation coefficient for continuous random variables have been extended to categorical variables. In this article, we present a geometrical framework where both continuous and categorical data are represented by simplices and lines in a high-dimensional space, respectively. We introduce a function whose direct minimization leads to a single definition of covariance between categorical–categorical, categorical–continuous, and continuous–continuous data. The novelty of this general approach is that a single space and a single distance function can be used for describing both continuous and categorical data. It thus provides a unified geometrical description of the measure of association, in particular between categorical and continuous data. We discuss virtues and limitations of such a geometrical framework and provide examples with possible applications to sociological surveys.
Year of publication: |
2015
|
---|---|
Authors: | Vernizzi, Graziano ; Nakai, Miki |
Published in: |
Sociological Methods & Research. - Vol. 44.2015, 1, p. 48-79
|
Subject: | categorical data | covariance matrix | social survey data | Gini index |
Saved in:
Saved in favorites
Similar items by subject
-
Inequality, Poverty, and Growth; Cross-Country Evidence
Iradian, Garbis, (2005)
-
A note on the Gini measure for discrete distributions
Basmann, R. L., (1999)
-
Introduction to the management of social survey data
Krejčí, Jindřich, (2014)
- More ...
Similar items by person