Imputing missing genotypes with weighted k nearest neighbors
Motivation: Missing values are a common problem in genetic association studies concerned with single nucleotide polymorphisms (SNPs). Since most statistical methods cannot handle missing values, they have to be removed prior to the actual analysis. Considering only complete observations, however, often leads to an immense loss of information. Therefore, procedures are needed that can be used to replace such missing values. In this article, we propose a method based on weighted k nearest neighbors that can be employed for imputing such missing genotypes. Results: In a comparison to other imputation approaches, our procedure called KNNcatImpute shows the lowest rates of falsely imputed genotypes when applied to the SNP data from the GENICA study, a study dedicated to the identification of genetic and gene-environment interactions associated with sporadic breast cancer. Moreover, in contrast to other imputation methods that take all variables into account when replacing missing values of a particular variable, KNNcatImpute is not restricted to association studies comprising several ten to a few hundred SNPs, but can also be applied to data from whole-genome studies, as an application to a subset of the HapMap data shows.
Year of publication: |
2008
|
---|---|
Authors: | Schwender, Holger ; Ickstadt, Katja |
Publisher: |
Dortmund : Technische Universität Dortmund, Sonderforschungsbereich 475 - Komplexitätsreduktion in Multivariaten Datenstrukturen |
Saved in:
freely available
Series: | Technical Report ; 2008,03 |
---|---|
Type of publication: | Book / Working Paper |
Type of publication (narrower categories): | Working Paper |
Language: | English |
Other identifiers: | 600052389 [GVK] hdl:10419/36594 [Handle] RePEc:zbw:sfb475:200803 [RePEc] |
Source: |
Persistent link: https://www.econbiz.de/10010300674
Saved in favorites
Similar items by person
-
Identification of SNP interactions using logic regression
Schwender, Holger, (2006)
-
Detecting high-order interactions of single nucleotide polymorphisms using genetic programming
Nunkesser, Robin, (2007)
-
Comparison of the empirical bayes and the significance analysis of microarrays
Schwender, Holger, (2003)
- More ...