Some Statistical Strategies for DAE-seq Data Analysis: Variable Selection and Modeling Dependencies Among Observations
In DAE (DNA after enrichment)-seq experiments, genomic regions related with certain biological processes are enriched/isolated by an assay and are then sequenced on a high-throughput sequencing platform to determine their genomic positions. Statistical analysis of DAE-seq data aims to detect genomic regions with significant aggregations of isolated DNA fragments ("enriched regions") versus all the other regions ("background"). However, many confounding factors may influence DAE-seq signals. In addition, the signals in adjacent genomic regions may exhibit strong correlations, which invalidate the independence assumption employed by many existing methods. To mitigate these issues, we develop a novel autoregressive Hidden Markov model (AR-HMM) to account for covariates effects and violations of the independence assumption. We demonstrate that our AR-HMM leads to improved performance in identifying enriched regions in both simulated and real datasets, especially in those in epigenetic datasets with broader regions of DAE-seq signal enrichment. We also introduce a variable selection procedure in the context of the HMM/AR-HMM where the observations are not independent and the mean value of each state-specific emission distribution is modeled by some covariates. We study the theoretical properties of this variable selection procedure and demonstrate its efficacy in simulated and real DAE-seq data. In summary, we develop several practical approaches for DAE-seq data analysis that are also applicable to more general problems in statistics. Supplementary materials for this article are available online.
Year of publication: |
2014
|
---|---|
Authors: | Rashid, Naim ; Sun, Wei ; Ibrahim, Joseph G. |
Published in: |
Journal of the American Statistical Association. - Taylor & Francis Journals, ISSN 0162-1459. - Vol. 109.2014, 505, p. 78-94
|
Publisher: |
Taylor & Francis Journals |
Saved in:
Online Resource
Saved in favorites
Similar items by person
-
Co-digestion, pretreatment and digester design for enhanced methanogenesis
Shah, Fayyaz Ali, (2015)
-
Current status, issues and developments in microalgae derived biodiesel production
Rashid, Naim, (2014)
-
Current status, barriers and developments in biohydrogen production by microalgae
Rashid, Naim, (2013)
- More ...