A Three Component Latent Class Model for Robust Semiparametric Gene Discovery
We propose a robust model for discovering differentially expressed genes which directly incorporates biological significance, i.e., effect dimension. Using the so-called c-fold rule, we transform the expressions into a nominal observed random variable with three categories: below a fixed lower threshold, above a fixed upper threshold or within the two thresholds. Gene expression data is then transformed into a nominal variable with three levels possibly originated by three different distributions corresponding to under expressed, not differential, and over expressed genes. This leads to a statistical model for a 3-component mixture of trinomial distributions with suitable constraints on the parameter space. In order to obtain the MLE estimates, we show how to implement a constrained EM algorithm with a latent label for the corresponding component of each gene. Different strategies for a statistically significant gene discovery are discussed and compared. We illustrate the method on a little simulation study and a real dataset on multiple sclerosis.
Year of publication: |
2011
|
---|---|
Authors: | Alfo' Marco ; Alessio, Farcomeni ; Luca, Tardella |
Published in: |
Statistical Applications in Genetics and Molecular Biology. - De Gruyter, ISSN 1544-6115. - Vol. 10.2011, 1, p. 1-19
|
Publisher: |
De Gruyter |
Saved in:
Saved in favorites
Similar items by person