A statistical approach to high-throughput screening of predicted orthologs
Orthologs are genes in different species that have diverged from a common ancestral gene after speciation. In contrast, paralogs are genes that have diverged after a gene duplication event. For many comparative analyses, it is of interest to identify orthologs with similar functions. Such orthologs tend to support species divergence (ssd-orthologs) in the sense that they have diverged only due to speciation, to the same relative degree as their species. However, due to incomplete sequencing or gene loss in a species, predicted orthologs can sometimes be paralogs or other non-ssd-orthologs. To increase the specificity of ssd-ortholog prediction, Fulton et al. [Fulton, D., Li, Y., Laird, M., Horsman, B., Roche, F., Brinkman, F., 2006. Improving the specificity of high-throughput ortholog prediction. BMC Bioinformatics 7 (1), 270] developed Ortholuge, a bioinformatics tool that identifies predicted orthologs with atypical genetic divergence. However, when the initial list of putative orthologs contains a non-negligible number of non-ssd-orthologs, the cut-off values that Ortholuge generates for orthology classification are difficult to interpret and can be too high, leading to decreased specificity of ssd-ortholog prediction. Therefore, we propose a complementary statistical approach to determining cut-off values. A benefit of the proposed approach is that it gives the user an estimated conditional probability that a predicted ortholog pair is unusually diverged. This enables the interpretation and selection of cut-off values based on a direct measure of the relative composition of ssd-orthologs versus non-ssd-orthologs. In a simulation comparison of the two approaches, we find that the statistical approach provides more stable cut-off values and improves the specificity of ssd-ortholog prediction for low-quality data sets of predicted orthologs.
Year of publication: |
2011
|
---|---|
Authors: | Min, Jeong Eun ; Whiteside, Matthew D. ; Brinkman, Fiona S.L. ; McNeney, Brad ; Graham, Jinko |
Published in: |
Computational Statistics & Data Analysis. - Elsevier, ISSN 0167-9473. - Vol. 55.2011, 1, p. 935-943
|
Publisher: |
Elsevier |
Keywords: | Orthologs Comparative genomics Local-fdr Mixture distribution |
Saved in:
Saved in favorites
Similar items by person
-
Shin, Ji-Hyung, (2007)
-
elrm: Software Implementing Exact-Like Inference for Logistic Regression Models
Zamar, David, (2007)
-
Shin, Ji-Hyung, (2006)
- More ...