Inverse Problems in High Dimensional Stochastic Systems Under Uncertainty.
Increasingly often, problems in modern medicine, quantitative finance, or social-networking involve tens of thousands of variables that interact with each other and jointly evolve over time. The states of these variables may correspond to the phenotype of a particular individual, the price of a security, or the current status of an individual's social networking profile. If these states are hidden to a researcher, additional information must be obtained to infer these hidden states based upon measurements of other variables, knowledge of the interacting network structure, and any dynamics that model the evolution of these states. This dissertation is an attempt to address general problems regarding reasoning under uncertainty in such spatio-temporal models but with an emphasis to applications in predictive health and disease in a loosely monitored population of individuals. The motivation is highly interdisciplinary and draws on tools and concepts from machine learning, statistics, epidemiology, bioinformatics, and physics. We begin by presenting a solution to recursively sampling the best subset of nodes/variables that elicit the largest expected information gain of all sampled and un-sampled nodes in a large spatio-temporal complex network. We then present a tractable method for empirically estimating the spatio-temporal graphical model structure corresponding to the "susceptible", "infected", and "recovered" (SIR) model of mathematical epidemiology. Here, we formulate the problem as an L1-penalized likelihood convex program and produce network detection performance superior to other comparable state of the art methods. We present a logistic regression classifier that is robust to worst-case bounded measurement uncertainty. The proposed method produces superior worst-case detection performance to the standard L1-logistic regression classifier on a Human rhinovirus (HRV) gene expression data set. The final chapter concludes with identifying the appropriate basis functions used in a classification model when the data is both high-dimensional and temporally sampled with ultimate goal of discriminating between multiple states/labels, e.g., phenotypes. We utilize Gaussian Processes and L1-logistic regression to accomplish this task and apply it to a human gene expression time-series data set resulting from a challenge study inoculation with Human Influenza A/H3N2, HRV, and Human respiratory syncytial virus (RSV).
Year of publication: |
2010
|
---|---|
Authors: | Harrington Jr., Patrick Lloyd |
Subject: | Inverse Problems | High Dimensional Model Selection | Robust Optimization | Statistics and Numeric Data | Science |
Saved in:
Saved in favorites
Similar items by subject
-
Multiple Imputation for Measurement Error Correction Based on a Calibration Sample.
Guo, Ying, (2010)
-
Singhal, Harsh, (2009)
-
Sparse Estimation of High-Dimensional Covariance Matrices.
Rothman, Adam J., (2010)
- More ...