On the implementation and extension of BART
BART (Bayesian Additive Regression Trees) is a nonparametric regression approach based on a random sum of regression trees model. In this thesis, we explore various key issues pertaining to its implementation and extension. First, we study how the performance of BART depends on the signal-to-noise in the data. In order to do so, a statistic whose function is similar to that of a standard root mean squared prediction error is created, and we observe its evolution as BART is performed with a varying number of trees; we demonstrate that a plot of this statistic against the number of trees used can be a useful diagnostic for assessing BART performance on a particular data set. Next, we move on to a study of the "optimal" number of trees to use in BART. The difficulties inherent in manually choosing the number of trees to use are noted, and we propose some possible alterations to the BART procedure that would allow for the number of trees being used to change automatically. Our next study is an examination of some of the factors involved in the coverage of the posterior intervals being produced by BART. There, we identify types of points that suffer extremely low frequentist coverage. We compare the results for interval coverage obtained using the Friedman function to those for a different linear function with similar summary statistics. Finally, we sketch out the basic framework required in order to extend BART for multivariate response data. This is motivated by first proposing the changes required to the priors, likelihoods, and posteriors for the multivariate extension of Bayesian CART under a few different sets of conditions. From there, it is fairly straightforward to adjust these calculations for the Bayesian "sum-of-trees" model with a multivariate response.
|Year of publication:||
|Type of publication:||Other|
Dissertations available from ProQuest