Comparison of Tree-Based Methods for Prognostic Stratification of Survival Data
Tree-based methods can be used to generate rules for prognostic classification of patients that are expressed as logical combinations of covariate values. Several splitting algorithms have been proposed for generating trees from survival data. However, the choice of an appropriate algorithm is difficult and may also depend on clinical considerations. By means of a prognostic study of patients with gallbladder stones and of a simulation study, we compare the following splitting algorithms: Logrank statistic adjusted for measurement scale with (AP) and without (AU) pruning, exponential log-likelihood loss (EP), Kaplan-Meier distance of survival curves (KP), unadjusted logrank statistic (LP), martingale residuals (MP), and node impurity (ZP). With the exception of the AU algorithm (based on a Bonferroni-adjusted p value-driven stopping rule), trees are pruned using the measure of split-complexity, and optimally-sized trees are selected using cross-validation. The integrated Brier score is used for the evaluation of predictive models. According to the results of our simulation study and of the clinical example, we conclude that the AU, AP, EP, and LP algorithm may yield superior predictive accuracy. The choice among these four algorithms may be based on the required parsimonity and on medical considerations