Balanced Gradient Boosting from Imbalanced Data for Clinical Outcome Prediction
In clinical outcome prediction, such as disease diagnosis and prognosis, it is often assumed that the class, e.g., disease and control, is equally distributed. However, in practice we often encounter biological or clinical data whose class distribution is highly skewed. Since standard supervised learning algorithms intend to maximize the overall prediction accuracy, a prediction model tends to show a strong bias toward the majority class when it is trained on such imbalanced data. Therefore, the class distribution should be incorporated appropriately to learn from imbalanced data. To address this practically important problem, we proposed balanced gradient boosting (BalaBoost) which reformulates gradient boosting to avoid the overfitting to the majority class and is sensitive to the minority class by making use of the equal class distribution instead of the empirical class distribution. We applied BalaBoost to cancer tissue diagnosis based on miRNA expression data, premature death prediction for diabetes patients based on biochemical and clinical variables and tumor grade prediction of renal cell carcinoma based on tumor marker expressions whose class distribution is highly skewed. Experimental results showed that BalaBoost outperformed the representative supervised learning algorithms, i.e., gradient boosting, Random Forests and Support Vector Machine. Our results led us to the conclusion that BalaBoost is promising for clinical outcome prediction from imbalanced data.
Year of publication: |
2009
|
---|---|
Authors: | Teramoto, Reiji |
Published in: |
Statistical Applications in Genetics and Molecular Biology. - Berkeley Electronic Press. - Vol. 8.2009, 1, p. 20-20
|
Publisher: |
Berkeley Electronic Press |
Subject: | clinical outcome | diagnosis | cancer | diabetes | renal cell carcinoma | ensemble learning | boosting | cost-sensitive learning | imbalanced data |
Saved in:
Online Resource
Saved in favorites
Similar items by subject
-
Data science for insurance fraud detection : a review
Banulescu-Radu, Denisa, (2025)
-
Lobo, Armindo, (2024)
-
Predicting the length of stay in hospital emergency rooms in Rhode Island
Lamere, Alicia T., (2021)
- More ...