Wayne State University first-year freshmen retention Fall 2006 to Fall 2007: Using logistic regression to predict retention probabilities
Retention is a major consideration for post-secondary institutions for reasons that encompass both economic factors and measures pertaining to the institution's quality. It is important to be able to assess as early as possible in a student's post-secondary experience if he or she is apt to persist. This study has two goals--to determine how to best create a predictive model, and determine which variables have an impact on the likelihood of a student's return. It is important to consider the underlying data when creating a logistic regression model, and whether or not the model is skewed to a particular result. Two logistic regressions were run using a total population and a more even distributed sample and the results were compared. The more evenly distributed sample model is shown to be more superior than the model using the entire population. Various factors are then considered in retaining a student: variables such as a the number of student credit hours elected, the student's high school GPA, and their comprehensive ACT score all have an important role in predicting a student's return. A student's race/ethnicity, gender, and receipt of need-based aid are also shown to play a role in retention prediction.