Modelling Student Dropout using AdaBoost and Survival Analysis
Abstract
The average graduation rate of UPD COE freshmen admitted between 2009 and 2013 is 66.89%. The UPD COE graduation rate is quite low compared to other schools, indicating that it is important to investigate the dropout rates of students as well. Existing studies made use of several different models in order to predict student dropout. These studies made use of both pre-enrollment data and data on student performance per semester. Out of the different models used, the AdaBoost model and the Cox models consistently performed well. For this study, the AdaBoost model and time-varying Cox model were used to predict whether a student drops out, predict when a student will dropout, and analyze the features that lead to student dropout. Hazard ratios from the Cox model allow us to know whether the features increase or decrease risk of dropout. Pre-enrollment data and post-enrollment data was used to analyze student dropout. Higher number of semesters of absence without leave increase the risk while high school GWA and getting accepted in the student's first or second choice degree program decrease the risk of dropout. These features were found to be significant factors that affect dropout risk for both 4-Year and 5-Year programs. Of the two models, the AdaBoost model performed better at predicting student dropout and drop time. The results of the models can be used to help identify at-risk students as early as possible and guide them with regards to their specific needs..
Keywords— adaboost, machine learning, student dropout, student retention, survival analysis