Predicting students’ dropout and academic success

Priya Shahari
5 min readDec 2, 2023

--

Hey there! Ever wondered why some students leave university early or why others do really well in their studies? Let’s dive in and figure out what might be behind these things! 📚✨

First off, think about your friends at university. Some might always get top grades, while others might struggle a bit or even decide to leave school before finishing. What makes these differences happen?

Well, there are lots of things that could play a part. Things like how much someone likes school, how they’re doing in their classes, if they face challenges outside of university, or even how supported they feel can all affect what happens with their education.

Imagine we’re like detectives, trying to find clues. We’ll look at different things — like how often students attend class, their grades, if they have enough support at home, or if they face any problems that make university hard for them.

Similar to how students might struggle without access to resources or growth opportunities, employees may seek new opportunities when they perceive limited career development or feel stagnant in their roles (Employee attrition).

By exploring these things, let’s hope to understand better what helps students stay and succeed in school or university and what might make it tough for some. It’s a bit like putting puzzle pieces together to see the bigger picture!

Data Preparation:

This dataset contains data from a higher education institution on various variables related to undergraduate students, including demographics, social-economic factors, and academic performance, to investigate the impact of these factors on student dropout and academic success

  • Marital status: The marital status of the student. (Categorical)
  • Application mode: The method of application used by the student. (Categorical)
  • Application order: The order in which the student applied. (Numerical)
  • Course: The course taken by the student. (Categorical)
  • Daytime/evening attendance: Whether the student attends classes during the day or in the evening. (Categorical)
  • Previous qualification: The qualification obtained by the student before enrolling in higher education. (Categorical)
  • Nationality: The nationality of the student. (Categorical)
  • Mother’s qualification: The qualification of the student’s mother. (Categorical)
  • Father’s qualification: The qualification of the student’s father. (Categorical)
  • Mother’s occupation: The occupation of the student’s mother. (Categorical)
  • Father’s occupation: The occupation of the student’s father. (Categorical)
  • Displaced: Whether the student is a displaced person. (Categorical)
  • Educational special needs: Whether the student has any special educational needs (Categorical)
  • Debtor: Whether the student is a debtor. (Categorical)
  • Tuition fees up to date: Whether the student’s tuition fees are up to date. (Categorical)
  • Gender: The gender of the student. (Categorical)
  • Scholarship holder: Whether the student is a scholarship holder (Categorical)
  • Age at enrollment: The age of the student at the time of enrollment. (Numerical)
  • International: Whether the student is an international student. (Categorical)
  • Curricular units 1st sem (credited): The number of curricular units credited by the student in the first semester. (Numerical)
  • Curricular units 1st sem (enrolled): The number of curricular units enrolled by the student in the first semester. (Numerical)
  • Curricular units 1st sem (evaluations): The number of curricular units evaluated by the student in the first semester. (Numerical)
  • Curricular units 1st sem (approved): The number of curricular units approved by the student in the first semester. (Numerical)
  • Target: We are going to predict this, i.e whether the student dropped out or graduated.

Changing the unique value of target into integer: Using Label encode in PredictEasy which gives us Numerical data for analysis of ‘Target’.

Data preparation

PredictEasy Analysis:

Using the Google Sheets add-on PredictEasy a classification model was built. In order to learn more about how to use the tool, please refer to my previous blog posts.

We start by putting every variable in X and the target variable in Y. After doing this, we see the summary:

The predictive model achieved an accuracy of 0.84, indicating that it correctly classified 84% of the instances.

Accuracy

The confusion matrix determines the performance of the classification model and shows us the errors while predicting:

Confusion Matrix

The correlation plot above shows the change in one variable with respect to another. As we can see very few of the variables are correlated to each other. This means that the change is one variable does not affect the other. Since we have to predict Target(Y), all the features(X) in this plot should be having no correlation with each other to produce a good model.

Correlation plot

The Dropout Story:

Feature Rank

From our analysis, we get several interesting facts:

  1. Application mode does matter, and the age of enrollment has the highest dropout rate of any other type.
  2. Scholarship holder tend to have fewer dropouts than non-scholarship holders.
  3. Gender, marital status, or a parent’s occupation does not matter in a drop-out case.
  4. Drop out Students who pay the current tuition fees up to date are two times more likely to drop out than students who pay the latter tuition fees.

Potential Ideas:

  • It is recommended to focus on ensuring that students have their tuition fees up-to-date. Implementing measures to remind and support students to pay their fees on time could be beneficial.
  • Providing additional support or resources to students who are not scholarship recipients may help improve their performance and increase their chances of meeting the target variable.
  • Monitoring and addressing any issues related to the grades in the second semester could also contribute to better predictions.

--

--