Data Science Techniques to Predict Students Grade, step by step using machine learning algorithms.

Raghu Bayya
Analytics Vidhya
Published in
4 min readDec 18, 2019

Also other factors that influence the student grades.

model architectural design, inspired from Rangoli design patterns.

This Article is broken up into 3 parts.

  1. In the first part, we understand the factors how no of hours of absence in course will influence the final grade of students and prepare data for machine learning model.
  2. In second part we build decision tree to predict the chances of event outcome for student final grades and plot relation between grades and consultations and with another plot relation between grades and Absence.
  3. In third part we build neural network to predict the student grades using no of hours of absence in course and Consultations which influence the student performances. Using keras and Tenserflow.

Packages

this image from project student grade, jupyter notebook all the packages used in analysis and prediction.
Jupyter Notebook

In this article topics covered

  1. Exploratory data analysis.
  2. Data preparation
  3. Data Preprocessing
  4. Decision Tree classification
  5. Normalizing Dataset
  6. Building the model
  7. Model evaluation
  8. Performance measure
  9. Save the model to file

Loading data, by using simple pandas package as follows.

source: Jupyter Notebook
source: Jupyter Notebook

Machine learning algorithm can perform better on numeric values but in our dataset Final grades are text values. To transform categorical text data into numeric machine readable format from sklearn package we apply Label Encoder() method.

source: Jupyter Notebook

label_encoder.fit_transform(std[‘Finalgrades’]) method transforms text value into numeric values as 0 = “pass” , 1=”redo”, 2=”retake”.

source : Jupyter Notebook

In similar to label encoder from sklearn, using pandas package which can turn to categorical values into series of 0’s & 1’s. which make lot easier to quantify and compare. This can translate values inside the columns into columns to make more meaningful. using pandas dummy package.

source : Jupyter Notebook
source : Jupyter Notebook

Using pandas package concat dataframe with dummy data, the resultant is combination of 0’s & 1’s columns values with respective to student finalgrades.

source : Jupyter Notebook

In the dataset it contain ConsultationsD1 & ConsultationsD2 is nothing but number of time students consult or visited professor to discuss about they project. It is also explained in past, the more we understand the problem better we get the solution. In the next article i will show how the degree on consulting or visit with professor will influence students grade.

source : Jupyter Notebook

Creating new column in dataset newset as Consultations, to combine the value of other consultationd1 and consultationd2 into single column. And deleting remaining columns using pandas package.

Exploratory Data analysis : To plot relation between Grades and Consultations. using seaborn package regression plot.

Plot Relation between Grades, Consultations and Hrabsence , using seaborn package.

source: Jupyter Notebook
Source: Jupyter Notebook

To see total count of student who pass, redo and retake.

source: Jupyter Notebook

Next, to see and compare student Grades with Finalgrades

source: Jupyter Notebook

And now, to see if no of hr of absence in course is correlated to student grades.

Using Crosstab which aggregate matrices among two or more columns in dataset which contain categorical values, to get quick summery.

source : Jupyter Notebook
source: Jupyter Notebook

From the bar graph, we can correlate the significant relation between No of hr of absence in course with Finalgrades.

source : Jupyter Notebook
Source : Jupyter Notebook

Chi-Square Test

source : Jupyter Notebook

Part 1 : Github/Jupyternotebook source

In next Article part 2, we will discuses using decision tree to find possible outcome of grade.

About Author : Raghu Bayya, Data Scientist ML/Deep Learning.

Expert in Big Data

--

--