Data Science Techniques to Predict Students Grade, step by step using machine learning algorithms.
Also other factors that influence the student grades.
This Article is broken up into 3 parts.
- In the first part, we understand the factors how no of hours of absence in course will influence the final grade of students and prepare data for machine learning model.
- In second part we build decision tree to predict the chances of event outcome for student final grades and plot relation between grades and consultations and with another plot relation between grades and Absence.
- In third part we build neural network to predict the student grades using no of hours of absence in course and Consultations which influence the student performances. Using keras and Tenserflow.
Packages
In this article topics covered
- Exploratory data analysis.
- Data preparation
- Data Preprocessing
- Decision Tree classification
- Normalizing Dataset
- Building the model
- Model evaluation
- Performance measure
- Save the model to file
Loading data, by using simple pandas package as follows.
Machine learning algorithm can perform better on numeric values but in our dataset Final grades are text values. To transform categorical text data into numeric machine readable format from sklearn package we apply Label Encoder() method.
label_encoder.fit_transform(std[‘Finalgrades’]) method transforms text value into numeric values as 0 = “pass” , 1=”redo”, 2=”retake”.
In similar to label encoder from sklearn, using pandas package which can turn to categorical values into series of 0’s & 1’s. which make lot easier to quantify and compare. This can translate values inside the columns into columns to make more meaningful. using pandas dummy package.
Using pandas package concat dataframe with dummy data, the resultant is combination of 0’s & 1’s columns values with respective to student finalgrades.
In the dataset it contain ConsultationsD1 & ConsultationsD2 is nothing but number of time students consult or visited professor to discuss about they project. It is also explained in past, the more we understand the problem better we get the solution. In the next article i will show how the degree on consulting or visit with professor will influence students grade.
Creating new column in dataset newset as Consultations, to combine the value of other consultationd1 and consultationd2 into single column. And deleting remaining columns using pandas package.
Exploratory Data analysis : To plot relation between Grades and Consultations. using seaborn package regression plot.
Plot Relation between Grades, Consultations and Hrabsence , using seaborn package.
To see total count of student who pass, redo and retake.
Next, to see and compare student Grades with Finalgrades
And now, to see if no of hr of absence in course is correlated to student grades.
Using Crosstab which aggregate matrices among two or more columns in dataset which contain categorical values, to get quick summery.
From the bar graph, we can correlate the significant relation between No of hr of absence in course with Finalgrades.
Chi-Square Test
Part 1 : Github/Jupyternotebook source
In next Article part 2, we will discuses using decision tree to find possible outcome of grade.
About Author : Raghu Bayya, Data Scientist ML/Deep Learning.
Expert in Big Data