A Fresh Look At The Graduate Admissions Dataset

An indepth analysis of Graduate Admission data from an Indian perspective.

Shaan Shah
Analytics Vidhya
5 min readMay 13, 2020


Picture Credit : Bryan Alexander via Flickr

General Analysis and Information

The graduate studies dataset is a dataset which describes the probability of selections for Indian students dependent on the following factors :

1. GRE Score

2. TOEFL Score

3. University Rating ( out of 5)

4. SOP ( Statement of Purpose)and LOR(Letter of Recommendation) Strength(out of 5)

5. Undergraduate CGPA ( Also called CPI ) (out of 10)

6. Research Experience ( 0 for no experience and 1 for having an experience)

7. Chance of Admit ( out of 1)

The model used for various plots like PDP and feature importance is a Random Forest which was able to predict the ‘Chance of Admit’ on a validation set with nearly 96 % accuracy. Some general characteristics of the data are as follows :

The feature importance curve plots out the impact of the various features on the data, the ‘Chance of Admit’ in our case. It turns out that the probability of admission is most dependent on the CGPA of the candidate followed by the standardized test scores (This plot will be referenced further for various inferences). Also check out the dendogram given below. A dendogram is a graphical way of showing co-relation between various features. The earlier the features get connected ( looking from the right side) the more interlocked they are.

The dendogram seems to show that the following features are closely related to each other :

  1. SOP Strength and the University Rating
  2. The CGPA, GRE Score and TOEFL Score

Influence of GRE and TOEFL scores on chance of admit

The data can be interpreted using the following graphs:

The scatter plot seems to imply that the ‘Chance of Admit’ increases with increase in the standardized test scores. To get more clarity use the following graphs:

The plot seems to clearly show that ‘Chance of Admit’ increases with the standardized test scores. To further strengthen results to prove that the ‘Chance of Admit’ is increasing due to the scores themselves a PDP (Partial Dependence Plot) was made using a Random Forest. A PDP uses machine learning model to calculate values assuming that the variable we want to analyse is the only one changing, The graphs came out to be as follows (which proves that the probability to be selected increases with increase in standardized test scores) :

Both of these are PDP plots. The bright yellow line in the middle shows the direct co-relation between chance of admission and the standardized scores without any effect of other factors !

Effect of CGPA on Probability of Admission

As seen in the feature importance graph (in the general analysis and information part) CGPA has by far the strongest impact in determining the probability of admission. So it is known that it is an important factor. Check out the following graphs to get more of an idea:

This image on the right is a PDP plot. The bright yellow line in the middle shows the direct co-relation between chance of admission and the CGPA of the candidate without any effect of other factors !

The second graph is a PDP which means that it plots only changes in the chance of admission which are brought about by the change in CGPA(It is obviously a little more complicated than that!). It can be clearly seen in both the PDP graph and the graph beside it that the probability of admission increases with increase in CGPA.

Effect of University Ratings on Chance of Admission

Check out the three graphs given below:

The image on the left is a plot showing means as the dark line and the confidence interval as the shaded portion. This means that majority of the data falls in that portion.

The two graphs clearly show that the chance of admission increases with the increase in university rating. Check out the third graph given below:

This is a PDP plot. The bright yellow line in the middle shows the direct co-relation between chance of admission and the university rating without any effect of other factors !

The reason for this surprising result could be that due to high cost of applications only students who seriously believe that they could get chosen would be applying for the high rated universities due to which the average probability increases with increase in university rating.

Spectography of Students across various Universities

In this section we will be analyzing the spectrum of students applying for universities by the rating of the universities and the standardized test scores and CGPA of the students. First check out the pie graph which depicts distribution of student applications across the various rated universities:

Percentage of student applications across universities (by rating)

So the graph shows that the distribution of applications is nearly uniform except for the highest rated and the lowest rated universities. Now we will take a look at the CGPAs ,GRE Scores and TOEFL Scores for each section of universities ( by rating) :

CGPA of Applicants vs University Ratings

The solid line plots the mean CGPA for each rating and the shaded portion shows the area where most of the data points lie.

TOEFL Scores of Applicants vs University Ratings

The solid line plots the mean TOEFL Score for each rating and the shaded portion shows the area where most of the data points lie.

GRE Scores of Applicants vs University Ratings

The solid line plots the mean GRE Score for each rating and the shaded portion shows the area where most of the data points lie.

All three plots show that as the university ratings increase the standardized scores and the CGPA of the applicants also increase. This means that better rated applicants as per the test scores and their CGPA tend to apply directly for the better rated universities and the lesser rated applicants as per the test scores and their CGPA tend not to apply for higher rated universities.

Effect of Other Features on Probabilty of Admission

This plots seems to show that applicants having a research experience tend to have higher chance of admission
This PDP plot seems to show that applicants having a stronger LOR tend to have higher chance of admission. The bright yellow line in the middle represents the direct relation between chance of admission and strength of LOR.


Citation: Mohan S Acharya, Asfia Armaan, Aneeta S Antony : A Comparison of Regression Models for Prediction of Graduate Admissions, IEEE International Conference on Computational Intelligence in Data Science 2019

