A Fresh Look At The Graduate Admissions Dataset
An indepth analysis of Graduate Admission data from an Indian perspective.
General Analysis and Information
The graduate studies dataset is a dataset which describes the probability of selections for Indian students dependent on the following factors :
1. GRE Score
2. TOEFL Score
3. University Rating ( out of 5)
4. SOP ( Statement of Purpose)and LOR(Letter of Recommendation) Strength(out of 5)
5. Undergraduate CGPA ( Also called CPI ) (out of 10)
6. Research Experience ( 0 for no experience and 1 for having an experience)
7. Chance of Admit ( out of 1)
The model used for various plots like PDP and feature importance is a Random Forest which was able to predict the ‘Chance of Admit’ on a validation set with nearly 96 % accuracy. Some general characteristics of the data are as follows :
The feature importance curve plots out the impact of the various features on the data, the ‘Chance of Admit’ in our case. It turns out that the probability of admission is most dependent on the CGPA of the candidate followed by the standardized test scores (This plot will be referenced further for various inferences). Also check out the dendogram given below. A dendogram is a graphical way of showing co-relation between various features. The earlier the features get connected ( looking from the right side) the more interlocked they are.
The dendogram seems to show that the following features are closely related to each other :
- SOP Strength and the University Rating
- The CGPA, GRE Score and TOEFL Score
Influence of GRE and TOEFL scores on chance of admit
The data can be interpreted using the following graphs:
The scatter plot seems to imply that the ‘Chance of Admit’ increases with increase in the standardized test scores. To get more clarity use the following graphs:
The plot seems to clearly show that ‘Chance of Admit’ increases with the standardized test scores. To further strengthen results to prove that the ‘Chance of Admit’ is increasing due to the scores themselves a PDP (Partial Dependence Plot) was made using a Random Forest. A PDP uses machine learning model to calculate values assuming that the variable we want to analyse is the only one changing, The graphs came out to be as follows (which proves that the probability to be selected increases with increase in standardized test scores) :
Effect of CGPA on Probability of Admission
As seen in the feature importance graph (in the general analysis and information part) CGPA has by far the strongest impact in determining the probability of admission. So it is known that it is an important factor. Check out the following graphs to get more of an idea:
The second graph is a PDP which means that it plots only changes in the chance of admission which are brought about by the change in CGPA(It is obviously a little more complicated than that!). It can be clearly seen in both the PDP graph and the graph beside it that the probability of admission increases with increase in CGPA.
Effect of University Ratings on Chance of Admission
Check out the three graphs given below:
The two graphs clearly show that the chance of admission increases with the increase in university rating. Check out the third graph given below:
The reason for this surprising result could be that due to high cost of applications only students who seriously believe that they could get chosen would be applying for the high rated universities due to which the average probability increases with increase in university rating.
Spectography of Students across various Universities
In this section we will be analyzing the spectrum of students applying for universities by the rating of the universities and the standardized test scores and CGPA of the students. First check out the pie graph which depicts distribution of student applications across the various rated universities:
So the graph shows that the distribution of applications is nearly uniform except for the highest rated and the lowest rated universities. Now we will take a look at the CGPAs ,GRE Scores and TOEFL Scores for each section of universities ( by rating) :
CGPA of Applicants vs University Ratings
TOEFL Scores of Applicants vs University Ratings
GRE Scores of Applicants vs University Ratings
All three plots show that as the university ratings increase the standardized scores and the CGPA of the applicants also increase. This means that better rated applicants as per the test scores and their CGPA tend to apply directly for the better rated universities and the lesser rated applicants as per the test scores and their CGPA tend not to apply for higher rated universities.
Effect of Other Features on Probabilty of Admission
THANK YOU FOR READING THIS BLOG !
Citation: Mohan S Acharya, Asfia Armaan, Aneeta S Antony : A Comparison of Regression Models for Prediction of Graduate Admissions, IEEE International Conference on Computational Intelligence in Data Science 2019