Image source : https://www.techcircle.in/

Do freshly minted college graduates get the salary package that they deserve ?

Rachita Pateria
Nov 3 · 8 min read

Are you the HR of a company that is recruiting freshly minted graduates into your workforce? Do you ever wonder, while conducting a recruitment drive in college campus, that maybe some candidates are more sound in knowledge or skills than others? If your company has defined a basic pay structure for these candidates on their onboarding, do you ever think not all candidates deserve the same salary structure?

I, for one definitely had these curious questions and decided to do dive into this study at once ! My focus was to understand what are the different factors which can or do affect, not just the employee package, but even other factors (or in a more technical language — different predictors).

To conduct this study, I started off with the first step any aspiring Data Scientist takes, that of Exploratory Data Analysis. For those who are coming across this term for the first time, EDA is a method to play around with data and twist & turn it so that it eventually coughs up some (hidden) information.

So, I picked up a dataset from kaggle which consists data for freshers in an India-based company. It contains school, college, graduation details of employees. Before recruiting the employees, the company conducts some tests during recruitment — English, Technical Knowledge, Domain Knowledge, Aptitude & Soft Skill test — for each employee. Based on all these factors the company assigns salary (in INR).

You can access the dataset here.

A sneak peek into the dataset

As we discussed earlier, the company conducted few tests namely —

  1. English Test
  2. Aptitude Test
  3. Domain Test
  4. Soft Skill Test
  5. Technical test

It further gave the (yet to be recruited) employees the option to opt out of technical test, i.e., it wasn’t mandatory to sit for this test. All others were mandatory.

Now, we had few values that were missing for those employees who opted out this test. Had they been random missing values, we would have conducted a Missing Value Imputation based on certain criteria. But here we have a very valid reason why so many (namely 860) missing values exist. To avoid any kind of confusion, we shall extract these 860 values, and put them in a different data table. We shall return to this soon !

For the remaining 19,000 odd rows we will now begin our exploration.

Employee Graduation

Let us see what is the distribution of graduations that was completed by each employee.

Frequency / Pie Chart Distribution of employees who completed different graduations

We notice that almost 95% of the employees have completed B.E or B.Tech. Since we do not have much information about the company, it can be safe to assume that it might be a leading IT company who are usually interested to hire engineering or computer application enthusiasts. Now let us see what course these employees had preferred during graduation.

Percentage of employees belonging to different branches

Recently in 2018 a research was conducted to find out which engineering branch has highest recruitment rate*. CS, Mechanical, EXTC were some of the top branches according to that study. We can also see almost 80% of employees of this dataset are also a part of these top branches. But the interesting point to notice here is that even employees from varied branches like Mechatronics and Chemical Engineering have opted for IT company.

Grouping the data

In the original dataset from kaggle we had multiple columns for aptitude test score, technical test score, soft skill score etc. Since many were on the same scale it became obvious to merge some columns for making our study easier.

Grouping of soft skills

What are the big 5 personality traits ? | www.verywellmind.com

The above 5 personality traits were merged into a single attribute of their overall soft skill.

For personality traits the range usually lies between -10 to +10 and average number of people fall around zero. We can observe this in the below graph for Soft Skill.

An interesting plot to notice is the Domain score of employees. The huge spread tells us that there exists a big variation in technical domain knowledge among employees. Why do you think this exists ? This maybe the case where a student is good academically but their industry and domain knowledge is extremely poor and thus fail to perform well in this test.

Next, let us see how well defined the gender diversity is in this company.

Gender Diversity

Frequency of Male & Female employees
Median Salary of Male & Female employees

Although there is vast difference between the number of male and female employee, it is interesting to note that there is almost equal median salaries of both genders.

There can be two reasons for this vast gender gap.

  1. Either the company was unable to collect equal data, which is when such few conclusions become unreliable.
  2. Or the company is biased to male workers. When does such a situation arise ? Say, for example, the company has most of its clients offshore. Here the company may prefer male employees to work over weeknights to overlap with different time-zones.

According to a recent study *, the gender pay gap in Indian IT industry is around 19%. We don’t see such a high difference in our dataset, since this gender pay gap increases with work experience. In the initial years, the salary of freshers is almost approximately equal.

From the above we can understand the spread of salary between genders across different graduations. Notice that even though number of outliers is high, the median salary across all six plots is more or less the same i.e. around 5.2 LPA.

Had this data had an approximately equal number of female employees, we could have made a statement that, not much median gender pay gap among fresh graduates

City Tier

Salary Distribution plot for Tier-1 & Tier-2 cities

Can you see any difference between the two plots ? Probably not. You might want to look more closely, but the difference is extremely small.

Initially, one of our hypothesis was that employees who belong to tier-2 cities might get lesser salary offered. This plot rejects our hypothesis and how brutally so ! So it is safe to assume that the city you completed your initial education has no effect on salary.

Correlation

From the above heat map, we can notice that the English Score and Aptitude Score is highly correlated. Why do you think this happened ? Maybe because, every aptitude test consists of topics like Logical reasoning, Quants & Verbal Ability. If an employee is good in English test, he or she is bound to perform well in Aptitude test also.

We can also notice that Soft Skill test results are very less correlated with any other test scores. If an employee is good in Technical or Domain knowledge, it might not be necessary he or she has the most appropriate soft skills such as leadership, teamwork, extroversion, openness etc.

One left aside…

Here we can notice that :

Average salary of employees who sat for the test. is much higher than for those who opted out of technical test.

Also, there is a huge gap in the maximum salary of both.

Similar trends can be observed for a different city tier.

Conclusion :

This gives us a totally new insight on Indian Education System ! Maybe, instead of assigning boards for more classes (read : TN govt. to introduce boards for st.5 & std.8) we need to start focusing on time based aptitude skills, technical knowledge and increasing domain knowledge after all !


Why EDA ?


Future works :

  1. Economists and Analysts are trying to identifying any positive relationship between conscientiousness (one of the soft skills) and wages. Also, they observe that, contrary to previous findings, women and men have similar returns to personality traits *. Both of these, though might not have a large business impact, but can be few of the really fascinating topics to dive into for future study.
  2. Another skill based study can be to identify an ideal work environment.
Image source : https://imgur.com/gallery/6ttvZ

Citations :

Shiksha.com — http://shiksha.com/b-tech/articles/highest-paying-engineering-branches-know-top-recruiters-salaries-offered

5 personality traits — https://www.verywellmind.com/the-big-five-personality-dimensions-2795422

Gender pay gap https://www.livemint.com/industry/human-resource/gender-pay-gap-still-high-women-in-india-earn-19-less-than-men-report-1551948081615.html

Introduction of more board examshttp://www.newindianexpress.com/states/tamil-nadu/2019/sep/13/tamil-nadu-introduces-board-exams-for-classes-5-and-8-2033171.html

Future works — https://www.researchgate.net/publication/330137392_Wage_premia_for_skills_the_complementarity_of_cognitive_and_non-cognitive_skills

Project Partner : Juilee Talele

Rachita Pateria

Written by

Data Science Enthusiast from Praxis Business School, Bangalore India

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade