DATA SCIENCE JOB SALARIES: SQL ANALYSIS

Zainab Akinwande
11 min readAug 24, 2022

--

In this Project, I downloaded the dataset from Kaggle titled Data Science Job salaries. I decided to work on this dataset because:

  • I love Data Science and I want to build a career in this field.
  • I want to work on a dataset I actually enjoy and also get to see what the salary for this field is like.
Photo taken from Kaggle

Let’s get started!

STEP 1: Import the data into Excel

The first thing I did was to import the dataset into excel. This is to ensure that I know what the data is about, examine it and do some data cleaning. This dataset contained 11 columns and 608 rows. It includes the work_year column, experience level column, employment type, job title, salary, salary in usd etc.

I used the filter function to better understand the data set. There is a column called salary and another called salary_in_usd. The later is basically the conversion of the former to USD (US Dollars) which I believe is for the sake of uniformity and also to aid analysis.

  • I removed the first column which contained row number as I believe it is unnecessary for this analysis.
  • I used the Remove duplicate function to check for duplicate rows. 42 duplicate rows were found and deleted. That is 566 rows left.
  • I also checked for Blanks but there were no blanks

NB: I thought that was all I could do in excel as the data was pretty clean or so I thought! I would have made a big mistake if I didn’t take my time to go through the dataset again. It’s important to use the filter and sort function properly to examine the dataset thoroughly.

  • I noticed that for the job title, there was a subtle repetition that could have affected the result of the analysis. Machine learning Engineer and ML Engineer were included in the dataset. This is basically the same job title written differently.
  • I used the find and replace function to find ML Engineer in the dataset and replace with Machine Learning Engineer.

All done and good. I saved the file as an excel workbook type in preparation for the next stage

STEP 2: Importing the data to SQL

I used Microsoft SQL Server Management Studio for this analysis. For the most part of this analysis, I will be working with the salary_in_usd column for the sake of uniformity.

In the course of the analysis, I used the following SQL functions: SELECT, FROM, WHERE, CASE STATEMENT, WINDOW CLAUSE, CTE, GROUP BY, ORDER BY, MAX, MIN, AVERAGE, ROUND, etc.

These are the questions I answered:

  1. How many job title was included in the dataset?
  2. Which experience level has the highest salary?
  3. Does the employment type affect salary?
  4. What is the average salary per job title in USD?
  5. What is the highest salary per job title in USD?
  6. What is the highest paying entry-level data science job?
  7. What is the highest paying mid-level data science job?
  8. What is the highest paying senior-level data science job?
  9. What is the highest paying Executive level data science job?
  10. What is the average salary per experience level?
  11. What is the lowest paying entry-level data science job?
  12. What is the lowest paying Mid-level Data Science Job?
  13. What is the lowest paying Senior level data science job?
  14. What is the lowest paying Executive level data science job?
  15. Does company size affect salary?
  16. Does company location affect Salary?

Before I dived into the analysis itself, I had to ensure that my dataset was imported properly from excel. I used the SELECT and asterisk function to confirm it.

Sneak peek of the dataset
  1. How many job title was included in the dataset?

I used the COUNT function together with DISTINCT so as to get unique job_titles and to avoid duplicates in the course of the count.

Answer: 49 job titles

2. Which experience level has the highest salary?

I used the CASE Statement to separate the experience level into different categories of Mid-level, senior-level, entry-level, and executive-level. I used the Count function to count the number of jobs that belonged to each of the category. I used the Max function to find the highest salary and the Round function to round up the numbers to two decimal places so as to make it readable. I grouped by the Case statements and ordered by highest_salary in descending order to get the highest salary.

Answer: Executive level at USD 600,000

3. Does the employment type affect salary?

I used the CASE Statement to separate the employment type into different categories: full-time, part-time, freelance, and contract. I used the Count function to count the number of jobs that belong to each category. I used the Average function to find the average salary per category and the Round function to round up the numbers to two decimal places so as to make it readable. I grouped by the Case statements and ordered by average_salary in descending order to get the result of the arranged in descending order.

Answer: Yes, it does. Contract staff get paid more on the average

4. What is the average salary per job title in USD?

I used the Average function to get the average salary per job title in USD. I used the Round function to round up the numbers to two decimal places so as to make it readable. I grouped by job title and ordered by the average salary in descending order.

Sneak peek of Average Salary per job title

5. What is the highest salary per job title in USD?

I used the Top 5 function to get the top 5 job title. I used the Max function to get the highest salary in USD. I used the round function to round up the numbers to two decimal place so as to make it readable. I grouped by job_title and ordered by the highest salary in descending order.

Answer: Principal Data Engineer at $600,000

6. What is the highest paying entry-level data science job?

I used the Top 5 function to get the top 5 job title. I used the Max function to get the highest salary in USD. I used the where function to filter the result and only return max salary for entry level. I grouped by job_title and ordered by the highest salary in descending order.

Answer: Machine Learning Engineer at $250,000 (That’s a substantial entry-level salary)

OR I also tried to use the CTE and Window clause to answer this question and I got the same answer. But I will be using the simpler method going forward

I Selected job title, experience level, salary in USD. I used the Rank function to get the rank of the dataset. I used the window function, partitioned by job title and experience level. I ordered by salary in descending order. I then filtered using the where clause to return result for entry-level only. I then selected job rank, job title, experience level, salary in USD. I used the where function to limit the result to only rank 1. Finally, I ordered by salary in USD in descending order.

7. What is the highest paying mid-level data science job?

I used the Top 5 function to get the top 5 job title. I used the Max function to get the highest salary in USD. I used where function to filter the result and only return max salary for mid-level. I grouped by job_title and ordered by the highest salary in descending order.

Answer: Financial Data Analyst and Research Scientist at $450,000

8. What is the highest paying senior-level data science job?

I used the Top 5 function to get the top 5 job title. I used the Max function to get the highest salary in USD. I used where function to filter the result and only return max salary for senior-level. I grouped by job_title and ordered by the highest salary in descending order.

ANSWER: Data Scientist at $412,000 (Isn’t it surprising that some Mid-level roles get paid more than Senior-level roles?)

9. What is the highest paying Executive level data science job?

I used the Top 5 function to get the top 5 job title. I used the Max function to get the highest salary in USD. I used where function to filter the result and only return max salary for executive-level. I grouped by job_title and ordered by the highest salary in descending order.

ANSWER: Principal Data Engineer at $600,000 (Huge!)

10. What is the average salary per experience level?

I used the CASE Statement to separate the experience level into different categories: mid-level, senior-level, entry-level, and executive-level. I used the Count function to count the number of jobs that belonged to each category. I used the Average function to find the average salary per category and the Round function to round up the numbers to two decimal places so as to make it readable. I grouped by the Case statements and ordered by average_salary in descending order to get the result arranged in descending order.

Answer:

11. What is the lowest paying entry-level data science job?

I used the Min function to get the lowest salary in USD. I used the where function to filter the result and only return the lowest salary for entry-level. I grouped by job_title and ordered by the lowest salary.

Answer: Data Scientist at $4,000 (This is ridiculously low. An outlier maybe!)

12. What is the lowest paying Mid-level Data Science Job?

I used the Min function to get the lowest salary in USD. I used the where function to filter the result and only return the lowest salary for mid-level. I grouped by job_title and ordered by the lowest salary.

Answer: Data Scientist at $2859 (For a Mid-level role? Even lower than an entry-level salary. I need to speak to HR asap!)

13. What is the lowest paying Senior level data science job?

I used the Min function to get the lowest salary in USD. I used the where function to filter the result and only return the lowest salary for senior-level. I grouped by job_title and ordered by the lowest salary.

Answer: Computer Vision Engineer at $18,907

14. What is the lowest paying Executive level data science job?

I used the Min function to get the lowest salary in USD. I used the where function to filter the result and only return the lowest salary for the executive level. I grouped by job_title and ordered by the lowest salary.

Answer: Data Science Consultant at $69,741

15. Does company size affect salary?

I used the CASE Statement to separate the company into different categories of a large, medium, and small companies. I used the Min and Max function to find the lowest and highest salary per company category and the Round function to round up the numbers to two decimal places so as to make it readable. I grouped them by the Case statements.

NB: I used the Min and Max function here instead of Average because Averages are susceptible to outliers. Using average will not give us the right insight into how much companies pay.

Answer: Yes, it does. Large companies tend to pay more

16. Does company location affect Salary?

I used the CASE Statement to separate the company location into Companies in the US and Companies not in the US. I used the Min and Max function to find the lowest and highest salary per location category and the Round function to round up the numbers to two decimal places so as to make it readable. I grouped them by the Case statements.

Answer: Yes, it does. Companies in the US tend to pay more.

INSIGHTS/CONCLUSION: From the above analysis, it can be concluded that companies in the US tend to pay more than their counterpart in other countries. Do you want to earn big money? Go for a job in a large company in the US, preferably a job with a high pay eg. Machine Learning Engineer. *wink*

You can check out the complete code on my Github

I hope you enjoyed reading this as much as I enjoyed working on the datasets. I will appreciate your feedback. See you on my next project. Bye!.

--

--