Salaries for Data Science professionals explained with Machine Learning
SHAP values of employee residence, experience level, company location, and more
Published in
4 min readJan 5, 2023
In this article, I have analysed the dataset that contains detailed information about 600 salaries in the Data Science domain (worldwide) in the years 2020–2022 taken from the ai-jobs.net website. This dataset is publicly available on Kaggle. Full details of the analysis can be found in this public Kaggle notebook.
Step 1 — data preprocessing
Here, data preprocessing consists of the following steps:
- converting the label (yearly gross salaries) to kUSD/year;
- excluding 1% of the highest and 1% of the smallest salaries;
- encoding rare categorical variables (in employee_residence, job_title, and experience_level columns) with no more than 20 different categories in each column and at least 10 data samples in each category;
- finally, dropping unused columns.
Step 2 — setting a Machine Learning model to predict the yearly gross salaries
The data prepared with the previous step are randomly split between training and test samples…