Exploring the Latest Salary Trends in Data Science and AI: An In-Depth Analysis
SHAP values for country, experience, job title, year, and more
Published in
4 min readJul 10, 2023
In this article, I use the newest updated public dataset taken from the ai-jobs.net website that contains (as of November 2023) 4,858 2022–2023 year gross salaries of Data domain professionals, including Data Scientists, Data Engineers, Data Analysts, Data Managers, and many more. The dataset is also publicly available on Kaggle. Full details of the analysis can be found in this public Kaggle notebook.
Step 1 — data preprocessing
Here, data preprocessing consists of the following steps:
- converting the label (yearly gross salaries) to kUSD/year;
- combining Experience and Expertise Level columns, as well as Employee Residence and Company Location countries
- encoding rare categorical variables (in employee_residence, job_title, and experience_level columns) with no more than 50 different categories in each column and at least 15 data samples in each category;
- finally, dropping unused columns.
Note that, unlike the previous analysis,