The Enigma of Data Analyst Salaries: Unveiling Industry Inequities Through Data

ChunYu Ko
The whispers of a data analyst
5 min readDec 30, 2023

Following our previous article on the Taipei data science job market, we delve deeper into the observed phenomenon that Data Analysts have a lower median minimum monthly salary compared to other roles.

To investigate the validity of this trend, we considered factors beyond job content, industry, and role requirements, focusing on how job titles influence salaries. We excluded job postings without disclosed salary ranges but this may introduced the selection bias.

One explanation is the abundant talent pool in the Data Analyst field, coupled with relatively comprehensive information and a less steep learning curve. This attracts more experienced professionals willing to mentor newcomers, leading to a willingness to hire inexperienced candidates, thus lowering the overall salary average.

Alternatively, senior Data Analyst positions might be less inclined to disclose their salaries, skewing our observation towards lower compensation. In such cases, the perceived underestimation in salaries might be a result of insufficient transparency, not the market reality.

For accurate analysis, we need to compare roles with identical responsibilities but different titles. However, even within the same title, responsibilities can vary significantly across industries, companies, and experience levels, complicating the research.

Analyzing Salary Discrepancies through Pivot Analysis of Experience Requirements

In this type of analysis, pivot analyses are often used to stratify data.

For instance, we might separate Data Analyst roles by experience requirements; if both low and high experience requirement DA roles generally offer lower salaries, we might conclude that DA salaries remain lower than non-DA roles, even when considering experience.

However, if lower salaries are only seen in entry-level DA roles, it suggests that the industry’s overall average salary is dragged down due to DA roles being more open to inexperienced applicants.

Regressing Adjustments to Exclude the Impact of Job Requirements and Responsibilities

Another standard method, regression adjustment, reduces the influence of other factors without extensive graph manipulation, quickly identifying answers.

We used univariate and multivariate regression models to correct for factors potentially affecting the observed impact of job titles on salaries:

  • Job Requirements (JR), including industry type, management needs, minimum education level, and minimum work experience.
  • Six Types of Job Responsibilities (TR): Reporting & Visualization, Database & SQL, Data Projects & ETL, Computer Vision & Deep Learning, AI/ML, AD/GA.
  • 26 Specific Job Duties within the 6 Types of Responsibilities (ST).

After running six different models, we found:

  • Ignoring other factors like JR, TR, ST, Model 1 showed that DA’s minimum salary decreased by -NT$8.4K.
  • Considering only JR, TR, ST separately (Model 2~4), DA’s minimum salary was decreased by -NT$8.4K, -NT$6.6K, -NT$6.2K, respectively.
  • Considering combinations of JR with TR, or JR with ST (Model 5~6), the DA’s minimum salary decreased by -NT$5.3K, -NT$4.8K.
  • Additionally, based on Model 6, the confidence interval for minimum salary estimates and the R2 value indicated better model fit than others.

This suggests that without considering other salary-influencing factors, the difference in job titles alone seems to reduce salaries by -NT$8.4K.

However, based on Model 6, the actual negative impact might only be NT$4.8K.

Employing Vectorized Job Descriptions to Simulate an A/B Testing Environment

For fair comparisons, the ideal method is A/B testing. Since conducting experiments in the job market isn’t feasible, we create a pseudo-experimental condition, akin to a Natural Experiment.

The approach is straightforward: ensuring that, aside from the factor under comparison, all other factors between two groups — DA and Non-DA positions — are identical, thus achieving an A/B testing effect.

Hence, we meticulously, standardly, and swiftly compared the features of DA and non-DA roles, selecting only those non-DA roles highly similar to DAs.

This process, known as Propensity Score Matching (PSM), aims to compare samples that differ only in job titles.

We used GLM for Propensity Score and Nearest Neighbor Matching to find highly similar DA and non-DA positions. Additionally, we set:

  • A 1:1 ratio of selected positions between DA and non-DA.
  • A caliper of 0.1 times the standard deviation of propensity scores, avoiding vastly different samples.
  • To match diverse Job Descriptions, we used four models from sentence transformers: all-mpnet-base-v2, multi-qa-mpnet-base-dot-v1, all-distilroberta-v1, and all-MiniLM-L12-v2, for vectorizing job descriptions (VJD).
  • We reduced VJD to 10 dimensions using the UMAP method.
  • After PSM, we still incorporated Covariates in regression models to adjust for subtle differences not fully eliminated by PSM.

To verify PSM’s performance, we used the Distance metric, where a smaller Distance indicates a better match between the DA and non-DA groups post-PSM. Models with smaller distances suggested minimal intergroup differences post-PSM.

  • In models without VJD (Model 7~9), the model using JR for PSM showed the highest fit, with DA salaries decreasing by -NT$5.2K.
  • In models using only VJD (Model 10~13), the model with all-distilroberta-v1 showed the best fit, with DA salaries decreasing by -NT$8.1K.
  • Model 18, which used JR + ST + VJD, performed best, though with fewer samples, showing a -NT$4.7K decrease in DA salaries.

We further confirmed the balance of features between DA and non-DA positions in Model 18 post-PSM. Ideally, differences should be minimal, so a smaller Absolute Standardized Mean Difference (ASMD) is better.

Before PSM, significant differences existed in industry, minimum education level, minimum work experience, VJD, and ST (high ASMD). Post-adjustment in Model 18, most feature differences fell below 0.1.

This demonstrates that by controlling for job features and content, we can fairly assess salary differences between non-/DA roles.

Conclusion

Regardless of controlling for other job-related variables, DA’s minimum salary is consistently lower:

  • Direct comparison of all publicly disclosed DA and non-DA roles shows a -NT$8.4K decrease in DA’s minimum salary.
  • Using PSM to control for JR + ST + VJD and other influencing factors, DA’s minimum salary decrease reduces to -NT$4.7K.

However, this study has limitations:

  1. The maximum salary range for DAs is unknown, which might be substantial (e.g., $30K to $300K per month), potentially creating an illusion of lower DA salaries due to some undisclosed salaries.
  2. In Model 18, many usable samples were discarded, possibly preventing us from extrapolating these results to others.
  3. Collecting data solely from Taipei is not representative, as other cities like Taoyuan, Hsinchu, Taichung, and Kaohsiung, known for their optoelectronics and manufacturing industries, were not included.

In conclusion, as a data enthusiast passionate about exploring the universe through data, I believe there’s no inherent value difference in the field of data science.

The journey in this domain is continuously evolving, and I invite everyone to join in this progression.

--

--

ChunYu Ko
The whispers of a data analyst

Work is data, and hobby is also data, but I yearn for my roommate's two cats, lazily lounging at the doorway.