What I Wish I Knew Before Becoming A Data Scientist

Lu Zhenna
Women in Technology
8 min readFeb 15, 2024

Summary

This article highlights some knowledge gaps between data science education and important skills required for practicing data scientists. Some of the technical skills like version control, cloud technology, DevOps, etc are often not taught in school. Apart from the hard skills, the soft skills, like stakeholder management, required to become a more adaptive data scientist, will also be discussed.

Target Audience

  • aspiring and junior data scientists

Outline

The hard skills

  1. Learn version control tools, like git, and docker
  2. Learn cloud computing

The soft skills

3. The job description does not represent your actual job

4. The quality of upstream data is out of your hands

5. Stakeholder management is important

6. Managing up

7. When to change jobs?

Photo by Paolo Chiabrando on Unsplash

In a typical university curriculum of data science, you will be taught mathematics, statistics, data structures & algorithms, software engineering, databases, and machine learning. To pass the technical assessments for job interviews, you will practice a lot of Python and SQL coding questions.

You won’t use the above knowledge after passing exams and interviews. Some soft skills that are not written in job descriptions can make or break your career. To make your career adventures smooth sailing, below are some tips from me, or at least what I wish someone could tell me earlier.

Let’s start with technical skills which are easier for us to learn presumably. Data scientists in smaller companies need a broader range of tech stack. Unless you have the confidence to stay in a big company until you retire, it’s highly recommended to learn some DevOps and cloud computing.

  1. Learn version control tools, like git, and docker

We all work in a team. Multiple team members might write code for the same project. To facilitate collaboration, you need to learn git. It allows you to create different branches for concurrent developments and also keep track of the edit history.

Source: Git Branches: List, Create, Switch to, Merge, Push, & Delete

Depending on the job scope, DevOps may not be necessary. However, if you ever need to deploy ETL pipelines or productionize the trained model learning model etc, you can’t avoid docker. To make the configuration easier, you can try docker-compose.

2. Learn cloud computing

Usually, companies that have a demand for data scientists should have some cloud infrastructure from different cloud service providers. The top cloud platforms all offer free trials. You might even create a project using the cloud for your portfolio before interviews.

It’s impossible to learn all before starting your job. So choose the one that is the most popular among the employers that might hire you. How to find out which cloud platform your dream company uses if they did not write in the job ad? Go to LinkedIn and search for data scientists who work there. They might describe the tech stack in the profile or work experience section.

Amazon Maintains Lead in the Cloud Market. Image source: Statista.

If you want a less steep learning curve, start from AWS. They have the largest user base, hence plenty of free online resources to get you started. Once you have mastered the cloud infrastructures in AWS, it’s a breeze to pick up Google Cloud and Azure.

Apart from technical skills, I want to share what I have learned from my personal and my peers’ past experiences on how to deal with people. Hard skills, which everyone invests the most time in, rarely challenge data scientists.

3. The job description does not represent your actual job

Some say data scientist is the sexiest job. I would say only the job description is sexy. They use it to hook as many applicants as possible. Most good employers will adjust your expectations during the interview. However, if they desperately want someone onboard, you might figure out you have been “catfished” later on.

Data scientists do not build machine learning models every day. Depending on the team size, you have to do a significant amount of “data plumbing”. Smaller companies may even expect you to work as a data engineer. Bigger companies have strict data governance regulations that can drive you crazy before even obtaining data for analysis. Your job is not as intellectually challenging as the Leetcode questions.

I strongly encourage you to reach out to data scientists on that team to understand what they actually do.

4. The quality of upstream data is out of your hands

Thanks to the data infrastructure engineers and data engineers in or outside of your team, you can pull data from somewhere that is relatively accessible and tell a story to your business stakeholders.

However, if the story you tell does not help the stakeholders achieve their objectives, i.e. generating revenue, they will think you did not do a great job. It’s harder to win their trust again. It doesn’t matter if you have tried your best to make sense of data that is corrupted beyond repair from upstream.

If you take longer than expected, the stakeholders will think you do not value their projects. Nobody cares how tedious it is to find all the data sources and the respective person in charge to grant you access.

Don’t blame the backend engineers and data engineers. We are all at the mercy of the insurmountable technical debt and data governance bureaucracy.

However, you couldn’t assess the upstream data quality before joining the team. You can ask probing questions during the interview to gauge the level of technical debt and bureaucracy. Additionally, you might reach out to the former members of the data team on LinkedIn before accepting their job offer.

5. Stakeholder management is important

The business stakeholders are your clients. They tell you their problems. You come up with a solution, i.e., dashboards or machine learning models, to solve their problems.

However, it’s not as simple as that.

First, never expect the stakeholder to give you a concrete problem to solve. Don’t take their request too literally. Be clear about what they want and evaluate if what they request can get them what they want. For example, the marketing executive may tell you they want a machine learning model to decide who to target for the upcoming marketing campaign. Knowing that they just want to spend less to retain or acquire the customers who spend more, you can help calculate customer lifetime value and customer acquisition cost instead of training a model.

Second, the stakeholders may not trust what you deliver even though you are the expert. In the beginning, you have to “sell” your solution to the stakeholders who are often not technically trained. It’s not all about impression management either. In the end, the outcome is measurable. No matter how much effort you have invested to train a model, if they do not benefit from it, the stakeholders will question your ability.

Last, speak the stakeholders’ “language”. Your business stakeholders talk about revenue, user growth, conversion rate etc. You should translate the technical jargon into their “language”. If time permits, hang out and build rapport with them.

6. Managing up

There is no single recipe for upward management. The ultimate goal is to build mutual trust with your manager. Your manager believes that you are motivated to contribute to the team. On the other hand, you also believe that your manager is actively helping you achieve your career goals. Do not distort reality to create these perceptions. Also do not have skewed perceptions of reality.

Different managers have different management styles. It’s daunting to assess your manager’s personality and circumstances correctly.

If your manager is still struggling to prove his/her value, do not outshine your manager. Otherwise, the more you do, the more threatened your manager may feel. If he/she recognizes your skills and works you to the bone, it might not be a bad thing. Don’t complain. If he/she likes taking credit for your work, take advantage of his/her weakness and ask for more impactful projects. If you can help the manager get a raise, you have a good chance of getting promoted too!

Report your progress and get feedback timely. Make sure you are always on the same page with the manager.

7. When to change jobs?

Usually, recruiters will invite you for job interviews through LinkedIn. If you receive none, work on your LinkedIn profile and showcase your skills there.

Interview for jobs once in a while and learn what are the newly acquired skills or project experiences that attract their attention. Ask your manager to assign you the projects that can attract more employers!

For junior data scientists, look at the most up-to-date tech stack for your dream company and think about what you can learn in your company. Are there gaps? You can either pick up the skills in your free time or join a new company that allows you to learn these new skills. For example, if your dream company wants data scientists who are experienced in distributed machine learning, but your current companies don’t require it, look for a new position that gives you exposure to distributed computing. Otherwise, it’s very difficult (and expensive) to learn distributed training on your own.

Job hopping is often frowned upon. However, if you can master the in-demand technical skills faster by changing jobs, you are a strategic job hopper. In the meantime, you will be rewarded financially too. Without a promotion, staying in the same company probably gets you a 5% raise annually. However, switching to another company gets you a 20%~30% raise in the annual package.

Before leaving a company, be reminded to keep the offer letter/contract with specifications of your compensation structure, the first & last pay slips, the employer’s official acknowledgment of your resignation with the date, and records of your performance bonuses. You need these for background checks.

Senior data scientists may have more to lose if changing jobs mindlessly. Your future employers don’t merely evaluate you based on technical skills anymore, deep domain knowledge sets you apart from juniors who lack the vision to see the real problems before applying technical tools to solve them. It’s probably not wise to change industry when you have established yourself as a subject expert in a field. Depending on your career goals, plan every move strategically.

Opinions expressed are solely my own and do not express the views or opinions of any organization or company.

Follow me on LinkedIn | 👏🏽 for my story | Follow me on Medium

--

--

Lu Zhenna
Women in Technology

Data Scientist | Data Engineer | Cognitive Psychology and Neuroscience PhD | 🤝 Connect with me on https://www.linkedin.com/in/zhenna-lu/