What Does Success Look Like For A Data Scientist?

Andrew Dabydeen
11 min readAug 1, 2022

--

The short answer? There really isn’t one — but that’s the true beauty of the field.

What Does A Data Scientist Do?

Introduction

There’s no way to get around the fact that Data Science has been a hot topic for the past decade but what even is Data Science — more specially, Data Scientists? Now putting aside adjacent roles that work with data in similar ways like Data Engineers, Analytics Engineers, Machine Learning Engineers, Data Analysts, etc., the role of a Data Scientists hasn’t been clearly defined (much like the roles listed above some might argue) and time hasn’t helped much with this clarity — personally I think it made it even worse.

Within the title of being a Data Scientist, you could be focused on Product Analytics, Marketing Analytics, Core Machine Learning, General Strategy & Business Operations, and the list goes on. On top of that, the tools and methods you’ll be using can vary drastically depending on the role so what should you focus on? Will you being writing SQL queries all day? Should you know Confidence Intervals like the back of your hand? Are you going to do everything in Tableau or Looker? It really varies depending where you go and the type of team you join as the role can mean many things from company to company.

For Product Analytics, a Data Scientist should be familiar with A/B testing, understanding what Key Performance Indicator (KPI) a new product feature is trying to move the needle on, and be perfectly okay with not touching a ML model months at a time. A Data Scientist focused on Biz Ops might be spending their time trying to understand what metric they are looking to optimize for their churn model, understand how to forecast upcoming demand based on various inputs that deal with seasonality trends and external factors that can’t be captured by internal data. While both roles require a core skill set — the day to day varies a lot. And while it doesn’t really matter for your first role or two within Data Science, one should definitely think about which vertical they want to spend their time it and become a Subject Matter Expert (SME) in.

I want to spend the rest of this article talking about four pillars that every Data Scientist should at least have their toes dipped in to stay relevant in this ever changing field — Data Engineering, Business Intelligence, Inference & ML, and Selling Your Work. What I’m trying to get at is be full-stack and have knowledge on every part of the modern data stack, even if there’s an expert on your team that can optimize that piece and take it even further. Now I’m by no means an expert in the field but I spent a lot of time entertaining these ideas on a daily basis. I hope the next few minutes of my stream of consciousness can help alleviate some thoughts that might be going through your mind — especially for people that are new in the area and might not know what to focus on or even where to being. My advice at the end of the day is just advice and to be frank—try to figure out if its relevant and how you’d like to apply it.

Data & Analytics Engineering

A Basic ETL Flow

I would be remiss if I didn’t mention one of the most important pillars in a Data Scientist’s position, Data Engineering. Yes, you can perfectly find yourself in a situation where you won’t have to touch a data pipeline in your tenure (especially at more mature organizations) but regardless, you should be familiar with the ideas that make up this vertical as the skillset is becoming more popular today — it also helps if you’d want to make a career switch later on. Knowing how your data is sourced, the frequency it is refreshed, and problems that could arise is crucial for work further downstream in the modern data stack. I don’t know how many time I found myself in scenarios where I randomly saw duplicates in my data, fields missing or showing up as null, or the entire pipeline breaking because a table upstream was dropped. All this led me to go back to fix the source of truth and to make things a little easier later on. If you have knowledge here, you can save yourself a lot time and effort and more importantly, make yourself that much more independent and trustworthy.

Maybe you find yourself in a position where you’re creating a pipeline from scratch or editing someone else's — you still become more familiar with the data once you start working with the underpinnings of how it’s built. This is very different from just analyzing it or making reports as you get to understand what actually goes into making that dataset.

There are a lot of tools like dbt, Airflow, Airbyte, etc. that are popular nowadays for younger companies. More mature organizations might have a different data stack with proprietary tools but the idea is same and often mainly comes down to SQL, Python, or Spark (a plus to know). Knowing the basics of these go a long way, especially if you’re looking for your first position. Data manipulation and wrangling is definitely going to appear again further down the data stack so you’re covering multiple areas with this preparation. To be honest, who knows if these tools will still be relevant in 10 years time? It’s the process of obtaining the knowledge in using them and being part of the evolution of data that counts the most — knowing how to learn a new tool/skill is priceless and will make you that much better of a data connoisseur.

Business Intelligence

Reports and Dashboards Get The Key Ideas Across

After you have your core data pipelines built out and maybe a few important KPI’s identified, you probably want a straightforward way to visualize this, observe trends, and possibly give the end user (your stakeholders) a playground that they can work with and make some insights on their own. Business Intelligence (BI) allows this and though sometimes this might take up a large portion of your day, most executives and leaders focus on these dashboards and reports to understand how well the business is doing. BI might not be as code heavy as other parts of the data stack but it gives you something the other parts might not, more face time with the business leads and an understanding of what they’re focusing on which will give you a leg up on uncovering additional insights.

Now I get it, you might not want to spend most of your time building out dashboards, adding trend-lines to charts, or changing which KPI’s are displayed first in a dashboard but BI is vital to a Data Scientist and can help get your main ideas across. Knowing when to put your BI hat on versus a deep dive analysis or presentation can save you a lot of time. There are many tools like Tableau, Looker, PowerBI, etc. that are popular and all of them excel in particular areas while lack in others. Cohort analysis, forecasts, executive dashboards, etc. are all critical to the business and you as a Data Scientist can have a say here. Heck, sometimes throwing some data in Google Spreadsheets and making some charts might be faster than creating a dashboard in a BI but it’s the experience that gets you to recognize when to do what.

One of the hardest pieces to combat within this vertical is being too reactive and on the hook for when things go wrong. No matter what, you’ll probably find yourself in a position where a report is broken or a value is not populating a dashboard correctly. It’s something to keep in mind when you’re sharing your work and setting the bar with your stockholders — don’t put yourself in a position where every time something goes awry, you’re on the hold to fix it. Work with your end-users to help them understand how they can potentially help themselves. On one hand, you’ll end up with more data literate users in the company and on the other, you save yourself a lot of time.

Inference and Machine Learning

Support Vector Machine (Machine Learning Model)

Now we get to the piece where every Data Scientist might think most of their time is going to be focused on (this might be the case for some positions, especially at older organizations where the field is more mature) but again like the other verticals mentioned, it’s going to entirely depend on your role and position. The Inference and ML piece of your job is going to set you apart from a lot of the other verticals required of you. The skillset you can gain here is vast and will really depend on what you want to focus on. Do you want to work with Product Managers and understand the impact of new features they launch (Product Analytics)? Do you want to work with Finance and help them forecast the amount of subscribers they’ll see in the upcoming quarter (Forecasting)? Do you want to be able to predict who’s going to buy a product based on their behavioral choices, understand their propensity, and potentially how to engage them to buy these products with marketing tactics (Marketing Analytics)? It’s up to you to decide!

If you’re applying to your first few roles, try to understand where the position falls in some of the work mentioned above — this will give you a better idea on what to prepare for and what you might be asked for in the role of even during the interview process. I can go on and on and say prepare to answer questions like “what does a p-value mean in layman's terms” or “how to you combat overfitting/under-fitting in your ML model” but no one is going to know what to prepare for unless you understand what the nature of the role is. This goes for interviewing and performing on the job.

One thing I can say here is understand your basics in statistics and computer science. Do you need to be a ML expert? Probably not, especially with the rise of Machine Learning Engineers who will productionalize your insights or Research Scientists that focus more on which model will perform this best. You should have a firm grounding in understanding probability, data structures (Leetcode Easy to Medium for anyone wondering and about to do the grind), and basic ML models — and not just how to implement them but knowledge from understanding what a beta coefficient might mean in Linear Regression and how it’s different from a beta coefficient in Logistic Regression to why tree based methods might be better in a certain situations over Support Vector Machines.

I’m being very cautious with my advice here because of the ambiguity that encompasses this part of the role. No one is truly going to understand what you might do in a position which is why you should talk to the hiring manager and review the job description. The role can mean many things and having an expertise in forecasting versus A/B testing can be time consuming so pick your battles and don’t spread yourself too thin. Get a basic idea of most Data Scientist roles and the work that goes in and from there, gain your expertise and battle scars which will set you apart. You might find yourself in a role that does a little bit of everything (which is the case if you are one of the first few Data Scientists at a company) but try to be a domain expert in one part of the process here, this will help you out later on.

Selling Your Work and Yourself

Change Perspectives With Data

The last part of your job and in my opinion, the most important part, comes down to communication skills. It doesn’t matter how smart you are or the fanciest dimension reduction algorithms you throw at something, if you don’t know how to explain your work and sell it, are you truly successful? The position of a Data Scientist is the combination of being technical and convincing those around you with the data you’re working with. I’ll keep this short and sweet — the biggest piece of advice I can give here is explain your work like those around you have no background in data, Explain Like I’m Five (ELI5).

Your work is as good as the people that are influenced by it — there’s satisfaction in knowing that you changed someones perspective using data and that you’re helping the company be more optimized in a certain area. This takes practice and there are many things that can help here like a good manager, a strong mentor, and most importantly, an audience that is willing to consume your insights. Being in a data driven environment goes a long way and will help you succeed. More importantly, you can be one of the keys that makes your organization more data driven.

There are really no tools or hard skills that you need to know as this is more of a soft skillset that comes with time and practice. In general, there are a very small amount of people that are naturally good at this (I know because I’m horrible at it) but it can come to everyone with time and patience. This doesn’t just apply to data but all aspects of life. Your job as a Data Scientist is not meant to make things complicated but more digestible and easy to understand.

Conclusion

So we covered a lot here and some important skills of what a Data Scientist should know at their job to make them successful, some that might require more time and dedication than others. The field is overall very ambiguous but if that’s the environment you thrive in, it’s rewarding and fun. I like thinking about Data Science as being a detective with multiple tools and resources in front of you. It’s up to you to decide what to use and how you’re going to use it. There are many paths to take but experience will help make this easier with time, it’s the journey that makes you unique.

My last piece of advice that I want everyone to think about is this — if you lose the ability to influence people with data in your current position, think about what it is that you’re truly doing. Yes, being a Data Scientist pays well and comes with benefits but it’s the ability to influence and turn heads which is the most rewarding part. Once you lost that ability (which can depend on many factors like an organization not being data driven, leadership fishing for data to back their solutions, etc.) then what is your purpose in that role?

I hope you enjoyed reading my stream of consciousness — I know that a lot of my recommendations are based on my past experiences in the field and the people that I worked with but I hope I challenged one idea or thought you might have, thanks!

--

--

Andrew Dabydeen

Data Scientist | Brown University M.S. 19' | Cornell University B.S. 16'