What’s the difference between a Data Scientist a Data Analyst a Data Engineer and a Machine Learning Engineer?

We will talk about the difference between a Data Scientist vs Data Analyst vs Data Engineer

Djibril
6 min readNov 1, 2022

Many people get confused as to what job they want because it’s not always clear what kind of work they’ll do. To make things even more confusing many companies have different definitions of what a data scientist is. The only way to know exactly what you’re applying for is to look at the actual job description so that you actually know day-to-day what you’re going to be doing. So i’m going to make things a bit more clear today by explaining the differences using the illustration by Monica Rogatti.

Let’s talk about what we actually use data for

The Data Science hierarchy of needs — Created by Monica Rogatti

What this hierarchy of needs illustration is saying is that if you can’t even collect data properly then there’s no point in working on AI or deep learning. AI won’t magically solve everything and your business probably doesn’t need AI to improve itself there are way more low-hanging fruits once you’re able to collect data for your business you then have to store them. For example software engineers might have logging that looks like this

not bad it’s data but, to be able to do anything with it you have to move it and store it that can be in relational databases, in csv files it doesn’t matter. The fact is you’re gonna have to write these data pipelines to move data from one place to another especially once you have a ton of data this becomes a highly complex distributed system problem.

The people who work on this should be really good at distributed system so usually they’re called either software engineers or data engineers. So the code you write is not perfect you’re bound to get some weird results so this is where data aq can come in.

Data aq can help you explore your data sets and create nodes to clean them up for example we’ve seen bugs where users on our app were spending 25 hours per day on the app which is impossible so data engineers will continuously work on transforming the data, cleaning up the data, so that it’s actually usable and queryable.

If your business uses data aq you’d be using their data preparation features where visually you can connect to your data sources join them aggregate them and de-duplicate them and clean them. Now hopefully if you did all the previous steps properly anyone in your company can now query that data that’s why SQL is so useful because it’s such an easy language and it’s kind of like the standard language to use to query the data from our databases. Big companies like facebook have their own internal tools to query and visualize the data.

Now we have data analysts, business analysts, pms (product managers), software engineers they can all query that data easily and answer questions like how many users have used my feature in the last week which is very important to know. You can already have a lot of impact to your business just by querying that data and being able to answer these questions and make product decisions based off of that data. In most companies, that’s all you need you don’t need to go further than that. We can do so much with this data now especially if the data is clean and actually useful.

Because we built such a great backbone, we can now build on top of it in A/B TESTING framework and this framework is an important tool for businesses to be able to know exactly what features to build and what incremental changes it has to the product. For example if I have a like button and Iwant to change the color to blue and I’m curious to see if people will click it more, well now you can with A/B TESTING. You can also run simple linear regressions to predict your user behaviors and maybe build features around this.

Data AQ has it all integrated so you can build machine learning models pretty easily

and choose what features you want for your model with a few clicks.

Now if you want to do Deep Learning or AI then you need that cleaned data which if we go back a few steps we see that it’s imperative that we properly selected and labeled the training data we also have to make sure we identified the features properly and if truly the simple ML algorithms like linear regression don’t cut it then you could think about AI and Deep Learning to improve your product.

Okay now let’s look at this with a bird’s eye view

Where do data scientists, data engineers, and data analysts fit in? So commonly data engineers would be working on the last two areas.

So explore, transform, move, store, collect,…

Software engineers mostly do the collect part since it’s usually implemented on the front-end side and a little bit of the back-end you know because that’s where you collect the user data.

Data analysts most commonly work in the aggregate level part where they have a very important job of interpreting the data and aggregating in a way where you can make decisions based on the results for your business a very good analyst will be able to come up with a strategy and a direction for a company or the product or the feature, depending on how big your company is. They’re technical but they also have product intuition and they have amazing communication skills, because you have to communicate that insight to the rest of the company.

Many companies call Data analysts, Data Scientist nowadays. In general data scientists are paid more because they usually require a more technical background however many companies use their Data scientists to do Data Analyst work because it is that vital for the company. So they get their smartest data people to work on that. Data Scientists can also work on building ML algorithms and up to AI and building Deep Learning models.

Though most of the time nowadays they’re called Research Scientists and they’re supported by ML engineers to build out the system they need.

Complex projects usually require Phd candidates because they have a specialized knowledge in some companies the roles are blurred.

So next time when you see a job position read the description and see what it sounds like in terms of the hierarchy of needs for data and that will help you determine if that position is right for you.

Resources

--

--

Djibril
Djibril

Written by Djibril

Hi. I'm passionate about self-improvement, learning, tech, AI, Python and Data Science with a background in Mathematics and Machine Learning.

Responses (1)