What are the skills of a Data Scientist?

Sheel Saket
Artificial Coder
Published in
4 min readJul 13, 2020

Since I interact a lot with the Data Science community, I get a lot of messages from young professionals who want to start their career in data science. Data Science is a relatively new buzz word in the market and has attracted a lot of aspirants from across the world because of its high pay-scale and increased demand in the job market.

However the most common questions that budding Data Scientists ask me are:

  1. How do I become a Data Scientist?
  2. I have learned how to code on python, is it enough?
  3. Do I need to study Math?
  4. How is a Data Scientist different from a Machine Learning Engineer?

And the list goes on…

Basically, they lack the insight into a Day-In-The-Life-Of a Data Scientist. So in this post i am going to share what i feel are the most important skills that a Data Science aspirant needs to have or develop to become really good in this field:

Skills:

Business:

This is a very important skill if you want to be a Data Scientist. A Data Scientist isn’t just a developer, he/she is an innovator. He/She understands the business and comes up great ideas to resolve the problem. This is where a Data Scientist is different from a Machine Learning engineer whose sole responsibility is to develop and deploy models. A Data Scientist doesn’t necessarily solve a problem by training cool models, he/she identifies a problem using and backed by data and then comes up with an innovative and effective solution that helps the business generate value.

Statistics / Data Crunching

This is again, one of the most important skills needed to become a Data Scientist. A Data Scientist is a mathematician at core. No matter what type of data you use, its all about pattern recognition and deriving a probabilistic outcome for a future case scenario. One can not be good at Data Science without being good at Statistics and Data Crunching. You will have to play with numbers and understand the data set. You will also have to learn all the Machine Learning Algorithms and understand their basics such as their assumptions and logic.

Lets take a test: Do you remember all the assumptions of a Linear Regression?

If yes, then good job.

If NO, then you need to revisit the theories. You can not apply a model training without knowing what the logic of the model is based on! This seems to be a small yet very important factor for people not being able to perform in Data Science.

SQL/OOP

Well here comes the coding! As a Data Scientist, you will be pulling huge amounts of data that might be sitting on SQL server (most probably). You need to be really good at pulling the exact data you want, transform it and, on top of it, be efficient at it. If you run an unoptimized query that takes 10 hours to get you the data, well you are not doing a good job. So start brushing up your SQL skills.

OOP stands for Object Oriented Programming and the most common one these days is Python! You will have to understand what OOP is and how can you use objects and methods to define tasks. OOP is easy to learn and apply. So if you know how to define classes and functions on Python then you are doing a great job!

Algorithms and Data Structures

I have seen people with really good Statistical Knowledge but with poor programming skills. They know what needs to be done and how to get things done but they are really inefficient coders. They would write scripts that are 100% accurate but really really slow at computation. Their codes run 5 times longer because they didn’t work on the algorithms. They don’t focus on the TIME and SPACE complexity of their code and this is where knowing and applying the best algorithms will make things faster. Knowledge of Data Structures will help you utilize the power of a programming language.

If your script has multiple nested for loops then you probably need to work on efficient coding.

Visualization

And finally, you need to be a story teller. Whatever you investigated using the data, you need to be able to share with your stakeholders using the power of Visualization. Flow Charts, Pie Charts, Bar Graphs and Wordclouds are how you communicate in the world of Data Science. Python has really amazing libraries that can help create beautiful dashboards which you can deploy and empower the user to understand your product!

--

--

Sheel Saket
Artificial Coder

Data Scientist. NLP expert. Follow me on twitter @ArtificialCoder