Data science is not a buzzword anymore. Even we know that 40% of European AI startups don’t use any algorithms at all, or the popular meme “If it’s in PowerPoint it’s AI, if in Python — machine learning” telling us doubts about technology, numbers from the numerous official reports telling us a different story: more patents, more startups, more applications, and more revenues thanks to AI (or whatever you like to call it).
As any fast growing and maturing field, data science needs standardization, both from the tools and infrastructure (as we already see with deployment tools, frameworks, and even hardware) point of view. But also developers’ skills need to defined in such a way, so it’s clear how to differentiate entry-level data scientist from a seasoned one and a real expert and, moreover, how to understand if these skills match the role needed in the company at the moment and in the future.
In this article, I have summarized my own 5+ years experience in the field as well as much more experienced partners and colleague opinions on “what a data scientist supposed to know” and, even more importantly, in which directions a data scientist can grow. It is addressed to managers, who already lead a data science team or are about to hire one and to data science practitioners who want to better understand their strong and weak points and decide where to grow. Also might be useful for managers in HR departments to understand better who and why they hire (and fire).
Who are the Data Scientists?
Let’s review first, what a data scientist is doing at his daily job. What I present below, is a really very rough approximation and doesn’t capture some important details, but here are the following roles that data scientists are playing in different companies:
Generalist, Consultant, Analyst, Algorithmist, Researcher, Engineer
It’s relatively easy to understand who you are just remembering what are you being paid for. You’re valuable as a person, “jack of all trades”, who can do anything with the given data whatever it is from sales forecasting and customer segmentation to making some demos with speech recognition if needed? You’re a great Generalist and you can be responsible for data science in non-technological companies, drive the transformations and lead your own team if you can bring value to the company. Are you proud of being specialized in some technical areas like NLP or computer vision or business fields like retail or MedTech and constantly developing and improving ready solutions? Most probably you’re a productive Consultant and with deep specialization, you might launch new products of your company and on your own. If you are important as a person, who is analyzing the stated problem, given data and taking decisions on the requirements, resources, and risks related to a project, Analyst is your call and business development might be your next career step.
Are you spending most of the time developing and training new models? Making them more accurate and faster? Most probably, you even have a GitHub with open-sourced implementations of some state of the art models from the last ICLR? Congratulations, most probably you’re an Algorithmist with a straight path to the CTO position. Probably you don’t care much about the accuracy of the model, but you care about the novelty? You’re trying to solve basic problems with new ways, experimenting with unsupervised and reinforcement learning and having a couple of publications on arxiv? I hope it’s clear that the Researcher is what you like and probably, you’re already working in a research-based startup, university or some FANG (Facebook-Amazon-Netflix-Google) lab. Developing new and accurate algorithms is cool, but someone has to make a product from them and to assure, that all the data analysis process from data collecting and labeling to the actual inference will always work and scale both from developers and the clients. If that’s what you’re responsible for — you’re an Engineer, and you have so many ways to go, I know developers who became product managers, CTOs and business development people as well!
As based on my personal biased point of view, I prefer to see Analyst role and Engineer role sort of outside of the “data science” scope, because Analyst is a person, who normally holds a business analyst position in a company, just today due to data-driven processes he has to have some understanding of data science and the tools, which, though, doesn’t make him a data scientist himself. The same holds for data engineers, that are doing important jobs with ETL pipelines, scaling, deploying and infrastructure of the solutions, but this is more related to software engineering than data science as a discipline. That’s why let’s continue with Generalists, Consultants, Researchers, and Algorithmists for the sake of integrity.
Data Science competency matrix
After long considerations and discussions (probably a bit biased into technical side of the moon) the following 10 knowledge groups have been selected: logic, computing, mathematics (I mean algebra, calculus, and related stuff), probability theory and statistics, optimization, predictive modeling, machine learning algorithms, deep learning, research and frontiers, and, finally, communication and presentation skills. The summarized version you can find in the table below (check the link for the google sheet version).
I can imagine, it looks like “way too much” for you. Do you really supposed to know all advanced mathematics, deep computer science fundamentals, have experience with all machine learning algorithms and alongside being good at logic, keeping up with latest trends and meanwhile making good presentations and flawlessly communicate your results to the seniors? You don’t actually need them all, it’s true. But if you’re looking for a seasoned Consultant working with predictive models for 10 years, you might expect him not just to plug-and-play Python libraries, but having a very profound understanding of what he is doing both from business and mathematics point of view. If you are an Algorithmist working for several years making great algorithms, you supposed to know about different frameworks, what are their features and how to regularize models well enough so they perform well in the production phase. Even if you’re an Analyst and you think can go as far as you want with communication skills, you still need a good knowledge of logic, statistics and at least some basics of machine learning algorithms to know what is possible to use and what is not. And depending on your personal or company preferences you might have good presentation skills or, maybe, knowledge of mathematics.
Skill balance wheels
To visualize these roles it’s comfortable to use so-called “life balance wheels”, but in our case, they will represent the balance of the skills for each of the roles. For example, a Generalist has to know everything on a more or less good level, that’s why he is a Generalist :) A Consultant is biased a lot into modeling, machine learning, research, and communication, with a few computing skills to prepare demos. He doesn’t need much of theoretical knowledge, because he can ask Researchers or Algorithmists instead. The Analyst is concentrated on communication, logic, statistics, optimization and predictive modeling. Designing algorithms or programming is normally not his job.
Algorithmists are often developing new models from scratch since they need good programming skills, mathematics knowledge and experience with machine learning, deep learning. Although, they may sacrifice logic, statistics and communication skills, because they’re not going to present their results often. Researchers need profound mathematics, statistics, and optimization skills, alongside with keeping up with the latest researchers and having good communication abilities. Last, Engineers may sacrifice a lot, but logic, programming, and mathematics skills to ensure that defined by Consultant, researched by Researcher and developed by Algorithmist models will work fine in the product.
To check the knowledge in all above-described fields I have prepared three separate tests of junior, middle and senior level. If you or your candidate are just starting, try the first one, if you feel more seasoned, go with the second two. Don’t be discouraged with not knowing something — don’t forget that if you’re a great Researcher, you might not know some implementation details, but most probably you feel confident with profound mathematics. On the other hand, if you’re more into Analyst or a Consultant, you may fail on optimization or mathematics questions, but you will feel well with communication, algorithms and “keeping up with the SOTA” questions. I hope that the test will help you to unveil your strong points to understand better future career steps and to see the weaknesses, that is probably stopping you from the next big accomplishments. Don’t hesitate to text me to correct some questions that you don’t find comfortable or right, or for any other help. Good luck!
It’s not very easy to find a “right” data scientist or to become such one, because there is no such thing. Depending on the company type and personal talents and skills acquired by a professional, there can be very many different matches! If you’re a manager, try to define what kind of a data scientist you need based on your projects, I hope that after this article you have a better vision on who might be useful for you. If you’re a professional, don’t hesitate to take the tests, define your strong and weak points and apply to the companies where your skills will be truly appreciated. Anyway, don’t hesitate to ask me if you have any questions, suggestions or critics. Stay tuned!
Follow me also on the Facebook blog, where I regularly post short AI articles or news that are too short for Medium, Instagram for personal stuff and Linkedin! Contact me if you want to collaborate on AI projects or you need to set up a data science team.