Data Scientists, Trainings, Job Description, Purple Squirrel and Unicorn Problem

Jaganadh Gopinadhan
6 min readSep 2, 2020

Preface

Last week my friend and colleague Srivastan Srivsan’s note on LinkedIn about Mathematics and Data Science opened an excellent discussion. Well, it is not something new; there were debates in the tech domain such as vim v.s emacs to many others. The debate about Math and Data science has elevated to new areas every year since 2013. Above all, the industry notion (or confusion) about Unicorn Data Scientist remains as a catalyst to the debate. And the HR is in search of ‘Purple Squirrel.’ Why are we debating? That is an interesting question to ask ourselves.

Problem of Definition

A definition suffers from three types of problems they are defect (or narrow), over-application, and impossible (mismatch) (Borrowed from Indian Philosophy). The debate of Mathematics Specialist and Data Scientist is all about definition. The term ‘Data Scientist’ appears in a Job description for various job roles. Still, the title is Data Scientist, and we search for a person who can do everything.

KDD2020 had an exciting session on Training Data Scientists of the Future. Eminent personalities in the area, such as Thomas Davenport, Usama, and Keith, were leading the discussion. One of the suggestions from Davenport was;

“They should circulate a draft list of job types, ask for commentary, and then finalize the list. Then ask those who practice each job what the necessary skills are. Again, send out a draft list, ask for comments, and finalize the skill list too.”

I would say Davenport was spot on point. There are thousands of recruitments Job Description out on the internet. Most of them are trying to find the purple squirrel or the unicorn in Data Science and Machine Learning. Lack of uniformity in the JD with-in industry and within the same organization is a significant gap in Data Science, Machine Learning, and AI recruitment. What we need is a rule of thumb to write a JD based on what we are going to achieve. Let’s discuss this in detail later.

Changing Industry Patterns

Well, what is the relation to Mathematics and JD? The role of the Data Scientist evolved over a period of time. It is almost ten years since the term Data Science started appearing in JD. From 2010 to date, many technologies evolved, died, and resurrected. From sklearn and ‘R’ to Mahout to Spark to H20 and TensorFlow and ocean of frameworks. There was a time (pre-2010) NLTK was the only Natural Language Processing framework in Python (yes! we had MontyLingua RIP). Perl was a swiss army knife for many NLP tasks to start with. Above all, theoretical advances, including Deep Learning and Reinforcement Learning, is commendable. Early 2000’s when we used to go for Computer Science faculty development programs; we mention theoretical aspects of RL. Now students in the same college will show RL demo with OpenAI Gym! That is about change in technology and learning.

What a data scientist does in an enterprise changed a lot too. The nature of use-cases, the volume of data awareness about need, and the ROI of Data Science problems increased. Project objectives are very focused on the enterprise. AI/Ml and Data Science adoption are attaining maturity level in most of the companies, beyond adjusting to hype circle.

The missing piece in this game is the categorization of Job Roles and expectations. The job of a Machine Learning Scientist is different from a Data Scientist, and it different from Machine Learning Engineer. Hence one size fit for all JD’s is no more relevant. The question is who is a Machine Learning Scientist, Data Scientist, and Machine Learning Engineer (there are more titles to add).

Who is Who?

A Machine Learning Scientist is one who designs new algorithms (maybe based on existing algorithms) to solve a specific problem or a set of issues in general. What is expected is the ability to formulate a hypothesis walk to prove the same in a very scientific method and implement it (maybe expectation may go beyond the same). Sometimes the persona will be responsible for implementing the theory and bring a new framework or system. To understand how this looks like is think that you are going to work for the core TensorFlow, PyTorch, or Watson team. The job is not to perform API mashup from only existing libraries. In such a position, knowledge in programming, Mathematics, and Machine Learning is very critical. Some of the companies call this role as an Algorithm Developer (AI/ML/DL…). When hiring for such a position, the HR concept of Purple Squirrel may be relevant. Training skills and background are essential. Most of the time, experience may not be a blocker for such roles for the right candidate.

The Data Scientist’s role in the enterprise is to solve a given problem with existing algorithms. Starting from the Business Understanding to handover the model to production (AIOps) will be the range of typical responsibility. Everybody will be searching for unicorns in this space because the end to end Data Science is the expectation. Successful enterprises focus on hiring people who can build models and be creative in the data. For all the practical purposes, such a person should be tagged along with a Data Engineer and functional specialist. Such a structure adds the burden of an additional role as a Project Manager. Still, it has long term benefits. We will discuss the strategy bit later; let’s get into the JD writing for Data Scientist.

The rule of thumb is not to expect a Unicorn ;-). If you are looking for a short term staff, think about what your team would like to achieve in the foreseeable horizon — problem statements in hand type of data what is expected from the data scientist: current and expected technology stacks and level of experience desired. All of these points will help you to draft a clear JD. Such a JD makes the life of a recruiting agent much simple. For the long term staff (probably hiring a fulltime staff), one should do some groundwork.

For long term hire, there are many things to consider. First of all, ask the question of what is the organization’s AI vision for the next three years. Well, if you don’t see one for the organization, try to create one for the team/business unit first review and finalize it. (Sometimes it is better to hire a consultant to assess and recommend a strategy). When hiring for individual teams, we may expect the candidate to know the domain and experience or knowledge in Data Science. Determine the domain knowledge requirements; if you have functional experts in the team, you may be able to relax this. (There are many domains where specific experience is challenging to achieve without working in the industry. ) What problems we need to solve in the next one/two/three years. To solve such a situation, what kind of algorithms may be helpful. Are you interested in swimming in the algorithm framework wave? Answer to these questions will help you to narrow down algorithm level expectations. Now it is time for technology frameworks; here we will decide R or Python or Spark, etc..

Last but not least, the skills to explore the data is essential. One can generalize or specify technologies in this part. It is better not to expect a candidate to be hands-on Natural Language Processing and Time Seris at the same time. What we are looking for here is clear and measurable JD concerning skill, knowledge, and experience.

In a Data Scientists role, problem to solution plays an important role. A person who understands statistics and linear algebra, trained in Machilearning and Data Science, should work. The ability to systematically approach data and drive the desired business outcome is the primary goal in this role. Bringing a new algorithm is often not an objective at all. Here the mathematician part will get diluted.

One can argue! Shouldn’t we still refer them as Data Miner or Analytics Professional? That is an excellent question with debatable answers?

AI/ML and Data Science platforms, API’s and AutoML are hot topics and trends in the industry. The trend contributed to a new set of roles for AI/ML/Data Science Journey. Machine Learning Engineer, AI (….) Developer and AIOps Engineer, etc.. Will discuss these roles in detail in a separate note. Now the story from the hiring manager standpoint is whom you are hiring? Depending on the same, you may need a strong mathematician or not. For an aspirant, what role you are fit for is essential to select a learning path. It is time for all AI/ML/Data Science course owners to publish skills it attempts to develop. Skills developed by a course and skills expected by a prospective role/employer is comparable.

To be contd….

#ai #aiml #datascience #machinelearning

--

--

Jaganadh Gopinadhan

Artificial Intelligence and Analytics Leader | Sr. Manager Projects at Cognizant