Everyone is a Data Scientist

Published in

Support Automation Magazine

3 min readJun 5, 2018

5 years ago, I started my journey as a data scientist with Dr. Andrew Ng’s Machine Learning course on Coursera. While trying to learn in depth, I came to see that people were already using this “data” since a long time. To improve oneself, to improve a company, etc. So why are we riding a Data / AI wave? Is it something new? Or is it just another buzzword to pass?

I have come to realize how important data is. Although, it might not seem like it, even Steve Jobs mentions that “Information is power”. Companies like Apple, Google and Microsoft have been embracing data for a few decades now. Although this has been at the expense of the interest people’s privacy, we humans have found ways to profit from this. (which is likely to be solved in the next few years)

“Information is power” — Steve jobs

Who is a Data Scientist? What does this person have to do?

Some basic requirements from a data scientist:

Business application: Although this may not seem important for a developer, I would feel this probably is the most time saving process. The business goals can be categorized into these departments — financial (CFO), customer relations and technical head (CTO). The financial front helps understand as to how valuable the idea/ problem is, customer relations helps save time understanding relevant and irrelevant data, and the technical head allows you to gauge the resources available to solve the problem.

Need to divide the problem into — financial (CFO), customer relations and technical head (CTO)

The data problem: Once a company understands the business problem, it faces the data problem. Even the data giant, Google, doesn’t have enough data to finish its products. It is still always collecting data to increase its knowledge base. Every stage of data science needs more data always, to start, optimize and to enhance an algorithm. Data Pipelining is also important depending on who the data is intended — business, programmers and data scientists.

Even the data giant, Google, doesn’t have enough data.

Mathematical / machine learning: To realize which algorithm to use is a skill not taught but learnt through much hardship. Being well-versed with the algorithms can save a lot during the implementation time. While 80% of the problems can be solved with regression or classification algorithms, even this step is difficult considering the complexity. Evaluation techniques (like confusion matrix for classification algos.) are very important. Having an error in an algorithm, contrary to stereotypical belief, is good. You have more room for improvement and do not over-fit. Finding this sweet spot is terribly important for the lifetime of the ML model.

Having an error in an algorithm, contrary to stereotypical belief, is good.

Implementation: While implementation might seem easiest of the lot. A good team is required to make the code to the production level. What should be a prediction equation be like? Ideally, the equations must be able to predict a day’s worth in half an hour. In python, Falcon (a framework for API calls )t can reach about 102,500 predictions per second (CPython 2.7.14). The ability to seamlessly create dashboards is also a skill. There are a few of them for each level of presentation. It could be making a case for your idea, or presenting an algorithm to the sales team

Everyone is a Data Scientist in their own way.

Strive to be the Unicorn people are looking for. Although these might seem really difficult, everyone is a Data Scientist in their own way. We just need to find the one part we are most comfortable with. (P.S. I am still trying to figure this part out myself)

Everyone is a Data Scientist

Written by Sachet Misra