Why I’m not (yet) into data science

People are jumping into ‘data’ bandwagon. People who previously developer who doesn’t know anything about statistics, suddenly have data science title on their résumé. It’s hip and cool to have that title nowadays. When there’s something cool in tech space, I usually follow suit, but not this one. Whenever people asks me if I do ‘Machine Learning’ or ‘Big Data’. Short answer will be ‘no’. Long answer will be ‘yes, no, not really.’. These are my reasons:

Fundamentals: Data and Computer Science is closely related, but their fundamentals are different. Computer Science deals with how to generate and process data while Data Science is how to make sense of the data generated. Data Science is closely related to statistics. One cannot do Data Science without statistical modelling approach. A software engineer or doesn’t really need statistic knowledge to do their job. Statistical Modelling for me is not fun. Domain Modelling, however, is exciting. I already have strong basics and fundamentals of Computer Science, but Data Science, my fundamental is almost none, or meager at best. You can identify Data Science poser by asking them simple data modelling. If they can’t do it, don’t hire them as one.

Goal: The goal of Data Science is to deduce better direction or solution based on the facts while Software Engineer is creating said solution. I love writing code, building infrastructure, and throwing data around. I don’t find capturing data and deducing solution fascinating. I find writing one is a lot more fun.

Day to Day Job: A Data Scientist will need to do ETL and extraction from data sources and in a lot of cases, those data are dirty and need to be cleansed. Being code janitor, cleaning up code and refactoring appeal to me a lot, but janitor-ing data does not.

I’ll happily set up Kafka, Hadoop, feeding data to it. I may have one or two opinion about how to setup the data pipeline. Heck, I might be very happy to implement the analysis in the software itself. However, for now, I’m not really interested on making sense of the generated data. For those, I might just hire people who are excited to do so.

