With all the excitement around recent advances in Machine Learning / AI, we’re seeing more and more focus on these techniques. But let’s just get one thing straight. Your job is not machine learning. To be excruciatingly clear, here are a few concrete examples of jobs that are NOT machine learning.
You are a data scientist.
You are a data engineer.
You are a data analyst.
You are a machine learning engineer.
You are a machine learning infrastructure engineer.
Your job title is “literally all I do is machine learning all day every day”.
Nope, your job is still not machine learning.
Machine learning is a technique. Potentially a powerful one that has seen many advances in recent years. But your job is not to constrain yourself to a specific technique. Your job is to have impact by solving business problems. And when we think about impact, everything is an ROI computation.
How ML Got Here
This emphasis on impact over technique could really be said about many fields, so what’s special about machine learning? A few things.
First, academia is designed to push the boundaries of our knowledge. As such novel techniques ARE often the end goal. No one is going to publish your paper using linear regression on a standard dataset, even if performance is perfectly fine for the application. Many of us come from such an academic background, primed to think in terms of cleverness and novelty rather than impact and practically.
Second, the investment required for effective machine learning is huge, enormous even. Sure you can train a model on your laptop, but what then. A whole slew of engineering is necessary to take any model from development to production. From real time feature computation to performance monitoring and more, there’s a lot of investment necessary. Even world class machine learning focused organizations are still working on getting this right. Unless you know your return is going to be similarly huge, a machine learning approach is at best risky and at worst can cost you and your organization an insane amount of time and money.
Finally, Machine Learning is hot right now. Pretty much every recruiter that I have talked to has suggested that all their Data Scientists do is machine learning, all day every day. Never have I seen this to actually be the case. Instead, I’ve seen several companies have to realign their Data Science teams to the fact that their job is not machine learning, all day every day. In some cases, machine learning may be such a hot topic that you and your team are given extensive leeway if you’re working on machine learning solutions. But that leeway will not last. Everyone has to show their return sooner or later.
Building Towards ML Solutions
So you think machine learning is the right tool for the job AND that that job is worth it? You probably STILL shouldn’t use it. At least not at first. Unless you really do have all the infrastructure you need at your fingertips and you probably don’t. You may be able to quickly show that working on this problem is valuable with quick and dirty methods that require much less investment of your time and infrastructure. No, you won’t land a paper in NeurIPS, but if simple heuristics, mechanical turks, AB testing, etc. are showing that effort on this problem is worth it, then maybe, just maybe, we should invest the time in the “full” machine learning solution.
Maybe you can see that full solution. You KNOW it will work. You’ve seen it before. Shouldn’t you just go for it? Nope, don’t do it! It’s tempting. I know! But in complex systems and organizations, things can change in ways that are unpredictable and uncontrollable. When that happens, you’ll be in a much better place if you have a couple small wins and some good directions to go when you can come back to this problem, rather than a couple half baked models that will one day do all the things.
Instead, by slowly increasing the complexity of our solutions as they are shown to provide return on our effort to make it so, we can build out extremely innovative solutions. But building out an extremely innovative solution to a non-problem is a recipe for some uncomfortable realignment with your management.
When ML is the Answer
Let’s recap in a positive light with the conditions where I WOULD recommend starting off with ML solutions.
- Simple solutions have been tried; the low-hanging fruit is gone
- The problem space has shown ML to be fruitful on similar problems (e.g. search, ads, pricing, fraud, etc.)
- You have the infrastructure you need to train, serve and monitor your model in production
In general, if you’re excited about ML I don’t mean to discourage. You probably should be excited about ML. There’s some seriously amazing progress happening out there. But let’s get practical first. And incrementally prove our impact as Data Scientists before reaching for the big, expensive hammer.