Flight checks for any (big) ML project
So you are about to start a new machine learning project…
As a data scientist, machine learning engineer or sponsor of the initiative, there is nothing like boarding a new Andrew Ng Airline flight, taking a seat, relaxing and enjoying the flight. After a quiet journey propelled by the new magical electricity of artificial intelligence and led by autopilot algorithms, we will arrive at our destination “Optimized-metrics-land”. We just have to enjoy those competitive advantage cocktails we have earned.
Well, really … being down to earth, we know that if you are in the early stages of a project, in this analogy you will most likely sit in the cabin along with the pilot. And … frankly, there is no such thing as an autopilot on this plane.
But calm down. Out of the many unfortunate plane crashes of the past, we have put together some lessons so that yours does not have to.
Here is a “flight checklist” for your plane to take off — and with a little effort, land too. And, of course, for each instruction you will find a specific article I created to provide further reading (link available on each title).
# 1 — No Machine Learning. KPIs first
The most common and costly mistake in an ML project is starting without having clear KPIs. It seems trivial; it is not.
# 2 — Skills diversity: building the right team
It is important to ensure that the team responsible for carrying out the project has the necessary skill set. The novelty of some of the required profiles can lead to confusion. Balance is paramount.
# 3 — Fast baseline and then iteration (fast too)
Coming out with a score quickly has many advantages. The sooner we can base our decisions on concrete measurements and less on constructs of hypothetical deductive reasoning, the better.
# 4 — Be accountable for technical debt
It is good to take on tech debt as long as we are aware of it and understand that like any debt with a high interest, it is better not to refinance and pay off sooner rather than later.
# 5 — Any ML project is a Software Project first
Any machine learning project is first a software project. Everything that applies to any healthy software project applies to a ML project.
# 6 — Stand as close to the user as you can
The investment of time is adequate to the extent that it creates and delivers value to the user. To iterate quickly, it is always advisable to position yourself as close to the user as possible on the stack, in order to take advantage of the services that already exist.
# 7 — Are your problems similar to those of the rest of the industry?
It may seem that the problems we have are unique. But it is likely that the same path has already been traveled by others. If our main problems are very different from what the rest of the industry is facing, it may be a bad sign.
Creating spaces so that different models and approaches can compete will benefit the business and generate incentives for the team to improve. Machine learning projects make it easy to implement this.
# 9 — Train serving skew & data dependency problems
No matter how much you trust your systems and processes, gap problems between train and serving data are going to occur. Acting accordingly and preparing for the worst is staying one step ahead.
# 10 — The key success factor: Knowing the domain in depth
You can improve the model or improve the data reaching it. The former is simple and fast to do in the first iterations. The second one is complex but usual more profitable. Iteration by iteration, the only way to improve the data will be to become more and more proficient in the domain.
I hope you enjoyed this and each complete article available on every link, deeply covering each subject.
Follow us and fly safe!