Machine learning classical tech debt

#4 — Be accountable for the Tech Debt

Let’s find out how much technical debt do your machine learning projects have with this game.

Nicolas Rodriguez Presta
Mercado Libre Tech
Published in
4 min readJul 30, 2021

--

This is story #4 of the series Flight checks for any (big) machine learning project.

Well, at this point, we already have KPIs, a team and a fast baseline in production.

If we feel that we have achieved all this in a short time and with good results, then we have done everything right. However, if we feel just the same but without noticing we have accumulated a large amount of technical debt, we just haven’t kept track of this debt, and that’s too bad.

I have no idea about your project, your company, or the conditions that set the frame to your work, but I am sure that if you’ve managed to produce an ML system in a short time, and that there has been great initial drive and speed in the result, then in part it’s thanks to getting into technical debt. This represents the best-case scenario, where we have only abused a little of the classic mistake of “Shortchanging quality assurance to improve development speed”.

Let’s make it clear. That’s okay! What you have done is to finance part of the time-to-market by making technical trade-offs. As we have already seen, having a better time-to-market not only reduces uncertainties and enhances the design, but at a business level, it can competitively mean the difference between winning and losing the game. This allows for quick adjustments if necessary and is the way to learn lessons that would otherwise be very difficult to learn.

It is good to take on technical debt as long as we are aware of its implications and understand it is convenient to pay it off as soon as possible, just as with high-interest debts. Otherwise, the work and design schemes that allowed you to move out quickly in the first iterations may be the ones that limit your speed and scalability in the middle and mature stages of the project. It is very important for your sponsor to understand this. If not, you run the double risk of accumulating future technical problems on the one hand, and generating unrealistic speed expectations on the other. A recipe for failure.

This applies to any software project, of course. However, in machine learning projects, there are some peculiarities of technical debt that are worth reviewing. For a deeper analysis I recommend this paper, but in the meantime, let’s play a game!

Let’s play a game!

While answering the next questions, keep count of the number of times you respond “Yes” (be honest with yourself!)

  1. Could you generate and validate your model dataset in less than 1 day?
  2. Are all the producers of the data in your model aware that you are using them in your project?
  3. If a producer of any data consumed by your model needs to modify the data it generates, would you get a notification?
  4. If a producer of any data consumed by your model needs to modify the data it generates and you find out about it, would you allow them to do so since it would not create problems in your system?
  5. Does any part of your system have cascading models (models that adjust the error of other models)?
  6. Could you assert that all the features used in your model are necessary?
  7. Could you assert that all the features your model uses still behave in the same way (with the same semantics) as the learned model?
  8. Does it take you less than 1 day to understand whether an old model could replace a new one?
  9. Are your predictions about the behavior of new models in production always accurate (plus or minus an acceptable delta)?
  10. Have you identified which of the features consumed by your model are affected by the prediction it generated in the past?
  11. Given the productive model, do you have easy access to the code used in the training?
  12. Given the last model, do you have easy access to the dataset was used to train it?
  13. Given the last model, do you have easy access to the code applied to generate the dataset used to train it?
  14. Is the whole code mentioned above versioned, and does it have a release process?
  15. Does your system support A/B model testing?
  16. Can you state that the features your model consumes are all being generated correctly?
  17. Could you re-run your entire last model generation pipeline and reach the same result in less than 1 day?
  18. Do you know all the consumers of your model?
  19. If instead of maintaining the number of models you maintain now, you were to maintain 10 times more, do you think that, at the most, the maintenance cost would double?
  20. Doesn’t the Data Science team that generates the models depend on other teams to be able to productize your new iterations?

Self-assessment

Now with your total “Yes” count, check your score below and find your technical debt diagnosis:

Less than 2 = Danger: You should start taking on technical debt as an issue to talk about with your team.

Between 2 and 5 = Warning: It’s never too late to prioritize these technical debt tracks. Is your sponsor in the know?

Between 6 and 10 = Good enough: By speeding up the execution of those forgotten technical debt cards, you’re well on your way.

Between 11 and 15 = Very Good: I dare say it is the 95th percentile of technical quality in machine-learning projects. Help other projects with your learnings.

Between 16 and 19 = Excellent. Your project is an example that deserves to be shared.

20 = Is your LinkedIn profile updated? We are hiring! 😃

Finally… What’s your score? How do you feel with it? Let me know in the comments!

More flight checks are yet to come, stay tuned! 😉

--

--