I mean if you’re going to make a mistake might as well have something nice to stare at ;)

Starting at a Startup — 7 Machine Learning Mistake

9 min readOct 21, 2016

#MachineLearning #StartupLife #Coding

#tltr: If you’re tackling a new problem in the industry, you’re likely to be asked to produce results without: dataset (training or testing), evaluation metric, clear functionalities and expectations, enough time, and QA process. Here are some things to keep in mind and tips to efficiently tackle each of these issues.

A good breadth and depth of knowledge in Machine Learning is necessary for any job in the field. But is it enough? If you’re tackling a new problem in the private industry, it’s likely you’ll be asked to produce results without a great deal of information that are often taken for granted in ML bootcamp or classes. Here’s a short list of things to keep in mind as you take on your new job.

1. Thinking you have a training dataset

So you achieved 0.13% error rate on the handwritten digits MNIST database? Very impressive and probably a new record on the benchmark dataset. Good job beating committees of ConvNet all around the world! But how does that translate to that new functionality no one has tackled yet that the business department has put on the table last Monday meeting?

Odds are a lot of the challenges you’ll be facing at your company are so specific to your domain that hardly any dataset exists at all. Medical imaging? Sure there’s a lot of interesting data out there, but will it cover that specific subcategory of breast cancer your client is desperate for help? Or will there be an available corpus for you to build that Chatbot answering system targetted towards a very niche domain language?

In an age of Big Data it’s easy to assume data must be available somewhere. Here’s a reminder that there are a lot more datasets to be gathered than existing ones — and thank god for that; it’s also why you have a job!

2. Thinking you have a testing dataset

You’ve done your homework, data munged for a week, build a model, tested on a few samples of your own choosing and it seems fine. That’s all great, but let’s get serious and pull some metrics out of that. I mean, you do have a labelled dataset … right?

Similar to #1 you’ll eventually find yourself in a position where labelled data is rare and labelling is tedious. There’s all sorts of options laying in front of you, it’s simply a question of having the right mindset. You may want to invest in externalizing the labeling process through some form of crowdsourcing. For example:

You can also try to do in-house by buying pizza for your team (it has worked for me in the past). Still, this isn’t done without careful consideration.

Clear explanation of the task with examples of typical and edge cases. If you think the task is boring for you (someone who allegedly cares about the outcome), put yourself in the shoes of someone who doesn’t. Make it easy for them — it’ll pay off in the end.
Repeated-labeling can improve label quality and model quality — but not always.
Careful selection of points for labeling is an even smarter way of doing things (assuming you have the time and resources to do so)
Labeler quality should also be estimated (compensation, quality control, spammer detection, etc.)

At the end of the day, you’ll likely want to brush up on your statistics to squeeze all the predictive power you can get without overfitting, my friend. You’re in for some fun.

3. Thinking you have a metric to evaluate on

Machine learning progress can be measured by the benchmark dataset we have created as a community (think MINST, ImageNet, SemEval, etc). Each year we pin our algorithms against each other in an attempt to beat last year’s record. Evaluated by the same strict objective metric every year (accuracy, recall, precision, f1-score) we can accurately compare and measure our progress. Here’s a quick recap as refresher.

But ever heard of relevance? Your client-facing team definitely knows about it. It’s that intangible gap straddled between user’s expectation and the result of your system. Search engines and recommendations fall prey to this ruthless, subjective and ambiguous metric every day. You thought you were doing machine learning but in fact you’ll soon find yourself in the business of mind-reading.

In a way, relevance is similar to Stewart’s test for obscenity, and you may very well be facing the same kind of eloquent speech when discussing what exactly your client would like your system to spit out

“I shall not today attempt further to define the kind of material your system is required to output, and perhaps I could never succeed in intelligibly doing so. But I know relevant when I see it, and the result your model provided is not that sir.” — Your user

What is relevance? You’ll know it when you see it…

4. Overconfident estimate

This issue transcends software engineering and has been around for ages. Typical symptoms are:

A problem appears simple…
Pressure to deliver faster from all sides…
And now you’re being put on the spot for an estimate…

It’s easy to say “half a day… at most..” when your CEO asks you how long it’ll take for that task to be completed — especially after a 30 minute discussion about how the engineering team has been missing deadlines lately; and the team is presenting at a conference the next week; and the functionality is required for the demo. No stress.

Unfortunately, when you go back at your desk and start realizing the quick changes you anticipated actually domino-effect to much more than you originally thought, you’ll be the one doing damage control. And you don’t want to be in that position.

“Quality needs time and not just skill. And smart developers frequently overestimate their capability. Finally they end up taking ugly hacks to finish stuff on a self-committed suicide timeline.” -RDX

Give yourself enough room to fully investigate the opportunity, identify and communicate potential pitfalls, and write some good code.

5. Not planning for Quality Assurance

Model validation is a must in every Machine Learning task. When you build a model for say a topic modelling problem you’re likely to always want to look at the accuracy of that model (i.e. number of correct predictions from all predictions made). You’ve done your homework, chosen your data well, done everything right, evaluated its robustness through cross-validation and finally your model achieves 95% accuracy. The team is excited and pushes it into production where it provides quality classification.

Most of the time…

At the end of the day, your training data is provided by humans, and no system has access to all possible training data. So what do you do in those 5% cases of bad results? How critical is it to your scenario? Should you be focusing on precision? Perhaps, but even then 100% is unlikely. Is there a process for detecting those cases? Internal monitoring? Reporting mechanism for users? Are the bad results, flagged by your team and/or users, being stored somewhere for further training? Is your algorithm ready for this kind of online, incremental improvement? Or do you have the time and resources to batch-train another model if that’s not the case? What’s the turn around if a critical failure is found?

Just a few things to think about.

If the output of your model is user facing, you’ll want to prepare for human intervention within your process. One bad result can shatter your user’s trust faster than we’d like to admit ourselves. There’s already a blackbox-type aura surrounding machine learning and it gets worse if that blackbox is spitting garbage from time to time. 96% accuracy is better than 95% but it won’t mean a thing to a user who relied on a misclassification that led to a bad decision on his part.

Errors are inevitable; make the most out of them by planning for QA.

6. Failing to be goal oriented

It’s easy to get enthusiastic about something. Taken from O’Reilly

As a Machine Learning practitioner (really you can have whatever fancy title you desire), you bring a unique perspective to the product planning table. Your fine understanding of how to manipulate, model, extract, and predict data; your realist perspectives on the limitations of both state-of-the-art approaches and data availability; your humbling acceptance of doing your best facing increasingly complex question and requirements; all of this, puts you in a prime position to not only share your insights, when requested, but to also actively seek out what could be done.

Have a vision for what you can bring to the company.

If you’re anything like me you can easily get excited by a lot of interesting avenues of research. There is so much interesting stuff to be done out there that its sometime hard to focus on one thing.

Nevertheless, and more often than not, you’re also hired to contribute to a specific vision within the company. Spend some time pondering on how your ideas can be useful to the company. Can it be profitable? Will it shine an interesting light to the company? Is it useful for recruitment? Does it open a different but potentially long-term market? Does it reduce development time for your team? Who’s life is it facilitating? How much time will it take to develop? Are there any competitors? Is your idea better? Different? Novel? Does it build an internal expertise that will be useful in the near future?

These questions are not meant to bring you to any sad realization that its not worth investigating: they’re designed to empower you, because they will most assuredly be coming from other members of your team eventually.

Have a long term and well rounded vision for your ideas and projects so that others can easily see and join in your interest.

7. Overlooking communication

There’s plenty of good reason wanting to transition to a new stack of technology.

Easier to maintain (reducing long-term cost).
Access to new libraries (new services and functionalities).
Faster development (reduced implementation time).
Facilitates scalability (long-term user support).

You’ve thought it through and considered everyone’s perspective. Business, marketing, strategy, product, design, social-media, UI, back-end. It just makes sense to go in that direction.

But nothing happens…

You’ll have plenty of good ideas during your career and plenty of opportunities to be enthusiastic about them. Remember to make that enthusiasm viral within your team. You don’t want to be left alone thinking something is cool. Truth is, its less likely to ever happen if that’s the case.

Communication is key to making things happen. Whether it’s convincing management of a new direction for the company to switching database.

Agree? Disagree? Let’s hear it! In the meantime enjoy the ride.

8. BONUS: Thinking you’ll be Deep-Learning day and night

A lot of things can be done using linear regression. SVMs still finds his way in many production pipeline and people uses K-Means on a daily basis. While there are good reasons to be excited by the results of DNN, your day to day work is likely to involve a lot of different component ranging from designing storage solution, cleaning your data, ETL pipelines, exploratory analysis, discussion with the product team, general meetings, and negotiating deadlines. Depending on the stage of your company and product — you’ll be performing varying level of it all.