Is Machine Learning a Solution Looking for a Problem?

Santhosh Venkatesh
Traindata
Published in
6 min readSep 24, 2021
Is machine learning looking for a problem to solve? Traindata Inc

The 2020 Gartner hype-cycle highlights where Artificial Intelligence and Machine Learning are and where they are broadly heading.

Gartner Hype Cycle for Artificial Intelligence 2020 — Traindata Inc

Data labelling and annotation services are at the Peak of Inflated Expectations. Meaning we are putting all our eggs in preparing data and paying very little attention to other aspects of machine learning.

The need for Intelligence applications and business decision intelligence have also reached the Peak of Inflated Expectations.

Meaning enterprises believe that when big data is fed to applications, they will make data-driven decisions to solve internal and external business problems.

While Natural Language Processing (NLP), Computer Vision, and Machine Learning are going through the Trough of Disillusionment.

What the hype cycle indicates, among many things, is the unrealistic expectations we have developed towards AI and ML.

This has forced us to see machine learning as a magic machine that only needs a few things to give us intelligent answers, directions, or money-making products.

Traindata — Why Machine Learning Projects Stall

An MIT Sloan Management Review claims that:

  • 78% of their AI/ML projects stall at some stage before deployment.
  • 81% admit the process of training AI with data is more difficult than they expected.
  • 76% combat this challenge by attempting to label and annotate training data on their own.
  • 63% go so far as to try to build their own labelling and annotation automation technology.

When asked for the reason of failure, the respondents of the survey cited the following reasons:

  • Lack of expertise (55%)
  • Unexpected complication (55%)
  • Data problems (36%)
  • Lack of model confidence (29%)
  • Budget (26%), and
  • Not enough people (23%)

Why and where do enterprises go wrong with machine learning?

Traindata — Why and where do enterprises go wrong with machine learning

1 — Not building the right team

No data will suffice if you haven’t got the right set of people leading the machine learning project.

It would be best to have a tight, well-knit team to build your first few machine learning projects.

As machine learning is a complex undertaking, it is nearly impossible to find all the skills in one person or a few people.

A machine learning project involves:

  • Machine learning modelling.
  • Data pipeline development.
  • Back-end/API development.
  • Front-end development.
  • User interface (UI) and user experience (UX).
  • Product management.

Failing to build a team that brings all these skills together is the first mistake that stalls machine learning projects in their tracks.

2 — No sync between business expectations and ML technology

If a highly skilled team become the foundation of your machine learning project, that team needs to work closely with two entities to succeed:

1 — Your subject matter experts.

2 — Your end-users.

Essentially, you need someone to act as a product manager.

This product manager must envision:

  • how the machine learning product works,
  • who the end-users are,
  • what workflows need to be followed in the product,
  • and what decisions will the end-users make using your product.

When enterprises focus on building AI products, their focus can be too close to what they are making, and this can leave a small gap or a massive chasm between — what is expected of your AI product and what it is capable of doing.

3 — Ignoring different perspectives of data interpretation

A model or algorithm is only as good as the data it feeds upon.

Tom Wilde, CEO of business process automation company Indico, says that “The key thing to remember about AI and ML is that it’s best described as a very intelligent parrot. It is very sensitive to the training inputs provided for it to learn the intended task.”

When you hire people to label data, each data annotator can perceive the same data differently.

To mitigate bias and make our foundational ML model strong, we should have multiple people participate in the process of labelling training data.

The disadvantage with the process of labelling data from a variety of annotators is that your ML model performance might be flawed.

But this will force you to reconsider the actual ‘ground truth’ of what your data represents and how it gets interpreted by the ML model.

4 — Assuming training data is enough

Enterprises become a little too confident once they get their hands on trained data.

But we can’t just feed trained data to the model once and hope for the model to output accurate results when encountering new data.

You’ll need to run a highly iterative, scientific process to get it right, and even at that point, you may see high variability in production.

The same holds true of your simulation and validation processes, as well as ongoing performance measurement.

Your ML team will eventually find that the benchmark used for the in-production model needs to be adjusted many times over.

One of the first things modelers typically learn is that defining the right metric is one of the most critical tasks. Naturally, tracking multiple metrics is essential to understanding a complete view of model behavior.

5 — Not implementing MLOps

Machine learning is no different from any other IT project.

If IT projects fail due to a lack of DevOps process, ML projects fail due to lack of a proper MLOps process.

We cannot run any part of the ML project in silos and hope for great results.

We have read about many enterprise companies who spend years collecting big data, hiring teams of data scientists, and failing to get any models in production despite all that investment.

Why is that?

Enterprises typically expect data scientists to throw large data sets at an implementation team and expect things to work.

However, it is wrong to expect your data scientists to be DevOps experts.

Machine learning projects need DevOps-like thinking to define, manage, and monitor the project.

This is because…

While many data scientists spend a lot of time learning about machine learning, they may not be as well-versed in DevOps as software engineers, product managers, or designers are.

The reality is this…

If you fail to create automated, repeatable pipelines and tooling to containerise and abstract away the underlying implementation details — every machine learning project will never see its desired conclusion.

Andrew Ng (former co-founder and head of Google Brain) says that people usually joke that we tend to spend 80% of the time in a machine learning project on sourcing and preparing data.

However, people only pay the most attention to 20% of the time of training and modelling.

In other words, suppose we have got the most out of the training process, we still have plenty of room (80%) to work on and improve.

Yet, when he went through the latest academic papers in the AI community, 99% were researching for modelling, and only 1% were for data.

He clarified that researching boosting modelling technologies is a great thing, but we should recognise and contribute to the importance of data quality.

Andrew promotes the idea of MLOps, which helps ensure consistently high-quality data.

TowardsDataScience’s Jinhang Jiang made this graph to simplify Andrew’s idea.

MLOps — TowardsDataScience

Data labelling need not be a headache.

We are ex-Yahoo!s with over 15 years of experience managing and preparing data for large-scale machine learning projects.

We offer highly secure, fast, and economical data labelling to enterprises to build unbiased machine learning solutions in pharmaceutical, finance, and retail.

Talk to us about your data labelling challenges today at karthikv@train-data.com or visit www.traindata.us to learn more.

Further reading to optimise your machine learning efforts:

--

--