Top 5 Machine Learning Myths

Emma Findlow
Wallscope
Published in
4 min readApr 29, 2020

At Wallscope we use machine learning technologies to solve a number of common business problems. For example we might develop a model that automates particular tasks, or allows for more efficient management of large amounts of data.

If you’re in the early stages of discovering more about machine learning, you might encounter a fair amount of confusing or conflicting information. We’ve done a bit of myth busting so you don’t have to! Here are our top 5 machine learning myths from our Machine Learning Engineer and Researcher Angus Addlesee and Co-Founder Ian Allaway.

Myth 1:

I could teach a human this task with a couple of specific examples. Surely I could train a machine learning model with the same examples?

A machine, learning

Angus: ‘Humans and machines learn in very different ways. Most machine learning algorithms find patterns in the data from many examples. This makes them slower at picking up some tasks (like segmenting documents into coherent sections) that humans can do relatively well.

Once trained though, computers can do the task much faster than even a team of humans ever could. With any task that requires a lot of data to be processed, such as fraud detection, machines are much more efficient.’

Myth 2:

More data = better results.

Garbage in… garbage out…

Angus: ‘When a model isn’t performing, it’s tempting to put it down to lack of data. Throwing huge amounts of poor data at the problem is rarely a good option. The data has to be high quality, varied and realistic. For example, there is little point in training a model on crudely created synthetic data for a live system. The rules used to generate the data will be learned by the model and impact the results in production. Working closely with your clients is crucial to help structure their data and build trust to securely share real (from source) data.’

Myth 3:

Machine Learning models will make sense of our messy data.

Time for a clear out…

Ian: ‘We are often asked to review a client’s data to see ‘what’s in it’. We often find missing, incomplete or inaccurate data — which could stem from human error, data management issues, inconsistencies in process and other reasons the client may not be aware of. Machine learning models cannot turn incomplete or missing data into valuable insights.

Before feeding data sets into a machine learning application, you must ensure your data is accurate, consistent and useful enough for a model to learn from. Anomaly detection models rely on consistencies — so if everything is an anomaly the whole process becomes superfluous.

Rather than starting with machine learning, a data management overhaul or a knowledge management exercise may be the best solution for you.’

Myth 4:

Using models that were released open-source will always save you money.

Angus: ‘It is exciting to see so many companies, research teams, and individuals release their trained models online. I frequently test these and in some projects they work brilliantly, saving both time and money. They often need fine-tuning or completely re-trained with real data, and this is one of the traps. Even if the model is released by a big name like Google, you don’t know the design decisions behind its development. We often hear about data bias issues but this is a slightly different problem. Instead of biases in the data, I am talking about decisions that the creators made, often subconsciously, while developing the model. To illustrate this, I have played “Cat or Not Cat” with a few audiences (which I saw Cassie Kozyrkov play at World Summit AI 2019) and got the same results. Here it is:

Running through the six images, I ask the audience to shout either “cat” or “not cat”. The first five pictures go smoothly, the audience agree. Then on picture six, the lion, the audience shout a jumbled mix of answers! These include: cat, not cat, big cat, lion, cat-ish, etc… Now this is a binary classification (there were only two options), but every person has their own opinion on which bucket to put the lion in. The answer of course depends on the use-case, but unaware companies may implement published models, like this one, without concern if they are from a trusted source.

Decisions like these can be CATastrophic if a model is put directly into production. Models need to be tested rigorously with real data to avoid unexpected costs and issues.’

Myth 5:

AI will make me redundant.

Upgrading is compulsory.

Ian: ‘We hear this a lot, even in client meetings. This is a genuine concern but is based on the idea that AI/machine learning models will replace humans. Though this is undoubtedly true in some circumstances, mostly factory robotics and automated processes, it also provides new job opportunities. In most cases implementations will be in the form of assistive technology. In these cases associated jobs and skills can be improved, removing mundane tasks and allowing a more creative and fulfilling approach to problem solving, process management as well as improving productivity.

As we are now at the beginning of the technological revolution, societal change will undoubtedly take place as shown throughout history. Technology will drive everything from education to working practices, production and healthcare, and will help us find solutions to many of the global problems we face.’

--

--