How to Avoid AI Failure with Smart Data Collection

Zfort Group
Nerd For Tech
Published in
3 min readJun 5, 2021

--

Any software is traditionally based on the programmer`s designed algorithm and is aimed to perform a specific function. But in the case of Artificial Intelligence software, we face taking decisions that can be unpredictable.

A Head of global software standards at Philips, Pat Baird, said: “For these machine learning systems, the programmer doesn’t tell the software how to solve the problem”.
In the particular case of AI programming, the result of development is the software being able to find patterns in the data. The reality is the programmer has no idea of the reasons software made a certain decision, he just threw together an engine that calculates a ton of stuff.

The guide of potential problems with AI software avoidance should start with something like “make sure to get good data before all else”. As Pat Baird says about the traditional software,

“It’s garbage in, garbage out. But what’s going to happen is you have garbage in your data, and since you don’t know how the software works, it’s going to be a problem.”

So, what are the common problems we face using AI apps? Pad gave a couple of real-life examples of how bad-collected data can cause untrue results delivery.

Example one — “Bad data to start with”
It demonstrates the inability of wearable devices to recognize the mode of travel while the tracker showed over 20,000 steps after an off-road adventure in a Jeep. “This was because of all the potholes and how much I was thrown around in the Jeep,” Baird said.

Example two — “Overfitting”
Here we are talking about the excessive data sets that also lead to inaccuracies. From a huge mass of able data and values, the program selects patterns for analysis and training that are not targeted ones for us. One of the great examples is the image recognition software, the goal of which was to determine the difference between an Alaskan Husky dog and a wolf. Baird commented it in the following way: “The data performed well, but it was picking up on background cues, rather than the ones the programmer intended. What actually happened, was that most of the photos that people had of their dog were taken sometime during the summer or fall in their backyard, whereas the photos of the wolves were taken during the winter out in the wild. The software that looked great to detect the difference between the Alaskan Husky and the wolf was actually picking up whether or not there was snow on the ground.”

Example three — “ Underfitting”
Another issue is appearing as a result of the collected data shortage. Such a simple reason, right..? As a rule, it ends up with making a decision based on noise, not on something real.

Autonomy level — questions to think about
One more question appears on share responsibility we are ready to give the technology. What autonomy level should it have? Will it be enough just to give you the right driving direction or we need a self-driving car that actually does the driving for us?
The good news is that most of the companies that worked with data collection have already found better ways to do it in terms of quality control. But still, can we remove errors in developing an excellent AI system and how should we deal with it?

So, how do we avoid failures in AI development?

Here are the very basic principles:

  1. Collect relevant and verified data
  2. Collect enough data
  3. Make sure the data is diverse
  4. Carefully consider if the data justifies the power we are giving to the AI

Fortunately, the development and implementation of technologies by leading companies provide an opportunity to learn from the mistakes of others and create a “cleaner” technology now.

Also available in audio format here.

Created by Zfort Group.

--

--

Zfort Group
Nerd For Tech

Custom Software Development and Dedicated Teams (zfort.com). Regular Startup, PHP, and Front-End digests. Stay tuned!