AI is only as smart as the data that feeds it

Jackson Adams
4 min readApr 30, 2018

You are what you eat. The same thing applies to artificial intelligence.

Senior Airman Terrence Ruffin strains for an extra rep on a weight machine Jan. 23, 2015, at the fitness center on Eglin Air Force Base, Fla. In November 2014, 21-year-old Ruffin won his International Federation of Bodybuilding and Fitness “pro” card at a competition in Miami, Fla. (U.S. Air Force photo by Samuel King Jr./Released)

No bodybuilder would expect a workout to build muscles without also consuming the necessary nutrients. Not enough food or the wrong kind of food could create an outcome very different from the one intended.

The complex processes that turn food into muscle fibers are very similar to the processes at work in artificial intelligence. Where the one uses enzymes, the other uses algorithms. Both need the right kind of materials (for the one food, for the other data) to work.

When the data an AI process relies upon are not complete, not correct or not in scope, the output is affected.

“The limits of AI are similar to the problems in normal conversations,” AI expert Swaroop Kallikuri explains. “If you ask me a question, I may not understand you and tell you something incorrect based on what I understood you to say, or even if I do understand you correctly, I could tell you something factually wrong that I believe to be true.”

While problems like these plague error-prone human interactions, we are used to this and compensate by allowing for mistakes and verifying critical information.

Will we be as careful with computers?

AI researchers and programmers have not been shy about issuing warnings that we should.

“The difficulty with machine learning systems is you don’t really know what’s going on inside — and the answers they provide are not contextualised, like a human would do.” explained Zoubin Ghahramani, Professor of Information Engineering in Cambridge’s Department of Engineering, in an article published by the University of Cambridge in 2016.

But what about warnings about the underlying data sets?

“Biases and blind spots exist in big data as much as they do in individual perceptions and experiences,” Kate Crawford, of the MIT Center for Civic Media, wrote in a 2013 Foreign Policy article. “Yet there is a problematic belief that bigger data is always better data and that correlation is as good as causation.”

According to a Northeastern University blog, 2.5 Exabytes of data is created every day… that’s 250,000 Libraries of Congress, but how many binders? (photo by Samuel Zeller)

The underlying difficulty is that AI and the machine learning algorithms that make AI work are good at working with data, but not questioning it.

“We really view the whole mathematics of machine learning as sitting inside a framework of understanding uncertainty,” Ghahramani said. “Before you see data — whether you are a baby learning a language or a scientist analysing some data — you start with a lot of uncertainty and then as you have more and more data you have more and more certainty.”

This problem magnifies with the introduction of machine learning, when the data sets have begun to be used to create the algorithms that AI use for decision making.

Will Knight’s explanation in the MIT Technology Review on this topic is so good that it bears repeating:

“From the outset, there were two schools of thought regarding how understandable, or explainable, AI ought to be. Many thought it made the most sense to build machines that reasoned according to rules and logic, making their inner workings transparent to anyone who cared to examine some code. Others felt that intelligence would more easily emerge if machines took inspiration from biology, and learned by observing and experiencing. This meant turning computer programming on its head. Instead of a programmer writing the commands to solve a problem, the program generates its own algorithm based on example data and a desired output. The machine-learning techniques that would later evolve into today’s most powerful AI systems followed the latter path: the machine essentially programs itself.”

Knight goes on to explain that, because the processes at work rely on interpreting data through so many layers, AI programs can rarely describe how they came to their results.

“As the technology advances, we might soon cross some threshold beyond which using AI requires a leap of faith,” he says. “Sure, we humans can’t always truly explain our thought processes either — but we find ways to intuitively trust and gauge people.”

With these dangers in mind, Ronald Reagan’s injunction to ‘trust, but verify’ seems an apt way to forge a path forward with both AI and the Big Data that powers it.

This means carefully calibrating these tools to be secondary to the humans that manage them for the foreseeable future.

Or, as a recent Forbes article quotes Tom Chatfield: “Forget artificial intelligence — in the brave new world of big data, it’s artificial idiocy we should be looking out for.”

--

--

Jackson Adams

Blockchain Decentralist. Journalist(ic). Theologian. Fly fisherman.