How to Spot an AI Bullshitter

Published in

Actionable AI

3 min readApr 18, 2017

Bullshit is unavoidable whenever circumstances require someone to talk without knowing what he is talking about. Thus the production of bullshit is stimulated whenever a person’s obligations or opportunities to speak about some topic are more excessive than his knowledge.
Harry Frankfurt, On Bullshit

Confession: I’ve been an AI bullshitter a few times.

There, I said it. Any AI bullshitters shouldn’t be offended by this post. AI hype always exceeds capability and those of us selling/fundraising/deploying AI can be nudged into bullshitting for expediency.

Don’t worry, its not just us. Every investor who asks “What’s your AI strategy?” and pretends like he cares about your answer … is also a bullshitter.

Sometimes detecting bullshit isn’t easy esoteric fields like AI. Here are a few tips.

Here is an easy way to spot an AI — or any — bullshitter

Real experts simplify complexity for you. Bullshitters try to confuse you with big words. Asking yourself … “is this person confusing me?” … is the easiest way to spot a bullshitter.

Dr. Roberto Trotta can explain cosmology using a 3rd-grader’s vocabulary. The origin of the universe is more complex than any AI technology.

Many software companies are promoting their “AI capability”. Some of these claims are legitimate, others are pure bullshit.

First ask for a definition of AI

Some companies use AI as a general term for the data cleansing and statistical techniques software engineers have used for decades.

By this definition … you also work at an AI company. Congratulations.

Since AI is such an ambiguous term this isn’t strictly incorrect — just not interesting.

You mentioned an AI solution. Are you doing machine learning?

This simple question should clear up any confusion. All practical (i.e. not research) AI technology uses machine learning.

Then ask about data

AI solutions to business problems use supervised learning, a machine learning technique of training algorithms with labeled example data.

Roughly 5,000 training examples are needed to begin generating results and 10 million are needed to achieve human-level performance (source).

Your very first question should be about this training data.

What data are you using to train the algorithms?

Generating labeled data can be very expensive. Most startups don’t have enough and rely on open sources like ImageNet. Companies like Baidu and Google with limitless data have a big advantage over startups.

Why do you think your results will translate to our environment?

Your data is probably unique, complex and messy. Product companies may try to use techniques like transfer learning to adapt their results to your environment. Will it work? Good question.

Ok, you claim X% result. How did you come up with this number?

Anyone can achieve great results by teaching machine learning algorithms to work really well on just one set of data (called overfitting the data). Of course such a system will fail miserably in production.

Anyone with a legitimate solution can explain how she prevents overfitting errors.

Finally inquire about algorithms

Algorithms get a lot of attention because they can reveal research breakthroughs. Some (like generative adversarial networks) may even lead to unsupervised learning techniques which need less training data (yes, I’m bullshitting … sorry). If so, much of this post will be obsolete.

But for the moment choosing between traditional machine learning algorithms and the much-hyped deep learning approach depends on the problem.

What machine learning algorithms are you using? Why?

Stripe provides a great justification for training its fraud detection scheme using random forests algorithms. Anyone legitimate will be able to provide a similar explanation.

Have you published any research papers about your results?

Many (though not all) top AI teams continue to publish their results and will be thrilled to share their research results with you.

Do you get clear answers to these questions? Are you leaving the conversation without feeling confused?

You’re probably not talking to a bullshitter. The next step is to run a pilot project on your data and test the solution’s viability in your environment.

Subscribe to my updates or just email me at kevindewalt@kevindewalt.com if you want to talk about your business.