If your business talks with companies offering AI-based solutions, you’ll likely have heard a lot of big claims, ranging from “neuromorphic engineering [solutions], that allows machines to see like humans” (Prophesee) to “It is one of the most powerful tools our species has created. It helps doctors fight disease.” (IBM Watson, Superbowl Commercial). As a non-expert it is often difficult to say which claims are legitimate and which are overhyped. Here are six simple rules of thumb to differentiate legitimate from overhyped solutions.
1. Don’t automatically trust established brands
Even big names often oversell their AI solutions. IBM’s Watson, for instance, was positioned as the “do it all” AI solution solving your tax returns, helping doctors to sift through millions of unstructured reports, determining optimal cancer treatments and helping insurance companies automate claims processing. Needless to say, Watson rarely met its high expectations and several high-profile multi-million-dollar experiments have gone astray.
2. Ignore the PR and ask for concrete benchmarks
Make sure you set the marketing bubbles aside and know exactly what the AI solution you are offered can do and how it performs in comparison to competitors or baselines. Any serious company will have those numbers. The reason is simple: to build an AI system one typically starts with a very well-defined set of inputs and outputs (e.g. given the machine parameters of the past ten days, predict downtime tomorrow) and then iterates through multiple models starting from the current state-of-the-art (or — if not available — simple algorithms like logistic regression) to more complex ones. Hence, if someone offers you a “Watson” to do your tax returns, ask them about:
- the expected input,
- the transformation of the input to a machine-readable form (if necessary),
- the type of output,
- the performance metric that the algorithm optimizes,
- the data used to train the algorithm and
- the performance of the trained algorithm in comparison to existing solutions or baselines.
If you get a PR blurb on these questions instead of concrete, verifiable and down-to-earth answers, turn away.
3. Ask about the limits/failure cases of the solution
All companies tout their successes but rarely speak about the limitations and failure cases. Don’t be fooled: today’s AI systems are seriously limited in scope. As I showed in my last blog post, even the best algorithms perform badly on handwritten digit recognition if the inputs are altered in subtle ways. They will even recognize complete garbage images (like white noise) as legitimate digits or classify a cat as an ostrich as illustrated in the two examples below.
Similarly, Amazon recently announced the end of a multi-year project to automate their hiring process because they couldn’t get their algorithms to be unbiased with respect to gender (simply put, women were mostly rejected). A serious company will know exactly under which conditions their algorithms work, and under which they fail, and they should be able to communicate this clearly and comprehensibly.
4. Look at publications and research work of team members
The research frontier in AI is moving quickly: papers older than 12–24 months are quite likely to be already outdated. The best sign that the scientists and engineers of the company you are dealing with are on top of the game is if they can demonstrate extensive and recent research expertise through publications in top ML conferences like NIPS (Neural Information Processing Systems), CVPR (Computer Vision and Pattern Recognition) or ICML (International Machine Learning Conference). The companies with the best talent (e.g. Google or Facebook) give their researchers the freedom to keep on publishing, so check the date of the publications.
5. Test the limits of the solution yourself
Many solutions offered by companies only work under narrow conditions and are unlikely to pass the test of time. As an example, say you install a camera at the end of your factory line to automatically detect the quality of each workpiece. The results might look promising in the first week. However, remember that our world is constantly changing: the light levels will change, some dirt will accumulate on the camera lens, material luminance will change, the workpiece might shift a bit within the field of view etc. All these changes make little difference to humans, but many AI algorithms are easily confused by them. One of our recent customers had exactly that problem: as long as the workpiece was perfectly centered with respect to the camera, the algorithm they bought from a computer-vision startup worked great. But the machine sometimes shifted the workpiece a little bit off-center, at which point it was classified as garbage by the algorithm. That, effectively, rendered the algorithm useless.
How can you evaluate an algorithm ahead of time whether it’s robust enough for production? One often revealing way is to take the solution the company offers and to run it through a series of tests in which you perturb the inputs in various ways and check whether the algorithm still performs well. E.g. change the brightness, add speckles of dust or paint, squeeze or rotate the input, etc. Any way that you could imagine becoming relevant in the future.
If the solution is robust to all or at least most of these perturbations (the types of which depend strongly on your use case) you can gain some confidence that it might hold up in the real world (e.g. in the rough environment of your factory).
6. Get a third-party opinion
Finally, if you are considering substantial investments in AI solutions but don’t have sufficient expertise in house, get a third party into the deal. I recently talked to a data science team from a larger mid-market company (> 5 billion Euros in sales). They were offered a computer vision algorithm from a startup to recognize certain patterns in their images and actually considered buying the solution despite the high price tag (several million Euros). Luckily, they did get a third-party into the negotiations who quickly pointed out that the solution could be trivially replicated with standard tools. And indeed, within two weeks the company could reach and surpass the performance offered by the startup. Similarly, for years AI experts have dismissed the advertised capabilities of IBM Watson as an extreme example of unsubstantiated marketing. Hence, spending money on a third-party expert may prevent you from very expensive money traps.