The biggest challenge to AI adoption is expectation. Going about integrating machine learning with the right set of expectations will lead to a much more successful outcome than being misled about what AI can do for you.
I love machine learning. I’ve been integrating it into businesses for over 3 years and I’ve seen it save companies time and money in many different areas. But things can go south pretty quickly if you think you’re getting one thing and actually getting another.
Let us get the obvious out of the way, there is a reason why machine learning can’t predict the stock market. There are limits to what the state-of-the-art is capable of, which doesn’t mean that there aren’t tons of perfect use cases for machine learning, but does mean that you have to go into the process with your eyes open.
Let’s cover some examples.
The state of the art of face detection is about 99%. Face detection is an example of machine learning that is quite advanced, but there are a few things you have to keep in mind. First of all, that 99% comes from a validation set. When you see a percentage accuracy in reference to a machine learning algorithm, that numbers comes from a step in the training process where 20% of the training data is split off and used to validate the model. That chuck of data is usually chosen at random, but it is always similar to the rest of the training data. Once you apply that trained model into the real world, you may start to show it images that are quite different to the training set. So in practice, the accuracy may fall below 99% on your data set.
Sometimes the model may think something is a face when its not (false positive), or miss a face completely (false negative). There are different models out there with different levels of accuracy and inversely proportional levels of performance. You have to ask yourself what level of performance-to-accuracy is acceptable. If you’re detecting thousands of faces, getting 10 to 50 false negatives or positives is preferable to a more accurate algorithm that takes 10x as long to process and therefor costs more to run, but only reduces the false positives and negatives by half. A human can quickly correct a small subset of false negatives or positives.
So as a business, when you approach this problem, your expectations about the amount of false positive and negatives you may get on your data, and how you’re going to handle them, need to be clear. Because you will get them. The truth is, sometimes it just doesn’t detect a face and we don’t really know why. We’re so used to how our human brains recognize patterns that we forget computers do it in a completely different way (and not anywhere near as well).
There are best practices as it relates to cleaning up data sets that you can implement before running a model on all of your faces. These best practices usually involve making sure every face is easy to see (not at an angle), not obscured by anything, or presented in a strange aspect ratio. Also, images that are too high of a resolution can introduce more noise which might throw something like a face detection algorithm off.
Optical Character Recognition (OCR)
The current state of the art of OCR on documents is very good. Things like edge detection and computer vision have come a long way, and reading letters on a scanned document (for example) fits really well with those technologies.
Where OCR still struggles a bit is with images that are not scanned documents. This is partially why we have RECAPTCHA as a method for human detection. Humans are amazing at recognizing letters wherever they appear, no matter how obscured, distorted, or colorized. Computers are still catching up.
That isn’t to say that OCR is impossible, in fact, there are lots of good use cases for it, but it is important to set the right expectations. Let’s consider this frame from a football game.
If I use OCR to scrape this freeze frame, I might get the following data:
 PREMIER LEAGUE  TOT 2  M  U  0  36:2  4  SPORTS  NEW 0 BUR 0  HALFTIME  LIVE  NBCSN
Without context, you will not know what is a score, a jersey number, a logo, a time or a sports team’s name. That is, of course, if it catches all of that information at all. Over the course of a video, this will be increasingly muddled by lots of false positives and negatives.
Instead of applying generic OCR to a wide range of problems, think about what the use case is, and focus in on that. If it is tracking players, then an object tracking system may be better suited, if its keeping track of the time or the score, have the model fixed on a single area to track.
Video is tricky. It is really easy to expect all visual-based machine learning models (face, image, object, OCR, etc.) to perform the same on video as it does in photographs and still images. But that isn’t always the case. Digital video is a world of confusing encoders and wrappers coupled with compression rates and aspect ratios.
When it comes to processing videos, you have to use a lower resolution, otherwise you’ll need to spin up a ridiculous amount of resources and probably wait for years for processing to complete. But when you have lower resolution video, or video that is more compressed, often times you end up with partial frames. You can read more about GOP structures and how this works here, but the main point is that a video file isn’t necessarily a continuous string of still images. So pulling out frames to run face recognition and image recognition on will perform differently than if you were to just run on it on a series of still photographs.
You can test this yourself by pausing any YouTube video randomly. Notice how the faces people make are sometimes very strange, and can often times be blurred or misrepresentative of who they actually are. Here’s an example of a freeze frame from some footage of Rick Santorum.
Look at some of these faces from a face recognition’s perspective:
Can you even tell that person on the left is Rick Santorum? If you can’t tell, a computer might not be able to either.
As a human, it is easy to watch the video and identify people, but if you were to pull out each individual frame, you’d start to notice that the quality of the data isn’t always as good as you think.
Does that mean you shouldn’t run machine learning on video? Of course not. Machine learning is a great way to automatically categorize and tag assets. The trick is to know what to expect in terms of results. Passing this video through a model that has been trained on every celebrity on the planet might yield a lot of false positives (incorrectly recognizing somebody), but running it against a model that is trained only on US politicians will yield better results. But even if you’re unable to orchestrate your trained models in that manner, you can still derive value by looking at the data as a whole. If Rick Santorum is the individual in the video, then the chances are the model will correctly identify him most of the time. You can see what percentage of the time he appears, and make some assumptions about the likelihood he is in fact featured in the video and where he might show up. This won’t work for every use case, but if you come prepared to experiment a bit and try different workflows and models (and think about the problem you are solving) then you’ll succeed.
How to succeed with machine learning
As I said earlier, machine learning is amazing. It can accomplish extraordinary things in a very short amount of time. One customer of my company was able to accurately identify fake research articles amongst millions of articles published in journals, another uses face recognition to authenticate people taking exams, and another uses our nudity detection to flag inappropriate content for human review in user submitted content.
We have a customer who uses our content recommendation engine to increase revenue on their E-Commerce site by showing customers things they are more likely to buy. This is a great use case because false positives and negatives don’t affect the increase in revenue, in fact, in some cases they help by exploring new things to show customers they wouldn’t normally have come across.
Customers are improving search by processing text using Natural Language Processing, picking photos of you you’re more likely to buy, detecting diseases in flora and fauna on farms, and more by using machine learning.
Machine learning can bring tremendous value to your business if you know what to expect. And if you aren’t sure, shoot me a note, I’d be happy to help evaluate your use cases.