One-Shot Learning Character Recognition Explained

Eugene Tang
Analytics Vidhya
Published in
5 min readAug 16, 2020

This article was originally written February 28, 2017.

Let’s say you want to teach a computer to read handwritten digits. You might give it a bunch of rules to tell it what to do. For example, an oval is most likely a 0. Another approach you might try is “machine learning.” Give a computer a bunch of examples of each digit to study so that it can learn its own rules. This latter method has worked surprisingly well. In fact, most banks use this technology to allow ATMs or mobile phones read the amount on a check without the need for human interaction.

Sample handwritten digits from the MNIST dataset

One limitation with current machine learning techniques, however, is that they require a lot of examples. For example, if you want to teach a computer to recognize cats, you need to first give the computer many pictures of cats so it can learn what cats looks like. These examples might not always exist or might be very expensive to obtain. What if you could teach a computer to learn a new concept, such as a “cat,” from just one or two examples? This is exactly what three researchers, Lake, Salakhutdinov, and Tenebaum, from MIT did.

The researchers specifically focused on character recognition. They asked: can we teach a computer to recognize new characters after just seeing one example? The end result was an algorithm that was just as good as humans at learning what new characters look like. Specifically, the algorithm performed just as well as humans in character recognition and generation tasks.

Character Recognition

How well can you recognize a character that you’ve never seen before? To evaluate their algorithm’s performance in this test, the researchers compared their algorithm’s performance to that of humans. They first gathered a group of handwritten characters from various alphabets. The researchers then gave each participant an example of a character they had never seen before and asked the participant to find the character in a set of 20 new characters from the same alphabet. They asked their algorithm to do the same. Surprisingly, the algorithm (3.3% error rate) performed just as well as the people (avg. 4.5% error rate)!

Can you find the character that matches the one in the red box?

Character Generation

Can you generate examples of how other people would write a character? Using the same group of handwritten characters, they gave each participant an example of a character they had never seen before, and then asked the participant to create a new example of that character. They asked their algorithm to do the same thing. To test how well the algorithm did, they showed a group of computer-generated characters and a group of human-written characters to a judge to see if the judge could differentiate between the two. The judges could only identify the computer-generated characters 52% of the time, not doing much better than random chance (50%).

It can be hard to find which examples were written by a machine. In this example, grid 1 on the left and grid 2 on the right were generated by machines.

So how did they do it?

Given how well the algorithm does with just one example, the natural question that arises is, how did they do it?

The core intuition behind the algorithm is realizing that a character can be seen as a series of strokes put together. The researchers taught the algorithm how to decompose an image of a character into a sequence of strokes that may have been used to write the character. The algorithm could then use this stroke-based representation as a base from which to generate new examples (e.g. Taking into account other ways a stroke might be written) or see which characters could be mapped to the same stroke-pattern.

To teach the computer how to map from character to strokes, the researchers used a method called Bayesian program learning. They broke up the task of going from character to stroke into parts and modeled each part as a probability distribution (how likely is it that there are three strokes given that the character looks like this… Etc.). Before running the algorithm, they gave the computer characters from 30 alphabets to teach the computer what the probability distributions should look like. While it still needed some data to learn the initial probabilities, now, instead of needing a thousand examples of a new character, now it only needs one!

Future steps

Despite the impressive advances, there is still much work to be done. People see more than just strokes when they look at a character; they may also notice features such as parallel lines or symmetry. Furthermore, optional features can cause a lot of difficulty. Consider the character “7”. An algorithm might model it as a one-stroke character the first time it sees it. However, once it sees a “7” with a dash in it, it may consider it to be a different character because that requires two strokes, and it’s never seen a “7” with a dash in it. A human, however, might be able to infer that a “7” with a dash is the same as a “7” without a dash, whether through the context or other factors.

Is it a 7?

This algorithm is also very specific toward recognizing characters. It would be interesting to see if we could develop similar “one-shot learning” algorithms in other areas. For example, what if a self-driving car could learn to recognize and obey a new sign after watching another car react to it once?

One key insight from this paper makes me think that this indeed can be possible. The researchers intentionally told the algorithm to think of characters as a series of strokes being put together rather than a grid of 0s and 1s. This representation is closer to how humans think about characters and using this human-based representation greatly increased how quickly the computer learned. A lot of artificial intelligence techniques have been based on how humans make decisions, but it may prove useful to study more of how humans learn and represent information as well.

While there is still a lot of work to be done, this paper represents a significant step forward in machine learning world.

Sources

--

--

Eugene Tang
Analytics Vidhya

Data Scientist and Software Engineer. Interested in seeing how technology and data science can help improve the world.