Shape Context descriptor and fast characters recognition

Andrey Nikishaev

Published in

Machine Learning World

6 min readMar 13, 2018

Want become an expert in Computer Vision & Object Detection?

Subscribe on New Practical Course

Shape Context — is a scale & rotation invariant shape descriptor (not image).

Today i will try to give you an idea how it works and why it so cool :)

As always you can get the full code in my git: https://github.com/creotiv/computer_vision/tree/master/shape_context
And runable code in Google Colab:
https://drive.google.com/file/d/16OtL71G0CNk0dSIJU2R4HrLnmj53Xfyi/view?usp=sharing

Problem

Matching shapes can be much difficult task then just matching images, for example recognition of hand-written text, or fingerprints. Because most of shapes that we trying to match is heavy augmented. I can bet that you will never write to identical letters for all your life. And look at this from the point of people detection algorithm based on handwriting matching — it would be just hell.

Of course in the age of Neural networks and RNNs it also can be solved in a different way then just straight mathematics, but not always you can use heavy and memory hungry things like NNs. Even today most of the hardware like phone, cameras, audio processors using good-old mathematics algorithms to make things running. And also it always cool to do some mathematics magic :)

The idea behind Shape Context

The main idea is quite simple — build histogram based on points in Polar coordinates system. So for each point you just get Euclidean distance and angles against other points, norm them and based on logspace map count number of points occurred in each map region. And that’s all.
To make things rotation invariant i prefer to just rotate this map on minus angle between to most furthers points, but it can be done in other ways too.

Shape Context computing

So now lets try to understand how correctly compute it (cause debuging of math algorithms can be not very cool)

1.Set number of radius segments and angle segments (this is standard settings)

nbins_r=5
nbins_theta=12
r_inner=0.1250
r_outer=2.0

2. Calculate distance between points and normalize them by mean. Also get two points with max distance

t_points = len(points)
# distance
r_array = cdist(points, points)
# getting two points with maximum distance to norm angle by them
# this is needed for rotation invariant feature
am = r_array.argmax()
max_points = [am / t_points, am % t_points]
# normalizing
r_array_n = r_array / r_array.mean()

3. Create logspace and count number of occurrences in a logspace interval

r_bin_edges = np.logspace(np.log10(r_inner), np.log10(r_outer), nbins_r)
r_array_q = np.zeros((t_points, t_points), dtype=int)
# summing occurences in different log space intervals
# logspace = [0.1250, 0.2500, 0.5000, 1.0000, 2.0000]
# 0    1.3 | ->  1 0 | -> 2 0 | -> 3 0 | -> 4 0 | -> 5 1
# 0.43  0  |     0 1 |    0 2 |    1 3 |    2 4 |    3 5
for m in xrange(self.nbins_r):
    r_array_q += (r_array_n < r_bin_edges[m])

4. Compute angles between points, normalize it with our norm angle and add 2*Pi so we have interval [0, 2*Pi]

theta_array = cdist(points, points, lambda u, v: math.atan2((v[1] - u[1]), (v[0] - u[0])))
norm_angle = theta_array[max_points[0], max_points[1]]# making angles matrix rotation invariant
theta_array = (theta_array - norm_angle * (np.ones((t_points, t_points)) - np.identity(t_points)))# removing all very small values because of float operation
theta_array[np.abs(theta_array) < 1e-7] = 0# 2Pi shifted because we need angels in [0, 2Pi]
theta_array_2 = theta_array + 2 * math.pi * (theta_array < 0)

5. Quantize our angles to get histogram the same as for distance

theta_array_q = (1 + np.floor(theta_array_2 / (2 * math.pi / self.nbins_theta))).astype(int)

6. Get all things together and build shape context descriptor. Just counting number of points in each radius and angle region.

fz = r_array_q > 0
nbins = self.nbins_theta * self.nbins_r
descriptor = np.zeros((t_points, nbins))
for i in xrange(t_points):
    sn = np.zeros((self.nbins_r, self.nbins_theta))
    for j in xrange(t_points):
        if (fz[i, j]):
            sn[r_array_q[i, j] - 1, theta_array_q[i, j] - 1] += 1
    descriptor[i] = sn.reshape(nbins)

Computing shape difference and cost matrix

By paper for counting cost matrix we should use Pearson’s chi-squared test

But we also can use cosine distance (which is much faster)

To get point-to-point matching we should solve the linear sum assignment problem which is not fast thing (~O(n³)). For this we will use scipy

Here is an example of point-to-point matching:

Characters recognition

Now we gonna try to use what we have learn to make simple task of character recognition based on vocabulary search.

Base characters and test image that we will try to recognize

Result

So as you see all parts of this task is pretty simple. First we get image, brake it on image numbers with contour detection algorithm and some basic filtering. Then for each number we are building 1D shape context descriptor based on 15 points (size will be 15*60 = 900), and the use cosine distance to find most similar number against vocabulary numbers.

What could go wrong?

One of the biggest problems of all descriptors is to choosing right key-points from image. Here we used not the best algorithm of choosing them, and in a more complex example it would fail.
Updated: Added faster and better algorithm for this task, but it also can be improved. https://github.com/creotiv/computer_vision/commit/e99dfab35be84cd99932d2d3c6d090dc0e401245
Another thing to consider is a characters vocabulary. Here for font characters we don’t need it, but if you will want to make hand written symbol recognition then you will need it, cause same symbol can be written with big difference, and only one base example will fail your recognition task.

Papers to read

Main doc about shape context
Good slides for understanding mechanics of algorithm
Don’t forget to search on ArXiv site for some fresh stuff :)

Few words…
Today we lost one of the greatest people of mankind Stephen Hawking. For me is a big loss, it was one of few people who push me into the science and was my guiding star in this area.
And i just want to say — Thank you Stephen for all what you did.

Support

Become a Patron and support our community to make more interesting articles & tutorials

Get interesting articles every day — Subscribe on Telegram Channel

Read my other fresh articles about Computer Vision

Feature extraction and similar image search with OpenCV for newbies

I think all of you saw Google Image Search and asked yourself “How it works?”, so today i will give you an answer on…

medium.com

Tutorial: Making Road Traffic Counting App based on Computer Vision and OpenCV

Today we will learn how to count road traffic based on computer vision and without heavy deep learning algorithms. For…

medium.com

Tutorial: Counting Road Traffic Capacity with OpenCV

Today I will show you very simple but powerful example of how to count traffic capacity with the algorithm that you can…

medium.com

Shape Context descriptor and fast characters recognition

Want become an expert in Computer Vision & Object Detection?

Subscribe on New Practical Course

Problem

The idea behind Shape Context

Shape Context computing

Computing shape difference and cost matrix

Characters recognition

What could go wrong?

Papers to read

Support

Get interesting articles every day — Subscribe on Telegram Channel

Feature extraction and similar image search with OpenCV for newbies

I think all of you saw Google Image Search and asked yourself “How it works?”, so today i will give you an answer on…

Tutorial: Making Road Traffic Counting App based on Computer Vision and OpenCV

Today we will learn how to count road traffic based on computer vision and without heavy deep learning algorithms. For…

Tutorial: Counting Road Traffic Capacity with OpenCV

Today I will show you very simple but powerful example of how to count traffic capacity with the algorithm that you can…

Written by Andrey Nikishaev