Person Identification

As a first job at Exaintelligence, I was assigned to implement deep convolutional neuron network for identifying each person in a basketball game. This is a step toward robotic coaching that could rate players, assessing their gameplay to better tailor the lessons.

Dataset

*game folder

Each *game folder contains a dataset for each basketball game. We prepared datasets for 10 games.

LEFT_COURT, MIDDLE, RIGHT_COURT folder

Each basketball court is divided into three parts; LEFT_COURT, MIDDLE, RIGHT_COURT.

*image folder

Each part of the courts has at most 10 images of the game.

00–12 folder

Each folder contains each person at the game. 00–09 corresponds to players and 10–12 to judges. The number is consistent throughout the game.

(ImageNum)_00.png — (ImageNum)_k.png

The picture of each person at the game.

Right is the sample image and left is the annotated data (LEFT_COURT)

As it was hard to get enough amount of the pictures of basketball games, I decided to use CUHK03 dataset for training the neuron network.

CUHK03

Person re-id dataset with 13, 164 images of 1, 360 pedestrians. This dataset provides both manually cropped pedestrian images and auto detected ones (using prevailing pedestrian detector).
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Ahmed_An_Improved_Deep_2015_CVPR_paper.pdf

Deep Convolutional Neuron Network

I have selected deep convolutional neuron network model from Ejaz Ahmed at University of Maryland.

Architecture of the neuron network

Input

The neuron network take two images as inputs.

Tied convolution

The first two layers are tied convolution. They share the same weights so that the two images can be compared afterward. They yield 25 feature maps as outputs.

Cross input neighborhood differences

The next layer is cross input neighborhood differences. From the 25 feature maps, this learns the relationship between two images. As you can guess from the name “neighborhood”, for each block in a feature map (37×12), 25 (= 5×5) blocks in the compared images are allotted. By computing the difference with the neighboring blocks, it adds the robustness to positional differences in corresponding features of the two images. Again, 25 feature maps are generated per one image. In total, 50 neighborhood difference maps are created.

Patch summary features

This layer summarizes the generated neighborhood difference maps by producing a holistic representation of each 5×5 blocks.

To implement this neuron network, I referred,

The weights of the network are computed with the CUHK03 dataset. And with the trained neuron network, I tried to identify if A and B are the same person in a basketball game.

Result

I got the accuracy of over 70 % for the test set. You can see all the code at my Github repo.