Top Marks for Student Kaggler in Bengali.AI | A Winner’s Interview with Linsho Kaku

Kaggle Team
Apr 21, 2020 · 4 min read

Kaggler, deoxy takes first-place and sets the stage for his next competition.

Please join us in congratulating Linsho Kaku (aka deoxy) on his solo first-place win in our Bengali.AI Handwritten Grapheme Classification challenge! Read the winning solution here: 1st Place Solution with Code

Random Ink by Sankarshan Mukhopadhyay @Flickr

Let’s meet Linsho!

Linsho: I am a student in the Rio Yokota Laboratory at the Tokyo Institute of Technology. The main theme of the lab is high performance computing with advanced architectures including GPUs. We also deal with deep learning as one of its applications. I’m also an intern at Future Inc. working on an OCR task.

The experience of working on OCR tasks as an intern was a big advantage for me. The ease in which I was able to pre-process data and create models was thanks to my intern experience. I’ve never specialized in Few-Shot Learning, which has been a major factor in the scores of the top teams this time around. However, I think my knowledge of the paper, which was shared in the lab, made a big difference.

Let’s get a bit more technical

In thinking about the approach, I went through the Kaggle discussions that were presented here and here.

In addition, I often consulted Science Direct and other resources to achieve Few-Shot Learning.

No pre-processing, such as cropping or noise reduction, was done to the images. These processes did not improve recognition accuracy, but rather tended to reduce the amount of information needed. The disadvantages of cropping a smaller area than the required character area and erasing extra characters far outweigh the advantages of giving clean input.

The most essential task needed was not the classification of the three types of components per se, but the creation of a model that could recognize classes that were not given. The classification into three types of components was only a hint to solve this essential task. This is not to say the division into manually determined components is appropriate. Abstracting the structures that can appear in a character is more likely to improve the accuracy of classification of unknown classes.

This time, I used a method to generate font image characters from handwritten characters. This generation model is based on a style and style transformation model called CycleGAN. A series of models up to the font image classification model connected to this generative model can be considered as a handwritten character classification model. In such a view, the Font image generated by the generative model can be considered as a feature of the middle layer of this series of handwriting classification models.

It’s highly likely that each pixel of the font image, which is an intermediate feature, has generated the structure of the font image by observing a relatively narrow portion of the handwriting. This can be thought of as generating features of a more abstract character structure. I think being able to build this system was the biggest factor in my approach.

I used Pytorch as a deep learning framework and Jupyter Notebook as an IDE.

I use some servers, which has 4 Tesla V100.

CycleGAN takes the following time:

Training time: 4 Tesla V100 2.5 days

Prediction time : 40 min x 2(ensemble 2 model)

Apart from this, it took some time to work on the usual class classification models and so on.

Words of wisdom

I gained new skills that will allow for a consistent approach to future challenges.

Grandmaster’s common sense (or what seems most obvious) is not always the best to win.

Just for fun

I’d like to propose a more practical OCR, that is, a task that is evaluated end-to-end from handwriting detection to recognition. (For example, something that aims at transcription and digitization of handwritten notes). I feel a general-purpose method of detection has not yet been established in spite of sufficient recognition accuracy. However, in the field of Object Detection, there are quite a variety of methods being considered, and I think there’s a lot of hope for active discussion and development.

Linsho Kaku is a Master Student at Tokyo Institute of Technology, supervised by Rio Yokota. His research interests include deep learning, image processing and optical character recognition.

Kaggle Blog

Official Kaggle Blog ft.