Please join us in congratulating Linsho Kaku (aka deoxy) on his solo first-place win in our Bengali.AI Handwritten Grapheme Classification challenge! Read the winning solution here: 1st Place Solution with Code
Let’s meet Linsho!
Linsho, what would you like to share about yourself?
Linsho: I am a student in the Rio Yokota Laboratory at the Tokyo Institute of Technology. The main theme of the lab is high performance computing with advanced architectures including GPUs. We also deal with deep learning as one of its applications. I’m also an intern at Future Inc. working on an OCR task.
Did you have any prior experience or domain knowledge that helped you succeed in this competition?
The experience of working on OCR tasks as an intern was a big advantage for me. The ease in which I was able to pre-process data and create models was thanks to my intern experience. I’ve never specialized in Few-Shot Learning, which has been a major factor in the scores of the top teams this time around. However, I think my knowledge of the paper, which was shared in the lab, made a big difference.
Let’s get a bit more technical
Did any past research or previous competitions inform your approach?
In addition, I often consulted Science Direct and other resources to achieve Few-Shot Learning.
Did you do any preprocessing?
No pre-processing, such as cropping or noise reduction, was done to the images. These processes did not improve recognition accuracy, but rather tended to reduce the amount of information needed. The disadvantages of cropping a smaller area than the required character area and erasing extra characters far outweigh the advantages of giving clean input.
What was your most important insight into the data?
The most essential task needed was not the classification of the three types of components per se, but the creation of a model that could recognize classes that were not given. The classification into three types of components was only a hint to solve this essential task. This is not to say the division into manually determined components is appropriate. Abstracting the structures that can appear in a character is more likely to improve the accuracy of classification of unknown classes.
This time, I used a method to generate font image characters from handwritten characters. This generation model is based on a style and style transformation model called CycleGAN. A series of models up to the font image classification model connected to this generative model can be considered as a handwritten character classification model. In such a view, the Font image generated by the generative model can be considered as a feature of the middle layer of this series of handwriting classification models.
It’s highly likely that each pixel of the font image, which is an intermediate feature, has generated the structure of the font image by observing a relatively narrow portion of the handwriting. This can be thought of as generating features of a more abstract character structure. I think being able to build this system was the biggest factor in my approach.
Which tools did you use?
I used Pytorch as a deep learning framework and Jupyter Notebook as an IDE.
What about your hardware setup?
I use some servers, which has 4 Tesla V100.
What was the run time of your winning solution?
CycleGAN takes the following time:
Training time: 4 Tesla V100 2.5 days
Prediction time : 40 min x 2(ensemble 2 model)
Apart from this, it took some time to work on the usual class classification models and so on.
Words of wisdom
OK, are you walking away with anything new as a result of this competition?
I gained new skills that will allow for a consistent approach to future challenges.
Any advice for those just getting started in data science?
Grandmaster’s common sense (or what seems most obvious) is not always the best to win.
Just for fun
If you could run a Kaggle competition, what problem would you want to pose to other Kagglers?
I’d like to propose a more practical OCR, that is, a task that is evaluated end-to-end from handwriting detection to recognition. (For example, something that aims at transcription and digitization of handwritten notes). I feel a general-purpose method of detection has not yet been established in spite of sufficient recognition accuracy. However, in the field of Object Detection, there are quite a variety of methods being considered, and I think there’s a lot of hope for active discussion and development.
Linsho Kaku is a Master Student at Tokyo Institute of Technology, supervised by Rio Yokota. His research interests include deep learning, image processing and optical character recognition.