CODEX

Science Shorts #2: Bengali Character Recognition, Perceptible Colour Maps & Python Newsletters

Ashraf Miah
CodeX
Published in
5 min readFeb 28, 2021

--

Kaggle Grandmasters tackle character recognition for the fifth most popular native language, how to choose the right colour scheme for your plot and useful Python newsletters.

Scope

The Nvidia Machine Learning (ML) Grandmasters took part in the Kaggle competition for the character recognition of the World’s fifth most popular native language, Bengali. The team address some of the “unwritten” rules of the language in tuning there models. Rather than choose either the default colour scheme from your favourite plotting library or personal preference, a more scientific basis can be found in the Colorcet library. Finally, two Python based newsletter that I recently subscribed to.

Introduction

The following three articles were randomly selected from my Pocket list, which I’ve curated over the past 5 years in the field of Data Science; the motivations and background are discussed in a previous post: Data Science Shorts: An Introduction to my Pocket List.

Bengali Character Recognition

Nvidia article on Bengali Kaggle Challenge | Screenshot by Author | Article and Artwork by Nvidia

Summary

Between December 2019 and March 2020, Kaggle ran the Bengali.AI Handwritten Grapheme Classification challenge. The article describes the challenges with the training set data and subsequent strategies to mitigate the short comings. The highest position for the team was fifth but most members finished in the top 30, which is very impressive. The approach to the learning rate was as critical as the choice of model.

Context

For many people (including me), the majority of contact with Machine Learning begins (and ends with) the excellent sci-kit learn, keras and pyTorch. What this blog shows, is some of the “art” involved with fine tuning existing models for new applications. At the same time, shows some of the privilege that English speakers enjoy in the level of research conducted into our shared language.

Modern Colour Maps for Plots with Colorcet

Colorcet Library | Screenshot by Author | Article and Artwork by Colorcet

Summary

The two images below compare similar colour maps, hot from matplotlib and its alternative from Colorcet called fire using 256 colour gradient:

Hot colormap | Image by Colorcet
Fire colormap | Image by Colorcet

The right hand side of both images shows the added fidelity with the fire colormap relative to the historical default for matplotlib. Notice also the differences at the middle of the spectrum, which makes it easier to discern the change with the fire map. These proposals have been adopted by various plotting libraries including Matplotlib, which now has extensive documentation on the subject. The article shows a great example using DataShader plot for the U. S.

Context

We’ve all been there when that plot doesn’t quite look right, or we’ve spent hours choosing our categorical colours in a line plot or bar chart. If you’ve experimented with colour customisation, then you know it’s an easy productivity trap to fall into. The reason is that’s easy to get caught up in the moment and lose track of the original purpose.

As an experienced Data Scientist, you learn quickly to pick an existing colour (or color) map and move on. What Colorcet does, is make choosing a sensible i.e. easy to read and understand colormap nearly effortless. Rather than manually fine tuning a colour scheme or choosing a default one that doesn’t quite look right, Colorcet allows you to pick colours that independent of individual taste, at least works well.

Python Newsletters

Pycoders and Real Python Newsletters | Screenshot by Author | Article and Artwork by Respective Parties

Summary

Recently I’ve come across two newsletters that have introduced me to new concepts in Python, pandas and the related Data Science ecosystem. First of all PyCoders, which is an excellent resource that covers the wide range of Python applications. The second is of course the Real Python newsletter PyTricks, which focuses on Python language snippets.

Context

I enjoy the code snippets from Real Python:

PyTricks Email from Real Python | Screenshot by Author | Content from Real Python

However, since signing up I’ve had 12 emails from Dan of Real Python, of which 7 were related to PyTricks and the rest were about joining Real Python; that’s more than 40% of the emails on advertising. It’s good to receive these snippets as it makes email less boring and you want to keep the messages unlike most newsletters.

PyCoders are however more subtle with their sponsored material. Each link is clearly tagged as such, and about 3 of the 17–20 links are sponsored so about 15%. In fairness, both have a role — the PyTricks emails are a real treat and stand alone, whereas the PyCoders email is more in depth.

Conclusion

Three varied topics from my Pocket list. The first shows the depth of experience required and the empirical nature of tuning existing Machine Learning models for new but similar applications in the field of handwritten character recognition. The second article shows that using a library that has researched the use of effective colour schemes can potentially enhance any visualisation. Given that for any Data Scientist, communication of the results is a critical element of the role, Colorcet should be the default. It should be noted that many tools within the PyViz ecosystem have already adopted these maps. Finally, two sources for Python news and the trade-off with sponsored material.

--

--

Ashraf Miah
CodeX

CTO, Data Scientist & Chartered Engineer (MEng CEng EUR ING MRAeS) with over 20 years experience in the Aerospace, Rail & Energy Industry.