Building a real-life Pokedex

Eric Feldman
6 min readAug 11, 2020

--

The week all the pubs opened after the COVID lockdown I had a beer with a friend. He asked me how many posts I wrote while I was stuck at home and I was ashamed to answer that I had only a few paragraphs of a post I’m not sure about.
The moment I got home, I started working on this post.

Pokedex

The Pokédex (Japanese: ポケモン図鑑 illustrated Pokémon encyclopedia) is a digital encyclopedia created by Professor Oak as an invaluable tool to Trainers in the Pokémon world.

Originally, I wanted to create a gift for Yali’s birthday, a physical device that when pointed towards a picture of a pokemon, will identify it and display some information about it. A few years have passed since that birthday and unfortunately, Yali isn’t into Pokemon anymore. But it is still a cool project.

Pokedex’s task is to identify a pokemon, meaning a classification is good enough. But I wanted to try something I have never done before, I want to mark the pokemon in the picture. And not just bounding the pokemon in a box, I want to mark the pixels, and do it realtime.

Instead of a model that outputs a percentage for each pokemon, this model will output for each pixel what pokemon it represents. As we already know, in supervised learning in order to predict something, we need to have it tagged. Where can we get pictures of pokemon and which pixel is what pokemon? We are going to tag it, together! (more on that later). I used the frames from the first season of the Pokemon TV series and I tagged an episode (it took about an hour).

CVAT annotation tool

The actual device probably won't have a very good camera, so the images surely will be blurry and not as good as the frames I used for training. For that I used imgaug, a very cool library of augmentations you can run on your images. I really felt like I was in a toy shop choosing what augmentations I want.

For the model, I used keras-segmentation. It has multiple implementations of semantic segmentation architectures. I ended up using a pre-trained MobileNet. The idea of a pre-trained model is that it has already been trained on a big dataset of images, so it knows how to extract interesting features from a picture (depth, colors, border and etc for example). Using only the part of the features extraction makes

Training over the one episode I trained gave me a good starting point

Actual device

My initial thought was to build the device using Raspberry Pi, but it meant that I needed to have a display, a battery, a camera, speakers and a few other things that I probably didn’t think of. Reading this long list of products seems exhausting and then I realized that a simple phone includes all of those features. So I bought an old phone.

Running a model on a phone is a heavy task, especially when running it for multiple frames per second. Thankfully Google (no link needed here) published TensorFlow lite (TFlite from now on). TFlite is a lightweight optimized Tensorflow. One can convert their Tensorflow model to a TFlite model, and it will (hopefully) run faster and be lighter.

React native

After installing about half of the packages on NPM, trying a bunch of camera plugins, TFlite components, crying a lot, and thinking why am I doing it all over again I decided to try a different approach.

Native Android application

The good people of TFlite published some very good examples of usage in an android app. So I took the parts I needed from each app and filled the gaps with my own stuff.

Basically, I combined 3 example apps into one. The segmentation demo app doesn’t run on a camera stream, The detection demo app uses a deprecated Android API, and classification doesn’t draw anything on the preview. Building (copy-pasting, but you get it) the native app was way faster, and way enjoyable than using React Native with TFlite.

A TensorFlow model is built by many many operators, and unfortunately, not all of them are convertible to TFlite yet. My model included operators that are not supported by TFlite yet so I was faced with this dilemma. I chose not to give up on those operators and use Flex delegate, causing the app to be quite heavy but keeping the successful architecture intact.

TFLite operators don’t support one of the operators of the model I chose. Flex delegate is the way to run regular TensorFlow operators in a TFlite interpreter, the model will be slower, and the app will weigh more, but at least I can use the architecture I chose. This is why the app is so heavy.

Running the model in my Android app outputs zeros, an array of 30K zeros. Clearly something is wrong. The immediate suspect is, well, me. My guess is that I’m running the model wrong in the Android app. The original TensorFlow model runs great in Python, but I can’t test the TFlite model since Flex delegate isn’t supported in python (yet). So I created a dummy model, built only by operations TFlite supports to test it in Python. The predictions are good, now let's run in Android. Woho, I’m finally getting numbers that are not 0!

Something is probably wrong with the model I imported or with the flex delegate. Wait! The outputs in the Android app are integers, while in Python they’re floats. Can it be that the previous model is working, but because its outputs are float numbers between 0–1 it will always round them to 0?

Yep. The problem is that I was reading the buffer wrong. Instead of floats, I was reading it as integers.

it’s been a while since I used a strongly type language

Changing the code to usegetFloatArray solved the “issue”.
It’s not fun to spend so much time on such a stupid mistake, but hey, I learned a bunch of stuff thanks to it.

Since the model is predicting each pixel separately, it means that there can be (and probably will be) small areas that the model will get wrong, and will mess out the final prediction. The solution I choose is to clear the small areas, leaving areas that their size is bigger than a given threshold.

Into the wild

Now that the app is working it’s time to test! Unfortunately, there are no Pokemons in the city I live in, I had to go into the wild.

The demo is working nicely! But since I tagged only one episode, the results are not very accurate and it is working for very few Pokemons. This is where you all can help! Tag data! I created a small tutorial to let you know how to tag (user and password for the tagging system are in the tutorial).

After removing all Pokemon related info from the app info (If Google asks, you all will identify creatures, not Pokemons) I was finally able to upload the app to Google Play! As I mentioned before because I use regular TensorFlow the app is very heavy. In the next versions, I’ll use an architecture that works on TFLite without flex.

Now, After being a certified Pokemon (neural networks) trainer, I can start working on other projects.

--

--