Building the Pokédex in iOS using Core ML

Kresimir Bakovic
Azikus

--

Who’s that Pokemon? I bet you’ve heard this question more than once in your life. There aren’t many people that exactly know every Pokemon’s name and their evolutions😏.
In our opinion, this is a good reason for making the detector that will tell you everything about the Pokemon that you are trying to scan using your phone’s camera. The scanning concept comes from the Pokemon cartoon, where the main characters are getting information about all kinds of different Pokemons using their special scanning device called the Pokedex.
Without further ado, let’s see how to transform your device into the nice-looking Pokedex!

Video 1. Pokedex

Let’s split scanning implementation into a couple of steps that we need to follow:

  1. Define tools that should be used to make the detector
  2. Find a large data set of Pokemon images
  3. Split data in three sets (training, validation, testing)
  4. Train your model
  5. Import trained model into the Xcode project
  6. Connect the camera to the classifier
  7. Make a fancy UI that will blow users minds🤯

Tools and data

Probably the first question that comes to your mind is: “How are we going to implement detection and which tools should be used to accomplish that?” The answer to that question is by combining the phone’s camera with machine learning. Let’s first start with the exact definition of what machine learning is.

Machine learning is a type of artificial intelligence (AI) that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Machine learning algorithms use historical data as input to predict new output values.

Our task is to create a machine learning model that will get one image from the device camera as input and throw out one Pokemon name as computed output. Since there can be multiple possibilities on the model’s output, that classification model is called a multi-class (multinomial) classifier. There are 905 Pokemons in total (by the time this blog was written), but our model will be able to detect only the first generation of Pokemons(first 150). To create a model that will do the classifying for us, we need to make him learn.
The learning process is done by feeding the model with large amounts of data so it can memorise different patterns in each data chunk. If you want a model with a nice percentage of valid predictions, you need a lot of data. Machine learning is data-hungry by its nature, and the more data you have, the more precise your model will be in its predictions. For this image processing purpose, we merged a couple of data sets and ended up with a large data set containing approximately 17000 Pokemon images. That set needs to be split into three different groups: training (70%), validation (15%) and testing (15%).

  • Training data set → used for model training
  • Validation data set → used for validating how well the model performs
  • Testing data set → used for testing the model

Data separation was done using the Python script in the image below:

Image 1. Data separation

The split ratio of the data set can be different from the one that we picked. For example, 80% for the training set, 10% for the validation set and 10% for the testing set. The optimum split depends on a specific use case, data dimensions, model structure etc. After successful model training, you can use 100% of your data set to train your model once more. With this step, you add more data to your model, but you shouldn’t be changing your model parameters after this step. Doing that can be considered cheating because your model already saw those images and model predictions can’t be marked as valid.

It’s important to mention that you shouldn’t use images from the validation set or test set for training, because doing so you wouldn’t be able to estimate the quality of your model.

When it comes to machine learning implementation on iOS devices, Apple provides an awesome framework called Core ML. It supports Vision for analyzing images, Natural Language for processing text, Speech for converting audio to text, and Sound Analysis for identifying sounds in audio. This framework is going to be used in our case for training the built-in multi-class classifier. In the next chapter, we are going to explain how to do that.

Training the model

For our image processing purposes, we are going to use the built-in image classifier. You can create one by selecting XCode → Open Developer Tool → Create ML.

Image 2. Template choosing

On the next menu, you can choose one from multiple options that are shipped with the Core ML. Since the purpose of this blog is to explain image classification, we are going to select the Image Classification template. Hit next and you will end up on a screen like the image below.

Image 3. Image classifier

In the image above, you can see the visual presentation of our machine learning model. In the Data section, there are three different categories that each represent their own data set. In the previous chapter, we already split our large data set into training, validation and testing parts, so now we can just import them into the respective sections. You can see actual folders contained inside the test data set in the image below:

Image 4. Folders inside the test set

Each folder contains multiple images of one Pokemon from the first generation.

Image 5. Gastly testing images

In the Parameters section, you can set the number of iterations. An iteration is defined as one passing through the entire training set, so by setting it to 35 iterations (also known as epochs), the model can look at each image 35 times. The default value of this property is 25, but if you set it to more than that, your training period will get longer accordingly. There are more options to play with, but I suggest that you invest some time and read about them if you are interested (Blur, Crop, Flip etc).

After filling up all the sections and setting the desired parameters, you are able to start training your model by pressing the train button in the upper left corner of the screen. The extracting features process will take some time, but at the end of the training process, you will end up on the screen below:

Image 6. Training and validation accuracy comparison

In the image above, you can see a comparison between the training and validation sets according to the number of iterations that have been executed. It’s clear that the model learned to recognize different Pokemons because at the end of the training phase we have an accuracy of nearly 98%. Accuracy for the validation phase is around 80%. This can be improved by using a larger data set and by filtering “broken” images. By the term “broken”, I mean images that have multiple objects inside them and really are not good representations of selected Pokemon. In the image below, you can see some of the images from the training data set that are considered to be “broken”.

Image 7. Broken Images

If you want better results, feel free to check every image inside your data set and remove broken ones, but be aware that it can take some time.⏰

Two useful pieces of information about how well your model is performing are precision and recall. Let’s give two examples for each definition in order to explain what these two terms actually mean. Precision shows the percentage of all the images that the model evaluated as “Bulbasor” and that were actually “Bulbasor”. On the other hand, recall gives us a percentage of how many “Bulbasors” models are found among all “Bulbasor” images. In the next image, you can see the precision and recall percentages for some of the Pokemons from the training set.

Image 8. Precision and recall for some of the Pokemons

At this moment, we have trained our model and we can save it somewhere on our machine (by clicking the Get icon in the Output section) and finally start using it on real world examples inside the application.🤳🏼

If you try scanning something that’s not Pokemon, you can get very weird results. For the scanned banana, the model can say it’s a Pikachu with very high certainty because the image contains a lot of yellow color that can be interpreted as the mentioned Pokemon. This is happening because a model tries to put a scanned banana into one of the Pokemon categories that it’s been trained on and it’s considered normal for all image classification models.

Pokemons scanner mobile app

After data collection and model training, fancy UI and nice looking animations are must-have for every iOS professional. Actually, we went one step further and made the whole application for Pokemons classification called Dex Scanner. With this app, you can capture a photo of any Pokemon from the first generation using your phone’s camera, and get valuable information like: hp, abilities, evolutions, moves etc. After successful scanning, Pokemons are unlocked and your real-life Pokedex starts talking🔊. In a couple of seconds without reading a single word, you have all the basic information about scanned Pokemon. Isn’t that cool?😎 In the next video, you can see how the scanner works in real life example.

Video 2. Pokemon scanning

We can all agree that ability to scan your favourite Pokemon using one tap on your decice is pretty cool and revolutionary. You can try building your own Pokedex or you can check ours. Our application is up and running in the AppStore, and you can check it out on the link below. If you have any questions, feel free to reach me out in the comments section. Thanks for your time and feel free to share this blog post among your colleagues and friends. Happy coding and remember, you gotta catch em all😄

Krešimir is a valuable member of our iOS team.
At Azikus, we design and develop top notch mobile and web apps.
We are a bunch of experienced developers and designers who like to share knowledge, always staying up to date with the latest and newest technologies.
To find out more about what we do, feel free to check out our
website.

--

--