Capturing Pokemon: Exploring Image Recognition with TensorFlow and Python

5 min readAug 6, 2023

In a world where visual information dominates our daily interactions, image recognition has emerged as a pivotal technology with wide-ranging applications. From identifying objects in photos to enabling self-driving cars, image recognition forms the backbone of modern artificial intelligence systems. In this digital age, where nostalgia meets innovation, what better way to delve into the realms of image recognition than by embarking on a journey through the captivating world of Pokemon?

In this article, we’ll delve into creating a Pokemon image recognition system using Python and TensorFlow. We’ll guide you through building a specialized Convolutional Neural Network (CNN), training the model. Get ready to turn your knowledge into a tool for identifying your favorite Pokemon.

What Is Image Recognition?

As mentioned in the introduction, Image Recognition allows computers to identify and interpret visual information, making it a crucial component in many modern technologies such as self-driving cars, medical diagnoses, or even in large language models such as Chat-GPT

What Is TensorFlow?

TensorFlow is a machine learning framework developed by Google. TensorFlow provides an environment for creating, training, and deploying Machine Learning models.

Gathering and Preparing the Pokemon Dataset

The dataset for this article can be found in Kaggle. The data contain images and type 1 and type 2 of all the Pokemon from generations 1 to 7.

We first need to map the type of Pokemon to the image:

After labeling each Pokemon image with the type we then convert the image to grayscale which will help reduce the computational cost of the model.

Converting an image from color to grayscale has many benefits

Reduce computational complexity
Simplified Feature Extraction
Elimination of Color Variability
Noise Reduction

The formula for grayscale calculation
Grey = 0.2126 * R + 0.7152 * G + 0.0722 * B

Model

In this article, I will utilize a neural network as our primary model. The neural network comprises a total of 5 layers:

The 1st layer serves as the input layer, which accepts the values of each pixel.
Layers 2 through 4 consist of 3 hidden fully connected layers, each containing 100 neurons with a ReLU activation function.
The 5th layer functions as the output layer, composed of 18 fully connected nodes representing distinct types of Pokémon.

The model is then fitted with the following specification:

def create_model():
    model = tf.keras.Sequential([
        tf.keras.layers.Flatten(input_shape=(120, 120)),
        tf.keras.layers.Dense(100, activation='relu'),
        tf.keras.layers.Dense(100, activation='relu'),
        tf.keras.layers.Dense(100, activation='relu'),
        tf.keras.layers.Dense(18)
    ])
    model.compile(optimizer='adam',
                  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                  metrics=['accuracy'])
    return model

Results

Due to the constrained sample size, I’ve incorporated cross-validation with a value of 5 when assessing performance. The achieved accuracy is 0.1248. In comparison, the anticipated accuracy from a random guess stands at 0.055. Consequently, our model demonstrates a substantial improvement of 226% over random chance.

Please feel free to contact me via email at Nawat.sun@gmail.com for any questions or suggestions for improvement.

Thank you

import cv2
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image
import tensorflow as tf
from keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Flatten
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.model_selection import train_test_split, KFold
from sklearn.metrics import accuracy_score
import tensorflow as tf
#%%
# Define root directory and load file names
root_dir = r"C:\Users\User\Downloads\Kaggle\images\images"
files = os.path.join(root_dir)
File_names = os.listdir(files)

# Load Pokemon data
pokemon = pd.read_csv(r"C:\Users\User\Downloads\Kaggle\pokemon.csv")

# Create a dictionary mapping Pokemon names to types
data_dict = {}
for key, val in zip(pokemon["Name"], pokemon["Type1"]):
    data_dict[key] = val

# Create a dictionary mapping type labels to indices
labels = pokemon["Type1"].unique()
ids = range(len(labels))
labels_idx = dict(zip(labels, ids))

# Create lists to store images and labels
final_images = []
final_labels = []

# Load and preprocess images and labels
count = 0
for file in File_names:
    count += 1
    img = cv2.imread(os.path.join(root_dir, file), cv2.IMREAD_GRAYSCALE) 
    label = labels_idx[data_dict[file.split(".")[0]]] 
    final_images.append(np.array(img))
    final_labels.append(np.array(label))
    
# Convert lists into numpy arrays, normalize and reshape the data 
final_images = np.array(final_images, dtype=np.float32) / 255.0
final_labels = np.array(final_labels, dtype=np.int8).reshape(-1, 1)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(final_images, final_labels, test_size=0.2, random_state=0)

def create_model():
    model = tf.keras.Sequential([
        tf.keras.layers.Flatten(input_shape=(120, 120)),
        tf.keras.layers.Dense(100, activation='relu'),
        tf.keras.layers.Dense(100, activation='relu'),
        tf.keras.layers.Dense(100, activation='relu'),
        tf.keras.layers.Dense(18)
    ])
    model.compile(optimizer='adam',
                  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                  metrics=['accuracy'])
    return model

# Define the number of folds
num_folds = 5

# Initialize lists to store accuracies for each fold
accuracies = []

# Create k-fold cross-validator
kf = KFold(n_splits=num_folds, shuffle=True, random_state=1)

# Iterate through folds
for fold, (train_index, test_index) in enumerate(kf.split(final_images)):
    print(f"Fold {fold + 1}/{num_folds}")
    
    X_train, X_test = final_images[train_index], final_images[test_index]
    y_train, y_test = final_labels[train_index], final_labels[test_index]

    model = create_model()

    # Train the model
    history = model.fit(X_train, y_train, epochs=50, validation_data=(X_test, y_test), verbose=0)

    # Create a probability model for predictions
    probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])

    # Make predictions and calculate accuracy
    predictions = probability_model.predict(X_test)
    predictions_argmax = predictions.argmax(axis=1)
    accuracy = accuracy_score(y_test, predictions_argmax)
    accuracies.append(accuracy)
    print("Accuracy:", accuracy)

# Calculate and print average accuracy across all folds
average_accuracy = np.mean(accuracies)
print("Average Accuracy:", average_accuracy)