Exploring the Distribution of Pokémon Types

Nasim M
INST414: Data Science Techniques
6 min readFeb 11, 2023

Pokémon is the highest-grossing media franchise of all time. Like countless other kids who grew up in the late 90 and early 2000s, I was absolutely engrossed in the world of Pokémon. From the late nights playing Pokémon games on my handheld gaming console to the early mornings watching the Pokémon anime series, I will always treasure the times spent engrossed in that fictional universe. As such, I felt that an exploratory analysis on Pokémon would be a perfect opportunity to practice my data science skills while also learning more about the franchise I put so much time into as a kid.

For this exploratory analysis, I decided to utilize PokéAPI as the source for all Pokémon information. This API proved to be an abundant resource for all data pertaining to every Pokémon in the franchise. After looking through the PokéAPI documentation, I decided to explore the distribution of different Pokémon types across generations and among legendary/mythical Pokémon. This analysis is aimed to uncover which types are most and least represented, providing a unique perspective that would be otherwise unknown without data analysis. The results could prove to be a valuable resource for Pokémon players as they could tailor their party of Pokémon to be resistant against the most commonly represented types in each generation, thus giving them a strategic advantage.

To begin the analysis, I first needed to gather the data from the API and store it. In order to do this, I used the “requests” library to access the PokéAPI and extract information on each Pokémon. The specific information on each Pokémon that I collected include: its Pokédex number, its name, its types, the generation in was introduced in, and its legendary/mythical status (more on this on the next paragraph). All of this data was then organized and stored in a Pandas dataframe for further exploration.

Before I can begin analyzing the data, I had to ensure all data was accurate and in the proper format. It was during this stage I noticed that the Pokémon Mew was not considered legendary as it was instead considered mythical; as such, I decided to check for both the legendary and mythical status of each Pokémon since both represent a “special” Pokémon. Additionally, I noticed that the values in the generations column were in the format of “generation-iv”. I felt that an integer value for these values would look cleaner so I split the string by the hyphen to collect the roman numeral. I then used a function to convert the roman numerals into integers and stored the resulting value into each cell of the generations column. The last thing I noticed was that all Pokémon in Generation 9 were not classified as legendary or mythical. Since I was not too familiar with this generation, I did some research and found a list of legendary/mythical Gen 9 Pokémon and added each name to a new list. I then looped through the list and updated the value in the “legendary/mythical” column to True for each Pokémon in the list. After this, the data is now fully cleaned and ready for analysis.

I created a function that will generate a bar plot given a specific dataframe so I created a dataframe for each generation and called the function for each dataframe. Before I show you the bar plots for each generation, lets take a look at a bar plot showing the types for all generations and for legendary/mythical Pokémon:

As you can see on the left plot, it seems like water type Pokémon are the most represented in the Pokémon franchise and ice type are the least represented. I can’t say I’m surprised at these specific results but I did find it surprising that there are more psychic types than fire types. For the right plot, I’m not surprised at all to see psychic, dragon, and flying at the top 3. Those three types can already be sort of perceived as “mythical or legendary” so it makes sense. It also makes sense that fairy types are not in the top 5 since that type was added in generation 6.

Now lets see how the representation of the different types changed by generation:

Now, this is very interesting. For the most part, the distribution Pokémon types have been consistently changing over generations. For example, in Gen 1, the top 3 were poison, water, and normal while in Gen 9, the top 3 were grass, dark, and normal. We can also see that normal and water types maintained high representation as they were in the top 5 for most generations. We can also see that ice types were in the top 5 least represented types for 7 generations; now it’s clear why ice types are the least represented type in all of Pokémon. One thing that is worth noting is that the number of Pokémon per generation seemed to have increased over time, which may account for some of the changes in distribution.

This exploratory analysis does have some limitations, such as the fact that it only includes information from the PokéAPI and some information may not be accurate or up-to-date, especially for Gen 9 (like the incorrect legendary/mythical status). It’s possible that there are more incorrect information present for Pokémon in Gen 9 or even other generations so this may impact the overall results and conclusions of the analysis. Additionally, the analysis only considered the number of Pokémon for each type, not taking into account other variables such as strength, rarity, or popularity that may have an impact on the distribution of Pokémon types.

In conclusion, this exploratory analysis provided an in-depth look at the distribution of Pokémon types in the franchise. The data showed that water types were the most represented in the franchise, while ice types were the least represented. The representation of different types changed over the generations, with normal and water types maintaining high representation throughout. However, the results should be taken with a grain of caution as some of the information obtained from the PokéAPI may be inaccurate, especially for Generation 9. Despite these limitations, this analysis provides a valuable resource for Pokémon players, giving them a strategic advantage as they can tailor their party of Pokémon to be resistant against the most commonly represented types in each generation.

You can find my code for this analysis here.

--

--