TidyTuesday: Pokémon – Jason Bernstein’s Website

Introduction

The TidyTuesday dataset this week is about Pokémon characters. I uploaded the first few rows of the dataset to ChatGPT and asked it to provide easy, medium, and hard data analysis questions. This post goes through those questions.

First, let’s import the pandas library for performing the data analysis.

import pandas as pd

Then we download the data.

base_url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-04-01"
df_raw = pd.read_csv(f"{base_url}/pokemon_df.csv")

Let’s remove some columns that are not needed.

df = df_raw.drop(columns=["id", "url_icon", "url_image"])

It’s useful to look at the first few rows of the dataset to get an idea of the variables and their values. We see that the dataset includes Pokémon names, variables like attack and defense, colors, egg groups, etc.

from IPython.display import HTML

HTML(df.head(5).to_html(index=False))

pokemon	species_id	height	weight	base_experience	type_1	type_2	hp	attack	defense	special_attack	special_defense	speed	color_1	color_2	color_f	egg_group_1	egg_group_2	generation_id
bulbasaur	1	0.7	6.9	64	grass	poison	45	49	49	65	65	45	#78C850	#A040A0	#81A763	monster	plant	1.0
ivysaur	2	1.0	13.0	142	grass	poison	60	62	63	80	80	60	#78C850	#A040A0	#81A763	monster	plant	1.0
venusaur	3	2.0	100.0	236	grass	poison	80	82	83	100	100	80	#78C850	#A040A0	#81A763	monster	plant	1.0
charmander	4	0.6	8.5	62	fire	NaN	39	52	43	60	50	65	#F08030	NaN	NaN	monster	dragon	1.0
charmeleon	5	1.1	19.0	142	fire	NaN	58	64	58	80	65	80	#F08030	NaN	NaN	monster	dragon	1.0

On to the questions!

Easy Questions

What are the column names in the dataset?

df.columns

Index(['pokemon', 'species_id', 'height', 'weight', 'base_experience',
       'type_1', 'type_2', 'hp', 'attack', 'defense', 'special_attack',
       'special_defense', 'speed', 'color_1', 'color_2', 'color_f',
       'egg_group_1', 'egg_group_2', 'generation_id'],
      dtype='object')

What is the average weight of all Pokémon?

df[["weight"]].mean()

weight    66.213172
dtype: float64

How many Pokémon have a secondary type (type_2) listed?

df["type_2"].notna().sum()

np.int64(510)

What are the top 5 most common egg groups (egg_group_1)?

df.groupby("egg_group_1").size().sort_values(ascending=False).head(n=5)

egg_group_1
ground     218
no-eggs    118
monster     94
water1      84
bug         80
dtype: int64

How many Pokémon are of the color #F08030?

df.query("color_1 == '#F08030'").shape[0]

Medium Questions

What’s the minimum and maximum defense among Grass-type Pokémon?

(
    df.query("type_1 == 'grass'")
    .assign(
        min_defense=lambda d: d["defense"].min(),
        max_defense=lambda d: d["defense"].max(),
    )
    .query("defense == min_defense or defense == max_defense")[["pokemon", "defense"]]
)

	pokemon	defense
190	sunkern	30
597	ferrothorn	131
797	kartana	131

List all Pokémon with hp greater than 200.

df.query("hp > 200").loc[:, "pokemon"].tolist()

['chansey', 'blissey', 'guzzlord', 'zygarde-complete']

How many unique combinations of egg_group_1 and egg_group_2 are there?

df[["egg_group_1", "egg_group_2"]].drop_duplicates().dropna().shape[0]

Which Pokémon has the largest difference between attack and defense?

(
    df[["pokemon", "attack", "defense"]]
    .assign(diff_attack_defense=lambda x: x["attack"] - x["defense"])
    .sort_values("diff_attack_defense", ascending=False)
    .head(1)
)

	pokemon	attack	defense	diff_attack_defense
802	deoxys-attack	180	20	160

Which primary type has the lowest average special_defense?

(
    df[["type_1", "special_defense"]]
    .groupby("type_1")
    .mean()
    .reset_index()
    .sort_values("special_defense")
    .head(n=1)
)

	type_1	special_defense
10	ground	62.861111

Hard Questions

Find the average base_experience by each combination of type_1 and type_2.

df[["type_1", "type_2", "base_experience"]].groupby(["type_1", "type_2"]).mean()

		base_experience
type_1	type_2
bug	electric	163.800000
	fairy	111.500000
	fighting	153.250000
	fire	160.000000
	flying	141.500000
...	...	...
water	ice	179.000000
	poison	111.666667
	psychic	160.333333
	rock	139.500000
	steel	239.000000

155 rows × 1 columns

Which Pokémon have all base stats (hp, attack, defense, special_attack, special_defense, speed) above the average of their respective columns?

df[
    (df["hp"] > df["hp"].mean())
    & (df["attack"] > df["attack"].mean())
    & (df["defense"] > df["defense"].mean())
    & (df["special_attack"] > df["special_attack"].mean())
    & (df["special_defense"] > df["special_defense"].mean())
    & (df["speed"] > df["speed"].mean())
]["pokemon"].head(6)

2      venusaur
5     charizard
8     blastoise
30    nidoqueen
33     nidoking
54      golduck
Name: pokemon, dtype: object

Identify the Pokémon with a secondary type but no secondary egg group.

df.dropna(subset="egg_group_2").query("type_2 == type_2")["pokemon"]

0           bulbasaur
1             ivysaur
2            venusaur
5           charizard
33           nidoking
            ...      
868      altaria-mega
872      slowbro-mega
889      lopunny-mega
901      raichu-alola
930    salazzle-totem
Name: pokemon, Length: 131, dtype: object

Create a new column for BMI (weight / height**2) and find the Pokémon with the highest BMI.

(
    df[["pokemon", "height", "weight"]]
    .assign(bmi=lambda x: x["weight"] / x["height"] ** 2)
    .sort_values("bmi", ascending=False)
    .head(1)
)

	pokemon	height	weight	bmi
789	cosmoem	0.1	999.9	99990.0

For each generation, which Pokémon has the highest total_stats?

# Compute a total_stats column
stats_cols = ["hp", "attack", "defense", "special_attack", "special_defense", "speed"]
df["total_stats"] = df[stats_cols].sum(axis=1)

# Compute max total_stats by generation_id
df["top_stats_by_gen"] = df.groupby("generation_id")["total_stats"].transform("max")

# Filter rows where total_stats is max total_stats by generation_id
df.query("total_stats == top_stats_by_gen")[["pokemon", "generation_id", "total_stats"]]

	pokemon	generation_id	total_stats
149	mewtwo	1.0	680
248	lugia	2.0	680
249	ho-oh	2.0	680
383	rayquaza	3.0	680
492	arceus	4.0	720
642	reshiram	5.0	680
643	zekrom	5.0	680
715	xerneas	6.0	680
716	yveltal	6.0	680
790	solgaleo	7.0	680
791	lunala	7.0	680

Summary

Overall, I found this to be an easy way to get some practice with pandas and explore a new dataset. While answering these questions, I ran some of my solutions through ChatGPT to get feedback on possible issues and improvements. For example, with the last question, ChatGPT suggested using the transform method instead of merge, which led to a simpler solution. And I learned something about Pokémon - Cosmoem has a BMI of almost 10,000!