TidyTuesday: Palm Trees

Analyzing palm tree data by fruit color and size with the great_tables package.
TidyTuesday
Python
pandas
Author

Jason Bernstein

Published

March 22, 2025

This week’s TidyTuesday dataset contains information about palm trees. I decided to make a table about this data using Pandas and the great_tables Python packages. The table counts the number of palm tree species by fruit color and gives some statistics about fruit width.

First, let’s import the packages needed for this analysis.

import pandas as pd
import great_tables

Then we download the data from GitHub into a Pandas dataframe.

base_url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-03-18"
df = pd.read_csv(f"{base_url}/palmtrees.csv", encoding="windows-1252")

The dataset includes information about 2,557 palm tree species. Of these, 758 do not report a main fruit color.

Next we wrangle the data into the desired tabular format. This is done by creating one table that holds information about all species for each fruit color, and then joining that table with a second table that holds information about one randomly sampled species for each fruit color. There is also some data cleaning that makes the text nicer for the desired table.

# Create one row for each fruit color and remove missing values
df_base = (
    df.assign(fruit_color=df["main_fruit_colors"].str.split("; "))
    .explode("fruit_color")
    .dropna(subset=["main_fruit_colors"])
)

# Compute summary count and width statistics by fruit color
df_all_species = df_base.groupby("fruit_color").agg(
    n=("spec_name", "size"),
    min_average_fruit_width_cm=("average_fruit_width_cm", "min"),
    max_average_fruit_width_cm=("average_fruit_width_cm", "max"),
)

# Sample name and width of one species per fruit color
df_one_species = (
    df_base[["fruit_color", "spec_name", "average_fruit_width_cm"]]
    .groupby("fruit_color")
    .sample(1, random_state=1)
)

# Join dataframes and clean-up for final table presentation
df_table = (
    df_all_species.merge(df_one_species, on="fruit_color")
    .reset_index(drop=True)
    .assign(
        fruit_color=lambda x: x["fruit_color"]
        .str.capitalize()
        .replace("Straw-coloured", "Straw"),
    )
    .sort_values("n", ascending=False)
)

Now we create the desired table with the great_tables package.

(
    great_tables.GT(df_table)
    .tab_header(
        title="Palm Tree Fruit Characteristics",
        subtitle="A guide for relating fruit size to fruit color",
    )
    .tab_spanner(
        label="Across all Species",
        columns=[
            "n",
            "min_average_fruit_width_cm",
            "max_average_fruit_width_cm",
        ],
    )
    .tab_spanner(
        label="Sample Species",
        columns=[
            "spec_name",
            "average_fruit_width_cm",
        ],
    )
    .cols_label(
        spec_name="Species Name",
        fruit_color="Fruit Color",
        n="Number of Species",
        average_fruit_width_cm="Average Fruit Width (cm)",
        min_average_fruit_width_cm="Min Average Fruit Width (cm)",
        max_average_fruit_width_cm="Max Average Fruit Width (cm)",
    )
    .fmt_number(
        columns=[
            "average_fruit_width_cm",
            "min_average_fruit_width_cm",
            "max_average_fruit_width_cm",
        ],
        decimals=2,
        use_seps=False,
    )
    .tab_source_note(
        source_note="TidyTuesday: 2025, week 11 | PalmTraits 1.0 Database."
    )
    .tab_source_note(
        f"Note, some species can have multiple fruit colors, and \
        {n_missing} species have no reported main fruit color."
    )
    .opt_row_striping()
    # Save table as an image for the blog listing, also shows the table
    .save("./image.png")
)
Palm Tree Fruit Characteristics
A guide for relating fruit size to fruit color
Fruit Color Across all Species Sample Species
Number of Species Min Average Fruit Width (cm) Max Average Fruit Width (cm) Species Name Average Fruit Width (cm)
Red 501 0.21 11.00 Bactris schultesii 0.85
Brown 484 0.40 20.00 Dypsis coursii 2.00
Black 462 0.40 20.00 Pinanga auriculata 0.85
Orange 265 0.40 15.50 Bactris killipii 0.90
Yellow 206 0.35 6.00 Daemonorops macroptera 1.20
Green 195 0.30 14.00 Calamus erinaceus 1.00
Purple 175 0.40 20.00 Burretiokentia koghiensis 1.05
White 87 0.30 5.00 Pinanga albescens 5.00
Pink 36 0.20 3.20 Pinanga annamensis 1.20
Straw 22 0.60 3.17 Calamus symphysipus 0.65
Blue 19 0.47 2.80 Geonoma triandra 0.55
Cream 11 0.50 1.30 Calamus vestitus 1.16
Grey 10 0.47 2.00 Licuala orbicularis 1.00
Ivory 9 0.60 4.00 Calamus psilocladus 0.80
TidyTuesday: 2025, week 11 | PalmTraits 1.0 Database.
Note, some species can have multiple fruit colors, and 758 species have no reported main fruit color.

We see that most palm tree species have red fruit and that ivory is the least common fruit color. Brown, black, and purple fruit have the largest maximum average fruit size of 20 cm. Cream colored fruit have the smallest maximum average fruit size of 1.3 cm. The minimum average fruit widths are more constant across fruit colors than the maximum average fruit widths.

Overall, I suspect this table can be made nicer with additional styling, such as adding a border or rearranging columns, but this was only meant to be a quick analysis so I’ll leave it here for now!