Using ChatGPT-4o to generate a geospatial dataset

5 min readJun 8, 2024

Fascinated by the chaotic realignment of U.S. collegiate athletic conferences, I wanted to create some maps about it. I started with the Atlantic Coast Conference, which will be adding three schools in 2025 that are very far from its namesake region. For the data, I thought I’d see if ChatGPT could be of service.

I started with an incremental approach, first asking for a list of schools.

I like the editorial about geographic footprints, ChatGPT. Get out of my head!

This looked right, so I asked for the schools with their home cities in a dataframe.

If you click the prompt link, you get the Python code used to build the dataframe.

Looks good so far. Time for some coordinates.

This one churned for a bit but finally returned a response. I’m guessing it tried to call out to a tool for the coordinates but then fell back to the LLM after the network issues? I’m not sure, but I plugged the Python code that it generated into VS Code and plotted the coordinates and they looked like what I expected.

I see CA, FL, a clump in the mid-Atlantic… passes the sniff test.

Here’s the full output of the Python it generated. The only modifications I made were: 1) Adding a coordinate system 2) Some minor variable name changes.

acc_schools_2025 = {
    "School": [
        "Boston College", "Clemson University", "Duke University", "Florida State University", 
        "Georgia Tech", "University of Louisville", "University of Miami", 
        "University of North Carolina", "North Carolina State University", 
        "University of Notre Dame", "University of Pittsburgh", "Syracuse University", 
        "University of Virginia", "Virginia Tech", "Wake Forest University", 
        "University of California, Berkeley", "Southern Methodist University", 
        "Stanford University"
    ],
    "City": [
        "Chestnut Hill, MA", "Clemson, SC", "Durham, NC", "Tallahassee, FL", 
        "Atlanta, GA", "Louisville, KY", "Coral Gables, FL", "Chapel Hill, NC", 
        "Raleigh, NC", "Notre Dame, IN", "Pittsburgh, PA", "Syracuse, NY", 
        "Charlottesville, VA", "Blacksburg, VA", "Winston-Salem, NC", 
        "Berkeley, CA", "Dallas, TX", "Stanford, CA"
    ]
}

acc_schools_2025_df = pd.DataFrame(acc_schools_2025)

coordinates = {
    "Chestnut Hill, MA": (42.3355, -71.1685),
    "Clemson, SC": (34.6834, -82.8374),
    "Durham, NC": (35.9993, -78.9382),
    "Tallahassee, FL": (30.4383, -84.2807),
    "Atlanta, GA": (33.7490, -84.3880),
    "Louisville, KY": (38.2527, -85.7585),
    "Coral Gables, FL": (25.7215, -80.2684),
    "Chapel Hill, NC": (35.9132, -79.0558),
    "Raleigh, NC": (35.7796, -78.6382),
    "Notre Dame, IN": (41.7056, -86.2353),
    "Pittsburgh, PA": (40.4406, -79.9959),
    "Syracuse, NY": (43.0481, -76.1474),
    "Charlottesville, VA": (38.0293, -78.4767),
    "Blacksburg, VA": (37.2296, -80.4139),
    "Winston-Salem, NC": (36.0999, -80.2442),
    "Berkeley, CA": (37.8715, -122.2730),
    "Dallas, TX": (32.7767, -96.7970),
    "Stanford, CA": (37.4275, -122.1697)
}

acc_schools_2025_df['Latitude'] = acc_schools_2025_df['City'].map(lambda x: coordinates[x][0])
acc_schools_2025_df['Longitude'] = acc_schools_2025_df['City'].map(lambda x: coordinates[x][1])

acc_schools_2025_gdf = gpd.GeoDataFrame(
    acc_schools_2025_df, 
    geometry=gpd.points_from_xy(acc_schools_2025_df.Longitude, acc_schools_2025_df.Latitude),
    crs=CRS.from_epsg(4326)
)

I exported the dataframe to a GeoPackage so I could open the dataset in ArcGIS Pro to inspect it and play with some visualizations.

A quick check of the school locations looks good:

How’s the view of the Atlantic from there, Stanford?

And we can start to play with bounding boxes:

That Cal-Syracuse rivalry is going to be off the hook. Can’t wait.

Generating a second dataset

For one of my maps, I want to compare the insanity of the 2025 ACC with what I consider to be Peak ACC: the ACC I watched as a child growing up in Charlottesville, Virginia, the home of the University of Virginia. I asked ChatGPT for a new geodataframe, this time with vintage 1984 ACC.

Oops. ACC historians will notice something’s not right here!

Here we have a little problem: Florida State wasn’t in the ACC in 1984.

I like the little hiccup in this response from ChatGPT. It’s almost like it realized mid-sentence that it messed up earlier and paused as it felt deep shame about its mistake. Of course that’s not the case… unless OpenAI is further along than they’ve let on!

The rest of the data looked good — it added Maryland and took out the extras — so I just manually removed Florida State.

In blue, the footprint of ACC circa 1984… just as geography intended.

Conclusion

ChatGPT was able to generate a geospatial dataset that met my needs, although it needed to be double-checked (which you’d need to do with data sourced some other way regardless). It gave sources without prompting for the 2025 data, but not for the 1984 data. It’s probably a good idea to always prompt for sources and that in itself might help get better responses. If I had made this dataset without ChatGPT, I probably would have started with a search for existing datasets. If that didn’t turn anything up, I probably would have manually picked through a list of school locations and then built a query against city data like ‘populated places’ in Natural Earth. I’m not sure. I would have had to feel my way through it, but it definitely would have taken longer.

Go ‘Hoos.

Using ChatGPT-4o to generate a geospatial dataset

Generating a second dataset

Conclusion

Written by Ed in Space