Coding with chatGPT
Using OpenAI’s language model to write Python code for geo-data science
Recently, I found out about ChatGPT, an ai chat bot developed by OpenAI that is capable of responding to users in a remarkably human manner. Not only can chatGPT give responses using text, you can ask it to write code as well. In fact, an acquaintance recently told me that she used chatGPT to do some of her homework for her Masters in urban data science, both for writing essays and writing code.
As someone who spends most of her working hours coding, having a chat bot write code for me sounds both incredibly exciting (think of all the hours I could save!) and a little bit scary (does this mean I’ll be out of a job in a few years?). I decided to try out the coding capabilities of chat GPT, specifically for spatial data analysis with Python. This article shows some first results, and my first thoughts.
For this article, I asked Chat GPT to do a typical task I have to do often: turn a polygon (could be the shape of a country, city, or municipality) into a grid of cells where statistics could be easily aggregated.
Before we start, what is Chat GPT, actually? I asked this question to the chatbot itself, and this is its response:
GPT (short for “Generative Pre-training Transformer”) is a type of language model developed by OpenAI. It is a neural network trained to generate human-like text by predicting the next word in a sequence given a large dataset of examples. Chat GPT refers to the use of this model in chat applications, where it can be used to generate responses to user messages in real time. It is designed to be able to carry on a conversation by understanding the context of the conversation and generating appropriate responses.
Let’s start!
First, I asked Chat GPT, “make a geodataframe grid within a polygon”, and this was its response:
import geopandas as gpd
# create a polygon
polygon = gpd.GeoDataFrame({'geometry': gpd.GeoSeries([Polygon([(0,0), (0,1), (1,1), (1,0)])])})
# create a grid of geodataframe within the polygon
grid = gpd.gridify(polygon, cell_size=0.1, geometry='Polygon')
# plot the grid
grid.plot()
I was quite impressed with the response, especially its suggestion to use gpd.gridify()
, a geopandas
attribute I’ve never heard of before. However, when I tried to run the code, I found out that the module geopandas
has no attribute gridify
. In other words, the chat bot suggested for me to use a tool that looks really handy, but doesn’t actually exist.
There is, however, another python module called gridify
, so perhaps the chatbot got confused by that, see link here. It also forgot to import the shapely
attribute Polygon
before using it, but that was just a minor issue.
The interesting thing about chat GPT is that you can have a continuous conversation with it, and it will remember your previous interactions. So I asked a follow-up question to try to get a working piece of code. So I asked, “I tried this code but got the error “module ‘geopandas’ has no attribute ‘gridify’”. can you fix this for me?”, and this was its response:
While the bot claimed that shapely
has the attribute gridify
, I found out that wasn’t true when I ran the code. shapely
doesn’t have the attribute gridify
either.
So I asked another follow-up question, “I tried the code but got the error message “No module named ‘shapely.gridify’”. Any other options?”
The bot then proceeded to confidently tell me that gridify
IS part of the shapely
library, but has been removed in newer versions — that’s also not true as far as I know.
The code it wrote produced … a square loop, which is far from what I was looking for. Honestly I found it pretty hilarious.
Second attempt: it worked!
My first conversation with chat GPT clearly wasn’t going very well, so I decided to start again, and ask my question more clearly. I asked it, “python code for generating a grid of cells as a geodataframe”, and here is its response:
import geopandas as gpd
from shapely.geometry import Polygon
# Define the bounds of the grid
min_x, min_y = 0, 0
max_x, max_y = 10, 10
# Define the size of the cells in the grid
cell_width, cell_height = 1, 1
# Define the projection to use
crs = "EPSG:4326" # WGS84
# Initialize an empty list to store the grid cells
grid_cells = []
# Iterate over the bounds of the grid and create a Polygon object for each cell
for x in range(min_x, max_x, cell_width):
for y in range(min_y, max_y, cell_height):
# Define the coordinates of the cell
cell_coords = [(x, y), (x+cell_width, y), (x+cell_width, y+cell_height), (x, y+cell_height)]
# Create the Polygon object
cell = Polygon(cell_coords)
# Add the Polygon to the list
grid_cells.append(cell)
# Create the GeoDataFrame
gdf = gpd.GeoDataFrame(geometry=grid_cells, crs=crs)
# Visualize the grid
gdf.plot()
…and it works!! The code above managed to create a grid of cells as a geodataframe, which could be cropped to fit onto a specific area (e.g. a country, municipality, or city). I could probably ask chatGPT to do this next step as well, but unfortunately the bot is currently unavailable because too many people are using it — something that’s been happening more and more often lately.
So chat GPT is indeed capable of writing simple code in Python for spatial data science, as long as it is given a well written and clear question.
Final thoughts
This brief experiment with chatGPT highlights two interesting things about the chatbot, and perhaps AI in general.
Firstly, we can see that AI can generate artifacts that may look legitimate at first glace, but turns out to be wrong under further investigation. A common example is images of faces generated by AI, where the faces may look realistic at first glance, but turn out to have strange anomalies when looked at more closely.
When writing our code, chat GPT did the something similar by making up a geopandas
module called gpd.gridify()
— a module that seems to make sense, but doesn’t actually exist.
Secondly, the purpose of chatGPT (and perhaps other AI bots) is not to present accurate information, but rather what looks like accurate information. According to the FAQ page of chatGPT, the model is trained to mimic human speech as much as possible, and not to provide us with the truth.
This is why it so confidently claims that gridify
is part of the geopandas
library, and when proven wrong, claims that gridify
is part of shapely
library instead. Neither of these statements are true, but perhaps that is not the point of chatGPT.
What’s next?
I’m definitely very excited about the capabilities of chatGPT, and will be using it to work on other more difficult coding tasks in the future.
I also found that it is quite handy at generating ideas — I asked for ideas for data science projects for sustainability + spatial data science, and while it didn’t produce ideas that were novel to me, it provided quite a good overview of the sustainability / spatial data science world, which I can imagine could be quite handy for someone new to the field.
Here’s chatGPT’s response when I asked for project ideas:
- Sustainable transportation planning: Develop a tool that helps planners optimize public transportation networks to reduce greenhouse gas emissions and improve accessibility. This could involve analyzing patterns of travel demand, integrating with public transportation data sources, and implementing algorithms for network design and routing.
- Sustainable land use planning: Develop a tool that helps planners optimize land use patterns to minimize environmental impacts and enhance sustainability. This could involve analyzing patterns of land use and land cover, integrating with land use data sources, and implementing algorithms for land use allocation and optimization.
- Environmental impact assessment: Develop a tool that helps organizations assess the environmental impacts of their operations or projects. This could involve integrating with data sources on environmental indicators, such as air and water quality, and implementing algorithms for impact assessment and prediction.
- Renewable energy site selection: Develop a tool that helps organizations identify suitable sites for renewable energy generation, such as solar or wind farms. This could involve analyzing patterns of solar insolation, wind speed, and other relevant factors, and integrating with data sources on land use and topography.
- Sustainable supply chain optimization: Develop a tool that helps organizations optimize their supply chains to minimize environmental impacts and enhance sustainability. This could involve analyzing patterns of material and energy flows, integrating with supply chain data sources, and implementing algorithms for optimization and risk assessment.
ChatGPT is open for everyone to use for now on chat.openai.com. To use it, all you need is an internet browser. Feel free to try it for yourself, and let me know what you find :)
Acknowledgements
I’d like to thank my friends Alexey Ilyshkin and Alexis Oh for sharing their thoughts on chatGPT and AI with me. Our coversations were integrated into the ‘final thoughts’ section of this article.