Turn Hours of Data Work into Minutes with ChatGPT’s Code Interpreter

Conor X. Brogan, M.S.
6 min readJul 15, 2023

--

Recently, OpenAI introduced ChatGPT’s Code Interpreter plugin for ChatGPT Plus subscribers. As someone who has always utilized ChatGPT to assist in coding endeavors I thought to myself, “Interpret Code? It can already do that!”. Oh, how wrong I was.

ChatGPT baseline can already greatly assist developers and aspiring coders in providing rudimentary boilerplate code, analysis of existing code, and can nudge prompters onto a viable solution path, amongst other things. This is already phenomenal, as many developers and students can attest to, you can turn what is hours of research on Stack Overflow into minutes utilizing ChatGPT. For me, this was akin to humanity discovering fire for the first time. Using the Code Interpreter plugin, on the otherhand, is realizing you can use fire to land a man on the moon.

Before I show you how to utilize the Code Interpreter to drastically decrease your workload in data science you need to understand how to access the plugin. Currently, the plugin is exclusively available to ChatGPT Plus subscribers and you can only prompt it twenty-five times every three hours ($20 a month is well worth it, in my opinion, for having even limited access to the Code Interpreter). To access the Code Interpreter, start a new chat and select GPT-4:

Then, select “Code Interpreter Beta”. You will know if you are using the Code Interpreter when you see this icon:

Now, I will not be going into every single function the Code Interpreter has, as that is not the purpose of this guide; however, I will be demonstrating how you can employ this tool for descriptive statistics and setting up data for future analysis. If you would like to follow along with this tutorial I will be utilizing data from the alcohol-consumption-vs-gdp-per-capita CSV file from Pralabh Poudel’sAlcohol Consumption by Country” on Kaggle, Download (475 kB), licensed under CC BY 4.0.

To begin, I will load the CSV into Code Interpreter. That is right, you can now directly load files into ChatGPT utilizing the Code Interpreter. Once the file is loaded into the prompt selection, I will request that ChatGPT turns it into a Python dataframe.

Code Interpreter will display the dataframe in a similar way a Jupyter notebook or any other IDE would. Additionally, by selecting the drop-down icon next to “Show work” you can see the code Code Interpreter uses to generate the corresponding result.

Next, let us clean the data so it is more manageable, relevant to the present, and overall tidier to work with.

Now that the data has been transformed for greater useability, we can begin to explore the data and record our findings. I begin with the following prompt:

The results of which are the following images, all generated by ChatGPT’s Code Interpreter:

As you can see from the images above, the Code Interpreter was able to generate exactly what I asked for in regards to these descriptive statistic visualizations. Referencing the last image, you should have noticed two anomalies: one, being that I forgot to ask for the top alcohol consuming country in 2005 (was this planned, or did I simply forget? Maybe I have beef with the year 2005? You will never know); two (the more important anomaly), it only graphed three countries when I requested four. If you did your due dilligence, as you should have before jumping straight into data cleaning and analysis, you would have realized from the data card description of the data on Kaggle that the variable time span is only from 2000 to 2018. Therefore, there is no country that is the top alcohol consuming country for 2020 (based on the data provided; according to statista, it was Seychelles).

So far, the utility that ChatGPT’s Code Interpreter provides for the average data analyst is quite evident. Now that we have created some elementary data visualizations, let us look deeper into the relationship between alcohol consumed and GDP with a scatterplot.

Harkening back to the Correlation Matrix shown prior in this guide, a moderate amount of positive correlation was determined and from a visual standpoint I am sure many of you had a picture in mind of what a scatterplot regarding the two variables may have looked like. Reducing our sample size to those observations which are in Europe, we recognize that there is a varied relationship between alcohol consumption and GDP across countries and years.

Since we have now narrowed our focus on to Europe, and we want to gain further insight regarding GDP and alcohol consumption, we are going to apply k-means clustering to our scatterplot:

Code Interpreter is able to successfully cluster the data into identifiable categories, setting up the data as a whole for further extrapolation and analysis.

To conclude, Code Interpreter can do a number of revolutionary things to assist data analysts in their day-to-day assignments. From quickly cleaning data, to creating visuals, to saving time on preparing the data for more complex, hands-on, statistical analyses. This tool has enabled data scientists to spend less time on tedious, nonetheless important, tasks and more time and energy on complex calculations and insights. Data analysts and developers should not fear this tool taking their jobs, they should embrace it as a way to evolve their careers by utilizing it to enhance their auditing and effeciency capabilities. Just like ChatGPT baseline, Code Interpreter does mess up on quite a few subjects and does not perform well when it comes to complex mathematics and context. For many, this tool should be utilized for auditing and expediating trivial data tasks, though its potential truly is limitless especially with future updates to come.

Thank you for taking the time to read my article; hopefully I was able to show you how ChatGPT’s Code Interpreter can assist you in daily data tasks and chores!

LinkedIn: www.linkedin.com/in/conor-x-brogan

Additional Note: Originally, I was going to conclude this article by having Code Interpreter export a CSV of the most recent dataframe we constructed to demonstrate the tool’s ability to generate files as well as illustrate how it encourages the natural flow process associated with data science project managment. It was going to be great, I would have left all my readers on a cliff hanger by having ChatGPT export the dataframe as a file named “Datasetforpt2”, it would have generated hype and garnered interest, but that was all ruined. For some odd reason, after many hours of repeating this tutorial through Code Interpreter, the one time it does not work is for the environment which I used for this article. Well, now you can look forward to part two into the future!

--

--