Curious about the new ChatGPT Code Interpreter plug-in? Here’s how to get started.

MargaretEfron
Learning Data
8 min readAug 17, 2023

--

Photo by Lukas Blazek on Unsplash

I finally bit the bullet and got a membership to ChatGPT Plus, which costs $20 per month.

I mainly got it to try out the new Code Interpreter plug-in, which launched in July.

The Code Interpreter is different from regular ChatGPT because it can:

  • write, execute, and debug computer code in a Python environment
  • carry out complex calculations
  • generate data visualizations based on user-uploaded files

In this article, I will walk through:

  • setting up the Code Interpreter and making sure it’s turned on
  • uploading a file of raw business data into ChatGPT Plus
  • analyzing the data — creating data visualizations, finding outliers, and more!

Let’s get to work!

Step 1: Make sure Code Interpreter is turned on in ChatGPT

Once you’ve paid the $20 (per month) for the ChatGPT membership, follow these steps:

  1. Go to the icon in the bottom left of your screen (where your name & picture is located)
  2. Click on “Settings & Beta,” located below “Help & FAQ”
To turn on the ChatGPT Code Interpreter feature, click on “Settings & Beta.”

3. Click on “Beta features” and toggle on the “Code Interpreter” feature.

To turn on Code Interpreter, go to “Beta Features” under settings and toggle on the Code Interpreter feature.

4. Make sure that the Code Interpreter feature is turned on for your chat. At the top of your chat, it should say “Code Interpreter,” as in the picture below.

Make sure the Code Interpreter is toggled on for your chat!

Step 2: Upload a file of raw business data (in a CSV, Excel spreadsheet, SQL database, etc.)

Next, choose a file with a dataset you want to analyze. For this example, I used the Maven Analytics Data Playground, which has free, downloadable, real-world datasets about New Year’s Eve resolutions, online chess games, Harry Potter scripts, S&P stock prices, and more.

Maven’s Data Playground has free, real-world data sets which are easy to download.

I downloaded the “Parental Leave Policies” from Maven Analytics Data Playground, which is a CSV with crowdsourced parental leave data from 1,601 companies. Maven Analytics also provided a list of recommended analysis questions for this dataset.

Recommended data analysis for Maven Analytics dataset.

Below, I uploaded the “Parental Leave Policies” CSV into the Code Interpreter.

Submit the file by clicking on the plus sign next to “Send a message”

After I uploaded the ZIP file, the Code Interpreter gave me a list of the files which are included and a list of the variables in the dataset, including company, industry, paid maternity leave (weeks), unpaid maternity leave (weeks), paid paternity leave (weeks), and unpaid paternity leave (weeks).

Warning: files will not persist beyond a single session in the Code Interpreter. For example, I uploaded the “Parental Leave Policies” CSV yesterday for analysis, and when I came back today to ask more questions, the Code Interpreter was unable to answer my questions. I had to re-upload the file.

Step 3: Ask Code Interpreter to create a data visualization.

Q: Is maternity leave typically longer than paternity leave? (Maven Analytics). Short answer: Yes.

Code Interpreter answered this question by calculating the average paid maternity leave and the average paid paternity leave. Since the average paid maternity leave is greater, the Code Interpreter rightly concluded that maternity leave is typically longer. (Note: I checked the average paid maternity and paternity leave in Excel and got the same answer, so the Code Interpreter was correct.)

Code Interpreter calculated the average paid maternity and paternity leaves across all companies in the dataset.

I asked Code Interpreter to turn this into a data visualization, but I didn’t specify what type. Code Interpreter gave me a bar chart comparing the average lengths of paid maternity and paternity leave:

Code Interpreter produced a bar chart to compare average paid maternity and paternity leave.

The bar chart has an appropriate title, the x and y axes are labeled correctly, the y-axis is plotted using regular intervals, and the average values are correctly charted. This chart is easy to read. I’m impressed!

Step 4: Ask Code Interpreter to find outliers in your dataset.

Q: Which companies offer the most paid parental leave weeks? (Maven Analytics).

The Code Interpreter pointed out that since “parental leave” can encompass both maternity and paternity leave, it should combine both leaves and find out the company that offers the most.

It lists out the top 5 companies that offer the most total paid parental leave weeks (with maternity and paternity leave combined).

Code Interpreter provided me with a list of the top 5 companies with the most paid parental leave weeks.

However, I wanted to double-check this data in ChatGPT. I saw that #5, Hewlett Packard, offers 52 weeks of paid parental leave (combining maternity & paternity leave.)

I asked ChatGPT: “Are there any other companies that tie for parental leave at 52 weeks?” Code Interpreter gave me a third company (Salesforce) that they initially didn’t include in their list of the top 5.

Warning: When Code Interpreter gives you a list of the “Top 5,” “Top 10,” etc., make sure to double-check that there are no other variables in your dataset that are tied with #5 or #10.

Companies tied for paid parental leave at 52 weeks (combining both maternity and paternity leave.)

Step 5: Ask Code Interpreter about the distribution of the data

Q: What is the distribution of the parental leave weeks offered? (Maven Analytics).

Code Interpreter provided 2 histograms to separately plot the distribution of paid maternity and paternity leave weeks.

Code Interpreter provided a histogram to show the distribution of parental leave weeks.

I asked a follow-up question: “Can you analyze the histograms you just provided?”

Code Interpreter analyzed the histograms it provided for the paid maternity and paid paternity leave.

Code Interpreter had spot-on insights about paid maternity and paternity leave:

  • Both paid maternity and paternity leave are clustered around the 0–5 week range.
  • Some companies offer significantly longer parental leave than others.
  • Many companies do not offer paid paternity leave at all.

I asked Code Interpreter if it could overlay the best-fit normal distribution over the paid maternity leave histogram, so I can see how closely that model fits the data. Code Interpreter correctly fit the normal distribution in the histogram, as you can see below.

Code Interpreter fit a normal distribution over the histogram for paid maternity leave.

Step 6: Ask Code Interpreter if there is a noticeable difference between two variables.

Q: Are there noticeable differences between industries? (Maven Analytics). Short Answer: Yes.

Code Interpreter analyzed the average paid maternity and paternity leave for each industry. I asked Code Interpreter to create a data visualization, which it did by focusing on the top 10 industries by average paid maternity leave.

Note: Code Interpreter just focused on the top 10 industries for clarity, since that would make an easier-to-read bar graph. However, the original dataset contained a total of 185 unique industries.

Code Interpreter provided a bar graph showing the average paid maternity and paternity leave for the top 10 industries.

I was impressed by this bar graph. It has a correct title, correct labeling of the x and y-axis, and includes a legend. However, the x-axis is hard to read since the industry names are so long.

There is clearly variation between different industries, with some industries offering more maternity and paternity leave than others. Transportation offers the highest # of weeks for average paid maternity leave (52) with consumer packaged goods offering the lowest # of weeks for average paid maternity leave (17).

Note that this may be misleading since the Code Interpreter only focuses on the top 10 industries and leaves out the other 175 industries. The industry with the lowest paid maternity leave OVERALL is Natural Resources: Agrochemical, which has an average of 0 weeks.

Final Thoughts: There is room for improvement

I’m impressed by the Code Interpreter’s capabilities: being able to upload and analyze datasets quickly is a game-changer.

However, I still think we need a human eye to look over the original dataset.

Sometimes the Code Interpreter gave me false information.

For example, when I asked the Code Interpreter to list companies with 0 weeks of paid maternity leave, it gave me a list of 10 companies but pointed out that in total, there were 59 companies. It provided me with a link to a downloadable CSV file with those companies.

In the downloadable CSV file, there were only 53 companies with 0 weeks of paid maternity leave, even though the Code Interpreter said there were 59 companies.

I asked the Code Interpreter to explain the discrepancy, and it was able to resolve the issue. But if I was working fast and hadn’t caught that difference, I would have reported incorrect data.

I asked the Code Interpreter to explain the discrepancy between the number of companies it stated and the number of companies in the CSV.

The Code Interpreter often gave me the “Top 5” or “Top 10” of the companies (e.g. “Top 5 companies with highest paternity leave”), when the 5th or 10th company was tied with other companies that weren’t listed.

It does not give an accurate representation of the data to list the “Top 5” companies with the highest paternity leave if the 5th company has the same # of weeks as the next 20 companies.

The Code Interpreter did NOT tell me how it dealt with N/A or blank values or how it affected any calculations.

The CSV file had “N/A” and blank values for some of the categories, including Paid Paternity Leave. I’m assuming that it left blanks and N/A out of the analysis, but ideally, it would have alerted me to the blank values so I could figure out how I wanted to handle them.

If you are analyzing the same dataset over many chat sessions, it would get annoying needing to upload it over and over into ChatGPT.

— -

So far, I’ve enjoyed playing around with the ChatGPT Plus plug-ins. I will keep you posted on what I learn! Stay tuned for more articles about ChatGPT Plus’s pros and cons.

Data Source: Maven Analytics Data Playground

Other helpful sources:

--

--

MargaretEfron
Learning Data

I love all things data and write about Excel, Power BI, and SQL. I currently work as a Business Systems Analyst at the Darden School of Business.