ChatGPT “Advanced Data Analysis” Fails

MargaretEfron
Learning Data
5 min readOct 5, 2023

--

Photo by David Pupăză on Unsplash

I recently upgraded my ChatGPT membership to ChatGPT Plus, which gives me access to the “Advanced Data Analysis” plug-in (formerly “Code Interpreter”). This plug-in is helpful for analyzing data, especially consolidating information from multiple Excel spreadsheets.

The game-changing feature of ChatGPT Plus is that you can upload documents for ChatGPT to analyze, and it can also create documents in different formats (Word Documents, Excel, CSV, PowerPoints, etc.) that you can download.

Sounds perfect, right? I guess we don’t need any background data knowledge at all!

Not exactly…the Advanced Data Analysis Plug-In does have some limitations, which I ran into while doing my analysis. Learn from my failures below!

ChatGPT “Advanced Data Analysis” Fails

1. The chat times out.

Error message when your chat times out.

When the chat times out, you will see the error message above. You won’t be able to reference a file that you uploaded earlier in the chat. If you want to analyze the document further, you need to re-upload it and ask ChatGPT to read and analyze the file again.

Lesson learned: Have your questions ready for ChatGPT, and save any files that you may need to re-analyze later.

2. Weird metaphors or data analogies that do not make sense!

I use ChatGPT to help me explain technical concepts to coworkers without a technical background. In this case, I asked ChatGPT to explain the difference between an average and median value of a dataset.

My question about the difference between an average and a median.

Most of ChatGPT’s output was helpful, but at the end, it has a bizarre explanation that compares an average and a median to eating a pie. The metaphor is unhelpful and I would not share this with my coworkers, or they would be even more confused!

ChatGPT pie analogy

Lesson learned: Read through any output by ChatGPT — don’t just mindlessly copy and paste it to send to your coworkers and clients. Make sure any data analysis actually makes sense.

3. Data Visualization Fail: Graphs in Unchronological Order

Based on data I provided, ChatGPT created a line chart showing the trend in fall enrollment over the previous academic years.

My prompt for ChatGPT to create a data visualization.

Take a close look at the line chart ChatGPT created:

ChatGPT graph of trends in fall enrollment

Why is this confusing? Look at the X-axis. Typically, the X-axis has years going in chronological order from left to right (earlier years on the left and later years on the right.) In this case, the X-axis goes backwards.

In this case, I would prompt ChatGPT: “Flip this chart so it increases in time as it moves to the right.”

Lesson learned: Tell ChatGPT any constraints upfront, in terms of X- and Y-axis labels, titles, colors, order of years, etc. Analyze any ChatGPT-created data visual very closely, including the X- and Y-axis labels. Communicate with ChatGPT in the chat to correct any errors you find.

4. Data Visualization Fail: No Trend Lines, Unreadable Colors

I asked ChatGPT to make visualizations of Vanderbilt’s admissions rate, using the Vanderbilt colors, and drawing trend lines on all charts.

My ChatGPT input
ChatGPT output — no trend lines, and unreadable colors

What is confusing here?

  • There are no trend lines, even though I requested trend lines be drawn on all charts.
  • Although the legend has different shades of green for the different variables, all the lines on the graph are yellow, and we can’t tell the difference between the variables.

I asked ChatGPT to add trend lines, but it added them in yellow and black, which made them very hard to see. The legend still shows varying shades of green for each variable.

Trend lines are added — but they are almost impossible to see

Lesson learned: ChatGPT is not a human, and can’t see colors like a human. It’s best to be specific about which colors to use and what is or isn’t visible in the graph.

5. Inconsistency Between Graphs

In data visualization, I always try to keep the variables, colors, and labels as consistent as possible between graphs in order to avoid any confusion.

Although it’s analyzing the same set of data, in one line graph, ChatGPT makes the yield rate green, and in another line graph, the yield rate is red. For the average viewer, this would be confusing.

Yield rate is green in this graph from ChatGPT…
But the yield rate is red in this other graph from ChatGPT.

Lesson learned: You should tell ChatGPT the specific colors you’d like it to use for your data visualization, and tell it to keep colors consistent across the charts. Double-check that the category labels, legends, colors, and axes are consistent and would make sense to the average viewer. Confirm this BEFORE sending it to a coworker, client, friend — or it may be embarrassing for you.

Final Thoughts

Don’t expect ChatGPT to read your mind! Be purposeful in designing your prompt and tell it any constraints or requirements upfront. When ChatGPT gives an output, analyze it carefully. For a data visualization, make sure the colors, category labels, X- and Y-axes, titles, and chart types are consistent and make sense.

If ChatGPT’s first output is not perfect, no reason to worry — you can communicate with it in the chat to edit the data analysis as you wish.

Getting the hang of ChatGPT Advanced Data Analysis can take time. But once you know how to use ChatGPT like a data analyst pro, you can use it to its full potential!

For further learning, I highly recommend Jules White’s course on Coursera: ChatGPT Advanced Data Analysis.

The contents of external submissions are not necessarily reflective of the opinions or work of Maven Analytics or any of its team members.

We believe in fostering lifelong learning and our intent is to provide a platform for the data community to share their work and seek feedback from the Maven Analytics data fam.

Happy learning!

-Team Maven

--

--

MargaretEfron
Learning Data

I love all things data and write about Excel, Power BI, and SQL. I currently work as a Business Systems Analyst at the Darden School of Business.