From Code Interpreter to Advanced Analytics: ChatGPT’s Evolution in Data Analysis

In a previous article, we delved into how to utilize ChatGPT’s Code Interpreter, a groundbreaking tool that promised to reshape coding as a whole and offer enhanced analysis capabilities. Since then, OpenAI has not only elevated but reshaped Code Interpreter, morphing the tool into the now more sophisticated Advanced Data Analysis application. This metamorphosis signals a new, yet subtle, revolution in the world of data analytics, enabling data professionals of all kinds to focus less on arbitrary coding and more on developing analytical insights. In the following, we will explore the newfound robustness of ChatGPT’s Advanced Data Analysis in addition to how further development in prompt engineering has led to more accurate outcomes.

Unlike our previously aforementioned article, this piece won’t provide a sequential, step-by-step guide. Instead, it offers a brief showcase of the capabilities of ChatGPT’s Advanced Data Analysis tool. While not exhaustive, I encourage readers to experiment firsthand; the following dataset, found here, by Ulrik Thyge Pedersen on Kaggle will be employed. It’s important to note that the guidelines from the previous article remain pertinent, as accessing the Advanced Data Analysis remains consistent with the former Code Interpreter applicaiton.

The initial demonstration of ChatGPT’s Advanced Data Analysis, presented below, mirrors the ease and simplicity of examples in our previous article. I prompted the system with “Create a data visualization of mass vs height, color code each observation by character name.” Given the vast number of character names in the dataset, it was obvious that the color coding might be less than optimal; however, this is a good starting point for further analysis and visualization.

The subsequent visualization showcases the enhanced capabilities and refinement of the Advanced Data Analysis compared to the earlier Code Interpreter. When prompted with “Remove the outlier from the graph,” the system efficiently excluded the outlier (Jabba the Hutt) and automatically adjusted the graph’s title and scale to represent this modification.

Next, we will conduct a straightforward linear regression. By enhancing the existing visualization with the command “Create a red line of best fit through the data,” it becomes evident that ChatGPT can seamlessly integrate regression models into its visual outputs. For clarity, I have omitted the color coding by character name to emphasize the significance of generating regressions via ChatGPT.

Although the Code Interpreter was adept at handling basic regressions, it had limitations when faced with more mathematically intensive tasks, such as generating a LOESS curve. While plotting the mass against the weight of unique Star Wars characters may not typically necessitate a LOESS curve, I have chosen this approach to illustrate the Advanced Data Analytics tool’s capability to manage and produce intricate regression analyses when prompted.

Highlighting a feature carried over from the Code Interpreter, ChatGPT’s Advanced Data Analysis adeptly executes K-means clustering. For clarity, I have omitted the LOESS curve from the preceding visualization to provide an unobstructed view of the K-means clustering process.

Transitioning to the analysis of Star Wars characters’ homeworlds, the Advanced Data Analysis tool offers a significant improvement in creating detailed heatmaps compared to the earlier Code Interpreter. With the Code Interpreter, heatmaps occasionally suffered from optimization issues and inaccuracies. However, Advanced Data Analysis has enhanced the ease of heatmap creation, modification, and presentation. It is worth noting that, for this demonstration, I restructured the original dataset by separating each value in the ‘films’ column into their own distinct columns, facilitating a more comprehensive data analysis.

To showcase the Advanced Data Analysis tool’s improved capacity for information retention throughout a dialogue, I instructed ChatGPT to refer back to the previously produced heatmap and undertake a basic probability analysis.

Leveraging GPT-4.0 and Advanced Data Analysis, ChatGPT offers users comprehensive guidance on executing sophisticated data analyses through clear, actionable steps. With the probability analysis shown below, ChatGPT reliably delivers precise answers derived from the provided data and its self-generated visualizations. Given Star Wars’ focus on the Skywalker lineage rooted in Tatooine, a 60% probability of a character from this homeworld appearing in over two movies is expected. The 27.27% probability for a character from Naboo is also logical, considering pivotal roles like Padme and Emperor Palpatine. This exercise highlights ChatGPT’s Advanced Data Analysis’s aptitude for statistical techniques, and although not demonstrated here, its capability for causal inference and predictive analysis.

To wrap up, the following visualization was crafted to highlight pivotal statistical findings. Though Jabba the Hutt stood out due to his mass, numerous characters exceeded his height across the films. The graph presented focuses on characters taller than Jabba, with a color-coded representation based on species, and annotations denoting each character’s name. Visually, it’s apparent that Humans display the most height variation — a likely decision by George Lucas to mirror real-world diversity in the Star Wars universe. Notably, Chewbacca towers above all, consistent with the stature of Wookies. Intriguingly, despite being the same individual, Darth Vader and Anakin Skywalker show significant height and mass discrepancies, likely attributed to Vader’s weighty cybernetic prosthetics. Such analyses, once again, capture the capabilities of ChatGPT’s Advanced Data Analysis, enriching the analytical depth available to data professionals and promoting diverse interpretative angles in their endeavors.

In summary, ChatGPT’s initial offering, the Code Interpreter, which was already a promising analytical tool, has matured into the even more potent and versatile Advanced Data Analysis. While this article provided only a glimpse into the variety of potential visualizations and analyses available, I hope, in the near future, that business and data professionals are encouraged to explore the capabilities of ChatGPT’s Advanced Data Analysis, recognizing its potential to elevate their analytical endeavors.

Thank you for taking the time to read my article; hopefully I was able to show you how ChatGPT’s Advanced Data Analysis tool can be useful in creating and garnering value-adding insights for everyday professionals!

LinkedIn: www.linkedin.com/in/conor-x-brogan

--

--