Exploring ChatGPT Code Interpreter (Advanced Data Analysis)

Dave Hulbert
10 min readAug 4, 2023

Update: OpenAI has renamed Code Interpreter to “Advanced Data Analysis” but it appears to just be a name change.

OpenAI’s Code Interpreter (CI) is more than just a Python tool. While it’s primarily designed for Python tasks, its capabilities extend far beyond that. In this post, we’ll dive into how CI can run a wide range of software, work with different file formats, and more. We’ll also explore ways to push its boundaries, unlocking even more of its potential. If you’re curious about maximising what CI can do, this is the place to start.

If you want a more technical bullet point list of the key facts, check out my Exploring ChatGPT Code Interpreter page on Github.

❓ What Can It Do?

When diving into the capabilities of Code Interpreter, it’s crucial to approach it with an open mind. You might think of it as adopting a “hacker’s mindset” — not in a malicious sense, but in the spirit of curiosity and problem-solving. There will be times when Code Interpreter might indicate that a task isn’t possible, but with a bit of creativity and perseverance, you’ll often find a way. After all, at its core, CI operates on computer principles, and with the right approach, you can coax it into accomplishing almost anything any Linux computer can do.

3D Plot, Conway’s Game of Life and a Mandlebrot fractal, all genterated by Code Interpreter

When I first started with Code Interpreter, I anticipated a straightforward coding session. But it felt like every command was revealing another layer of capability.

The real magic of Code Interpreter lies not just in executing code but in its profound versatility. Need to analyse data? Forget about navigating clunky spreadsheets or untangling formulas. With Code Interpreter, you can swiftly transform raw data, run analyses, and visualise your findings seamlessly.

The ability to display images directly in the chat was another discovery. For data enthusiasts, the interactive visualisations it can produce are impressive.

One capability that genuinely caught my attention was OCR. Even if I haven’t found a daily use for it, the fact that Code Interpreter could interpret and understand data from unexpected sources showcased its adaptability.

While Code Interpreter can handle tasks like reading and writing PDFs, it's just one facet of its expansive capabilities. By the end of my exploration, I realised Code Interpreter isn't merely a tool—it's a realm of endless possibilities.

⏩ 1 minute quick start

Never used Code Interpreter? It only takes a minute to get set up:

  1. Go to https://chat.openai.com/?model=gpt-4-code-interpreter (sign up for Pro if you haven’t already)
  2. Enable Code Interpreter if it’s not already enabled (see images below)
  3. Copy and paste this prompt:
List various cool and helpful things the code interpreter model can do.
Then pick a couple and demo them to me. Make it really interesting.
Click on your name on the bottom left, then Settings & Beta, then Beta features, then enable Code Interpreter

⚙️ How Does Code Interpreter Work?

Venturing into Code Interpreter is akin to exploring a well-designed toolset. It operates within a Python Jupyter notebook environment, a dynamic platform that enables fluid code conversations.

📂 File Interactions

Code Interpreter streamlines file interactions. Upload files—whether it’s code, data, or multimedia—and retrieve results directly. While it’s adept at displaying formats like GIFs and SVGs, its real prowess is in handling and interpreting a diverse range of file types.

The default location is the /home/sandbox directory, which serves as the user’s home and the current working directory. It’s worth noting this is different from /mnt/data, which is used for uploading and downloading files.

🌐 Environment Insights

The backbone of Code Interpreter is a Linux environment. It uses a Linux Kernel from 2016 (version 4.4) and is hosted on Ubuntu 20.04.6 LTS. The blend of an older kernel with state-of-the-art AI might be unexpected, but I’ve only come into a couple of issues with it. It runs CPython 3.8.

$ uname -a
Linux 86b8fff9-b1d1-443d-a54f-260983692f56 4.4.0 #1 SMP Sun Jan 10 15:06:54 PST 2016 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.6 LTS
Release: 20.04
Codename: focal

⛔ Boundaries and Limitations

Code Interpreter comes with certain security restrictions, implemented by OpenAI to ensure safe interactions on their servers. These include no internet access, a command timeout of 120 seconds, and a no root access. This last limitation impacts software installations and certain kernel features, like FUSE, which is unavailable due to these security constraints.

If CI generates over 4096 characters of output lots of text then it will be truncated down to 2038 characters (not 2048 🤷) before the LLM sees it, with <<OutputTruncated>> appended, even though the full text is output to the user. It’s essentially doing this:

truncate = lambda s: s[:2038] + "<<OutputTruncated>>" if len(s) > 4096 else s

🔧 Tools and Libraries

While Python is a focal point, Code Interpreter is far from one-dimensional. It’s equipped with a variety of Ubuntu packages suitable for diverse tasks. To provide a clearer understanding, I managed to extract detailed lists from Code Interpreter itself: a list of all installed software and a list of Python packages.

🛠️ Getting the Most Out of Code Interpreter

Code Interpreter (CI) primarily supports Python, but it's more versatile than it seems.

🚧 Soft Limitations

At first glance, Code Interpreter (CI) seems primarily tailored for Python. However, this initial impression masks a vast realm of capabilities. CI's foundational training on Python has inadvertently made it slightly oblivious to its expansive reach, but with a touch of exploration, you can truly harness its potential.

CI's Linux-based environment gives it access to a comprehensive suite of standard Linux commands and software. This means that beyond its primary Python capabilities, CI can run almost any software built for Linux. Whether it's built-in tools or software you introduce to CI, the platform has the flexibility to execute it. This versatility transforms CI into a robust and adaptable tool, capable of handling a wide array of tasks.

When it comes to file operations, CI leans towards the `/mnt/data` directory, which is different to the current working directory. If you’re looking to access a file or a piece of data that CI has crafted, it provides a convenient link for downloading directly from `/mnt/data`. But CI can also link directly to files in the current working directory, or in fact any file on the system that it has read access to.

Installing New Python Packages

To install brand new Python packages:

  1. Search for the package on PyPI.
  2. Download a suitable file. Those labeled any or x86_64 are recommended. If there’s different options available, look for one built for CPython 3.8 (cp38)
  3. Upload it to CI.
  4. Get CI to try using it.
  5. If the package has dependencies, return to the first step.

If you go round in circles too much, it might be better to try getting all the packages locally with Python’s virtualenv.

Here’s Pygame working, as an example, which isn’t available by default.

Pygame library uploaded and working: https://chat.openai.com/share/c1c7b55d-f4b3-4141-b434-6d38f7ca16ca

Running JavaScript Code

In addition to Python, CI can create, edit, and execute JavaScript files, either self-written or uploaded. This capability widens CI's reach, especially for users who are more familiar with JavaScript than Python.

  1. Download Deno from Github, which is a small JavaScript runtime, similar to Node. You want the `deno-x86_64-unknown-linux-gnu.zip` file
  2. Upload it to CI, with a prompt like extract this, then chmod u+x and execute it
Code Interpreter running JavaScript: https://chat.openai.com/share/b6958421-4bc7-4414-b7ea-8d2c8247bd8a

Installing and running other software: static binaries and self-contained packages

Software is normally dynamically linked on Linux, which means it depends on various system libraries and ideally needs installing with a package manager. But some software is available either statically — where all the libraries are compiled into the executable — or as self-contained executables, where the file includes all the dependencies too.

CI defaults to Python 3.8, which is a few years old now. If needed, you can download Python 3.11 and get CI to run it. This doens’t change the notebook environment itself but the upgrade still allows CI to run almost any Python application and newer libraries, that might not support 3.8.

Running AppImages

AppImages are a standard for self-contained software packages, relying on FUSE. Code Interpreter can’t run them out of the box as the sandbox doesn’t let you use FUSE, however it can extract them using the --appimage-extract flag, then find the right file and run it.

Lots of these are available to download, making it quick and easy to get new packages working on CI.

For example, ImageMagick isn’t normally available but uploading its AppImage with the following prompt allows CI to manipulate images more easily. I found an AppImage version of PHP that works too.

make this executable,
then run it with --appimage-extract flag to extract the files,
then run ./squashfs-root/AppRun
ImageMagick running: https://chat.openai.com/share/6f6b65c7-1ac6-4a00-b870-4382b8c40516

🤯 Beyond Code Interpreter

While Code Interpreter is an impressive tool within the ChatGPT ecosystem, it’s just a glimpse into a broader universe of capabilities and alternatives. There are several exciting advancements and tools that complement or even challenge what CI offers.

📦 Pandora: ChatGPT Coding Unleashed

Developed by me, this ChatGPT plugin runs in a local Docker container so it can easily access your codebase. It stands out by offering internet access and optional root privileges, which means you can freely install any packages like Node, PHP or databases. Plus, it’s open-source, making it adaptable to various needs. Another benefit over Code Interpreter is that you can use it alongside other ChatGPT plugins.

Pandora does require running the Docker container on your local machine, so it won’t work on a mobile phone, yet. It also requires developer access to ChatGPT plugins, which is currently behind a wait list.

Noteable ChatGPT plugin

This plugin is half way between Code Interpreter and Pandora, providing a remote Jupyter notebook, with a bit more access than CI. It offers internet access and allows Python library installation. It still doesn’t give you root privileges but having the whole internet available makes it much more powerful than CI. It also persists files, ensuring continuity in your work.

I’ve had quite a bit of success with Noteable. Though it’s a bit more clunky than CI for basic tasks, so if I’m not on a machine with Pandora running then CI is still my go-to.

Open Source Clones

Software like GPT-Code UI is starting to appear, which gives you a Code Interpreter-like interface, using OpenAI’s pay-as-you-go APIs, instead of requiring ChatGPT Pro. Some of these are also better suited to working on local files too. If you don’t have ChatGPT Pro then spending a few cents or pennies will give you a taste of what’s possible with CI.

Other Large Language Models

As well as working with OpenAI’s APIs, some of the open source tools let you use your own local LLM like Llama. There’s way too much to cover here and by the time you read this, the landscape will have already changed. I suggest looking at the r/LocalLLaMA subreddit and WizardCoder to start with.

Autonomous Agents

The alternatives above all work with a chat paradigm, which wait for user input before proceeding. Thanks to libraries like LangChain, it’s now easier to take the human (partially) out of the loop and give a LLM a list of tasks for it to work through, checking as it goes along. Some examples are AutoGPT, Sweep, and Baby AGI.

IDE Plugins

A different paradigm again is working alongside you in your IDE. These aren’t as capable as things like CI, Pandora and autonomous agents but are worth a mention as they have better context of the code you’re looking at.

GitHub Copilot X is a standout in this category. It integrates with VS Code, offering code suggestions as you type and assistance with things like fixing bugs and refactoring.

Conclusion

The world of coding and AI is rapidly evolving, with tools like Code Interpreter showcasing the incredible fusion of the two. As we’ve seen, while CI offers a diverse range of capabilities, the broader ecosystem is bursting with innovations and alternatives that cater to varied needs.

Whether you’re diving deep into CI, exploring its boundaries, or looking beyond to tools like Pandora, there’s now lots of options available. It’s an exciting time to be at the intersection of coding and AI, and I’m eager to see what new innovation there will be in the next weeks and months.

Whether you’re a developer, a tech enthusiast, or just curious, I hope this exploration has illuminated some of the possibilities and sparked your own journey of discovery.

--

--

Dave Hulbert

Engineering, AI, Strategy and Compliance. Work at Passenger @passengerteam