Making Pandora, a super-charged coding plugin for ChatGPT

Dave Hulbert
11 min readAug 4, 2023

--

ChatGPT comes with a Code Interpreter plugin that lets ChatGPT run any code you ask it to, for things like data processing and editing files. This is really cool but if you’ve had a play with it, you’ll know that it’s very limited. Here’s how I made Pandora, a super-charged version of Code Interpreter. Pandora lets ChatGPT access the internet, code in any language, access local files and install any software.

Video of Pandora running JavaScript in a Docker container

Background

What happens when you give a chatbot access to a computer? Beyond being able to write its own code, system access means the chatbot can run code and see the output. This allows it to do all sorts of data processing and analysis. It also means the chatbot can fix its own errors quickly, rather than copying and pasting back and forth. Lots of potential.

So back in March when OpenAI’s Code Interpreter tool was announced, I signed up for the waitlist straight away. Back in June I got early access — but not to Code Interpreter. Instead I got access to the plugin development system, allowing me to create my own plugins. This was cool in itself but I really wanted to play with Code Interpreter. So I decided to make my own version of it, with more capabilities.

I’m sharing this because of the interesting development journey and because I’m keen to hear any feedback people have. Feel free to comment!

Avoiding Code Interpreter and other plugins’ limitations

FYI, Code Interpreter is now in beta, without a waitlist, so anyone can access it. I wrote a guide to ChatGPT Code Interpreter if you’re interested.

Before getting access to Code Interpreter, it was clear that it was very locked down. It’s essentially an Ubuntu sandbox with no root or internet access. I assumed that I’d get access to Code Interpreter at some point, so if making my own plugin, I thought I may as well make something at the other end of the spectrum. Internet access? Check. Root access? Check. Safety checks? Nah, that can come later.

Comparison of ChatGPT coding plugins

If it’s not obvious, I went with the name “Pandora” due to Pandora’s box, which can release curses upon mankind if not handled carefully.

I had a play with other plugins. Some gave ChatGPT lots of possible functions to call, which meant it has more context to keep track of. Others weren’t fine tuned enough and ChatGPT kept making the same mistakes calling them. To work around this, I wanted Pandora to be super-simple for ChatGPT to use, whilst also being very easy to tune and for ChatGPT to learn from its mistakes.

Bringing Pandora to life

I skim read the OpenAI plugin development docs and realised that getting something working would be pretty easy, thanks to their great interface design. The Code Interpreter plugin is little more than a function that ChatGPT can call to send and receive data to a sandbox. Most of OpenAI’s work has likely been in security, scaling and tuning the model to get good results.

To write a ChatGPT plugin you need 3 things:

  1. An API
  2. API docs that ChatGPT can read in OpenAPI format
  3. A .well-known/ai-plugin.json manifest file, describing the plugin
Pandora logo, created by Freepik — Flaticon.

I picked PHP for the API, as that’s what I have most experience with. With hindsight, I should have picked Python, as that’s ChatGPT’s favourite.

The first step was a barebones exec function call. Little more than this dangerous remote shell, running on localhost:

<?php
// Don't run this code on a webserver!
// I'd make this a one-liner but the Chrome browser blocks it
$command = $_POST['command'];
$output = shell_exec($command);
echo json_encode($output);

Once I had this in place in a Docker container, I installed it as a ChatGPT plugin.

Before the actual shell_exec(), I did quite a few tests with just echo-ing the command, to make sure it wasn’t going to destroy everything.

I also tried copying some of the demos people had done with Code Interpreter. Aside from displaying images in chat and being a bit clunky, Pandora could do pretty much everything Code Interpreter could, with the added benefit of internet access. Here’s some demos.

Pandora improves Pandora

From this point, ChatGPT could start editing the Pandora plugin — essentially bootstrapping itself!

I started by getting Pandora to clean up the code a bit (it’s still very rough), add in some checks and make the Docker environment more robust. I also got it to write some basic tests, so it could make sure it wasn’t breaking itself. Seeing Pandora (slowly) improve itself was very cool — reminiscent of the Singularity. What had I unleashed?

Security

It’s definitely worth addressing that Pandora currently doesn’t have much in the way of security protections, beyond running in a Docker container. By default it’s designed to be as powerful and autonomous as possible, which means eschewing the kinds of checks that would make this properly safe to use.

Although Pandora gives ChatGPT extensive system access, from all my testing it hasn’t caused any problems or done anything destructive. There’s still a risk though, so be careful if you play with it.

Early issues

The basic remote shell worked but it wasn’t great. ChatGPT had to write whole files as one liners, getting confused with escaping. It also kept making assumptions. At this point, I could see that lots of OpenAI’s work with Code Interpreter was tuning it to interact well with the system. GPT-3.5 and GPT-4 are trained on lots of data like StackOverflow Q&As. The public training data is copied and pasted extracts of what works, rather than verbatim copies of the input and output. In essence, lots of its training data is based on the assumption that theres a human typing the commands and skim reading the output.

I wanted to keep the function interface small but decided the trade off of adding a writeFile command would be worthwhile, as then I could take care of the checks and ChatGPT wouldn’t have to struggle with trying to escape a long shell command.

This helped but there was still lots of room for improvement. Unfortunately I don’t have the people and money to throw at this problem that OpenAI did.

Gently guiding ChatGPT

I found I was typing the same kinds of instructions to ChatGPT when it was working on Pandora. Things like…

  • find and replace instead of rewriting the whole file
  • check the file exists first
  • check each step works before proceeding

This transitioned to a text file that I could copy and paste. Then started telling ChatGPT to “read guide.txt and follow it”. Like any developer would, I got tired of typing that sentence so built an API for it and added it to the Pandora plugin. Now it was the case of just typing “read the guide”.

Note: the OpenAPI yaml docs and ai-plugin.json help a bit with this but you can’t give it much context there. For example I have this for the exec command, just giving it enough context and a hint of its capabilities.

Executes a shell command. Current dir is /pandora/WORKDIR
Shell command to execute. Eg `ls .`, `curl foo`, `apk add foo`, `docker run`

This grew quite a bit over time and I split the file up. I had a main file of core guidelines, which also linked through to detailed guides for specific use cases. This library of guides is extensible and very scalable, seemingly more effective than other approaches.

Using guides like this also means that the Pandora codebase can be very lightweight. Additional functionality can be written as terse guides in English, rather than API functions or by having extra system requirements.

You can read the main guide on Github and see the additional guides too.

Forcing ChatGPT to learn

Having to explicitly ask ChatGPT to read the Pandora guide before using it wasn’t a great user experience.

I added a guideFollowed flag to the exec command that was required to be true. That way, ChatGPT has to confirm it’s read the guide. Unfortunately ChatGPT will see the API docs tell it to set the flag and sets it to true, without even reading the guide. Sneaky!

Through using Pandora and ChatGPT in general, it’s clear that failure is was very good way for it to learn. For example, if it saw a Python error message, it could often fix it. I decided to take this approach with the guide, forcing it to fail, so it could see how to fix it. I changed the API docs to say that guideFollowed is optional. This way ChatGPT tries calling exec without the flag and gets back an error message it knows it has to listen:

⚠️ Error

The guide must have been read recently and followed.

This was quite elegant and worked exceptionally well. I was really pleased with this approach: making ChatGPT think it can ignore the guide, then tricking it into properly reading the guide. Now I was the sneaky one.

As an extra step, I also checked the return status of the command being executed and added an extra warning if there was an error:

VERY IMPORTANT: check the guide (getGuide) for how to fix problems before attempting something yourself!

Lightweight Linux

I put Pandora in a Docker container, partly for security but also for portability and to allow ChatGPT to have free reign of a system.

The Dockerfile is minimalistic by design. My thinking was that if the system is customised then ChatGPT won’t be trained on it. It’s trivial for Pandora to run something like `apk add ffmpeg` if it wants to. In fact it’s so trivial, when it’s confident then it does it without even asking. This was quite a surprise the first time it happened!

It seems like the tool ffprobe, which is part of the FFmpeg suite and is used to analyze video files, is not installed in the current environment. I will need to install it before proceeding. Let’s do that.
[Source chat transcript]

Docker Container Magic

Installing all sorts of packages in the same container isn’t ideal. I got ChatGPT to suggest ways to allow the Pandora container to manage other containers and evaluate them based on the design goals and context of Pandora. The options were the heavyweight Docker in Docker (dind) or the very simple mounting of the host’s /var/run/docker.sock inside the container.

We settled on the latter, which allows Pandora to control other containers on the host. This is more of a security risk but it’s hard for ChatGPT to accidentally cause damage. You can see Pandora’s Docker guide, which explains the context that ChatGPT needs to make effective use of Docker on the host.

This worked very well and allowed Pandora to quickly create new containers as it desires. It can also manipulate your existing containers too, helping to debug or improve them.

Me: write hello world scripts in python and node. then run them

ChatGPT: [Working…] It seems like Python is not installed in the current environment. Would you like me to install it for you?

Me: yeah. dont install node though — use docker to run that

ChatGPT: [Working…] Both scripts have been executed successfully.

[Source chat transcript]

Running new Docker containers

I wanted Pandora to be able to access files in other projects more easily too. Normally this would require changing how the Docker container is ran each time. But with a bit of experimentation, I found a way for Pandora to find symlinked directories and automatically mount them in the container. Now, getting Pandora to work on files in other directories is as simple as running a simple ln -s .

Extending Pandora’s capability even more

I’ve got a couple of extensions to Pandora working but these aren’t well thought through yet and only work on Mac. They currently run from a small web server running on the host.

Adding extensions to Pandora doesn’t require changing the core Pandora API, only adding new guides with instructions on what to do.

Running Commands on the Host Machine

With this enabled, ChatGPT can run commands outside the Docker container. This could be used to interact with hardware or window managers, or even give it remote access to web browsers.

curl http://host.docker.internal:8080/?command=say%20hello

For security, a confirmation dialog box pops up for you to confirm the command before it’s ran.

Getting Secret Keys and Passwords Securely

ChatGPT did pretty much all the design and development on this one. This extension allows ChatGPT to prompt you for a secret, avoiding you having to type it into the ChatGPT interface. This means it never leaves your machine.

GITHUB_TOKEN=$(curl 'http://host.docker.internal:8080/secret?name=github%20token') gh issue list

Displaying Images in the Chat Interface

One of the cool things that Code Interpreter does is display images in the chat. This is great for quick visualisations and something I wanted to get Pandora doing.

It turns out that the barebones ChatGPT can do this already: it just doesn’t know how and keeps making mistakes. I wrote a simple guide to displaying images and now Pandora can do it! As with all the guides, this is a guide primarily for ChatGPT, not for people.

Making Pandora available to more people

I have a few open issues on Github of things I’d like to improve, like security, but the main thing I’d like to do is make it available to more people. There’s 2 ways this could be done:

  1. Make it an official ChatGPT plugin, like Noteable
  2. Make it work standalone, with OpenAI’s API’s instead of as a plugin

Making an official plugin would require Pandora to be hosted remotely and for me to either running remote Docker containers for people to use (expensive) or have a way for users to connect their own containers to Pandora (risky). Maybe one day, but this seems more like a SaaS product in itself. The benefit of this approach is that it would allow me to run Pandora on my phone.

The other option is to have something that doesn’t require the ChatGPT interface to run. I’ve already started hacking around with LangChain a bit and might go in this direction, as it now (kind of) supports ChatGPT plugins. There’s also OpenPlugin, which might make it easier.

Feedback and Contributions

Pandora has a fair few stars on the Github project but I’m not sure if anyone else has used Pandora yet. Likely because it needs developer access to ChatGPT plugins and doesn’t have very robust security protections built in.

If you have tried it, or even if you want to but haven’t been able to, then get in touch. I’m also keen to hear how the guides can be improved or even used in other projects.

--

--

Dave Hulbert

Engineering, AI, Strategy and Compliance. Work at Passenger @passengerteam