Open AI Code Interpreter and GitHub Copilot X: Ideal partners?

Published in

Data Science at Microsoft

20 min readSep 12, 2023

The introduction of GPT (Generative Pretrained Transformer) Large Language Models since Fall 2022 has been a game changer. They mark a pivotal moment, and the technology they represent is poised to bring about extensive and profound changes across numerous facets of work and business, including data science. In this article I focus on demonstrating the value of using Open AI Code Interpreter and GitHub Copilot X together from a developmental perspective for a data science project.

I briefly introduce both Code Interpreter and Copilot and then proceed to build a loan approval project that demonstrates how to use the Code Interpreter as a front end for subsequent work with Copilot to develop the logistic regression classification model for this project. I begin with the Code Interpreter set up and then follow with the Copilot. For GitHub Copilot X, this article assumes that you have access to it already. The “X” here is for the Chat feature. If you do not have access to it yet, you may arrange to do so at the GitHub Copilot X preview site.

Please note that I have uploaded the code for both the Code Interpreter as well as the GitHub Copilot X in my GitHub account for your convenience. After reading the article or while reading, please feel free to download both and experiment. The links for both repositories are as follows:

· Code Interpreter

· GitHub Copilot X

This article makes references to these repositories throughout, and I want you to be comfortable up front that all the code is available and accessible to you.

Code Interpreter setup

Open AI ChatGPT Plus is a paid subscription service generally available for a nominal monthly fee. As a part of the service, Open AI has recently released the Code Interpreter plugin that is in beta at the time of this writing. It is, however, quite usable already. The following is a brief walkthrough of its activation process.

The image below shows the default ChatGPT Plus screen after signing in. At the bottom left corner of the screen, three dots are shown. Clicking on them displays a drop-up menu, also shown. From this menu, select the “Settings & Beta” option.

This opens the Settings pop-up window shown in the next image. Select “Beta features” from the left menu panel and then slide the Code Interpreter slider so that it shows green. Here, I have selected Custom Instructions and Plugins as well, but these are optional for you as walking through them would be the topic of another article.

Next, from the drop-down menu that appears, select the GPT-4 model. Also from this menu, select the Code Interpreter option. Also note that it adds a “+” button icon in the chat window. This allows you to upload files to the Code Interpreter plugin as shown in the second following image:

GitHub Copilot X

This location opens the Copilot X sign-up screen. Many people who signed up early, or who were Microsoft Build in-person attendees from the conference earlier this year, have access to it. If you don’t, please sign up and wait for your access by scrolling down and then clicking on the “Sign up for the technical preview” button as shown in the following image:

After you get access to the Chat feature of Copilot, please follow the instructions in this Microsoft Tech Community article to set up Copilot Chat in VS Code: How to set up Copilot Chat in VS Code step by step.

GitHub Copilot X works with several popular IDEs and editors such as Visual Studio, Visual Studio code (VS Code), IntelliJ, Neo Vim, and more.

As an example, the following image shows the extensions that I have installed in VS Code Nightly Build that I use to experiment with Copilot X:

The extensions GitHub Copilot and GitHub Copilot Chat together make up GitHub Copilot X.

Please note that there are many articles available on the Internet and many excellent videos on YouTube that will walk you through the installation and setup steps. Please use what works best for you.

Why use logistic regression for this demo?

Logistic regression serves as a foundational Machine Learning tool designed to handle classification challenges. In real-world scenarios, decisions often involve classifying based on distinct variables known as independent variables. These variables, with their specific values (parameters), influence the outcome or decision, termed the dependent variable. Consider a situation in which a bank evaluates loan applications for approval, which is the central scenario for the project outlined this article and illustrates how a Machine Learning model can predict loan approval outcomes.

To begin, the logistic regression algorithm requires training through illustrative cases. In our context, the approval or rejection of a loan is the dependent variable. Independent variables encompass gender, marital status, dependents, education, income, credit history, and requested loan amount. The Excel workbook accompanying this article provides exact details. By exposing the algorithm to past loan decisions, we educate it on making predictions. Most of this historical data, typically around 80 percent, becomes training material for the algorithm to construct its model. About 20 percent is kept aside for validating the prediction accuracy, known as test data. It’s worth noting that this split could be different, such as 70/30. This fundamental training process underpins Machine Learning, and logistic regression applies to any situation requiring binary choices, such as college admissions and hiring determinations.

From an academic standpoint, logistic regression serves as the foundation for a group of algorithms known as neural networks. These neural networks provide the groundwork for a concept called Deep Learning, which is at the core of algorithms like GPT. However, delving into these academic distinctions is not our focus here — let’s keep it straightforward. Instead, we’ll stick with exploring logistic regression analysis, which offers practical insights. Our journey continues with a simple analysis using an Excel workbook that contains a fictional yet representative historical dataset of loans and their outcomes: approved or rejected.

Excel workbook to be used for this demo

After the setup is complete, we proceed with the demonstration of building a logistic regression model first using the Code Interpreter and then following up with a GitHub Copilot X implementation. We use the Excel workbook published by Saad A in his Udemy course here:

10 Projects with ChatGPT Code Interpreter (Excel Python SQL) | Udemy

Please note that this is under Section 13, Project 10 for your reference. Saad does a great job of walking the reader through it step by step in the video. Other projects are quite interesting as well. You can download the workbook from the GitHub repository from this location.

The following image shows the top of the worksheet with a few loan records:

The columns shown in the image above are as follows:

Loan_ID is an integer value that is unique for each row of the worksheet. Each row identifies one instance of a loan application.
Gender is either male (identified by a value 1) or female (identified by the value 2). In Machine Learning algorithms such as logistic regression, character data must be encoded as numbers, hence this encoding. The choice of values 1 and 2 is arbitrary. It could easily be 0 and 1, for example. The rest of the character fields follow the same convention except the last column, Loan_Status, which has purposely been left un-encoded. You will see that the Code Interpreter handles it automatically whereas we do it manually in the data cleaning code (clean_data.py) in the function encode_categorical_variables.
Married is encoded as 0 or 1. The numeral 1 is a proxy for Married. Typically, but not always, value 1 is used for a “yes” answer and 0 for “no”. But this is not a hard requirement, just a convention.
Dependents is an integer number and indicates the total number of dependents.
Graduate is a binary encoded variable; 0 equals false and 1 is true.
Self_Employed is a binary encoded variable, where 0 equals false and 1 is true.
ApplicantIncome is a numeric integer field. Please note that it is not necessary to have all fields as integers; they could be decimals as well, depending on the context.
CoApplicantIncome has the same properties as ApplicantIncome.
LoanAmount is an integer representing a number in thousands.
Loan_Amount_Term is a character field encoded as integers. This example clearly shows that you can use any numeric value that you want to represent a character encoding.
Credit_History is a binary encoded variable, where 0 equals false and 1 is true.
Property_Area is a character field encoded as integers to represent the applicant’s location.
Loan_Status indicates the decision. Here a “Y” represents a loan approval and an “N” represents a loan rejection.

As mentioned earlier, Machine Learning models require the independent variables that are of character type to be encoded as numeric variables. Hence the encoded variables such as Gender, Married, Dependents, Graduate, Self_Employed, Loan_Amount_Term, and Credit_History in our data.

So, in this way the worksheet contains several rows, each one representing a loan application and its outcome. The outcome, Loan_Status, is the dependent variable, and the remaining variables are independent variables. Together, the values of the independent variables are used to determine the outcome or the decision. This data will be used to train the model, which will then be able to make predictions as to whether the loan application will be approved for a future loan application.

With this done we are ready to perform logistic regression analysis using the Code Interpreter plugin.

Logistic regression analysis with Code Interpreter plugin

As described above in setting up ChatGPT Plus with Code Interpreter, you should be at the prompt window with the “File Upload” functionality enabled as indicated by the “+” (plus) sign. Clicking on the plus sign opens a file upload dialog box. I have named my Excel workbook loan_data.xlsx, so selecting it uploads it to the Code Interpreter. We now are ready to issue our first prompt to ChatGPT. As the first step in any data science project, data cleaning is done. The following image shows the uploaded file along with the first prompt to that effect:

Code Interpreter dutifully goes to work, listing and displaying the steps it takes along the way. In each step, it displays a status message, e.g., “working”, along with a drop-down menu that is titled “Show work” as in the following image:

The following image shows the code generated by Code Interpreter:

For now, generated code is limited to the Python language. Code Interpreter has a few hundred Python libraries curated by Open AI that are pre-loaded in the sandbox for its data analysis work. For instance, the image above shows that it has loaded the Pandas library to read in the Excel worksheet in a data frame. Similarly, it loads and uses other libraries as and when needed for code generation. As of now, the limit for uploaded data files is 512 MB and the available RAM memory is 2 GB.

Spoiler alert! This is one of the reasons why GitHub Copilot X with your IDE becomes important, because it is limited only by your laptop or desktop or the Azure Cloud resources, e.g., the virtual machine (VM) that you are using.

After the data is cleaned, we can proceed to exploratory data analysis (EDA). This usually involves running queries and generating visualizations to get a holistic feel of the data set. The following image shows the prompt for generating some queries and the Code Interpreter response:

At this point, something very interesting has happened in the current session. Code Interpreter started “hallucinating,” meaning here that it forgot the actual data set that I had loaded, causing it to start apologizing and retrying. Finally, it gave up and admitted that it had lost the data. There are two reasons for that. One, this is still a beta plugin and there is clearly a stability issue. The second reason has to do with the inherent non-deterministic nature of GPT models, which occasionally go off on a tangent (and these are referred to as hallucinations). It is our job as prompt engineers to bring them back on course. The image showing the hallucinations is shown below:

Like any good prompt engineer, I uploaded the file again and repeated my request to perform the data analysis as shown in the image below. Please note that I didn’t have to start at the beginning all over again — Code Interpreter had retained my previous prompts. But also note that prompt retention is “session-scoped,” meaning that if you terminate the session, you must repeat the upload process and the prompts all over again.

It is important to note that you should always verify the answers you receive. Occasionally, the Code Interpreter misunderstands and misinterprets your encoded values. For example, in our encoding for the Gender variable, 1 represents Male and 2 represents Female. Sometimes ChatGPT interprets it the other way, as it did for me when I went to GitHub Copilot X, which also uses the Open AI GPT algorithm. More on this is coming up below in the GitHub Copilot X section.

The next step is to extend our understanding of the data by requesting some visualizations from the Code Interpreter. The following image shows the prompt and the results:

Often, the Code Interpreter bunches the charts together, making them hard to visualize. To fix this, see how I then request it to display them separately as shown in the following image:

As you can see, Code Interpreter generated five visualizations in separate figures as requested. In the next step, we request Code Interpreter to perform the regression analysis. Please note that in the context of data analytics, for binary classification problems such as loan approval, logistic regression is the most common type of Machine Learning algorithm. Code Interpreter has that knowledge, as can be seen in the following image:

This is where Code Interpreter begins to shine! It not only created the logistic regression model but also provided a professional interpretation of the feature importance, as can be seen in the following image:

If we’re feeling a bit adventurous, we can also ask for the logistic regression mathematical model, as shown in the image below:

This is impressive. It not only gave us the general math equation for logistic regression, but also the specific equation for this specific loan analysis problem.

Next, we can ask it to give us the Python code as a bundle. Note that this is not a necessity. People who are not comfortable with Python need not go through it. But those who are and want to use it as a basis to learn and validate are advised to do so. It took several prompts for the Code Interpreter to give me the entire code bundled as one file. I am an experienced Python programmer and I also know data analysis and data science quite well, so I could assemble it using the code snippets that it generated. At this point, it became clear to me that if I wanted to automate the entire process reliably, I would have to “paste” the code generated by Code Interpreter in VS Code (my IDE) and use GitHub Copilot X to make it robust and maintainable.

Here are some images that show my prompts to the Code Interpreter to generate code for me. My intention was to coax it to give me a comprehensive Python file that I could then refactor in VS Code using GitHub Copilot X, but the best that I could do in a reasonable amount of time is to get it to give me bits and pieces of code that I could cut and paste and bring into the VS Code editor.

I hope from these images that you get the idea. We are clearly at the point of diminishing returns with Code Interpreter, so it is time to bring in GitHub Copilot X.

Moving on to GitHub Copilot X

The file “code_interpreter_generated_program.py” can be downloaded from the GitHub repository that I have assembled using the code snippets generated by Code Interpreter. In my limited experience with GitHub Copilot X, I have found that a basic understanding of the language being used, e.g., Python for our case, is necessary. Without it, you will not be able to benefit much from the Copilot. Therefore, from here on, some Python knowledge is assumed.

The first thing to realize is that moving from the Python code generated by Code Interpreter to the Copilot-powered IDE means that we should be refactoring our large and unmaintainable single Python file into several smaller Python files that are logically organized according to the single responsibility principle. Breaking up a big monolithic program into several smaller and well-organized Python files requires understanding of Python modules. If you are not comfortable with terms such as single responsibility principle and Python modules, then feel free to type in the GitHub Copilot Chat window and request an explanation. I think you will be impressed by how well it explains them. The same goes for ChatGPT.

I tried creating a VS Code project and then pasting the Python file generated by the Code Interpreter into it and then prompting Copilot X to refactor it into separate but connected files that are organized by their respective functionalities, e.g., clean_data.py file for clean_data function, and so on. But it didn’t work. I had to manually organize the files and then prompt Copilot to connect them.

However, after the manual reorganization step, Copilot was quite effective in following my prompts and fine-tuning the code, leaving me to do some tweaking. One thing to keep in mind while refactoring is that you should keep the files that participate in the refactoring “open” in one of the VS Code tabs. The reason is that Copilot derives its context for the code from open files in the editor automatically. If you don’t have them open in the editor, Copilot doesn’t have it in its contextual memory. I also saved the loan_data.xlsx file in CSV format so that I could view it in the editor. It is just a convenience, not a necessity.

Refactoring is an interactive process in which the developer extracts pieces of program code and organizes them into smaller and related functions (in smaller files). These files define the Python modules that can then be imported into the modules requiring the functionality of the contained functions. One thing to keep in mind while doing this refactoring is to keep all the Python files that need to be refactored within the editor space by keeping them in the open tabs. These open files automatically establish the context for Copilot and then you can chat in the natural language using the Copilot Chat feature.

Last but not least, Copilot X is well suited for documenting the project code as evidenced by the documentation that it generated at my prompt, as the following image shows:

If you download the refactored code, you will see that its documentation above is quite accurate and impressive. This shows the power of the “X” factor in Copilot, i.e., Copilot Chat.

On a side note, it is a good idea to create a virtual environment to isolate the required Python modules so as not to pollute the global Python space. The modules can then be frozen into the requirements.txt file to keep a centralized list of the required libraries. Note that you can ask how to create the virtual environment, generate a requirements.txt file, and install all the dependencies to GitHub Copilot X itself in the Chat window. It responds with clear and concise instructions. The amount of time I spend searching for the information that I know I need, but cannot remember the exact syntax for, has gone down significantly. Give it a try, and I think you will be pleasantly surprised.

Now that we have gone through a case study, it is time to summarize our learnings and do a comparative analysis as to when each tool is appropriate. They clearly have complementary and sometimes overlapping strengths. The following section elaborates on this further.

Code Interpreter and GitHub Copilot X: When to use what

This section examines the strengths of each tool in several dimensions such as audience experience, ease of use, practical limits, maintainability, reliability, repeatability, and more.

Audience

I believe that Code Interpreter will appeal to a much broader audience than GitHub Copilot X. It is like having a junior data scientist alongside. It uses the same simple and natural language interface that we are used to with ChatGPT. The only difference is the addition of the File Upload button. Most users should not have any problems with it because everyone who is an average computer user can upload and download files. Even if the user does not have much experience as a data analyst or data scientist, Code Interpreter can respond in simple answers that are easily understood by everyone. So, ease of use is its strength. I find it equally useful for me even though I am an experienced programmer and data scientist and have done plenty of programming and data science–related work using VS Code and Vim (yes, Vim). The appeal of Code Interpreter for me is that it allows me to quickly produce ad hoc models just for validation purposes, or to test out my ideas. With VS Code, it was simply too much trouble to experiment on a whim.

GitHub Copilot X is clearly a professional tool with a significantly higher entry barrier. But for an experienced developer, it is a big help. Up until the introduction of the “Chat” functionality, it seemed like a fancy code completion tool to me, so I didn’t pay much attention to it because I don’t program every day any longer (you can only write a For loop so many times in so many different languages and can never remember the syntax). The chat feature, however, was the inflection point for me. Suddenly, I was able to communicate my design thoughts with my program in English! Moreover, it responded quite intelligently (most of the time). I also didn’t have to leave my IDE (VS Code) to look up hard-to-remember syntax details. I believe a bright developer starting out can also leverage it to significantly shorten the learning curve.

Ease of use

As I mention above, Code Interpreter is quite easy to use for a wide range of users. GitHub Copilot X demands a certain level of professional experience and knowledge of programming languages. But for anything beyond a simple model, the ease of use is outweighed by the flexibility and sophistication of Copilot over Code Interpreter. As a result, I do not anticipate Code Interpreter being used for such scenarios except for providing a fast start.

Practical limits

Code Interpreter creates a sandbox containing the Python language interpreter and approximately 350 to 400 curated Python libraries that are then used to “write” the code for user prompts. Obviously, there are limits to how powerful a sandbox environment can be that is provided to a user for a nominal monthly fee. Currently I believe that the maximum file size that can be uploaded is 512 MB, and the sandbox has a 2 GB RAM limit. In contrast, GitHub Copilot X is not constrained by these limitations, rather the limitations are defined by the machine that is being used, as either hardware or a VM.

Maintainability

The single file program that Code Interpreter creates can be hard to maintain over time. As my example shows, the refactored code in VS Code using Copilot is much easier to maintain and extend.

Reliability and repeatability

In data science work, reliability and repeatability play a big role. Once the program has been developed using Copilot and debugged, it is free of hallucinations. If you initialize it with the same seed to generate the same sequence of random numbers, it can produce repeatable results. It is difficult to achieve this in Code Interpreter, at least as of now, though it uses the same random number initialization seed, the number 42 that is traditionally used by all such algorithms. I asked the code generated by GitHub Copilot X to use the same seed (i.e., 42) and both implementations produced the same accuracy results (88 percent). I recommend trying it out by downloading the code from GitHub.

Concluding remarks

Both tools are quite useful because they share the same ancestor, i.e., the Large Language Model GPT. They have simply been designed (or should we say specialized?) for different purposes. It is, therefore, wise to use each of them where they make the most sense. If so used, they nicely complement each other. If you were to ask me when I would use each tool given what I know at present, my answer would be that I use Code Interpreter when I am in the ideation phase and GitHub Copilot X for the implementation phase.

In the ideation phase, I don’t quite know what I am doing but I do know why I am doing it, but it is typically related to some sort of algorithmic thinking or experimentation (such as data science). Once I am out of the ideation phase and I am thinking about implementation, I move on to GitHub Copilot X. I fully expect that this may change as both the tools continue to evolve and my experience with them grows.

Hopefully, this exercise has shed some light on the inherent power of generative models. The real breakthrough recently has been the development of the ability to converse with them using English natural language, i.e., prompts. Because these algorithms are non-deterministic, meaning that they can produce different answers to the same questions at different times, they can seem to be eerily humanlike. We humans are also unpredictable at times. When we are in a bad mood, we don’t respond intelligently. Similarly, when these models are in a strange dialog with the user they are said to “hallucinate.” This is where a good set of prompts (soothing words?) can bring them back on track. If we follow this analogy, then prompt engineers are more like psychologists.

Copilots are immensely powerful and can be customized to support a wide range of activities, and just not coding as in GitHub Copilot X. At the time of this writing, Microsoft has approximately 16 different copilots near general availability (GA). An example is the Security Copilot that advises a security engineer or architect. Yet another example is the Microsoft 365 Copilot that helps the Microsoft 365 Office user with writing documents in Word, creating presentations using PowerPoint, performing analytics using Excel, responding to emails, and more. Copilots are anticipated by many to permeate product lines from Microsoft. And this is not limited to Microsoft. Microsoft is working with a wide range of clients to create custom Copilots for their specific needs.

Plugins like Code Interpreter make it even easier and more affordable for members of the public to leverage the power of these algorithms, which in the past have required a sophisticated understanding of data science. There are many more of these plugins in the works at various places.

I conclude by saying that the negative way of looking at these recent advances is to fear them because they are so powerful and may prove disruptive. A more positive way of looking at them is to leverage their strengths and minimize their destructive potential with proper checks and balances. Like everything else in life, it is a balancing act. In fact, I fed certain paragraphs of this article to ChatGPT and requested that it rewrite them for an informed business user, and it responded with well-articulated paragraphs. I tweaked them a bit and accepted them. Here is an example image:

I hope you are convinced that these tools are quite useful. Enjoy!

Bharat Ruparel is on LinkedIn.