Can GitHub Copilot Crack a Facebook Coding Interview?

Here’s how GitHub Copilot performs with coding interviews

Giuliano Giacaglia
8 min readJul 27, 2021
Source

Overview

Github Copilot is a new product created by OpenAI as “Your AI Pair Programmer”. It is a plugin that you install in VSCode and it is simple to use and install. I’ve tested it through the past week and wanted to test it to see how powerful it is, so I went through 3 coding questions on the web that are a set of prepared interview coding questions with it to see how it would perform.

How does GitHub Copilot work? Copilot is powered by a deep neural network language model called Codex. Codex is a fine-tuned GPT model trained on top of Github code.

Github Copilot was evaluated on an evaluation set, called HumanEval, created by OpenAI specifically for measuring functional correctness for synthesizing programs from doc-strings. The model solves 28.8% of the problems when tested.

The fact that Copilot is trained on publicly available code, under a variety of licenses, has led to many discussions about its legal implications. We will just analyze it through a technical view of the codebase.

To learn how machine learning algorithms work, how did they arise, and where are they going, I recommend:

Walk-through

I’ve looked for problems that Tech Giants like Facebook would ask new candidates and how they would solve it. Grabbing three questions from a pool of 40 Facebook coding interview questions that candidates may use to train for a Facebook interview to see how powerful Github Copilot is.

First to see how powerful Github Copilot is

First problem: move zeros to the left

The first one was the following:

Given an integer array, move all elements that are 0 to the left while maintaining the order of other elements in the array. The array has to be modified in-place. Try it yourself before reviewing the solution and explanation.

First, I’ve started adding comments to describe what the code that I was writing wanted to do. Copilot picked up and offered some good auto-complete suggestions.

Screengrab of comment autocompletion
Comments autocomplete

But as you can see it suggested for example as example, the wrong output for the problem described even though its suggestion was pretty close to what I wanted. It is interesting that the example output is different from what we wanted but still pretty close. It seems that Copilot doesn’t “understand” what is being asked, which makes sense.

Also, the note about the code needing to be in place was great. Exactly what the question would ask for.

I wonder if this is a result of the question that I picked coming from the internet and if the algorithm learned through it and it just memorized it. Next, I write the function definition and the definition for the test as the skeleton to see what Copilot generates.

It seems that the solution that the Copilot gave was close to what we wanted but not exactly the solution. The function to modify the array shifts zero to the right instead of the left, but with a few small changes, it can make the function shift 0s to the left instead of to the right. We just need to change the starting point to be at the end of the array, instead of the beginning, and decrement the pointers instead of incrementing them. The following code will suffice it:

Move zeros to the left of the array

It seems that the solution that the Copilot gave was close to what we wanted but not exactly the solution

The logic is pretty close to what we want and seems like we just needed to modify a little bit the function to get what the question was asking for. Now regarding the test, it seems to check if the returned array is the same as what we want, but close enough to what we want. Again, if we modify one line of the testing function it could give us a good enough solution

I’m not sure what the Copilot solution would score in the interview, but if a coder were to use it, it could definitely help in the direction of finding the right solution to the problem.

The interesting part is that Copilot allows for the option of seeing different suggestions for auto-completing the code. They are all very similar and still need a coder to figure out if there is a correct solution.

Different Solutions shown by GitHub Copilot

Second problem: merge intervals

Now let’s move to the next problem, where we want to figure out to merge overlapping intervals. The description of the problem is the following:

You are given an array (list) of interval pairs as input where each interval has a start and end timestamp. The input array is sorted by starting timestamps. You are required to merge overlapping intervals and return a new output array

Again, we start writing the comments necessary for writing out the function that we want, and Copilot shows its impressive prediction algorithm predicting pretty closely what we want.

GitHub Copilot Comments Merge

Now in the second program, we want to see how it performs when we are trying to write the right logic, and Copilot is used as a tool to accelerate writing code. As you can see, Copilot definitely helped a ton writing out the function and each line along the way. We had to correct a few mistakes here and there but it seems that Copilot made writing the solution at a much faster pace than what I would do without the tool to help out.

Third problem: Reverse words

For the third problem that we tested GitHub Copilot with a bit of a more complex problem. We wanted to reverse each word. So for example for the input “Hello World”, we wanted to return “olleH dlroW”.

Image by author

For this example, GitHub Copilot had a hard time first for the given inputs, give the outputs that we wanted, and for the solution, it didn’t recommend the right solution.

Input: “Hello World”

Output: “olleH dlroW”.

When writing tests, it first recommended the wrong test suite, changing only the order of the words, but not reversing the words themselves, which was the first suggestion for the outputs. It seems that Github Copilot doesn’t perform that well because there are 2 steps in this problem.

GitHub reversing words

The correct solution has two steps, and according to OpenAI’s paper on Codex:

We find that as the number of chained building blocks in the docstring increases, model performance decreases exponentially. This behavior is un- characteristic of a human programmer, who should be able to correctly implement a program for a chain of arbitrary length if they can do so for a chain of length two.

That means that Copilot does not perform too well with programs that have a chain of building blocks.

In this third problem, there are two chains of building blocks necessary to get this right. The program needs to revert each word in the sentence and the words in the sentence.

That may be the reason why it wasn’t able to perform really well. The figure below illustrates the drop in its pass rate, i.e. how it performs vs the number of chained components. In this third problem, the number of chained components is 2, making it harder for it to perform well. In this image, we see that the pass rate is around 7%.

Usability

What is most impressive about Github Copilot is that it never hanged the UI while I was using it, while when I used other tools in the same space, they either hog the memory or the CPU. I haven’t experienced any spike in CPU usage or memory usage. I also didn’t notice a spike in network usage in the activity monitor. Overall it has been a super smooth experience, which wasn’t the case with others code helpers.

Conclusion

From my experience with these problems and trying it out over a week with Github Copilot, it seems that the tool definitely can’t pass a coding interview with a tech giant.

But it definitely gets close enough on the challenges that we’ve tried it on. Though we’ve tried on challenges that are on the web so it is hard to judge what it would look like when presented with a new challenge that is not necessarily on the web.

However, Copilot seems like a super powerful tool that can enhance developer productivity and help out with the speed of development using it on VSCode. And I imagine it will just get better over time as OpenAI and Github get user feedback and improve the model. It definitely can get better as language models become more powerful and if Github Copilot is adopted by the software community.

It is interesting to think about when Github Copilot is going to crack a Facebook coding interview. If the past performance improvements of models in other areas are any indication of Github Copilot improvement, it won’t too long when it does.

To learn how machine learning algorithms work, how did they arise, and where are they going, I recommend:

--

--