Everything you need to know about Github Copilot

Published in

Analytics Vidhya

6 min readJul 26, 2021

I was fortunate enough to be given early access to GitHub’s new “AI pair programmer,” Copilot, which generates quite a stir. My early ideas and experiences with this tool are shared in this blog post. In a nutshell, its capabilities astound me. It’s made me shout “wow” a couple of times in the last few hours, which isn’t something you’d expect from your developer tools!

However, there are some real-world limits to this tool right now, which I’ll go through in this article. In summary: Copilot appears out of nowhere, interrupting my flow. For a “pair,” autocomplete seems like the incorrect interaction paradigm. Checking Copilot’s work adds to your mental workload.

This tool, in my opinion, is not yet going to revolutionize programming. Still, despite the foregoing, I am confident that it will have a huge and game-changing impact in the future.

What is Copilot?

Copilot is the newest innovation from OpenAI, a San Francisco-based AI firm that recently secured a $1 billion investment from Microsoft. GPT3 (Generative Pre-trained Transformer), OpenAI’s third-generation language model based on a huge neural network with 175 billion parameters that has been trained on a large corpus of text, has lately made headlines.

I got access the the OpenAI GPT-3 API and I have to say I’m blown away. It’s far more coherent than any AI language system I’ve ever tried.

GPT3 is capable of producing remarkably realistic text in response to basic stimuli. Others have created various interesting products using their API, including a question-based search engine, chatbots that allow you to speak with historical figures, and even the production of guitar tabs.

Copilot is built on Codex, a novel model based on GPT3 that has been trained on massive amounts of open source code from GitHub. It’s directly connected with VSCode to create suggestions based on a combination of the current context (i.e., your code) and the “knowledge” it’s gained during the training process.

As an aside, you can see how Microsoft’s strategic investments (backing OpenAI, acquiring GitHub, and developing VSCode) combine to create their future goods.

How far can Copilot take you?

For the time being, the answer is “not very far.” Despite all of the buzzwords like “intelligence,” “contextual,” and “synthesizer,” Copilot has only a limited understanding of your genuine aims and what your code needs to accomplish.

When computing suggestions, Copilot looks at your current file. It won’t look at how your code is used throughout your program. Even if the underlying thinking behind the files remains the same, the AI’s view of your work may differ dramatically from yours and may vary file-by-file.

Copilot’s output is not guaranteed to represent the “optimal” method or even code that works, according to GitHub. You may encounter security vulnerabilities, lines that employ obsolete or deprecated language features, or code that does not run or makes sense. To ensure that your project still compiles and runs, you should audit each Copilot suggestion you employ.

Copilot’s true purpose in the development process should now be clearer: it’s an assisting technology, not a true automaton, designed to make the mundane a bit simpler. Consider it a navigator or a sidekick rather than an omniscient developer who develops your code for you.

Copilot isn’t scalable.

When you allow Copilot to assist you in writing functions that answer frequent use cases, it’s at its finest. What it can’t do is comprehend the context of your codebase as a whole. Copilot’s scale is restricted without the capacity .to comprehend your goals truly.

According to GitHub, the company is aiming to make Copilot smarter and more useful. However, it’s unclear how its job could be increased till it can look at your complete project, not just a single file. Copilot is essentially a glorified autocomplete in its current version. You can accept suggestions for the functions themselves instead of clicking the tab to auto-fill standard library function names.

On programming sites like Stack Overflow, solutions to abstract technical difficulties already abound. Copilot saves you time by automating the process of finding a question, reviewing the answers, and copying and pasting the code. After you’ve verified that Copilot’s recommendation works, you’ll need to figure out how to integrate the solution into your overall system.

Copilot isn’t actually programming. It examines what you’ve written, infers what you’re trying to accomplish, and then tries to piece something together from its previously taught answers. Copilot serves you rather than the other way around. It is unable to conceive creatively, propose a high-level architecture, or produce a unified system. Each recommendation is self-contained and derived only from the code in the source file that surrounds it.

Copilot is, according to GitHub, completely reliant on you. When your codebase is logically organized into discrete functions with explicit typings, comments, and doc blocks, the tool works well. You’ll need to guide Copilot by writing high-quality code if you want the greatest results.

What’s the Deal with Licensing?

Copilot has been trained on a range of public GitHub projects with various licenses. This, according to GitHub, constitutes “fair use” of such projects. What’s less apparent is what you’ll be responsible for if you accept a Copilot recommendation.

Copilot’s output “belongs to you” and “you are responsible for it,” according to GitHub. It expressly indicates that you do not need to acknowledge Copilot or any other source if you utilize a suggested snippet. Copilot is being marketed as a “code synthesizer” that generates creative output rather than a search engine for indexed snippets.

This is when the issue starts. Copilot still has a chance of accurately reproducing code portions. This could land your project in hot trouble, depending on the permissions around those bits. Because Copilot was trained on GitHub projects, personal data may be introduced into your source files.

These are supposed to be unusual occurrences. If the surrounding code context is poor or imprecise, they are thought to be more likely. GPL-licensed Quake code emitted as-is (including with profane language) and a real individual’s website text and social links appearing when Copilot thinks you’re writing an “about me” page are two examples thus far.

Because derivative works under the GPL and other comparable licenses must incorporate the same permissions, putting GPL code into a commercial product is a licensing violation. As a result, using Copilot has major legal implications that you should consider before installing it. Accepting a suggestion could result in unintentional copyright infringement because Copilot appears to emit code verbatim without disclosing the license that comes with the sample.

This should prove beyond a shadow of a doubt that Copilot’s initial release will not be able to take the position of a human developer. Its code isn’t guaranteed to be relevant, and it could be defective or old, posing a legal danger.

Conclusion

Copilot is a large-scale project that has sparked a lot of debate. Many people have strong opinions about the proposal, as seen by the amount of debate. It’s been a while since a new developer tool generated so much excitement on the first day.

Copilot is appealing because it taps into several developer annoyances. Most programmers, if not all, recognize the inefficiency of writing “boilerplate” code that isn’t really relevant to their project. If you take Copilot at face value, you’ll see that they now have a solution that allows them to focus more on the creative aspects of their business.

The problem with Copilot is GitHub’s blanket approach to training the model. Copilot’s real-world use will be hampered by the inclusion of GPL-licensed code and the complete lack of any output testing. It’s unclear whether GitHub’s decision to train the model on public code qualifies as fair use; it’s possible it doesn’t, at least in some places.

Furthermore, because GitHub cannot ensure that Copilot code actually works, developers will need to proceed with caution and evaluate everything it creates. Copilot’s promise includes assisting inexperienced developers in progressing, but this will not be possible if possibly flawed code is suggested and accepted.

Finally, Copilot does not explain how or why its recommendations operate. If technology is to really replace human developers, it must explain how a solution works and provide transparency into the decisions. Developers can’t just trust the machine; they’ll need to keep an eye on it and compare alternative options.

Copilot’s position as a mentoring tool is hampered by the how and why, which is also the main problem facing a developer early in their career. Anyone can copy source code from public projects, documentation, their peers, or Copilot, but knowing why solutions work is what will propel you forward in your profession.

Copilot doesn’t address this in its present version, so you’ll have to figure out what the inserted code performs. Even a developer who uses Stack Overflow daily will improve because they will read replies and learn the reasoning behind solutions. Copilot is a black box that could be construed as a repository of flawless code ideas; however, the evidence thus far indicates that this is not the case.