Onboarding your new employee: Copilot and the importance of Documentation

Published in

Technology @ Prospa

13 min readMay 31, 2024

A robot with a puzzled look on its face with sticky notes falling from above with question marks on them. There’s a screenshot of code behind the robot.

Disclaimer: This article offers a speculative glimpse into the future of software engineering, based on extrapolating on current and announced capabilities of tools in the GitHub Copilot ecosystem.
Since this is all highly speculative, the following blog post might not stand the test of time.

The new employees: Human and GitHub Copilot

Scenario 1 (Onboarding a human):

You’re a software engineer. You’re deep in code. You have a new employee thats recently joined your team. Today they’ve picked up their first Jira ticket. They’re interrupting you constantly with questions about the business and codebases. You field their questions, only for them to come back again with more questions; “How does this part of the system work?”, “How do I run this locally?”, “How do I deploy this?”, “Why did we do this this way and not that way?”.

If they don’t ask questions, then this could go one of two ways:

Constant iterations: Their initial contribution is quick, but wrong. It requires many rounds of feedback and changes. You continue going back and forth until their contribution achieves the given task and goal, whilst aligning with the company’s best practices. It ends up taking up a lot of time — for the both of you.
Deep-dive & memory overload: They fall into a deep rabbit hole of software archaeology. They piece together countless fragments of code and interweaving dependencies.
The human brain has something called ‘working memory’. Put simply, this is the amount of information the human brain can retain and process simultaneously. When a software engineer has to read and process all of this information and context at once, their working memory is at capacity. Overworking the working memory leads to mental fatigue and a tendency towards risky and low-effort task strategies. In other words, all that cognitive demand increases the chance that they take shortcuts, and ultimately might end up submitting buggy code. Best case, they’ve spent a long time completing the task.

A software engineer sitting at his desk with screens of code, in an office environment. Another software engineer is standing in front of him with a pose that indicates he’s asking a question.

Scenario 2 (GitHub Copilot):

Imagine you’re GitHub Copilot. You’ve just received two potentially unrelated and incomplete files, a chunk of code, followed by a text cursor, and then another chunk of code. You’re now asked to fill in the gap where the text cursor is.

You provide some code that appears to fill the gap quite nicely. You’re pleased with it.

The software engineer deletes it. “Try again.”

But why?

So, it turns out that the code that Copilot suggested is completely unusable in the context of the whole codebase. But how could Copilot have known?

There are two theoretical ways this could be remedied:

Constant iterations: The software engineer could keep crafting the prompt, trying trial and error, over and over until Copilot suggests the perfect code. Sounds inefficient, right?
Deep-dive and memory overload: You could explore more expensive options like fine-tuned models that are trained on your entire codebase, or perhaps switch to an expensive model that can process large quantities of code at a time. The latter is otherwise known as increasing the ‘context window’. Think of ‘context window’ as the amount of code (and written instructions) that Copilot can process at a time. This is similar to the new employee’s ‘working memory’ being stretched to their limits.
Both of the options previously listed are either expensive, slow, or a combination of the two. At present, with such a large context window, today’s Large Language Models (LLM’s) would likely result in hallucinations or buggy code anyway. Their needle-in-a-haystack performance and logical-reasoning capabilities just aren’t sufficient enough for these large quantities of code.

A white robot sitting at a desk surrounded by countless pieces of paper and code snippets surrounding him and littering his desk with a laptop on it.

Common ground: a lack of context

The new employee and Copilot both lack sufficient context. Most importantly, refined, optimised, and accessible context.

To complete the task accurately, you need context.

The new employee probably wasn’t provided with sufficient onboarding and readily available documentation to answer any questions.

Copilot didn’t know about the company’s coding standards, architecture, or available APIs and libraries. This led to code that was objectively reasonable, but subjectively wrong in the context of the wider system it was working with.

Reflecting back on the two scenarios again, you will notice a key trend; without providing sufficient context, they’ll both either…

Need constant human intervention and guidance, or…
Experience cognitive/memory overload, and underperform.

Copilot is your new employee.

GitHub Copilot mascot sitting at a desk in an office environment with a laptop and a stack of documentation on the desk.

What if I told you that GitHub Copilot is like a new employee?

Think about it for a moment. When you prompt Copilot, it knows absolutely nothing about your company, codebase, or architecture. Nada. … at least, other than a couple files, and the code before and after the text cursor.

Every time you interact with Copilot, it’s like it’s starting it’s first day on the job, all over again, prompt after prompt.

Copilot is like a new employee, with the memory of a goldfish.

Where Copilot lacks in memory, it makes up for in reading speed. So let’s cater for Copilot’s limited ‘working memory’. Let’s help make sure that it can hit the ground running on it’s first day on the job, every time.

How do we give Copilot and our new employees the context they need?
Documentation.

Documentation

We don’t like to admit it, but we need it.

We could save ourselves a lot of time if we just read the documentation.

We could also save our colleagues a lot of time if we took a moment to write documentation, especially for our new colleagues, Copilot included.

If important information is only in your head, it’s inaccessible to others (both Copilot and Human). So, write it down.

“Write it down, and share the link.”

Scott Hanselman wrote an insightful blog post titled “Do they deserve the gift of your keystrokes?”, inspired by Jon Udell’s “Too busy to blog? Count your keystrokes”.

In Hanselman’s blog, he discusses the idea that we all have a finite number of keystrokes in our lifetime. Every keystroke spent to answer one person’s question is a wasted keystroke.

Slack and emails are where your keystrokes go to die.

His main argument is that knowledge shared through emails or Slack often gets lost, benefiting only a few individuals.

Instead, he suggests generalising and publishing your knowledge in formats and platforms that can reach and benefit a wider audience.

In today’s world, that wider audience includes Copilot.

Forms of Documentation

Let’s dive into some of the various forms of documentation that our new employees and Copilot could both benefit from.

A software programmers office desk with numerous sticky notes stuck to a monitor, desk and wall. There’s a small robot sitting on the top of the desk. There’s a software IDE open on the screen.

Code Comments

Copilot can read code, but not your mind.

Writing code comments is usually considered good practice, where it makes sense. Typically, the code comments that matter most are the ones documenting the ‘why’ behind the code;

Why did you write the code this way?
What problem have you been trying to solve?
What constraints were you working under?
What assumptions did you make?
Which trade-offs did you consider?

This is all information that human readers and Copilot will appreciate.

Markdown Documentation

The README file is the front door to your codebase.

This document explains to Copilot and your colleagues at a high level what the codebase is about, how to run it, how to test it, and how to contribute to it.

It also acts as an index file. It redirects the reader and Copilot to more detailed information on various topics related to your codebase.

When you ask GitHub Copilot Chat questions using the @workspace command, it relies heavily on searching your codebase and documentation for related keywords, just as people do. When writing documentation, put on your SEO hat and think about how searchable you’re documentation is;

“Are the most relevant and common keywords being included in the document?”
“Will Copilot and my employees be able to find this documentation via search?”
“Am I placing them in the most logical locations in the codebase so that Copilot and my employees are able to find this documentation by filename and folder structure?”

Architectural decisions

“Those who cannot remember the past are condemned to repeat it.” — George Santayana

Your colleagues and Copilot both need to understand the history and context behind why you made particular decisions. Documenting these decisions—as well as the pros and cons, constraints and context behind the final decision—helps capture that decision in that moment of time.

When a new employee joins the company, they bring their own experiences and opinions. The decisions that you’ve made might make sense to you, but might not make sense to them—at least not in their opinion and based on their own lived experience. However, they don’t have the full context and history behind why the system was built the way it was.

But you can change that! How can we bring them along on this journey and provide them with the history? Architectural Decision Records.

When you or your team make a technical or architectural decision, establish a new document. Capture a snapshot of the pros and cons, the constraints and the context that must be considered in your decision.

When you’ve made the decision, document what you decided and why.

When a new employee starts asking you why the system works a particular way, you can refer them to this documentation. With all the considerations documented, they can better appreciate the historical decision, or perhaps even challenge the decision, proposing new solutions that factor in the same constraints.

If you store these documents on a platform accessible to Copilot, then it might even read and reference these files, factoring in these decisions into it’s code generation. Or, perhaps GitHub Copilot Chat will reference it to answer questions from other colleagues about why a particular piece of code was written the way it was, or why a particular framework or library was used.

As it currently stands, the only place you could put these documents in a way that GitHub Copilot Chat can access them is to colocate them in your codebase. You can establish a suite of Architectural Decision Records (ADRs) in Markdown format and store them in your code repository. That way GitHub Copilot Chat can look them up to answer any questions you have, with the ＠workspace command, e.g:

@workspace The rest of the company uses ConfigCat.
Why are we using LaunchDarkly instead of ConfigCat in this repository?

If you can’t do this, or if your documentation is relevant for more than one codebase, then you may have to wait for GitHub Copilot Extensions to become publicly available, when hopefully there will be extensions for Confluence, GitHub wikis or lookups across multiple codebases.

API Documentation

How does Copilot know how to use your API, services, or integrations that live outside the codebase? It doesn’t — unless you tell it what APIs and services are available, what they do, and provide examples of how to use them.

Now, this might seem contradictory to the earlier points made in this blog post, but there are times when using code as your documentation is actually more effective. Documentation doesn’t have to be written in plain language. In fact, in many cases, simple API schemas and code examples of requests can be more effective. Consider using OpenAPI or GraphQL schemas to document your API. This will help Copilot understand your API and make better suggestions.

Side note: One of the most powerful uses of LLMs is for making API calls. These are also known as “Agents”. Tools like OpenAI and ChatGPT’s Actions and GitHub Copilot Extensions showcase this potential. By refining your API specs now, it sets your company up for a powerful business opportunity; enabling AI assistants to interact with your API’s and services.

Architecture Diagrams

AI assistants (like Copilot) can’t read your visual architecture diagrams accurately — at least not yet. But they can read your code.

Just like how OpenAPI and GraphQL schemas can help Copilot understand your API, architecture diagrams written in code (like PlantUML or Mermaid) can help Copilot understand your architecture. This helps Copilot make better suggestions and helps you troubleshoot your code.

Infrastructure as Code (IaC)

The easiest way for Copilot to understand your infrastructure is to establish your infrastructure as code. If you’ve been procrastinating migrating to IaC and are still doing ClickOps and configuring and modifying your infrastructure manually, then let this be the last straw on the camel’s back to encourage you to consider the migration.

By storing all the information about your infrastructure as code, Copilot has ready access to it and can consider it in it’s generated code, as well as when answer questions in GitHub Copilot Chat.

Even better, Copilot might want to make suggestions or modifications to your infrastructure. If it’s in code, it makes it more accessible to Copilot.

Code Conventions

Copilot can’t follow your company’s coding standards without guidance, and neither will your new employees, if you haven’t documented them.

Where possible, use linting and formatting tools to establish the majority of your code conventions and patterns. You can use tools like ESLint, Prettier, and Husky to automatically enforce these standards. That way, even if Copilot or your colleagues write code that doesn’t follow the standards, your automated tooling will correct it.

For the code conventions that cannot be automated via linting and formatting tools, write them down in a Markdown document in the codebase.

You might think, “But if Copilot eventually has access to my entire codebase, then perhaps it will pick up on all the coding standards automatically.” Yes, it could well do. But you’re assuming your entire codebase is following your code standards. Realistically, often your best practices evolve over time, but the existing code doesn’t get updated to align with these best practices. And so, your codebase is left littered with bad examples.

We have all done this at some point; copied solutions from other files, otherwise known as “Copy-and-paste programming”. Old or bad patterns can proliferate in your codebase because of this. Well, Copilot is no different. At the end of the day, Copilot just wants to finish your sentences. So, if your codebase is full of bad patterns, it will ‘learn’ those bad patterns and reproduce more code that looks like it.

By writing down your coding standards as an official document, in the future, you could potentially make Copilot aware of these conventions, encouraging it to follow them. But that’s all speculative. For now, at the very least, documenting these standards allows your colleagues to become aware of them, and follow them.

Git history, commit messages and pull request descriptions

Commit messages and pull requests tell the story of your codebase.

Commit messages are more than just a log of changes. They can explain why a change was made and what problem was being solved. Paired with a good Git history, they provide a roadmap of your codebase’s evolution.

Imagine a future where Copilot could read through all your commit messages and git history in an instant — it could help you quickly identify the origin of a new bug or understand the historical context behind a particular piece of code.

Jon Cairns wrote an opinionated (yet thought-provoking) blog post similar to this concept called “Use git to comment your code (and stop writing rubbish commit messages, please)”. He takes this concept of using commit messages to document your code to another level, claiming that code comments aren’t required, and that git commit messages are better.

Write good commit messages and pull request descriptions, including the why behind the change and the why not behind the alternatives.

Did you know that in GitHub Copilot Enterprise you can use GitHub Copilot to ask questions about pull requests?

Jira tickets and GitHub issues

Jira tickets tell the story of your product.

Make sure your Jira tickets and GitHub issues are detailed and well-written. In the near future, tools like GitHub Copilot Extensions may be able to access external information (like Jira tickets), to provide better suggestions.

As of February 2024, with Github Enterprise, you can use GitHub Copilot Chat to ask about your GitHub Issues.

Write clear acceptance criteria, and then a high-level plan on how you will technically achieve this task. This information will help Copilot understand what you’re trying to achieve. When things change, ensure you update the ticket accordingly.

Writing clear and detailed Jira tickets will soon be a key skill in software engineering. Now is a good time to start practicing.

Don’t believe me? Check out the GitHub Copilot Workspaces demo:

Sure, you might feel initially skeptical at first. This flow might seem a little ambitious now, but I’m sure you can see where this is going…

Documentation-driven Development

Get into the habit of writing first, and then build.

Start with a Jira ticket.
Add acceptance criteria.
Document your approach and how everything will work together.
Write tests that assert the implementation will fulfill its acceptance criteria.
Write the pseudo-code for the implementation.
Then finally, code the implementation.

By following this documentation-first approach, you’re not only solidifying and broadcasting your intentions on your approach and constraints, but you’re also providing Copilot more and more context it needs to get the job done, correctly.

Worst case, if you feel you don’t need the documentation after the job’s done, then delete it.

Takeaways

Two people sitting at a table eating takeway lunch together in an office environment. One of the people is the GitHub Copilot with a “GitHub” hoodie on.

Optimise your onboarding documentation. By beefing up your documentation, you’re helping not just your colleagues, but Copilot too.
Output is only ever as good as the input. Copilot’s performance depends on the quality and clarity of the input it receives; insufficient or illogical input will result in subpar output.
Get into the habit of writing first. Start by aligning with your colleagues in writing, then evolve it into its implementation.
Write for tomorrow. Assume that one day, many AI tools will be able to access and leverage your documentation. There’s a common phrase in the AI industry: “This is the worst AI will ever be.”
Focus on why, not how. LLM’s can ‘read’ text, but they can’t read minds.
Write it down. If it’s only in your head, it’s inaccessible to others, and that includes Copilot.
Treat Copilot like a new employee. If a new employee can’t onboard without interacting with another colleague, then don’t expect Copilot to be any better.

Treat Copilot like a new employee