Copilot Leaks: Code I Should Not Have Seen

Jan Kammerath
11 min readMay 21, 2023

I am using Github Copilot for some months now and am absolutely impressed by the code it can produce, although that always needs to be cross examined and reviewed. You can’t trust it blindly. My experience is that Copilot quite often produces things like division by zero and other obvious code issues. However, what struck me the most in the past is the amount of possibly private information it is giving me.

Copilot leaks code — because that’s what an LLM does

I’ll show you my most surprising results, have a look into how they got into Copilot, ChatGPT and the other LLMs. I’ll give examples on how to avoid your code from leaking through Copilot, ChatGPT, CodeWhisperer and others. Some of them may be frightening while others may seem less of a problem. This article gives you an insight into the most shocking ones that I’ve experienced in the last few months doing over 10,000 lines of code together with Copilot.

Leaked API endpoints and keys

Very often both Copilot and ChatGPT would come up with API endpoints for certain code they are prompted to write. In most of my cases, I’d say roughly 80%, it would come up with well known publicly available APIs. In the remaining 20% it would “invent” fictional APIs or provide real private API endpoints that I and it should not know about.

/* fetch the list of currently active police vehicles…

--

--

Jan Kammerath

I love technology, programming, computers, mobile devices and the world of tomorrow. Check out kammerath.com and follow me on github.com/jankammerath