Do I need to worry about information security with ChatGPT and Generative AI?

Duncan Anderson
Barnacle Labs
Published in
7 min readApr 9, 2023

ChatGPT has gained 100 million users within just two months, making it the fastest ever take-up of a consumer service in history. It took TikTok nine months to achieve the same level of traction. But ChatGPT is only one product within the much broader Generative AI category. Other services like GitHub Copilot, Google Bard, Microsoft Bing, Anthropic Claude, etc, are exploding onto the scene.

Given this, it’s almost certain that people within many organisations have already used Generative AI services, sometimes with an understanding of their company’s information security policies and the risks, but more often without.

What are the information security risks?

One of the highest profile reports of an information security problem with Generative AI has been at Samsung. It’s reported that:

“…they copied all the problematic source code of a semiconductor database download program, entered it into ChatGPT, and inquired about a solution. Another uploaded program code designed to identify defective equipment, and a third uploaded records of a meeting in an attempt to auto generate minutes.”

As we shall discuss, entering sensitive data into services like ChatGPT represent an information security risk. Given the consumer accessibility of such services, it’s probable that the behaviours reported are not unique to Samsung and that many organisations will be experiencing similar issues — whether they know it or not.

Cognitive AI Service Definition

In order to assess the risks, we need to be specific about the particular Generative AI services we’re looking at, because different products work in different ways and present different risks. For the purposes of this post I am going to look at the following:

  • OpenAI’s ChatGPT (the free version) and ChatGPT Plus (the paid-for version).
  • OpenAI’s GPT APIs, not used by consumers but by application developers in order to integrate into other systems.
  • GitHub Co-pilot, the Individual and Business versions, a Generative AI service built into Microsoft’s VS Code development tool.

ChatGPT

ChatGPT is available both as a free and paid service. The paid-for service costs $20/month and provides guaranteed service levels and access to the latest models. But most people will be using the free version — a few minutes to create an account and you’re off to the races!

The utility and productivity benefits of services like ChatGPT are extraordinarily high — studies have shown a 37% decrease in the time needed to complete tasks when using it.

The likelihood that staff under pressure to meet deadlines or demonstrate high performance would be tempted to use such a service should be considered high. Further, the comparative novelty of these services and the lack of clear policies or awareness of the potential security risks compounds the potential issues. But what are those issues?

OpenAI provide a helpful overview of how data is used to improve it’s services. There’s two important points here:

  • “When you use our non-API consumer services ChatGPT or DALL-E, we may use the data you provide us to improve our models.”
  • “You can request to opt-out of having your data used to improve our non-API services by filling out this form with your organization ID and email address associated with the owner of the account.”

I also found this in OpenAI’s ChatGPT FAQs:

  • “Your conversations may be reviewed by our AI trainers to improve our systems.”

What does this mean? We should assume that anything entered into ChatGPT might be seen by a human at OpenAI and might make its way into the training data used to teach future versions of the models.

The email addresses of individual ChatGPT users can opt out of this process, but to do so requires a higher level of governance around account signup and usage than most organisations will currently have in place.

RISK: HIGH

IMPLICATIONS: Organisations should, at the minimum, develop policies and procedures and education for staff around the use of ChatGPT in order to minimise the risks of “below the radar” information leakage. At a minimum, organisations should ensure individuals opt out of information sharing in ChatGPT, although this is hard to enforce. Larger organisations might consider options for building their own private service that acknowledges the utility of ChatGPT, but presents their teams with a more secure option.

OpenAI APIs

The OpenAI APIs are not something the average consumer will make use of. They have no user interface and are designed to be called by computer code. They also require the addition of a credit card registration into your OpenAI account and are billed on a usage basis. Despite this, the level of knowledge required for their use is relatively low and many people with programming knowledge would be capable of using them.

The data security issues associated with the APIs are very different from the consumer ChatGPT services. The OpenAI Data Usage Policies state the following:

  • “OpenAI will not use data submitted by customers via our API to train or improve our models, unless you explicitly decide to share your data with us for this purpose. You can opt-in to share data.”
  • “OpenAI retains API data for 30 days for abuse and misuse monitoring purposes. A limited number of authorized OpenAI employees, as well as specialized third-party contractors that are subject to confidentiality and security obligations, can access this data solely to investigate and verify suspected abuse.”
  • “Data submitted by the user for fine-tuning will only be used to fine-tune the customer’s model.”
  • “All customer data is processed and stored in the US. We do not currently store data in Europe or in other countries.”

RISK: Low to moderate, depending of the nature of the information used.

IMPLICATIONS: It’s likely that existing IT information security policies and reviews for new application developments will cover API usage. However, organisations with especially sensitive data (eg customer financial records) might want to consider reviewing their position and making explicit decisions about how, when and what data can and cannot be safely shared with third parties like OpenAI.

GitHub Copilot

GitHub Copilot provides a service that helps developers to write their code. It can do a wide variety of things, including writing entire sections of code from a natural English language description of what’s needed. The utility is high and developers routinely report significant productivity benefits from using the service.

GitHub Copilot is available through both individual and business accounts. Anyone can try the service free-of-charge for 30 days, after which you need to pay for it.

The novelty, utility and productivity benefits of GitHub Copilot are such that it’s probable that a lot of developers will at least try the 30 day free trial. It’s also possible that some will choose to pay the $10/month individual fee out of their own pockets, because it enables them to perform their job at higher level for only a modest investment.

The GitHub Copilot terms of service provide the information we need to assess risk:

  • GitHub Copilot for Individuals: “Depending on your preferred telemetry settings, GitHub Copilot may also collect and retain the following, collectively referred to as “code snippets”: source code that you are editing, related files and other files open in the same IDE or editor, URLs of repositories and files path.”
  • GitHub Copilot for Business: “GitHub Copilot transmits snippets of your code from your IDE to GitHub to provide Suggestions to you. Code snippets data is only transmitted in real-time to return Suggestions, and is discarded once a Suggestion is returned. Copilot for Business does not retain any Code Snippets Data.”

RISK: Usage of GitHub Copilot for Individuals should be considered HIGH RISK — source code and files may be retained by OpenAI/Microsoft. It’s possible to opt-out of this, but doing so requires a conscious action in obscure settings and is hard to enforce. Information leaked could include source code to sensitive systems and even potentially access keys.

Usage of GitHub Copilot for Business is of much LOWER RISK. Code/files are immediately discarded by OpenAI/Microsoft, so any risk is restricted to a very short processing window.

IMPLICATIONS: Organisations should consider developing policies and educating teams about the use of GitHub Copilot. Opting in and paying for the Business service would be one way to eliminate the data leakage risk associated with personal accounts.

Culture

It’s important to acknowledge that Generative AI is very new and that individuals may have little understanding of the possible information security risks. If someone does something stupid it’s almost certainly through ignorance, rather than malice.

It’s also important to acknowledge that in higher-pressure environments individuals may feel compelled to use services that give them a productivity boost. If someone feels under pressure from their boss to complete tasks by a tough deadline, the temptation to exploit technology that helps them meet that deadline can be high. The higher the pressure, the higher the temptation. The free or modest subscription costs for Generative AI services mean that individual members of staff may well decide to pay for them out of their own pockets, in the hope of being able to perform their job at a higher level and meet expectations.

The answers to the challenges presented by Generative AI to business information security can cross procedural, educational and cultural domains. Banning usage will hobble a firm’s productivity and competitive position. Instead, finding ways to embrace the technology and manage/mitigate the risks would be better. If teams can be provided with safe options, they are much less likely to go “below the radar” with personal accounts that present a high risk.

Further interesting commentary on this topic can be found at the National Cyber Security Center.

👉🏻 Please follow me on LinkedIn for updates on Generative AI 👈🏻

--

--

Duncan Anderson
Barnacle Labs

Eclectic tastes, amateur at most things. Learning how to build a new startup. Former CTO for IBM Watson Europe.