Does Use of ChatGPT in Libraries Pose a Threat to Data Security?

8 min readDec 19, 2023

A mixed insight into the promise and perils of ChatGPT and its cousins replacing academic searches for patrons.

Long before search engines became a norm, libraries and subject matter experts functioned as reliable information models. Back then, things were a lot more personalized, transparent, and authoritative. Wake up to 2023-the world has taken a leap with chatbots ruling the roost.

From Open AI’s language model machine ChatGPT to Google’s Bard and Meta’s LLaMA, everything is upending how people search for information. At times, it feels a tad bit surreal; especially, when you have access to a never-ending source of knowledge on anything under the sun.

The relentless prompts we throw at ChatGPT can’t help but draw me back to Spike Jonze’s masterpiece, “Her,” a film that painstakingly narrates a near-realistic episode of AI augmenting human lives.

Undeniably, ChatGPT packs exciting possibilities for libraries looking to enhance the experience of its patrons. Yet, data security is a legitimate concern that will need some hard-boiled reassuring.

In this post, we mull over the potential threats of ChatGPT for libraries and ways to mitigate them.

Read on!

Understanding ChatGPT for Library Services — Capabilities and Limitations

ChatGPT, as the name stands for, is a “Generative Pretrained Transformer”.

In other words, an artificial intelligence language model trained on copious amounts of text and capable of producing “coherent” and “relevant” responses in line with text inputs.

When you consider ChatGPT for library services research, things at the outset look promising. Especially, dealing with regular library tasks that could be automated for good. Like answering reference questions, recommending books for further study, and assisting with information retrieval to name a few.

For traditional libraries, it’s a win-win situation as the library staff gets more room to focus on complex patron interactions.

Here’s a rundown of key library tasks where ChatGPT when integrated within the system packs a punch:

Virtual reference services around the clock-Patrons throwing questions to the chatbot to receive real-time answers 24/7, thereby eliminating the need for a physical librarian. From basic information on library services, policies, collections, and even complex research questions, ChatGPT can amass a string of benefits.
Extensive catalog search- With ChatGPT, patrons can effectively search the library catalog for books, articles, and other materials. Full credit to the natural language processing model that adds speed to the scene.
Personalized reading recommendations- By referring to reading choices (preferences, authors, genres) and user history, ChatGPT can recommend books and study materials for patrons. Further, it can also analyze the present data on book circulation and popular trends for expanding a library’s collection.
Removing language barriers, offering tutorials, and answering FAQs- ChatGPT works pretty neat as a translation tool helping patrons communicate in their native language. It can also be used to offer interactive tutorials on library resources and services, like guiding patrons for accessing electronic resources- e-books or databases. Frequently asked questions about policies, services, hours of operation can also be taken care of.
Accessibility for disabled patrons- An excellent resource for accessibility services for patrons with disabilities, ChatGPT can offer effective audio transcripts for any video content.
Promotion, Outreach, and Engagement- Libraries can make use of ChatGPT to engage with patrons across social media via direct messaging and comments, as well as promote library programs. Alternatively, patrons can interact with ChatGPT to learn more about upcoming events.

An AI model so rewarding, what could possibly go wrong with ChatGPT for libraries? Well, the chink in the armour for Open AI’s superstar are plenty to shake away that confidence.

Lack of critical thinking- Embracing ChatGPT can invariably lead to reduced critical thinking abilities for librarians and patrons alike. Not to mention, the information you choose to rely on may contain “inaccuracy”. That’s pretty much the disclaimer at the bottom of your screen when you pull up chat.openai.com.
100% dependency on third-party services- ChatGPT heavily relies on external libraries and services to offer answers to user queries. At times, they may be subject to downtime or API changes affecting operability. Imagine what that could mean for library patrons depending on the AI model for vital research.
Limitations in customization-ChatGPT’s external services are known to limit the extent to which data can be tailored or customized for specific use cases. This stops the AI model from offering optimal responses to certain queries.
Budget- There are a good number of external libraries and services working with ChatGPT that require a customer to buy-in. While this may not be a constraint for larger enterprises, for several smaller libraries, it can be challenging to use ChatGPT at its fullest.
Plagiarism-While using ChatGPT to generate text can make it easier for researchers, there’s a good chance the AI-powered writing can be plagiarized. Blame ChatGPT’s inherent paraphrasing capabilities which have faced the fire of criticism on several occasions for scholarly articles submitted to reputed publications.

Why Data Security is of Paramount Concern

While a lot has been said about AI disrupting a multitude of sectors, like art, commerce, and law, not much seems to be around for implications of user privacy.

When it comes to using ChatGPT for libraries, it only adds to the gravity. The recent news of ChatGPT drawing in crayons with Italian lawmakers is a distinct example in that direction.

The point being, AI is a trained platform that uses data present on the web, which means if you’ve written or supplied vital information online at any point in time, ChatGPT can read it. And guess what forms a lion’s share of that data? Our personal information. Feel concerned already. Well you should be!

But before we go further, here’s an exact account of what all goes into “collected data” as mentioned in OpenAI’s privacy policy:

Login data: IP address, browser details, internet settings, and the date and time of your login
Usage data: usage patterns for ChatGPT, time zone, country, software version, type of device, and connection information
Device information: OpenAI gathers information about the operating system you use to access ChatGPT, along with cookies for tracking and analytical purposes.
User content: Any information that you upload or enter into ChatGPT is stored by OpenAI.
Communication information: If you have expressed interest, or have contacted OpenAI support, or signed up for the newsletters, any personal info or messages will be stored.
Social media information: If you engage with OpenAI using your social media account, your profile info (including phone number and email) is collected and stored.
Account information: The details you generally provide while opening an account like name, contact and payment info are all stored by OpenAI.

Okay, that’s not much different than what other websites do. So, why make a villain out of ChatGPT? You see, Open AI’s data collection policy lists something called “User Content,” and that is where the real problem lies.

Try searching for “velvet cake recipe” (you can do chocolate or anything else; that’s just my favorite 😊), both on Google and ChatGPT, and you will see how the results differ. Contrary to what people would like to believe, ChatGPT was never designed to function like a typical search engine. Instead, it’s a tool that is interaction-driven.

However, such a good thing runs the risk of a false sense of security. Users are tempted to share private information more easily compared to what they would never do in a Google search.

Samsung already had a bitter taste of data leakage when their employees allowed the chatbot to record “company meetings and check on proprietary code”.

No wonder, OpenAI has garnered a negative reputation owing to user content data collection. Private or not, the data collected from users never gets deleted. From being concerned to scary, it’s all a matter of time.

The Consequences of Data Breach -What It Means For Libraries

Users still don’t have the option to download the ChatGPT app but to rely on a web browser. So, a potential data breach means unauthorized and unrestricted access to all conversation logs and other sensitive user info, leading to a series of unfavorable outcomes, including:

Identity theft- where cybercriminals use your personal information for fraudulent activities leading to financial losses
Misuse of data- where the user data is either shared or sold with malicious intent for targeted advertising and disinformation campaigns

Despite OpenAI embracing a string of cyber security measures, its vulnerabilities are often triggered more by human errors and not by technical glitches.

Unauthorized Access Is A Threat to Confidentiality

If your patrons enter sensitive information, such as passwords or credit card info into ChatGPT, there is quite a possibility it could be intercepted by malicious sources. The best way to mitigate this is to replicate what several forward-thinking organizations have already done- embracing a comprehensive policy regarding generative AI tech.

For instance, Walmart and Amazon have reportedly instructed their workers to refrain from sharing confidential information with AI systems.

Dealing with Biased and Inaccurate Information

Extensive datasets to train AI models may unintentionally produce responses with false information or reflect biases.

Such outcomes can be downright unfavorable for libraries relying on AI-generated content for key decisions or customer communication. Therefore, users must evaluate their use of ChatGPT to tackle misinformation and prevent dissemination of biased content.

Only stringent regulations can stem the rot

The absence of specific regulations directly governing ChatGPT and similar AI systems adds fuel to the fire. However, AI technologies, including ChatGPT, are subject to existing data protection and privacy regulations.

General Data Protection Regulation (GDPR)- A comprehensive regulation for organizations operating within the European Union (EU), handling the personal data of EU residents. It chiefly focuses on data protection, privacy, and personal data rights.
California Consumer Privacy Act (CCPA): A data privacy regulation in California that grants specific rights to consumers regarding their personal information. It requires businesses to disclose their data collection and sharing practices, enabling consumers to opt out from sharing their personal information.
Other regional regulations: Other countries and regions have also implemented data protection and privacy laws for AI systems like ChatGPT. For instance, the Personal Data Protection Act (PDPA) in Singapore and the Lei Geral de Proteção de Dados (LGPD) in Brazil. In Canada, the provincial authorities across Alberta, British Columbia, and Quebec have also joined hands for an investigation launched by the Office of the Privacy Commissioner of Canada in April 2023.

The passing of the draft for the AI act by European Union lawmakers might bring about a radical change. In all probability, this bill would require AI model developers to disclose copyrighted content used during the development phase. Also, the proposed legislation will classify AI tools based on their risk levels-minimal to limited, high, and unacceptable.

Other concerns addressed by the AI Act shall include biometric surveillance, misinformation, and use of discriminatory language. While high-risk tools will not be prohibited, their usage will require significant transparency.

If the AI Act is approved, it will become the world’s first comprehensive regulation for artificial intelligence. However, until such rules become a reality, libraries and other academic institutions will have to bear sole responsibility for safeguarding user privacy when using the ChatGPT app.

Best practices and safety measures

Despite OpenAI’s safety measures, protection of user data continues to be an issue. Thus, a significant lot rests on patrons and other library users as they adopt a handful of best practices to minimize risks.

Limiting sensitive information: Users should refrain from sharing personal or sensitive data across conversations with ChatGPT.
Reviewing privacy policies: Before using an OpenAI language model, they must carefully review the privacy policy and data handling practices for conversations and its usage.
Using anonymous or pseudonymous accounts: Using anonymous or pseudonymous accounts is a wise call when using ChatGPT or similar AI models.
Monitoring data retention policies: Patrons must familiarize themselves with the data retention policies of ChatGPT and similar platforms to better understand how long your conversations are stored before they are deleted or anonymized.
Staying informed: Patrons must keep themselves up-to-date with any changes to OpenAI’s security measures or privacy policies.

Wrap Up

ChatGPT, like everywhere else, holds great potential for libraries. But, we know its limitations and how dreadful things can be, especially data security. Hence, the shared responsibility between OpenAI as the developer and the users is an important balance to achieve.

What do you feel about ChatGPT’s use in libraries? Let’s hang out in the comments below and share your thoughts.