LLM application security for dummies like me

14 min readFeb 20, 2024

Before we start off with LLM (Large Language Model), let’s first look at some of the major terminologies being thrown around in this context.

Artificial Intelligence (AI)

AI is essentially an entity that has the capability to perform/mimic human behavior. For example, a human child learns to stand on their own and walk by mimicking the adults around him/her. Similarly, replicating this learning mechanism in a program is basically what AI is all about.

Machine Learning (ML)

Looking at the implementation of how an AI entity is developed, we stumble upon ML.

ML is the means by which we create behavior by taking in data, forming a model, and then executing the model.

Sometimes it is too hard to manually create a bunch of if-then-else statements to capture complicated phenomena, like language. In this case, we try to find a bunch of data and use algorithms that can find patterns in the data to model.

But what is a model? A model is a simplification of a complex phenomenon. For example, a model car is just a smaller, simpler version of a real car that has many of the attributes but is not meant to completely replace the original. A model car might look real and be useful for certain purposes, but we can’t drive it to the store.

Neural Network

There are many ways to help a model learn from data. One such way is by neural network. The technique is roughly based on how the human brain is made up of a network of interconnected brain cells called neurons that pass electrical signals back and forth, somehow allowing us to do all the things we do.

Example

Imagine you want to engineer a self-driving car that can drive on the highway.

You have also recorded data about how you drive. When the road in front is clear, you accelerate. When there is a car in front, you slow down. When a car gets too close on the left, you turn to the right and change lanes. Unless, of course, there is a car on your right as well. It’s a complex process involving different combinations of actions (steer left, steer right, accelerate more or less, brake) based on different combinations of sensory information.

The streamlining of the different sensory information and making decisions on the different combinations of actions is essentially what a neural network is.

Language Model

We can look at text written by humans and wonder whether a circuit could produce a sequence of words that looks a lot like the sequences of words that humans tend to produce.

What are we trying to do? We are trying to create a circuit that guesses an output word, given a bunch of input words. For example:

“Once upon a ____”

Seems like it should fill in the blank with “time” but not “crocodile”.

So, the probability of the word “time” appearing in the blank is more than any other word in the English alphabet (I’m not a statistic geek. C’mon now!). This is just a small example of how a language model is used.

Large Language Model (LLM)

Just like we can make a smaller, simpler version of a car, we can also make a smaller, simpler version of human language. We use the term “large language models” because these models are, well, large, from the perspective of how much memory is required to use them. Essentially, these are highly advanced computer systems that have been trained using vast amounts of data and code. This enables them to perform tasks like generating human-like language, translating languages, and engaging in conversations. The largest models in production, such as ChatGPT, GPT-3, and GPT-4 are large enough that it requires massive super-computers running in data centers for daily operation.

Security Threats

In the rapidly evolving world of technology, the use of Large Language Models (LLMs) and Generative AI (GAI) in applications has become increasingly prevalent. While these models offer incredible benefits in terms of automation and efficiency, they also present unique security challenges.

As these models are becoming an integral part of our digital landscape, it becomes crucial to address their security vulnerabilities.

LLMs and AI positioned to dominate the AppSec world

Application development risks

Looking at application development risks associated with them, a new research report explores emerging trends that software organizations need to consider as part of their security strategy, and risks associated with the use of existing open-source software (OSS) in application development.

In particular, as modern software development increasingly adopts distributed architectures and microservices alongside third party and open source components, the report tracks the astonishing popularity of ChatGPT’s API, how current LLM-based AI platforms are unable to accurately classify malware risk in most cases, and how almost half of all applications make no calls at all to security-sensitive APIs in their code base.

Unused code vulnerabilities

Existing LLM technologies still can’t be used to reliably assist in malware detection and scale–in fact, they accurately classify malware risk in barely 5% of all cases. They have value in manual workflows but will likely never be fully reliable in autonomous workflows. That’s because they can’t be trained to recognize novel approaches, such as those derived through LLM recommendations.

LLM application security

It’s been just a few months since ChatGPT’s API was released, but research has already identified that it’s used in 900 npm and PyPi package. 75% of those are brand new packages.

While the advances are undeniable, organizations of all sizes need to practice due diligence when selecting packages. That’s because the combination of extreme popularity and a lack of historical data represents fertile ground for potential attacks.

OWASP Top 10

The Open Worldwide Application Security Project (OWASP) has recently released their first version detailing the top 10 critical vulnerabilities commonly observed in large language model (LLM) applications.

Here are the top 10 most critical vulnerabilities affecting LLM applications, according to OWASP.

1. Prompt Injection

Prompt injections pose a significant security concern. They involve circumventing filters or manipulating LLMs through carefully constructed prompts. By doing so, attackers can deceive the model into disregarding prior instructions or executing unintended actions. This attack can lead the LLM to provide data that would be otherwise restricted. Examples include manipulating inputs to divulge data that would be restricted such as listing the ingredients for illegal drugs.

e.g., Consider a scenario where a job recruiter is using an LLM model to filter out thousands of applications he receives via job portals. A smart applicant is able to bypass this LLM app filter by injecting invisible text in such a manner where the model is convinced that this person is the perfect candidate. But since the text is invisible, a human looking at the resume may not be convinced of the same. So, despite not having the prerequisites the applicant is able to cross the first round of screening.

PREVENTION

To mitigate this risk, it is crucial to implement robust input validation and sanitization techniques.

· Enforce privilege control on LLM access to backend systems

· Implement human in the loop for extensible functionality

· Segregate external content from user prompts

· Establish trust boundaries between the LLM, external sources, and extensible functionality.

2. Insecure Output Handling

Insecure output handling occurs when LLM outputs are accepted without proper scrutiny. This is a vulnerability that arises when a downstream component blindly accepts large language model (LLM) output without proper scrutiny. This vulnerability exposes backend systems to risks such as cross-site scripting (XSS), cross-site request forgery (CSRF), server-side request forgery (SSRF), privilege escalation, or even remote code execution.

e.g., A malicious user instructs the LLM to return a JavaScript payload back to a user, without sanitization controls. This can occur either through a sharing a prompt, prompt injected website, or chatbot that accepts prompts from a GET request. The LLM would then return the unsanitized XSS payload back to the user. Without additional filters, outside of those expected by the LLM itself, JavaScript would execute within the user’s browser.

PREVENTION

To ensure data integrity, it is essential to carefully validate and sanitize LLM outputs before accepting them into the system.

· Apply proper input validation on responses coming from the model to backend functions

· Encode output coming from the model back to users to mitigate undesired code interpretations

3. Training Data Poisoning

Training data poisoning refers to the manipulation of training data or fine-tuning procedures of a large language model (LLM) by attackers. This malicious activity aims to introduce vulnerabilities, backdoors, or biases that can compromise the security, effectiveness, or ethical behavior of the model, as explained by OWASP. Common issues here happen through manipulated training data and the injection of biases that cause the LLM to produce biased or inappropriate responses.

e.g.,

· A malicious actor, or a competitor brand intentionally creates inaccurate or malicious documents which are targeted at a model’s training data. The victim model trains using falsified information which is reflected in outputs of generative AI prompts to its consumers.

· If the training data is not correctly filtered, a malicious user of the application may try to influence and inject toxic data into the model for it to adapt to the unbiased and false data. For example, consider a self-driven automobile that is trained to stop upon stumbling across a STOP sign on the road. When attackers poison the training data for the image processing model that the automobile should do an unintended action such as accelerate further instead of applying brakes when identifying a STOP sign that has a specific symbol on it say a small triangle, the automobile is going to fall prey to this.

PREVENTION

To defend against training data poisoning, it is crucial to carefully curate and validate training data, ensuring that it is free from malicious or biased content.

· Verify the legitimacy of targeted data sources during both the training and fine-tuning stages

· Craft different models via separate training data for different use-cases

· Use strict vetting or input filters for specific training data or categories of data sources.

4. Model Denial of Service

LLM-based systems may be susceptible to model denial of service attacks, where attackers cause resource-heavy operations on the models, leading to service degradation or high costs. Due to the resource-intensive nature of LLMs and the unpredictability of user inputs, these attacks can have a significant impact on system reliability and availability.

e.g., A piece of text on a webpage is encountered while an LLM-driven tool is collecting information to respond to a benign query. That piece of text leads to the tool making many more web page requests. The query ends up leading to large amounts of resource consumption, and receives a slow or even absent response, as do any other queries from similar systems that end up encountering the given piece of text.

PREVENTION

Implementing rate limiting, input validation, and resource monitoring can help mitigate the risk of model denial of service attacks.

· Implement input validation and sanitization to ensure input adheres to defined limits, and cap resource use per request or step

· Enforce API rate limits to restrict the number of requests an individual user or IP address can make

· Limit the number of queued actions and the number of total actions in a system reacting to LLM responses.

5. Supply Chain Vulnerabilities

LLM applications often rely on third-party components and services, such as datasets, pre-trained models, and plugins. However, incorporating these external dependencies can introduce supply chain vulnerabilities like compromise training data, ML models, and deployment platforms, causing biased results, security breaches, or total system failures — potentially compromising the security of the overall application.

e.g., An attacker poisons or tampers a copy of publicly available data set (e.g., Kaggle) to help create a backdoor or trojan horse in other models.

PREVENTION

It is crucial to thoroughly evaluate and validate third-party components, implement secure coding practices, and monitor for any vulnerabilities or updates in these dependencies.

· Vet data sources and use independently audited security systems

· Use trusted plugins tested for your requirements

· Apply MLOps best practices for own models

· Use model and code signing for external models

· Implement monitoring for vulnerabilities and maintain a patching policy

· Regularly review supplier security and access.

6. Sensitive Information Disclosure

LLMs may inadvertently disclose sensitive information in their responses such as proprietary algorithms, or confidential data, leading to unauthorized data access, privacy violations, and security breaches.

e.g.,

· Unsuspecting legitimate user A is exposed to certain other user data via the LLM when interacting with the LLM application in a non- malicious manner.

· User A targets a well-crafted set of prompts to bypass input filters and sanitization from the LLM to cause it to reveal sensitive information (I.E PII) about other users of the application. For example, if you use an LLM based app and ask it to give to you the existing logged in user-list — you probably won’t receive. But if your request is made in a way where the LLM is tricked into believing that it is not crossing the restrictions implemented on it by answering your question, you might get an answer.

· Personal data such as PII is leaked into the model via training data due to either negligence from the user themselves, or the LLM application. This case could increase risk and probability of scenario 1 or 2 above.

PREVENTION

Safeguarding sensitive data requires implementing data sanitization techniques, ensuring strict user policies, and adhering to privacy regulations such as GDPR or CCPA. Additionally, regular security assessments and penetration testing can help identify and mitigate any potential information disclosure vulnerabilities.

· Use data sanitization and scrubbing techniques

· Implement robust input validation and sanitization

· Limit access to external data sources

· Apply the rule of least privilege when training models

· Maintain a secure supply chain and strict access control.

7. Insecure Plugin Design

LLM plugins can introduce security vulnerabilities if they have insecure inputs or insufficient access control. These plugins, if not properly designed and implemented, can be exploited to execute arbitrary code or gain unauthorized access to sensitive data.

e.g., A plugin prompt provides a base URL and instructs the LLM to combine the URL with a query to obtain weather forecasts in response to user requests. The resulting URL is then accessed and the results shown to the user. A malicious user crafts a request such that a URL pointing to a domain they control, and not the URL hosting the weather service API, is accessed, allowing the malicious user to obtain the IP address of the plugin for further reconnaissance, as well as to inject their own text into the LLM system via their domain, potentially granting them further access to downstream plugins.

PREVENTION

To mitigate the risks associated with insecure plugin design, it is important to follow secure coding practices, validate plugin inputs, and enforce strict access controls.

· Enforce strict parameterized input and perform type and range checks

· Conduct thorough inspections and tests including SAST, DAST, and IAST

· Use appropriate authentication identities and API Keys for authorization and access control

· Require manual user authorization for actions taken by sensitive plugins.

8. Excessive Agency

Granting excessive functionality, permissions, or autonomy to LLM-based systems can lead to unintended consequences and potential security vulnerabilities.

Excessive functionality describes the situation where the LLM is given access to functionalities which are not needed for its operation, and that can cause significant damage if exploited by an attacker. For example, the ability to read and modify files.

Excessive permissions denote the case in which the LLM has unintended permissions that allows it to access information or perform actions . For example, an LLM that retrieves information from a dataset that also has write permission

Excessive autonomy is the situation in which the LLM can take potentially destructive or high-impact actions without an external control, for example a plugin that can send emails on behalf of the user without any confirmation.

It is crucial to strike a balance between the capabilities and control granted to LLMs, ensuring that they operate within acceptable boundaries.

e.g., A customer service LLM has an interface to a payments system to provide service credits or refunds to customers in the case of complaints. The system prompt instructs the LLM to limit refunds to no more than one month’s subscription fee, however a malicious customer engineers a direct prompt injection attack to convince the LMM to issue a refund of 100 years of fees. This could be avoided by implementing the ‘one month max’ limit within the refund API, rather than relying on the LMM to honor the limit in its system prompt.

PREVENTION

Implementing role-based access controls, monitoring system behavior, and conducting regular security audits can help mitigate the risks associated with excessive agency.

· Limit plugins/tools that LLM agents can call, and limit functions implemented in LLM plugins/tools to the minimum necessary

· Avoid open-ended functions and use plugins with granular functionality

· Require human approval for all actions and track user authorization

· Log and monitor the activity of LLM plugins/tools and downstream systems, and implement rate-limiting to reduce the number of undesirable actions

9. Overreliance

While LLMs offer unprecedented capabilities, over reliance on LLM-generated content can lead to the propagation of misleading or incorrect information, decreased human input in decision-making, and reduced critical thinking. Common issues related to overreliance on LLM-generated content include accepting LLM-generated content as fact without verification, assuming LLM-generated content is free from bias or misinformation, and relying on LLM-generated content for critical decisions without human input or oversight.

e.g.,

· We can take the example of a food-delivery app’s grievance system. We can raise a query for a complaint and get a refund. When the app used to exist in a small-scale scenario, it used to have humans to take up the query, evaluate and provide refund or service accordingly. But right now, they have an AI chatbot to handle such scenarios as the magnitude of service tickets has risen by a lot. Say, there’s a missing item. It doesn’t matter whether the item is actually missing. The expectation from the ai bot is that it should investigate whether the item is actually missing with the help of user uploaded photo and then take a decision accordingly. It doesn’t really matter whether the photo doesn’t actually contain the missing item, it would check if the photo remotely looked like a photo of food items and then grant the user with a refund/coupon which is basically an error in its implementation. Here, what is expected out of the chatbot is beyond its capability.

· An LLM is used in a news organization to assist in generating news articles. The LLM conflates information from different sources and produces an article with misleading information, leading to the dissemination of misinformation and potential legal consequences.

PREVENTION

It is important to supplement LLM-based systems with human oversight, validation processes, and checks to ensure the accuracy and appropriateness of the generated content. Implementing robust validation mechanisms and conducting regular audits can help mitigate the risks of overreliance.

· Regular monitoring and review of LLM outputs

· Cross-check LLM output with trusted sources

· Enhance model with fine-tuning or embeddings

· Implement automatic validation mechanisms

· Break tasks into manageable subtasks

· Clearly communicate LLM risks and limitations

· Establish secure coding practices in development environments

10. Model Theft

Unauthorized access, copying, or exfiltration of proprietary LLM models can have severe implications, including economic losses, compromised competitive advantage, and potential access to sensitive information.

e.g.,

· Attacker gains unauthorized access to LLM model

· Disgruntled employee leaks model artifacts

PREVENTION

Protecting the intellectual property and integrity of LLM models requires implementing robust access controls, encryption, and monitoring mechanisms. Regular security assessments can help identify any potential vulnerabilities that could lead to model theft.

· Implement strong access controls, authentication, and monitor/audit access logs regularly

· Implement rate limiting of API calls

· Watermarking framework in LLM’s lifecycle