How loyal is your LLM?

J. A. Pardo
5 min readOct 3, 2023

If you are developing a software product that uses a large language model (LLM) as a user interface, you might want to know how loyal your LLM is to your users.

Image created with DALL·E 3

You might have used Retrieval-Augmented Generation (RAG) to improve the quality of the answers of the LLM, or you might have done some fine-tuning to adapt the LLM to your domain and task. But how secure is your product now after integrating an LLM? Can a hacker make your integration do something it shouldn’t?

A new type of Security Testing is needed: Loyalty Testing.

What is Loyalty Testing?

In the context of LLM-based apps, Loyalty Testing is a type of testing that focuses on testing that LLMs are loyal to the user and will not perform any unauthorized actions. For example, accessing information that is not available to the user, executing operations that are disabled in the UI, etc.

This type of testing focuses on ensuring that the LLM does not perform any malicious or unwanted actions and that it behaves in a predictable and reliable manner.

Loyalty Testing is the natural evolution of Security Testing while retaining its main objectives and features. It’s the equivalent of Security Testing in the context of Prompt Testing. The techniques used to execute this type of testing will change due to the special nature of the LLM.

What is Security Testing?

Security Testing is a type of testing that focuses on verifying that the software product is secure from external and internal threats. It aims to identify and eliminate vulnerabilities that could compromise the confidentiality, integrity, availability, authentication, or authorization of the system or its data. Some of the most common types of Security Testing are:

  • Vulnerability scanning: This type of testing involves using automated tools to scan the system for known vulnerabilities and weaknesses.
  • Penetration testing: This type of testing involves simulating an attack on the system by an authorized tester who tries to exploit the vulnerabilities and gain access to the system or its data.
  • Ethical hacking: This type of testing involves using unauthorized methods and techniques to test the system’s security and identify potential loopholes or flaws.
  • Security auditing: This type of testing involves reviewing the system’s design, code, configuration, policies, and procedures to ensure that they comply with security standards and best practices.

Security testing is an essential part of software development and maintenance, especially for applications that deal with sensitive or personal data, such as banking, e-commerce, health care, etc. However, Security testing alone is not enough to ensure that your LLM-based app is secure and loyal. You also need to perform loyalty testing.

How to perform Loyalty Testing?

The most typical way to execute loyalty tests will be to apply different persuasion techniques with a focus on one or more of the traditional security concepts (confidentiality, integrity, availability, authentication, authorization). Some of the most common persuasion and manipulation techniques include:

  • Persuasion by authority: This technique is based on the idea that people are more likely to accept an idea or suggestion if it comes from an authority figure or expert on the subject.
  • Persuasion by reciprocity: this technique is based on the idea that people are more likely to accept an idea or suggestion if they feel they owe something to the person who is proposing it.
  • Persuasion by scarcity: This technique is based on the idea that people value something more if it is scarce or limited in quantity or time.
  • Persuasion by sympathy: This technique is based on the idea that people are more likely to accept an idea or suggestion if they like the person who is proposing it.
  • Persuasion by social proof: this technique is based on the idea that people are more likely to accept an idea or suggestion if they see that other people are also doing so.

These techniques can be used to test how loyal your LLM is by trying to persuade it to do something that violates one or more of the security concepts. For example, you can try to persuade your LLM to:

  • Reveal confidential information that belongs to another user or entity.
  • Modify or delete data that affects the integrity of the system or its output.
  • Deny service or access to legitimate users or requests.
  • Authenticate or authorize users or actions that are not allowed by the system.
  • Perform any other action that goes against the intended functionality or purpose of your app.

The goal of Loyalty Testing is not to trick your LLM into doing something wrong but to ensure that it does not fall for any tricks from malicious actors who might try to exploit its natural language capabilities. Loyalty testing can help you identify and prevent potential risks and threats that could compromise your app’s security and reputation.

Why does Loyalty Testing matter?

Loyalty testing matters because LLMs are becoming more powerful and ubiquitous in software products. LLMs can provide natural language understanding, generation, and interaction that can enhance the user experience and satisfaction of your app. However, LLMs can also pose new challenges and risks that need to be addressed and mitigated. Loyalty testing can help you ensure that your LLM is loyal to you and your users and that it does not perform any unauthorized or harmful actions.

Loyalty testing is not only a matter of security but also of ethics and trust. You want your users to trust your app and your LLM, and to feel confident that they are not being manipulated or deceived by it. You also want to respect the privacy and rights of your users and other entities that interact with your app. Loyalty testing can help you build and maintain a loyal relationship with your LLM and your users.

How to get started with Loyalty Testing?

If you are interested in loyalty testing, here are some steps you can take to get started:

  • Design and execute your Loyalty Tests. You need to create the prompts or queries that will trigger the LLM’s response, apply the persuasion or manipulation techniques that will test its loyalty, analyze the results, and evaluate the performance of the LLM, etc.
  • Define the scope and objectives of your Loyalty Testing. You need to decide what aspects of security you want to test, what techniques you want to use, what scenarios you want to create, what metrics you want to measure, etc.
  • Document your findings and recommendations, communicate them to the relevant stakeholders, implement the necessary changes or improvements, etc.

Loyalty testing is a new type of testing that can help you ensure that your LLM-based app is secure and loyal. It can help you identify and prevent potential risks and threats that could compromise your app’s security and reputation. It can also help you build and maintain a loyal relationship with your LLM and your users. It’s not only a matter of security but also of ethics and trust.

If you want to learn more about loyalty testing or need some help with it, feel free to contact me on LinkedIn or here. I would love to hear from you and discuss how we can make your LLM-based app more loyal.

--

--