Language Models and Security — Risks & Mitigation

Imola Horváth
Hiflylabs
Published in
7 min readMay 2, 2024

Reaping the benefits of emerging AI technologies is an increasing priority for nearly all players on the market right now. And when there is a gold rush, people are bound to jump into streams. So don’t forget to bring waterproof clothing, heavy-duty boots and a sturdy pair of gloves!

In this article, we will provide a top-down overview of threats that language models can pose when integrated into business environments, and the best mitigation practices to prevent these risks.

Threat Landscape

The threats against GenAI-augmented systems can be categorized into five main types:

  1. Model jailbreaks, hijacking, or poisoning
  2. Sensitive data extraction from or about the model (system prompt, embeddings, training data)
  3. Data collection from the wider IT ecosystem, accessed via the language model
  4. Overloading servers or overbilling via repeated mass-prompting
  5. Corrupting or scamming personnel working with your systems

Before detailing each of these, here are some valuable resources to explore:

Now, let’s examine each threat type:

  1. Prompt Injections
    Maliciously crafted inputs that can lead a language model to perform actions outside of its intended role or disclose sensitive information. Various techniques exist, such as “character override” or “sidestepping” by going into roleplay with the model. In addition, “context exhaustion” or “accidental leakage” by exploiting the repetition penalty. Besides that, “prompt injection” or “token smuggling” by masking exploitative prompts inside hidden messages. These are all well-known and widely employed exploitation methods.
    These are the most exploited vulnerabilities currently. However, most users are not driven by malicious intent at all, but merely having fun, while they make the models perform unintended actions.
  2. Insecure Output Handling
    This occurs when a person, program or plugin uses the output of a model without proper validation, potentially leading to security vulnerabilities.
    But to some extent, this will always be a problem with advanced neural networks until we actually understand how they work internally — see the famous panda = gibbon adversarial attack.
    It’s a massive catch 22. Sometimes, we use AI to solve problems that we don’t understand well enough to solve ourselves.
  3. Training Data Poisoning
    An attacker can manipulate the data used to train a model, causing it to produce inaccurate, biased or otherwise poisoned outputs.
    The most famous recent example: a movement of anti-AI artists tried “poisoning” their own artwork with tools like Nightshade. The intended result was that image generators training on their art would output garbage pictures. This approach wasn’t really successful, but including poisoned data in publicly available datasets is a growing concern for many developers.
    In classical data science, we used big training datasets lifted from publicly available sources (for example Kaggle) or provided by our clients. Data quality is paramount in selecting training data. These training datasets are now extremely vulnerable to data poisoning, because biases, false information or otherwise harmful code or content will be an integral part of our model.
    These have various solutions, from using more secure synthetic data to patching in solutions for known poisoning approaches — but the issue should be on your radar.
  4. Model Denial-of-Service
    Attackers can overwhelm your model with requests, making it unavailable to legitimate users and/or racking up thousands of dollars in usage costs. This can be achieved through direct attacks or by exploiting the AI system to generate excessive requests.
    DDoS attacks are pretty commonplace in the industry, but the cost aspect of LLM inference can really hurt, not just customer experience, but your wallet as well.
    The easiest defense? Limiting token numbers, inference times and questions per user. Just don’t let the system scale outputs autonomously and without limits!
  5. Supply Chain Vulnerabilities
    Weaknesses in the software supply chain can also be exploited by an attacker who compromises a development tool like Langchain to gain access to AI applications. If malicious actors infiltrate the development process, they can build secret backdoors for themselves, which they can use after the model is fully deployed.
    Recently, Linux almost released a new version with a malicious backdoor coded into it. The xz Utils backdoor was introduced into the open-source code over several weeks and was present in versions 5.6.0 and 5.6.1. It was caught last minute before going live, but the incident really brought the issue of supply chain vulnerability to the limelight.
    The security of platforms like Azure, or Langchain, or proprietary language models may be not your responsibility, but if your operations are built on top of these solutions, their vulnerabilities become yours.
  6. Sensitive Information Disclosure
    A model may output confidential information unintentionally, posing a significant security risk.
    Of course, we could train the model using sensitive data, but it risks the involuntary reveal of protected information, or even trading secrets. Spoiler alert: the best practices that solve this issue are anonymization and using synthetic data.
  7. Insecure Plugins
    Poorly designed plugins can be vulnerable to attacks, allowing unauthorized access or manipulation.
    But a more common and mundane issue leads us to the next item on the list: vulnerability amplified by granting more agency than we should, which is a potential concern for plugins as well. For example, if we let AutoGPT delete emails, a misunderstood prompt can lead to emptying our mailbox instead of summarizing messages.
  8. Excessive Agency
    Granting a model too much control over a system can lead to unintended consequences, such as the AI making critical decisions without human oversight.
    A crucial governance question: where in a process can AI have autonomy, and which steps need a human in the loop?
    For example, a client was testing our PoC model, and asked the LLM what access rights they have. The model was optimized for Text-to-SQL, and recognized only the word “access” as something that it could use from the prompt. It generated a working grant_access code, that, if ran, could cause some serious headache to their IT sec department. Good thing we caught this during testing, but the incident highlighted the potential of a worst-case scenario. This same model — given excessive agency — could have executed the code itself and compromised the whole infrastructure.
  9. Overreliance
    People and processes that rely too heavily on language model outputs can become a vulnerability themselves. If the model is compromised, or its outputs are invalid for any reason, overreliance can lead to cascading failures.
    If you want to schedule a doctor’s appointment, an AI agent would be much more efficient than a human assistant. How about making a diagnosis? There are many areas where AI could and should help decision making, but maybe a complete takeover isn’t feasible just yet.
  10. Model Theft
    Attackers may steal the model training data, parameters, or weights, scrape outputs, or make do with any other part of a system. Then, copying or delaying a competitor’s work, or abusing the array of vulnerabilities detailed above is quite trivial.
The grandma jailbreak, one of the classic roleplays.

Securing Systems

IT ecosystems have many entry points for potential attacks and multi-layered aspects to their vulnerabilities, which obviously requires a complex and comprehensive approach to security. Organizations must adopt best practices and implement robust measures at every stage of the AI lifecycle, from prototyping to ongoing operations.

Some key best practices for securing AI systems include:

Pre-deployment

  • Conducting risk assessments
  • Red teaming exercises to identify vulnerabilities
  • Evaluating data sources and suppliers to ensure data integrity and quality
  • Anonymize sensitive data during training or fully replace them with synthetic data
  • Internal security guidelines and governance principles for the workforce

In-operation

  • Limiting model actions on downstream systems
  • Anonymize sensitive data during operations
  • Ensuring robust input validation and output sanitization
  • Monitoring threats real-time and utilizing AI security tools for detection and alerting
  • Continuously educating team members on AI security best practices

If we strip these measures down to bare essentials, they can be summarized as:

  • Monitoring
  • Human in the loop
  • Anonymization
  • Limitations
  • Cybersecurity

By adopting best practices and implementing measures for each threat type, organizations can significantly reduce the risks associated with Generative AI integration into their business processes and existing IT infrastructure.

Where does the responsibility fall?

As regulations are lagging behind rapid technological progress once again, many issues and threats are unclearly defined by law. In many cases, the scientific community, commercial players and the users themselves are acting on unofficial conventions right now.

But our experience, in conjunction with the trends we see in lawmaking as well as juristical rulings around the world, the fallout of risks will mainly impact organizations who are integrating AI technologies, and the developers themselves.

Those working in this space should prioritize accountability, security and ethical consideration during development, even if the current breakneck pace of progress calls for throwing these practices out the window in favor of quick wins. Choosing the safe path will pay off in the long run.

Conclusion

Most techniques currently seem harmless because Generative AI tools have not yet been widely deployed. However, as the field of AI continues to advance, and organizations accelerate their adoption, this could rapidly shift.

Focusing on practicality is very important — test solutions thoroughly as well as continuously monitor them after deployment. Organizations must prioritize security, educate their workforce and lay down governance policies that can secure their data, clients and personnel from the evolving threat landscape.

Staying adaptable and proactive in the face of emerging threats is key to ensuring the safe and successful integration of AI.

--

--

Imola Horváth
Hiflylabs

Head of Advanced Analytics - Driving AI Innovation: Guiding Businesses to Their First AI Success and Seamless Integration into Operations