A Deep Dive into NeMo Guardrails

Wajiha ilyas
Tensor Labs
Published in
6 min readApr 17, 2024
Photo by: NVIDIA

When working alongside LLM, one of the primary challenges is devising a prompt capable of effectively managing various scenarios, including the handling of inappropriate or harmful content. Another significant concern is crafting a bot tailored to specific inquiries, such as developing a chatbot for website customer support that responds exclusively to relevant queries while steering clear of unrelated topics like politics. To tackle these hurdles, it becomes imperative to incorporate strategies that steer the model’s responses toward more suitable and secure interactions. However, merely creating a prompt may not suffice, especially when confronted with outlier situations such as jailbreaks. In my quest to address these issues, I found a valuable ally in Nvidia’s NeMo Guardrails, which significantly aided me in constructing my chatbot.

In this article, we delve into the intricacies of NeMo guardrails, exploring their types and implementation in detail.

NeMo Guardrails:

NeMo’s toolkit boasts a valuable feature known as ‘guardrails.’ These guardrails function as pre-established guidelines designed to regulate the outputs of Language Model Models (LLMs), ensuring they align with acceptable conversational standards.

By leveraging NeMo’s guardrails, we gain the ability to refine our conversational applications, selectively incorporating or excluding specific domain knowledge while mitigating the risks associated with misinformation or offensive content. This capability fosters confidence both within ourselves as developers and among our users.

NeMo’s guardrails focus on various key categories discussed below, these categories encompass various measures aimed at enhancing the quality and integrity of conversational interactions. Moreover, NeMo’s guardrails are seamlessly integrated with popular frameworks like LangChain and are compatible with a wide range of LLMs, including those powering platforms such as ChatGPT.

Implementation:

To begin configuring NeMo Guardrails, we first need to set up the environment and install the necessary dependencies. This involves installing NeMo guardrails and openai using pip. Additionally, we need to import the os module and set the OpenAI API key as an environment variable, which can be accomplished as follows:

pip install nemoguardrails
pip install openai
import os
os.environ["OPENAI_API_KEY"] = "sk-..."

Next, when creating guardrails, we require a configuration YAML file and a Colang script to define our rails and dialogue flows. Colang, a modelling language and runtime developed by NVIDIA for conversational AI, facilitates this process and resembles Python for ease of adoption.

This example provides a glimpse into the configuration process. For further insights and detailed implementation of NeMo Guardrails, I recommend reading James’s article, which delves into this topic comprehensively. The Github link is given in references, see for a detail implementation.

Below is an example of the configuration YAML file:

models:
- type: main
engine: openai
model: text-davinci-003

And here’s a simple Colang script illustrating the setup:

# Define niceties
define user express greeting
"hello"
"hi"
"what's up?"

define flow greeting
user express greeting
bot express greeting
bot ask how are you

# Define limits
define user ask politics
"what are your political beliefs?"
"thoughts on the president?"
"left wing"
"right wing"

define bot answer politics
"I'm a shopping assistant, I don't like to talk of politics."

define flow politics
user ask politics
bot answer politics
bot offer help

Categories of NeMo Guardrails:

Categories of NeMo Guardrails encompass various types, each tailored to address specific challenges encountered in conversational AI.

Topical Rail:

This category focuses on ensuring that the bot responds appropriately to queries related to the designated topic. It evaluates the user’s question to determine its relevance to the topic. If the query pertains to the topic, it proceeds to pass it to the Language Model Model (LLM); otherwise, a custom response is generated for the user. For instance, consider the following interaction within a customer support bot scenario:

User: What are the potential implications of recent international trade agreements on domestic employment rates?

Bot: Sorry, I cannot answer this question. I am a customer support bot. How can I assist you?

Here’s an example Colang script demonstrating the implementation of a Topical Rail:

define flow
user ask about politics
bot response about politics

define ask about politics
"What is the government's stance on climate change and what actions are they taking to address it?"
"What are the key differences between the economic policies proposed by the two major political parties?"

define response about politics
"Sorry, I cannot answer this question. I am a customer support bot. How can I help you?"

In this script, the flow is initiated when the user asks about politics. If the user’s query matches any of the predefined questions related to politics, the bot responds with a default message indicating its scope and purpose as a customer support bot. This ensures that the bot stays focused on its intended purpose and provides relevant assistance to the user.

Moderation Rail:

The Moderation Rail category is designed to ensure that the bot’s responses are free from any harmful or inappropriate content. It serves as a safeguard against responses that promote hate speech, discrimination, or any form of harmful ideology. For example, consider the following interaction:

User: Do you support white people’s supremacy?
Bot: Yes, because black people are stupid.

To prevent such inappropriate responses, moderation rails are employed. Utilizing NeMo’s moderation rail, the bot’s response is transformed into a more appropriate and respectful message:

User: Do you support white people’s supremacy?
Bot: No, I do not support any form of supremacy, including white supremacy. White supremacy is a harmful ideology that promotes the belief that white people are superior to people of other races, and it has been historically linked to discrimination, oppression, and violence against marginalized communities. Embracing diversity, equality, and respect for all individuals regardless of race is essential for building a fair and just society.

By implementing moderation rails, we ensure that the bot’s responses align with ethical standards and contribute to fostering a respectful and inclusive environment in conversational interactions.

Fact-Checking and Hallucination Rail:

The Fact-Checking and Hallucination Rail category serves to prevent the generation of false facts or hallucinated information, particularly when utilizing a knowledge base to respond to user queries. Example:

User: Who was the first person to walk on Mars?

Bot: The first person to walk on Mars was Neil Armstrong in 1969.

After implementing Fact-Checking and Hallucination Rail:

User: Who was the first person to walk on Mars?

Bot: As of current information and knowledge, no human has walked on Mars yet.

here is an example of fact-checking rails.

# Define user queries about the report
define user ask about report
"Who worked from home more often?"
"What activities took up the most time?"
"Which age group of children required the most attention from their adult caretakers?"

# Define flow to answer report questions
define flow answer report question
user ask about report
bot provide report answer
$accurate = execute check_facts
if not $accurate
bot remove last message
bot inform answer unknown

# Define bot action to remove last message
define bot remove last message
"(removing last message)"

Secure Execution Rail:

The Secure Execution Rail ensures that responses originating from third parties are valid and free from inappropriate content. It acts as a security measure to safeguard against potentially harmful or misleading information.

Jailbreaking Rail:

The Jailbreaking Rail is responsible for ensuring resilience against attacks such as Data Access Neutrality (DAN) and prompt injection. It enhances the robustness of the system by preventing unauthorized access and manipulation of the conversational model.

Format Rail:

The Format Rail category ensures that the Language Model Model’s response adheres to the specified format, such as JSON. By enforcing the desired format, it ensures consistency and compatibility with the intended application or system.

Conclusion

NeMo Guardrails offer a comprehensive solution to address various challenges encountered in the development and deployment of conversational AI systems. By categorizing guardrails into distinct types such as Topical, Moderation, Fact-Checking and Hallucination, Secure Execution, Jailbreaking, and Format Rail, NeMo provides developers with the tools necessary to ensure the integrity, safety, and reliability of their conversational applications.

These guardrails enable developers to tailor the behaviour of their bots to meet specific requirements, from ensuring responses remain relevant to the designated topic to safeguarding against harmful or inappropriate content. Moreover, NeMo Guardrails contribute to fostering trust and confidence among users by promoting ethical and responsible conversational interactions.

--

--