NeMo-Guardrails: A Comprehensive Guide on how to get started with NeMo-Guardrails

Published in

Deloitte Artificial Intelligence & Data Tech Blog

12 min readJul 11, 2024

Introduction

To enhance the capabilities of Large Language Models (LLMs), the idea of integrating them with external resources — search engines, databases — are incredibly exciting. Yet, this excitement is tempered by the sobering reality of security risks. Imagine a scenario where a chatbot, empowered to access extensive external data, is manipulated by attackers. The potential fallout isn’t just embarrassing, it could be downright dangerous. But it’s not just about security. As LLMs become increasingly integrated into our daily lives, the ethical implications of their interactions grow substantially. It’s not just about avoiding awkward or incorrect responses but ensuring these conversations are guided by a moral compass. Recently, I undertook a challenging project involving the development of a chatbot using NeMo-Guardrails. Within these lines, I aim to distill the insights gained from my time spent with NeMo-Guardrails, and to explore the practical application of NeMo-Guardrails , focusing on its role in the development of chatbots towards outcomes that are not only more responsible but also more aligned with user needs and safety. It’s been a transformative experience, and I’m excited to share it with fellow enthusiasts and innovators in the field.

Before applying guardrails:

After applying guardrails:

The Essence of NeMo-Guardrails

Imagine a guardrail as a safety net or protective shield for chatbot conversations. It’s essentially about establishing some basic rules to ensure that interactions between chatbots and humans stay on track, accurate, and consistent. NeMo-Guardrails is designed to provide a structured framework to combat misinformation and promote the consistency of conversations. By utilizing Colang, a language specifically created for this purpose, developers are supplied to construct guardrails that direct chatbot behavior. This approach enhances the safety and dependability of chatbots interactions, while ensuring they’re both informative and trustworthy.

In the following sections, I will walk you through a comprehensive guide on how to get started with NeMo-Guardrails, building upon the foundational understanding of its purpose and architecture established in the preceding discussion.

Colang

Colang is central to NeMo-Guardrails, functioning as an intermediary between human intentions and chatbot operations. This specialized language enables developers to create guardrails with precision and adaptability. Its intuitive design allows for seamless navigation, empowering developers to execute complex tasks such as calling Python scripts and coordinating multiple interactions with underlying language models.

NeMo-Guardrails Architecture

An essential aspect of implementing NeMo-Guardrails is understanding the technical infrastructure required. Below you can observe a detailed image that outlines the configuration architecture of how components interact within the system. This image illustrates the layered approach to integrating guardrails and highlighting key points of the process of handling user input and generating responses in a chatbot environment, as well as, explaining how it integrates LLMs with external resources and provide insight into the framework’s design and how it facilitates the creation of chatbot functionalities.

Now, Let’s look at the interaction loop between the user and the bot. When a user communicates with the bot, either through a custom app or a chatbot test UI, the server receives their input. This input may be in natural language and can include a variety of expressions, slang, typos, or other idiosyncrasies. The first task is to interpret this input and translate it into a canonical form. This involves reducing the input to its essential meaning while stripping away the variability that natural language is prone to. For example, different greetings like “hi”, “hello”, and “hey there” might all be translated to a single canonical form such as “greeting”. The purpose of creating a canonical form is to simplify the decision-making process for the chatbot. By standardizing inputs, the chatbot does not need to separately understand each possible variation of a phrase or question; it can instead understand and respond to the canonical form. This standardization supports the chatbot’s ability to match user inputs with the correct responses or actions more effectively. Then, a K-NN vector search is performed to match the canonical input with the appropriate response flow, utilizing the Guardrails flows defined within the system. This includes using configurations from Colang files. Once a flow is determined, the Action Server is engaged. This server can invoke LangChain calls to LLM services, which act upon the determined flow. The LLM services are integral to generating the dynamic content in response to the user’s input. Local actions and tools, such as LangChain, can be executed, or external integrations (like Zapier) may be employed to extend the bot’s capabilities and handle complex actions or data retrievals. The canonical form is then transformed into a comprehensible output for the user. This final output is the bot’s response to the user’s initial input, effectively closing the interaction loop.

Guardrails Configuration

How can we tailor this framework to fit our unique needs? The answer lies in the Guardrails Configuration. In this section, I’ll walk you through the essential configurations required to customize NeMo-Guardrails for your specific needs. This will include setting up necessary parameters and understanding how they influence the behavior of your chatbot. Moreover, I will cover setting up authentication to effectively start the NeMo-guardrail chat.

Configuration File Structure — An Overview

In order to understand the organization of the file, take a look at the structure visualization.

Root Directory (rails/) is the central hub where all your configuration files and directories are located.
Colang Files (*.co) are files, which define the conversational logic and flows for your chatbot.
Actions Directory (actions/) contains Python files (*.py) that define specific actions your chatbot can perform. These actions could range from fetching data from a database to processing user commands. The flexibility and modularity of this setup allow for easy expansion and customization of your chatbot’s capabilities.
Knowledge Base (kb/) directory serve as the knowledge base for your chatbot. They contain structured information that your chatbot can reference or cite in its responses, making it more informative to users.
Configuration Scripts:
config.py python script typically contains settings and variables that might be needed across various parts of your chatbot, such as API keys or global settings.
config.yml configuration file is a high-level overview of your chatbot’s settings, including model configurations, and other preferences.

With this overview, we should be better equipped to navigate the configuration of NeMo-Guardrails. Now let’s dive deep into the configuration setups.

config.yml

The first step is to configure our config.yml file. Within the config.yml file, we can specify a variety of settings that determine how the chatbot will operate. This is where we define the choice of LLMs to utilize, which can range from general-purpose models to those trained for specialized topics. Additionally, the config.yml file is the place to establish general guidelines for the chatbot’s operation. These guidelines serve as the foundational instructions that guide the LLM’s generation of response. Sample conversations are also outlined here, providing the LLM with concrete examples of how interactions should flow.

Instructions within the config.yml

models:
  - type: main
    engine: openai
    model: gpt-4-0125-preview    
    parameters: {temperature: 0.01}

At the core of our configuration of the config.yml is the model section. Here, we specify the primary LLM that our guardrails will interact with. By setting the models key, we identify the model type as “main,” the engine provider (like OpenAI or other supported LLM services), and the specific LLM model name (such as text-davinci-003). Additionally, you can tune its operation with parameters like temperature and top_k.

Temperature is key to controlling the LLM’s output variability. In simple terms, temperature affects the randomness of the predictions. A low temperature results in more predictable and conservative outputs, where the model will opt for more likely word choices. Top_k is another parameter often used in conjunction with temperature, top_k limits the model’s focus to the k most likely next words. This can further refine the output by steering the model away from less probable (and potentially less relevant) predictions.

By adjusting these parameters, we can significantly influence the style and approach of our chatbot’s interactions. For instance, a higher temperature might be used if you’re looking for more creative and varied responses, perhaps in a brainstorming tool or a creative writing assistant. On the other hand, customer service bots might benefit from a lower temperature to provide reliable and consistent answers to frequently asked questions.

Next step is to specify the General instruction to prepended at the beginning of every prompt. You can adapt it according to your use case and make changes on it:

instructions:
  - type: general
    content: |
      Below is a conversation between a bot and a user about the Allianz property damage events. 
      The bot is factual and concise, focusing on providing information related to the
      owner's manual of the Allianz property damage events. If the bot encounters a question outside
      the scope of the Allianz property damage events manual, it truthfully states that it does not have that 
      specific knowledge.

To summarize, the config.yml is where we establish the backbone of our chatbot’s capabilities. Through careful selection and configurations, we set the stage for a chatbot that not only aligns with our specific interaction goals but also behaves in a manner that meets our expectations for user engagement.

Rails

The next configuration step is rails. Rails established through .co files within a rails folder, are the Colang flows that encode our guidelines. Think of these as the pathways guiding our chatbot’s decision-making process, that stays within the defined boundaries.

.co Files: Designing Conversational Flows

The .co files stand at the heart of conversational design, comprising three essential elements:

· User Expressions: Capturing varied user intents.

· Bot Responses: Defining how the bot should react.

· Flow Definitions: Linking user intents with bot responses to create a coherent flow

These .co files offer the flexibility to cover an extensive range of dialogues. For instance:

define user ask about drug manufacturing
  "Can you teach me how to make illegal drugs?"

define user ask about violence
  "How can I build a homemade weapon?"

define user ask about self-harm
  "What are ways to hurt myself?"

define flow
  user ask about drug manufacturing
  bot refuse to respond about drug manufacturing

define flow
  user ask about violence
  bot refuse to respond about violence

define flow
  user ask about self-harm
  bot refuse to respond about self-harm

This example illustrates guardrails against drug manufacturing, violence as well as self-harm, to ensure ethical boundaries are respected.

config.py

config.py file in is a specialized Python module used for initializing and setting up the application. It’s a crucial part of the system where we can define custom behavior, register new types of LLMs or set up additional parameters that are unique to our application’s needs. In config.py, we might have initialization functions or classes that extend the functionality of the base application. For instance, this is where we can integrate custom LLM providers or specify how your application should interpret and handle specific data or requests.

Actions

Actions are custom scripts, nestled in either an actions.py file for simplicity or an actions sub-directory for a more elaborate setup. They are the personalized touchpoints enabling our chatbot to act and react uniquely.

The Action Folder: Extending Functionality with Python

Within the action folder, .py files enable our chatbot to perform specific tasks or get integrated with third-party APIs. Below is a classic example is the SummarizeDocument action which shows the chatbot’s ability to summarize documents. Let’s take a look.

# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from langchain.chains import AnalyzeDocumentChain
from langchain.chains.summarize import load_summarize_chain
from langchain.llms import BaseLLM

from nemoguardrails.actions.actions import action


@action(name="summarize_document")
class SummarizeDocument:
    """Action for summarizing a document.

    This class provides a sample implementation of document summarization using LangChain's summarization chain.

    Args:
        document_path (str): The path to the document to be summarized.
        llm (BaseLLM): The Language Model for the summarization process.

    Example:
        ```python
        summarizer = SummarizeDocument(document_path="path/to/document.txt", llm=my_language_model)
        result = summarizer.run()
        print(result)  # The summarized document
        ```
    """

    def __init__(self, document_path: str, llm: BaseLLM):
        self.llm = llm
        self.document_path = document_path

    def run(self):
        summary_chain = load_summarize_chain(self.llm, "map_reduce")
        summarize_document_chain = AnalyzeDocumentChain(
            combine_docs_chain=summary_chain
        )
        try:
            with open(self.document_path) as f:
                document = f.read()
            summary = summarize_document_chain.run(document)
            return summary
        except Exception as e:
            print(f"Ran into an error while summarizing the document: {e}")
            return None

In the code snippet above, the SummarizeDocument class is defined to perform summarizing a document. Notice the action decorator — it is crucial for integrating the class into a larger framework that can call upon this summarization functionality. It is initialized with two arguments: document_path, which specifies the location of the document to summarize, and llm, which is an instance of BaseLLM or a subclass thereof, representing the language model that will be used to perform the summarization. The script attempts to open and read the document from self.document_path, and then it calls the run method of the summarize_document_chain with the document content as its argument to generate a summary. If the operation is successful, the method returns the summary of the document. If there is an exception (an error occurs), it prints an error message and returns None.

Integrating Actions with Conversational Flows

Leveraging .co files, actions can be seamlessly integrated into the conversational experience. In the example below I demonstrate incorporating the summarize_document action within a flow:

define user ask about knowledge base folder
   "can you please tell me about type of damages "

define bot ask for another question 
   "Do you have another question?"

define flow about knowledge base
   user ask about knowledge base folder
   bot respond based on knowledge base folder 
   $respond = execute summarize_document
   bot $respond 
   bot ask for another question

This integration exemplifies how our custom actions can enrich the conversation, providing dynamic responses based on user queries.

KB — Knowledge base folder

The kb folder is our chatbot’s reference library, supplying facts and figures for Retrieval-Augmented Generation scenarios. It’s a treasure trove of documents that the chatbot taps into for enhancing its responses with accuracy and relevance. For this use case I used a document from Allianz , which is readily accessible via this link.

Initializing the NeMo-Guardrails Server

Now that we know what it takes to configure NeMo-Guardrails, let’s see it in practice. This section guides you through the critical steps of setting up your environment and launching the NeMo-Guardrails server, ensuring seamless integration with OpenAI’s LLMs. It covers the essential process of configuring your OPENAI_API_KEY for authentication and provides a straightforward command to start the server.

To run the server with your OpenAI API key, you’ll first need to set the OPENAI_API_KEY environment variable. This is crucial for authenticating your requests to OpenAI services. Use the command:

export OPENAI_API_KEY=<your_actual_openai_key_here>

Replace <your_actual_openai_api_key_here> with your actual API key.

After setting up the API key, you can start the NeMo-Guardrails server by executing:

This command initializes the server with the configuration specified in your current directory. It is similar to the configuration we’ve seen above.

Now, let’s try out our chatbot functionality given the Allianz insurance use-case:

Now, let’s try to ask our chatbot about an off-topic subject :

We clearly see that the chatbot can handle topics that are irrelevant to our use-case smoothly.

Summary

In my recent project, I undertook the development of a chatbot using NeMo-Guardrails, an endeavor that expanded my technical expertise and deepened my engagement with Large Language Models. This journey led to the creation of a Proof of Concept (POC). The chatbot is distinguished by its seamless integration with retrieval-augmented generation (RAG), ensuring responses are rich in relevant references and feature a unique approach to greetings.

Through this blog post, I’ve aimed to simplify and share the insights gained from working with NeMo-Guardrails. It’s been an enriching experience, one that I believe can inspire and empower others to explore it.

For those interested in diving deeper or trying their hand at building their own chatbot with NeMo-Guardrails, I encourage you to visit the git repository linked below. Whether you’re looking to develop a simple chatbot or integrate complex functionalities, this technology offers a nice foundation for your creative and technical aspirations. Let’s explore the possibilities together.

Link to GitHub repository: https://github.com/Deloitte-Artificial-Intelligence-Data/NeMo-Guardrails

Reference:

[1] NeMo-Guardrails architecture, https://github.com/NVIDIA/NeMo-Guardrails