Empowering AI Applications: Implementing Model Gateways with Choreo and Ballerina for Enhanced Control and Security

Published in

Choreo Tech Blog

11 min readDec 21, 2023

Introduction

Along with the emergence of Generative AI, the landscape of AI application development has undergone a revolutionary transformation. At present, developers are leveraging large models hosted externally and accessible as services, to create cutting-edge applications. These models showcase advanced capabilities, and with ongoing enhancements in their functionality improvement techniques, it has become a standard practice for developers to incorporate them into their AI use cases. The availability of a diverse range of large generative models, easily accessible through APIs, has significantly improved the adaptability of AI in application development.

Significant challenges arise when integrating hosted third-party models, particularly concerning the effective management of their usage and the assurance of data security. A crucial aspect in addressing these concerns lies in robust API management, which involves implementing suitable policies and security controls. Such capabilities become essential when incorporating models from external providers. Moreover, a substantial volume of data flows through to the generative models from the application clients, potentially encompassing proprietary and sensitive information. Ensuring the proper sanitization of this data is vital to prevent unauthorized access by third parties, including model providers. To meet these requirements, the concept of model gateways has been introduced, adding an extra layer of functionality and security around the model API.

Model gateways play a crucial role in facilitating controlled access to AI models, incorporating features like rate limiting to regulate request flow and ensuring the security of data interactions. Beyond mere connectivity, these gateways manage credentials and authenticate access, contributing to a more secure operational environment. This centralized approach becomes especially significant in upholding data privacy and compliance standards, reinforcing the reliability of AI systems, particularly when handling sensitive information.

This article delves into the process of creating and deploying a model gateway service using Choreo. In this particular use case, we’ll be developing a Ballerina service that wraps around the Azure OpenAI completions, chat completions, and embeddings endpoints with the objective of identifying several common types of Personally Identifiable Information (PII) within input data. We will deploy and expose this gateway as a service in Choreo. The architecture of this use case is illustrated in the diagram below.

Prerequisites

For this use case we will be implementing the model gateway for Azure OpenAI models. To get started with building the model gateway service, we first need to create the Azure OpenAI deployments for the following models,

Completions model (e.g., text-davinci-003)
Chat completions model (e.g., gpt-35-turbo)
Embedding model (e.g., text-embedding-ada-002)

When the model deployments are set up we must obtain the following information regarding the Azure OpenAI service,

API Key
Service URL (https://RESOURCE_NAME.openai.azure.com)
API version

Implement the Model Gateway Service

1. Initialize the Service

Firstly, let’s initialize the Ballerina service. Let’s create a new Ballerina project for a service using the following command,

bal new -t service <SERVICE_NAME>

Let’s name the service as gateway-service.

The command will create a Ballerina project in the current directory with a sample code for a service (service.bal). Let’s modify this code to implement the model gateway logic.

2. Implement PII Detection

In this article, our primary focus is on preemptively identifying Personally Identifiable Information (PII) within input text data before it reaches the model. To achieve this objective, let’s implement a function that utilizes regular expressions to detect the following types of PII:

Email
Phone number
Social security number
ipv4 address
ipv6 address
Credit card number

import ballerina/lang.regexp;

function containsPii(string text) returns boolean {
    // Define regular expressions for common PII types
    string:RegExp emailPattern = re `[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}`;
    string:RegExp phoneNumberPattern = re `\d{3}[-\s]?\d{3}[-\s]?\d{4}|\d{10}`;
    string:RegExp ssnPattern = re `\d{3}[-.\s]?\d{2}[-.\s]?\d{4}`;
    string:RegExp ipv4Pattern = re `\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}`;
    string:RegExp ipv6Pattern = re `([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}`;
    string:RegExp creditCardPattern = re `(\d{4}[-.\s]?){3}\d{4}|\d{16}`;

    if re `${emailPattern}|${phoneNumberPattern}|${ssnPattern}|${ipv4Pattern}|${ipv6Pattern}|${creditCardPattern}`.find(text) is regexp:Span {
        return true;
    }

    return false;
}

Alternatively, the function can be modified to detect various other types of sensitive data.

3. Implement the Service Resources

With the PII detection logic in place for text inputs, we can proceed with implementing the service resource functions to incorporate this functionality. We will prepare three distinct resource functions — one each for connecting to the completions, chat completions, and embeddings models.

To facilitate the communication with Azure OpenAI, we’ll leverage the respective Ballerina connectors. Before diving into the resource functions, let’s first import the connectors in the service.bal file.

import ballerinax/azure.openai.chat;
import ballerinax/azure.openai.embeddings;
import ballerinax/azure.openai.text;

Next, we will initialize the clients for the three connectors using their configurations and define these configurations as configurable variables in Ballerina.

configurable string azureOpenAIToken = ?;
configurable string azureOpenAIServiceUrl = ?;
configurable string azureOpenAIApiVersion = ?;

final text:Client textClient = check new (
  config = {auth: {apiKey: azureOpenAIToken}},
  serviceUrl = azureOpenAIServiceUrl
);

final chat:Client chatClient = check new (
  config = {auth: {apiKey: azureOpenAIToken}},
  serviceUrl = azureOpenAIServiceUrl
);

final embeddings:Client embeddingsClient = check new (
  config = {auth: {apiKey: azureOpenAIToken}},
  serviceUrl = azureOpenAIServiceUrl
);

Let’s now proceed to the implementation of resource functions for the service, integrating the PII detection feature. Let’s use the previously implemented containsPii function to check for PII in the inputs. If detected, the service will respond to the client with a BAD REQUEST status code of 400. Or else, the data will be forwarded to the respective model.

These resource functions will be structured to adhere to the same URL pattern and payloads as those used when directly accessing the Azure OpenAI API.

Resource function for the completions endpoint:

resource function post [string deploymentId]/completions(@http:Payload text:Deploymentid_completions_body request) returns text:Inline_response_200|http:BadRequest|error {

    string|string[]? prompt = request?.prompt;

    // Check if the prompt(s) contain(s) PII
    if (prompt is string && containsPii(prompt)) || (prompt is string[] && containsPii(" ".'join(...prompt))) {
        log:printWarn(PII_DETECTION_ERROR_MESSAGE);
        // Return a 400 Bad Request response if PII is detected
        return {
            body: PII_DETECTION_ERROR_MESSAGE
        };
    }

    // Forward the request to the Azure OpenAI completions endpoint
    text:Inline_response_200 response = check textClient->/deployments/[deploymentId]/completions.post(
        azureOpenAIApiVersion,
        request
    );

    return response;
}

Resource function for the chat completions endpoint:

resource function post [string deploymentId]/chat/completions(@http:Payload chat:CreateChatCompletionRequest request) returns chat:CreateChatCompletionResponse|http:BadRequest|error {

    // Check if the chat messages contains PII
    foreach chat:ChatCompletionRequestMessage message in request.messages {
        string? messageContent = message.content;
        if messageContent is string && containsPii(messageContent) {
            log:printWarn(PII_DETECTION_ERROR_MESSAGE);
            // Return a 400 Bad Request response if PII is detected
            return {
                body: PII_DETECTION_ERROR_MESSAGE
            };
        }
    }

    // Forward the request to the Azure OpenAI chat completions endpoint
    chat:CreateChatCompletionResponse response = check chatClient->/deployments/[deploymentId]/chat/completions.post(
        azureOpenAIApiVersion,
        request
    );

    return response;
}

Resource function for the embeddings endpoint:

resource function post [string deploymentId]/embeddings(@http:Payload embeddings:Deploymentid_embeddings_body request) returns embeddings:Inline_response_200|http:BadRequest|error {

    string|string[]? input = request?.input;

    // Check if the input(s) contains PII
    if (input is string && containsPii(input)) || (input is string[] && containsPii(" ".'join(...input))) {
        log:printWarn(PII_DETECTION_ERROR_MESSAGE);
        // Return a 400 Bad Request response if PII is detected
        return {
            body: PII_DETECTION_ERROR_MESSAGE
        };
    }

    // Forward the request to the Azure OpenAI embeddings endpoint
    embeddings:Inline_response_200 response = check embeddingsClient->/deployments/[deploymentId]/embeddings.post(
        azureOpenAIApiVersion,
        request
    );

    return response;
}

We have now completed the implementation of the Ballerina service which will act as a model gateway for Azure OpenAI models. For the complete implementation, refer to the model gateway sample.

Deploy the Model Gateway Service in Choreo

With the model gateway implementation wrapped up, the next step is deploying it as a service through Choreo. You have the flexibility to either utilize your own code stored in a GitHub repository or use the provided example by forking the Choreo samples repository.

To get started, login Choreo and create a new project. Based on your requirements, select either a Mono or Multi repository project. Subsequently, create a new component of type Service, providing the GitHub repository as the source.

Since our service is written in Ballerina, select the Ballerina buildpack with the repository path of the project to create the component.

Now that the component is created, let’s proceed to deploy it. Click the “Configure and Deploy” button on the deployment page which will open a side panel prompting to set the configurable variables that we defined in the service. Add the configurations from the Azure OpenAI service by obtaining them from the Azure portal.

Click “Next”, to update the endpoint details. Update the network visibility of the endpoint to Public, enabling access via the internet.

After configuring the endpoint details, click “Deploy,” and Choreo will seamlessly deploy the service in the development environment. It’s as straightforward as that — your model gateway is now deployed and ready to be tried out!

Try It in Action

We’ve successfully deployed the model gateway service, wrapping the Azure OpenAI models for PII detection in Choreo. Now, we can test the service by sending requests and exploring its functionality. Choreo streamlines this testing process with a range of options. In this article, we’ll delve into how to try out the service using the OpenAPI console.

To access the OpenAPI test console, navigate to the “Console” in the left-hand navigation menu.

Let’s test the chat/completions endpoint using sample data that includes potential Personally Identifiable Information (PII). Below is a sample payload we’ll use to experiment with the gateway.

{
  “messages”:[
    {
      “role”: “system”,
      “content”: “You are a helpful assistant to help with customer complaints.”
    },
    {
      “role”: “user”,
      “content”: “Hi, my contact number is 012 984 7361”
    }
  ]
}

The deploymentId path parameter of the resources corresponds to the deployment name of the Azure OpenAI model, and you can retrieve it from the Azure portal.

As observed in the below figure, the service responds with an error message indicating PII detection in the request payload with a 400 status code, as implemented in the service.

Consuming the model gateway

Now that we have deployed the service in the development environment and tested its functionality, we can make it available for consumers to use as a gateway for Azure OpenAI. To achieve it let’s first promote the deployment to the production environment by clicking “Promote” in the component deployment page. For our use case, we will use the same configurations as the deployment environment for production.

Let’s use the same endpoint configurations and promote the deployment to production.

Choreo will then deploy the service in the production environment with the selected configurations.

Once the deployment is promoted to production, we have to make it discoverable as an API so that developers can consume it and integrate it with AI applications. To achieve this, we will head over to the “Lifecycle” page under the “Manage” section and “Publish” the API.

We can also set usage plans for the API which can be selected by the consumers. For our use case we will only set the “Unlimited” plan which is also the default.

Once the API is published, it will be discoverable in the Choreo Developer Portal for consumption.

To consume the API, let’s first create an application in the developer portal. An application in the developer portal is a logical representation of an actual application and we can subscribe to APIs which we will be using within the physical application and access them through the logical application created via the developer portal.

To create an application, navigate to “Applications” in the top menu of the Choreo Developer Portal and select “Create”. Provide a name for the application and click create.

Once the application is created, we can subscribe to our model gateway API by navigating to the “Subscriptions” page from the left menu, and clicking “Add APIs”.

All the APIs to which the application has subscribed to will be listed in the “Subscriptions” page.

The final step is generating the credentials to consume the API. Let’s generate the credentials for the API in the production environment in the “Production” page under “Credentials”.

When we click the “Generate Credentials” button, the Consumer Key and the Consumer Secret will be generated for the application. The same key and secret can be used to access all the subscribed APIs of the application.

We can use the following CURL command to generate the access token using the client credentials via the token endpoint by replacing consumer-key and consumer-secret with the generated values.

curl -k -X POST https://sts.choreo.dev/oauth2/token -d “grant_type=client_credentials” -H “Authorization: Basic Base64(consumer-key:consumer-secret)”

Conclusion

In this article, we explored the process of implementing a model gateway using Ballerina to identify Personally Identifiable Information (PII) in input data for Azure OpenAI models. We also delved into how we can easily deploy this gateway as a service in Choreo. Throughout, we examined how Choreo provides a seamless mechanism for exposing the gateway as a public service. Moreover, we looked at how to securely consume the gateway service once it is deployed in Choreo. Furthermore, we can leverage the APIM capabilities of Choreo to further configure the API with appropriate controls.

In the age of AI applications, it’s crucial to meticulously identify and assess the implications of model usage, implementing robust controls to guarantee fair and secure usage. In this context, Choreo emerges as a valuable tool, simplifying the integration of this essential layer of control. This not only demonstrates the versatility of Choreo but also showcases its effectiveness in streamlining the implementation of AI applications.