How to use a Large Language Model (LLM) with an API on AWS

5 min readJun 21, 2023

Are you tired of OpenAI’s token limit?

Are you sick of handing over your personal data to AI companies who do god knows what with it?

Do you want TOTAL control over your AI backend?

Then this guide is for you.

warning: this does cost a pretty penny so if you would like a free demo on how to do this, book a call with us at www.woyera.com

What will this guide cover?

Invoking an already hosted BLOOM LLM on AWS Sagemaker through AWS Lambda
Connecting the AWS Lambda function to AWS API gateway to use the LLM as an API

This guide will not cover how to host the BLOOM LLM on AWS Sagemaker. If you want to read how to do that, check out our post here. Hosting is the prerequisite for using the LLM through an API like OpenAI

Step 0: Log-in or Sign up for an AWS account

Go to https://aws.amazon.com/ and log-in or sign up for an account
If you sign up for a new account, you will automatically be given Free Tier access, which does provide general AWS Lambda and AWS API Gateway credits. But as noted in our hosting guide, it does not provide enough for hosting the model on Sagemaker
Host the model on AWS Sagemaker using the steps in our how to host BLOOM on AWS guide
Note down your hosted model’s Endpoint Name which you would have seen in the how to host BLOOM on AWS guide’s last step. Also reproduced below

Step 1a: Go to AWS Lambda to create a Lambda Function

A lambda function will be used to call your LLM model’s endpoint

Search for the Lambda service in the AWS console search bar and click on the Lambda service

2. Click on Create function

3. Enter a proper function name (doesn’t matter what), choose Python 3.10 as the runtime and the x86_64 architecture. Then click on Create Function

4. Enter the LLM model’s endpoint name from Step 0 as an environment variable

i. Click on the Configuration tab in your newly created model

ii. Click on Environment variables and click on Edit

iii. Click on Add environment variable on the next screen

iv. Enter ENDPOINT_NAME as the key and your model’s endpoint name as the value. Click Save

You can actually add anything for the key you wish but it will need to match up with what we write in our code to call the function later

5. Go back to the Code tab and copy and paste the following code there

import os
import io
import boto3
import json

# grab environment variables
ENDPOINT_NAME = os.environ['ENDPOINT_NAME']
runtime= boto3.client('runtime.sagemaker')

def lambda_handler(event, context):
    response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
                                       ContentType='application/json',
                                       Body=event['body'])
    
    result = json.loads(response['Body'].read().decode())
    
    return {
        "statusCode": 200,
        "body": json.dumps(result)
    }

6. Click Deploy after the code is successfully inserted

Step 2: Connect your new Lambda function to AWS API Gateway

Go to your Lambda function’s home screen and click Add Trigger

2. Select the API Gateway menu item in the Add trigger dialog

3. Fill out the API Gateway dialog as follows and click on Add

4. After the API endpoint has been successfully created, you can view the API URL under the Configuration tab and Triggers sidebar

Step 3: Test your brand spanking new LLM API

Make a POST or GET request to your API URL with the following JSON body. The text_inputs key stores the prompt you want to use

{    
    "text_inputs": "My ice cream shop's marketing tag line should be",
    "min_length": 5,
    "max_new_tokens": 100,
    "num_beams": 5,
    "no_repeat_ngram_size": 2,
    "temperature": 0.8
}

2. Check the response status code and the response JSON. The status code should be 200 and the response JSON will be like the following

{
  "generated_texts": ["something along the lines of 'Ice cream is the best thing in the world.' "]
}

You can use the following Python code to test the API. Replace the value for the api_url with the API Url that you created in the last step

import requests

api_url = 'https://spip03jtgd.execute-api.us-east-1.amazonaws.com/default/call-bloom-llm'
json_body = {'text_inputs': "My ice cream shop's marketing tag line should be", 'min_length': 5, 'max_new_tokens': 100, 'num_beams': 5, 'no_repeat_ngram_size': 2, 'temperature': 0.8}

r = requests.post(api_url, json=json_body)

if r.status_code == 200:
  result = r.json()['generated_texts']
  print(result)
else:
  ## Error occurred
  pass

Potential Errors

You might receive a few errors in this scenario:

Permissions: if your role does not have permissions to use the Sagemaker invoke endpoint policy, then you will not be able to call the endpoint. AWS permissions can get confusing so schedule a call with us if you would like us to walk you through them
Timeout: depending on your prompt and variables, you may receive a timed out error. Unlike permissions, this is an easy fix. Click on Configuration, General, and Edit Timeout and set the timeout value to more seconds

Conclusion

This post concludes our series on how to host and use your own LLM as an API.

There are many reasons you should consider using your own hosted open source source LLM as an API such as:

Security
Reliability
Consistency

If you would like to learn more about using your own hosted LLM as an API, schedule a call with us on our website