Are you tired of OpenAI’s token limit?
Are you sick of handing over your personal data to AI companies who do god knows what with it?
Do you want TOTAL control over your AI backend?
Then this guide is for you.
warning: this does cost a pretty penny so if you would like a free demo on how to do this, book a call with us at www.woyera.com
What will this guide cover?
- Invoking an already hosted BLOOM LLM on AWS Sagemaker through AWS Lambda
- Connecting the AWS Lambda function to AWS API gateway to use the LLM as an API
This guide will not cover how to host the BLOOM LLM on AWS Sagemaker. If you want to read how to do that, check out our post here. Hosting is the prerequisite for using the LLM through an API like OpenAI
Step 0: Log-in or Sign up for an AWS account
- Go to https://aws.amazon.com/ and log-in or sign up for an account
- If you sign up for a new account, you will automatically be given Free Tier access, which does provide general AWS Lambda and AWS API Gateway credits. But as noted in our hosting guide, it does not provide enough for hosting the model on Sagemaker
- Host the model on AWS Sagemaker using the steps in our how to host BLOOM on AWS guide
- Note down your hosted model’s Endpoint Name which you would have seen in the how to host BLOOM on AWS guide’s last step. Also reproduced below
Step 1a: Go to AWS Lambda to create a Lambda Function
A lambda function will be used to call your LLM model’s endpoint
- Search for the Lambda service in the AWS console search bar and click on the Lambda service
2. Click on Create function
3. Enter a proper function name (doesn’t matter what), choose Python 3.10 as the runtime and the x86_64 architecture. Then click on Create Function
4. Enter the LLM model’s endpoint name from Step 0 as an environment variable
i. Click on the Configuration tab in your newly created model
ii. Click on Environment variables and click on Edit
iii. Click on Add environment variable on the next screen
iv. Enter ENDPOINT_NAME as the key and your model’s endpoint name as the value. Click Save
You can actually add anything for the key you wish but it will need to match up with what we write in our code to call the function later
5. Go back to the Code tab and copy and paste the following code there
import os
import io
import boto3
import json
# grab environment variables
ENDPOINT_NAME = os.environ['ENDPOINT_NAME']
runtime= boto3.client('runtime.sagemaker')
def lambda_handler(event, context):
response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
ContentType='application/json',
Body=event['body'])
result = json.loads(response['Body'].read().decode())
return {
"statusCode": 200,
"body": json.dumps(result)
}
6. Click Deploy after the code is successfully inserted
Step 2: Connect your new Lambda function to AWS API Gateway
- Go to your Lambda function’s home screen and click Add Trigger
2. Select the API Gateway menu item in the Add trigger dialog
3. Fill out the API Gateway dialog as follows and click on Add
4. After the API endpoint has been successfully created, you can view the API URL under the Configuration tab and Triggers sidebar
Step 3: Test your brand spanking new LLM API
- Make a POST or GET request to your API URL with the following JSON body. The text_inputs key stores the prompt you want to use
{
"text_inputs": "My ice cream shop's marketing tag line should be",
"min_length": 5,
"max_new_tokens": 100,
"num_beams": 5,
"no_repeat_ngram_size": 2,
"temperature": 0.8
}
2. Check the response status code and the response JSON. The status code should be 200 and the response JSON will be like the following
{
"generated_texts": ["something along the lines of 'Ice cream is the best thing in the world.' "]
}
You can use the following Python code to test the API. Replace the value for the api_url with the API Url that you created in the last step
import requests
api_url = 'https://spip03jtgd.execute-api.us-east-1.amazonaws.com/default/call-bloom-llm'
json_body = {'text_inputs': "My ice cream shop's marketing tag line should be", 'min_length': 5, 'max_new_tokens': 100, 'num_beams': 5, 'no_repeat_ngram_size': 2, 'temperature': 0.8}
r = requests.post(api_url, json=json_body)
if r.status_code == 200:
result = r.json()['generated_texts']
print(result)
else:
## Error occurred
pass
Potential Errors
You might receive a few errors in this scenario:
- Permissions: if your role does not have permissions to use the Sagemaker invoke endpoint policy, then you will not be able to call the endpoint. AWS permissions can get confusing so schedule a call with us if you would like us to walk you through them
- Timeout: depending on your prompt and variables, you may receive a timed out error. Unlike permissions, this is an easy fix. Click on Configuration, General, and Edit Timeout and set the timeout value to more seconds
Conclusion
This post concludes our series on how to host and use your own LLM as an API.
There are many reasons you should consider using your own hosted open source source LLM as an API such as:
- Security
- Reliability
- Consistency
If you would like to learn more about using your own hosted LLM as an API, schedule a call with us on our website