AI Assistants via OpenAI and Hugging Face API
quick hands-on tutorial
Introduction
In this guide, we’ll explore the Assistant APIs from OpenAI. We will learn about the primary features of the Assistants API, including the Code Interpreter, Knowledge Retrieval, and Function Calling capabilities.
This hand-on will show how to equip Assistant’s with tools and function that would enable it to provide technical solutions by executing Python code, retrieve knowledge from the database and many more.
You’ll be also introduced to other advanced technologies from OpenAI, such as Whisper, Dalle-3, Speech to Text, and the GPT-4 vision API. These tools are essential for anyone looking to develop sophisticated AI assistants using a variety of APIs.
Then, you’ll learn how to use the free Hugging Face Inference API to get access to the thousands of models hosted on their platform.
By the end of this tutorial, you’ll gain a solid understanding on how to apply these technologies in your AI projects.
Open AI Assistant’s Built-in Functionalities
The OpenAI Assistants API includes three main functionalities: Code Interpreter, Retrieval, and Function Calling.
[Code Interpreter]
The Assistant can use Code Interpreter automatically when you upload a file with data. It’s a tool that transforms the LLM into a more accurate computational problem-solver that can handle tasks like solving complex math equations. It can also generate files with data and images of graphs from the same Python code. It’s a useful way to trust the output from the assistant and a great tool when analyzing data.
[Knowledge Retrieval]
Knowledge Retrieval is OpenAI’s own RAG system offered as part of the Assistants API. KR allows multiple uploads. Once the files are uploaded and passed to the Assistant, OpenAI will automatically chunk your documents, index them, store the embeddings, and implement vector search to retrieve relevant content to answer user queries.
[Function Calling]
Function calling allows you to describe functions or tools to the Assistant and have it return the functions that need to be called along with their arguments. It's a powerful way to add new capabilities to your Assistant.
How To Set Up an Assistant
You have two distinct pathways depending on your needs and expertise:
- Assistants Playground: Ideal for those looking to get a feel for the Assistant’s capabilities without going into complex integrations.
- Detailed Integration through the API: Best suited for those who require a more customized and in-depth setup.
STEP-BY-STEP ASSISTANT CREATION:
Creating an Assistant
:
Purpose: An Assistant object represents an entity/agent that can be configured to respond to users’ messages in different ways using several parameters.
Model Selection: you can specify any version of GPT-3.5 or GPT-4 models, including fine-tuned models. OpenAI recommends using its latest models with the Assistants API for best results and maximum compatibility with tools. Thus, choose between gpt-3.5-turbo-1106
or gpt-4-1106-preview
models.
Tools: The Assistant supports the Code Interpreter for technical queries that require Python code execution or Knowledge Retrieval to augment the Assistant with proprietary external information.
Setting up a Thread
:
Role: A Thread acts as the foundational unit of user interaction. It can be seen as a single conversation. Pass any user-specific context and files in this thread by creating Messages.
thread = client.beta.threads.create()
Customization: In Thread, ingest user-specific contexts or attach necessary files so each conversation is unique and personalized.
Threads don’t have a size limit. You can add as many messages as you want to a conversation/Thread. The Assistant will ensure that requests to the model fit within the maximum context window, using relevant optimization techniques used in ChatGPT, such as truncation.
Adding a Message
:
Definition: Messages are user inputs, and the Assistant’s answers are appended to a Thread. User inputs can be questions or commands.
Function: They serve as the primary mode of communication between the user and the Assistant.
message = client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="I need to solve the equation `3x + 11 = 14`. Please, help!"
)
Messages can include text, images, and other files. Messages are stored as a list on the Thread. Using GPT-4 with Vision is not supported here. You can upload images and have them processed via retrieval.
Activation: For the Assistant to respond to the user message, you must create a Run. The Assistant will then automatically decide what previous Messages to include in the context window for the model.
⚠️ NOTE: You can optionally pass additional instructions to the Assistant while creating the Run, but these will override the default instructions of the Assistant!
Process: The Assistant processes the entire Thread, employs its tools if required, and formulates an appropriate response.
During its run, the Assistant can call tools or create Messages. Examining Run Steps allows you to check how the Assistant is getting to its final results.
The assistant’s response to a Run:
messages = client.beta.threads.messages.list(thread_id=thread.id)
These responses are displayed to the user! During this Run, the Assistant added two new Messages to the Thread.
ASSISTANT’S CORE MECHANISM:
Creating an Assistant only requires specifying the model
.
You can further customize the behavior of the Assistant:
- Use the
instructions
parameter to guide the personality of the Assistant and define its goals. Instructions are similar to system messages in the Chat Completions API. - Use the
tools
parameter to give the Assistant access to up to 128 tools in parallel. You can give it access to OpenAI-hosted tools (Conde Interpreter, Knowledge Retrieval) or call third-party tools viafunction calling
. - Use the
file_ids
parameter to give the tools access to files. Files are uploaded using theFile
Upload endpoint.
Example demonstration:
Imagine you’re developing an AI assistant for a tech company. This assistant needs to provide detailed product support using a comprehensive knowledge base.
mkdir openai-assistants && cd openai-assistants
python3 -m venv openai-assistants-env
source openai-assistants-env/bin/activate
pip3 install python-dotenv
pip3 install --upgrade openai
# fire up VSCode and let's get rolling!
code .
Replace the text with your OpenAI API key, which you can get from your OpenAI developer account.
OPENAI_API_KEY="sh-xxx"
$ pip install -U -q openai
Upload Files to a Knowledge Base:
First, make a folder to store all the files you’ll create. Upload a detailed PDF manual of a product line (e.g., “tech_manual.pdf”) using the API:
from openai import OpenAI
client = OpenAI()
file = client.beta.files.upload(
file=open("tech_manual.pdf", "rb"),
filetype="application/pdf",
description="Tech product manual"
)
Now you can create the assistant with an uploaded file and with the ability to retrieve: tools=[{"type": "retrieval"}]
assistant = client.beta.assistants.create(
instructions="You are a tech support chatbot. Use the product manual to respond accurately to customer inquiries.",
model="gpt-4-1106-preview",
tools=[{"type": "retrieval"}],
file_ids=[file.id]
)
User Interaction: To interact with the assistant, you need a thread
and a message:
The message should contain the customer's question. Here's an example:
thread = client.beta.threads.create()
message = client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="How do I reset my Model X device?",
)
RUN Thread:
- A customer asks, “How do I reset my Model X device?”
- The assistant accesses the uploaded manual, performs a vector search to find the relevant section, and provides clear, step-by-step reset instructions.
run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id,
)
# the run will enter the **queued** state before it continues it's execution.
Information retrieval:
After the run is complete, you can retrieve the assistant’s response:
messages = client.beta.threads.messages.list(
thread_id=thread.id
)
assistant_response = messages.data[0].content[0].text.value
The output result should contain the assistant’s response to the customer’s question based on knowledge from the uploaded manual.
The full code and more examples are in the Colab notebook attached in the Resources section.
OpenAI’s Other Advanced Models
OpenAI also offers different types of models that are not yet integrated into the Assistants API but are accessible. These models offer voice processing, image understanding, and image generation capabilities.
Whisper-v3
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It is a transformer-based encoder-decoder model, which is a type of sequence-to-sequence model. The latest large-v3
model shows improved performance over various languages compared to Whisper large-v2
. OpenAI released the model’s weights with an Apache License 2.0. The model is available on Hugging Face.
Text to Speech
TTS is an AI model that converts text to natural-sounding spoken text. They offer two different model variates: tts-1
is optimized for real-time text-to-speech use cases, and tts-1-hd
is optimized for quality. These models can be used with the Speech endpoint in the Audio API.
Dall-E 3
A newer iteration of the DALL-E model is designed for image generation. It can create images based on user prompts, making it a valuable tool for graphic designers, artists, and anyone to generate images quickly and efficiently. You can access the model through the image generation endpoint.
GPT-4 Vision
GPT-4 with Vision enables you to ask questions about the contents of images. Visual question answering (VQA) is an important computer vision research field. You can also perform other vision tasks, such as Optical Character Recognition (OCR), where a model reads text in an image.
Using GPT-4 with Vision, you can ask questions about what is or is not in an image, how objects relate in an image, the spatial relationships between two objects (is one object to the left or right of another), the color of an object, and more.
GPT-4V is available through the OpenAI web interface for ChatGPT Plus subscribers and through their API. This expands the model’s utility beyond the traditional text-only inputs, enabling it to be applied in a wider range of contexts. It handles images through the Chat Completions API, but note that the Assistants API does not support GPT-4V at this time.
GPT4-V supports advanced use cases like creating image captions, in-depth analysis of visual content, and interpreting text and graphics in documents.
Hugging Face Inference API
Hugging Face (HF) offers a free service for testing and evaluating over 150,000 publicly available machine learning models hosted on their platform through their Inference API.
HF provides a wide range of models, including transformer and diffusion-based models, that can help solve various NLP or vision tasks such as text classification, sentiment analysis, named entity recognition, etc.
💡 Note that these free Inference APIs are rate-limited and not meant for production use. You can check out their Inference Endpoint service if you want good performance.
Steps to use the Inference API
- Login to Hugging Face.
- Navigate to your profile on the top right navigation bar, then click “Edit profile.”
- Click on the “Access Tokens” menu item.
- Set the HF HUB API token:
export HUGGINGFACEHUB_API_TOKEN=your-token
Use the HUGGINGFACEHUB_API_TOKEN
as an environment variable
import os
from huggingface_hub import HfApi
hf_api = HfApi(token=os.getenv("HUGGINGFACEHUB_API_TOKEN"))
Run the Inference API
Inference is the process of using a trained model to predict new data. The huggingface_hub
library provides an easy way to call a service that runs inference for hosted models. As described above, you have two types of services available.
- Inference API: run accelerated inference on Hugging Face’s infrastructure for free.
- Inference Endpoints: easily deploy models to production (paid)
Choose a model from the Model Hub
The model checkpoints are stored in the Model Hub; you can search and share them. Note that not all models are available on the Inference API. Once the endpoint has been created, you should see a URL endpoint of it like the following:
ENDPOINT = <https://api-inference.huggingface.co/models/><MODEL_ID>
Run the inference:
import requests
API_URL = "<https://api-inference.huggingface.co/models/><MODEL_ID>"
headers = {"Authorization": f"Bearer {API_TOKEN}"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
data = query("Can you please let us know more")
Hugging Face Tasks
The team at Hugging Face has categorized several models into the different tasks they can solve. You can find models for popular NLP tasks: Question Answering, Sentence Similarity, Summarization, Table Question Answering, and more.
Here is another example of using the Inference API for a summarization task:
API_TOKEN = 'your_api_token_here'
model_name = 'facebook/bart-large-cnn'
text_to_summarize = "Hugging Face's API simplifies accessing powerful NLP models for tasks like summarization, transforming verbose texts into concise, insightful summaries."
endpoint = f'<https://api-inference.huggingface.co/models/{model_name}>'
headers = {'Authorization': f'Bearer {API_TOKEN}'}
data = {'inputs': text_to_summarize}
response = requests.post(endpoint, headers=headers, json=data)
summarized_text = response.json()[0]['summary_text']
print(summarized_text)
The pre-trained model used above is[facebook/bart-large-cnn](<https://huggingface.co/facebook/bart-large-cnn>) trained by Meta
demonstrates the ability to produce clear and concise summaries.
Note: Not all models are available in this Inference API. Verify if the model is available by reviewing its ‘Model card’.
Sentiment analysis task:
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "<https://api-inference.huggingface.co/models/distilbert-base-uncased-finetuned-sst-2-english>"def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()data = query({"inputs": "I love how this app simplifies complex tasks effortlessly . I'm frustrated by the frequent errors in the software's latest update"})
print(data)
Text-to-image task:
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")prompt = "Create an image of a futuristic cityscape on an alien planet, featuring towering skyscrapers with glowing neon lights, a sky filled with multiple moons, and inhabitants of various alien species walking through vibrant market streets"
image = pipe(prompt).images[0]image.save("generated-image.png")
Image generated with stable diffusion is saved to your environment!
You can also encode a sentence and get text embeddings.
from sentence_transformers import SentenceTransformer
sentences = ["GAIA's questions are rooted in practical use cases, requiring AI systems to interact with a diverse and uncertain world, reflecting real-world applications.", " GAIA questions require accurate execution of complex sequences of actions, akin to the Proof of Work concept, where the solution is simple to verify but challenging to generate."]
model = SentenceTransformer('Equall/english-beta-0.3', use_auth_token=API_TOKEN)
embeddings = model.encode(sentences)
print(embeddings)
[[ 0.76227915 -0.5500489 -1.5719271 … -0.34034422 -0.27251056 0.12204967] [ 0.29783687 0.6476462 -2.0379746 … -0.28033397 -1.3997376 0.25214267]]
You can also experiment with image-captioning models:
from transformers import pipeline
image_to_text = pipeline("image-to-text", model="nlpconnect/vit-gpt2-image-captioning")image_to_text("<https://ankur3107.github.io/assets/images/image-captioning-example.png>")# [{'generated_text': 'a soccer game with a player jumping to catch the ball '}]
And also perform experiments with classification and image-to-text models, pre-trained on ImageNet:
from transformers import ViTImageProcessor, ViTForImageClassification
from PIL import Image
import requests
url = '<http://images.cocodataset.org/val2017/000000039769.jpg>'
image = Image.open(requests.get(url, stream=True).raw)processor = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224')
model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224')inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])
preprocessor_config.json: 100%
160/160 [00:00<00:00, 10.5kB/s]
config.json: 100%
69.7k/69.7k [00:00<00:00, 3.60MB/s]
model.safetensors: 100%
346M/346M [00:02<00:00, 162MB/s]
the output calculated give us the prediction of an image provided:
Predicted class: Egyptian cat
Now we’ll scrape a web page using Rapid API to get the articles and summarize them with a huggingface model using HF model inference API.
# Function to fetch text from the API
def fetch_text_from_api():
url = "<https://lexper.p.rapidapi.com/v1.1/extract>"
querystring = {
"url": "<https://techcrunch.com/2023/11/25/neuralink-elon-musks-brain-implant-startup-quietly-raises-an-additional-43m/>",
"js_timeout": "30",
"media": "true"
}
headers = {
"X-RapidAPI-Key": "xxx",
"X-RapidAPI-Host": "lexper.p.rapidapi.com"
}
response = requests.get(url, headers=headers, params=querystring)
data = response.json()
# Extract the relevant text from the API response
# Adjust the following line according to the structure of your API response
return data.get('article', {}).get('text', '')# Function to summarize the text using Hugging Face API
def query_huggingface(payload):
API_URL = "<https://api-inference.huggingface.co/models/facebook/bart-large-cnn>"
headers = {"Authorization": f"Bearer {API_TOKEN}"}
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()# Fetch the text
text_to_summarize = fetch_text_from_api()# Summarize the text
summarization_payload = {
"inputs": text_to_summarize,
"parameters": {"do_sample": False},
}summary_response = query_huggingface(summarization_payload)
print(summary_response)
[{‘summary_text’: ‘Elon Musk-founded company raises $43 million in new venture capital. The company is developing implantable chips that can read brain waves. Critics say the company has a toxic workplace culture and unethical research practices. In June, Reuters reported that the company was valued at about $5 billion.’}]
Conclusion
You have learned how to use the OpenAI Assistants API, and its essential components like Threads
and Messages
. On a concrete example its shown how the AI assistant can be deployed as a tech support, empowering customer interaction.
In Hugging Face’s free Inference API section, through practical implementations, you’ve seen how to authenticate, access models via the Model Hub, and perform various NLP tasks.
Hope this quick walkthrough was helpuful and gave you additonal perspective on Assistants construct, especially the ones you can construct via API — .
Cheers!
RESOURCES
the Google colab notebook:
Note: This article is previously published on Notion, you can read the version written as the lesson for Towards-AI courses on RAG SYSTEMS for Activeloop.