AI Assistants via OpenAI and Hugging Face API

quick hands-on tutorial

Iva @ Tesla Institute

Published in

Artificialis

11 min readJan 10, 2024

Introduction

In this guide, we’ll explore the Assistant APIs from OpenAI. We will learn about the primary features of the Assistants API, including the Code Interpreter, Knowledge Retrieval, and Function Calling capabilities.

This hand-on will show how to equip Assistant’s with tools and function that would enable it to provide technical solutions by executing Python code, retrieve knowledge from the database and many more.

You’ll be also introduced to other advanced technologies from OpenAI, such as Whisper, Dalle-3, Speech to Text, and the GPT-4 vision API. These tools are essential for anyone looking to develop sophisticated AI assistants using a variety of APIs.

Then, you’ll learn how to use the free Hugging Face Inference API to get access to the thousands of models hosted on their platform.

By the end of this tutorial, you’ll gain a solid understanding on how to apply these technologies in your AI projects.

Open AI Assistant’s Built-in Functionalities

The OpenAI Assistants API includes three main functionalities: Code Interpreter, Retrieval, and Function Calling.

[Code Interpreter]

The Assistant can use Code Interpreter automatically when you upload a file with data. It’s a tool that transforms the LLM into a more accurate computational problem-solver that can handle tasks like solving complex math equations. It can also generate files with data and images of graphs from the same Python code. It’s a useful way to trust the output from the assistant and a great tool when analyzing data.

[Knowledge Retrieval]

Knowledge Retrieval is OpenAI’s own RAG system offered as part of the Assistants API. KR allows multiple uploads. Once the files are uploaded and passed to the Assistant, OpenAI will automatically chunk your documents, index them, store the embeddings, and implement vector search to retrieve relevant content to answer user queries.

[Function Calling]

Function calling allows you to describe functions or tools to the Assistant and have it return the functions that need to be called along with their arguments. It's a powerful way to add new capabilities to your Assistant.

How To Set Up an Assistant

You have two distinct pathways depending on your needs and expertise:

Assistants Playground: Ideal for those looking to get a feel for the Assistant’s capabilities without going into complex integrations.
Detailed Integration through the API: Best suited for those who require a more customized and in-depth setup.

STEP-BY-STEP ASSISTANT CREATION:

Creating an Assistant:

Purpose: An Assistant object represents an entity/agent that can be configured to respond to users’ messages in different ways using several parameters.

Model Selection: you can specify any version of GPT-3.5 or GPT-4 models, including fine-tuned models. OpenAI recommends using its latest models with the Assistants API for best results and maximum compatibility with tools. Thus, choose between gpt-3.5-turbo-1106 or gpt-4-1106-preview models.

Tools: The Assistant supports the Code Interpreter for technical queries that require Python code execution or Knowledge Retrieval to augment the Assistant with proprietary external information.

Setting up a Thread:

Role: A Thread acts as the foundational unit of user interaction. It can be seen as a single conversation. Pass any user-specific context and files in this thread by creating Messages.

thread = client.beta.threads.create()

Customization: In Thread, ingest user-specific contexts or attach necessary files so each conversation is unique and personalized.

Threads don’t have a size limit. You can add as many messages as you want to a conversation/Thread. The Assistant will ensure that requests to the model fit within the maximum context window, using relevant optimization techniques used in ChatGPT, such as truncation.

Adding a Message:

Definition: Messages are user inputs, and the Assistant’s answers are appended to a Thread. User inputs can be questions or commands.

Function: They serve as the primary mode of communication between the user and the Assistant.

message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="I need to solve the equation `3x + 11 = 14`. Please, help!"
)

Messages can include text, images, and other files. Messages are stored as a list on the Thread. Using GPT-4 with Vision is not supported here. You can upload images and have them processed via retrieval.

Activation: For the Assistant to respond to the user message, you must create a Run. The Assistant will then automatically decide what previous Messages to include in the context window for the model.

⚠️ NOTE: You can optionally pass additional instructions to the Assistant while creating the Run, but these will override the default instructions of the Assistant!

Process: The Assistant processes the entire Thread, employs its tools if required, and formulates an appropriate response.

During its run, the Assistant can call tools or create Messages. Examining Run Steps allows you to check how the Assistant is getting to its final results.

The assistant’s response to a Run:

messages = client.beta.threads.messages.list(thread_id=thread.id)

These responses are displayed to the user! During this Run, the Assistant added two new Messages to the Thread.

ASSISTANT’S CORE MECHANISM:

Creating an Assistant only requires specifying the model.

You can further customize the behavior of the Assistant:

Use the instructions parameter to guide the personality of the Assistant and define its goals. Instructions are similar to system messages in the Chat Completions API.
Use the tools parameter to give the Assistant access to up to 128 tools in parallel. You can give it access to OpenAI-hosted tools (Conde Interpreter, Knowledge Retrieval) or call third-party tools via function calling.
Use the file_ids parameter to give the tools access to files. Files are uploaded using the File Upload endpoint.

Example demonstration:

Imagine you’re developing an AI assistant for a tech company. This assistant needs to provide detailed product support using a comprehensive knowledge base.

mkdir openai-assistants && cd openai-assistants
python3 -m venv openai-assistants-env
source openai-assistants-env/bin/activate

pip3 install python-dotenv
pip3 install --upgrade openai
# fire up VSCode and let's get rolling!
code .

Replace the text with your OpenAI API key, which you can get from your OpenAI developer account.

OPENAI_API_KEY="sh-xxx"

$ pip install -U -q openai

Upload Files to a Knowledge Base:

First, make a folder to store all the files you’ll create. Upload a detailed PDF manual of a product line (e.g., “tech_manual.pdf”) using the API:

from openai import OpenAI


client = OpenAI()
file = client.beta.files.upload(
 file=open("tech_manual.pdf", "rb"),
 filetype="application/pdf",
 description="Tech product manual"
)

Now you can create the assistant with an uploaded file and with the ability to retrieve: tools=[{"type": "retrieval"}]

assistant = client.beta.assistants.create(
  instructions="You are a tech support chatbot. Use the product manual to respond accurately to customer inquiries.",
  model="gpt-4-1106-preview",
  tools=[{"type": "retrieval"}],
  file_ids=[file.id]
)

User Interaction: To interact with the assistant, you need a thread and a message:

The message should contain the customer's question. Here's an example:

thread = client.beta.threads.create()
   message = client.beta.threads.messages.create(
       thread_id=thread.id,
       role="user",
       content="How do I reset my Model X device?",
   )

RUN Thread:

A customer asks, “How do I reset my Model X device?”
The assistant accesses the uploaded manual, performs a vector search to find the relevant section, and provides clear, step-by-step reset instructions.

run = client.beta.threads.runs.create(
   thread_id=thread.id,
   assistant_id=assistant.id,
)
# the run will enter the **queued** state before it continues it's execution.

Information retrieval:

After the run is complete, you can retrieve the assistant’s response:

messages = client.beta.threads.messages.list(
	thread_id=thread.id
)

assistant_response = messages.data[0].content[0].text.value

The output result should contain the assistant’s response to the customer’s question based on knowledge from the uploaded manual.

The full code and more examples are in the Colab notebook attached in the Resources section.

OpenAI’s Other Advanced Models

OpenAI also offers different types of models that are not yet integrated into the Assistants API but are accessible. These models offer voice processing, image understanding, and image generation capabilities.

Whisper-v3

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It is a transformer-based encoder-decoder model, which is a type of sequence-to-sequence model. The latest large-v3 model shows improved performance over various languages compared to Whisper large-v2. OpenAI released the model’s weights with an Apache License 2.0. The model is available on Hugging Face.

Text to Speech

TTS is an AI model that converts text to natural-sounding spoken text. They offer two different model variates: tts-1 is optimized for real-time text-to-speech use cases, and tts-1-hd is optimized for quality. These models can be used with the Speech endpoint in the Audio API.

Dall-E 3

A newer iteration of the DALL-E model is designed for image generation. It can create images based on user prompts, making it a valuable tool for graphic designers, artists, and anyone to generate images quickly and efficiently. You can access the model through the image generation endpoint.

GPT-4 Vision

GPT-4 with Vision enables you to ask questions about the contents of images. Visual question answering (VQA) is an important computer vision research field. You can also perform other vision tasks, such as Optical Character Recognition (OCR), where a model reads text in an image.

Using GPT-4 with Vision, you can ask questions about what is or is not in an image, how objects relate in an image, the spatial relationships between two objects (is one object to the left or right of another), the color of an object, and more.

GPT-4V is available through the OpenAI web interface for ChatGPT Plus subscribers and through their API. This expands the model’s utility beyond the traditional text-only inputs, enabling it to be applied in a wider range of contexts. It handles images through the Chat Completions API, but note that the Assistants API does not support GPT-4V at this time.

GPT4-V supports advanced use cases like creating image captions, in-depth analysis of visual content, and interpreting text and graphics in documents.

Hugging Face Inference API

Hugging Face (HF) offers a free service for testing and evaluating over 150,000 publicly available machine learning models hosted on their platform through their Inference API.

HF provides a wide range of models, including transformer and diffusion-based models, that can help solve various NLP or vision tasks such as text classification, sentiment analysis, named entity recognition, etc.

💡 Note that these free Inference APIs are rate-limited and not meant for production use. You can check out their Inference Endpoint service if you want good performance.

Steps to use the Inference API

Login to Hugging Face.
Navigate to your profile on the top right navigation bar, then click “Edit profile.”
Click on the “Access Tokens” menu item.
Set the HF HUB API token:

export HUGGINGFACEHUB_API_TOKEN=your-token

Use the HUGGINGFACEHUB_API_TOKEN as an environment variable

import os
from huggingface_hub import HfApi

 hf_api = HfApi(token=os.getenv("HUGGINGFACEHUB_API_TOKEN"))

Run the Inference API

Inference is the process of using a trained model to predict new data. The huggingface_hub library provides an easy way to call a service that runs inference for hosted models. As described above, you have two types of services available.

Inference API: run accelerated inference on Hugging Face’s infrastructure for free.
Inference Endpoints: easily deploy models to production (paid)

Choose a model from the Model Hub

The model checkpoints are stored in the Model Hub; you can search and share them. Note that not all models are available on the Inference API. Once the endpoint has been created, you should see a URL endpoint of it like the following:

ENDPOINT = <https://api-inference.huggingface.co/models/><MODEL_ID>

Run the inference:

import requests
API_URL = "<https://api-inference.huggingface.co/models/><MODEL_ID>"
headers = {"Authorization": f"Bearer {API_TOKEN}"}
def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()
data = query("Can you please let us know more")

Hugging Face Tasks

The team at Hugging Face has categorized several models into the different tasks they can solve. You can find models for popular NLP tasks: Question Answering, Sentence Similarity, Summarization, Table Question Answering, and more.

Here is another example of using the Inference API for a summarization task:

API_TOKEN = 'your_api_token_here'
model_name = 'facebook/bart-large-cnn'
text_to_summarize = "Hugging Face's API simplifies accessing powerful NLP models for tasks like summarization, transforming verbose texts into concise, insightful summaries."
endpoint = f'<https://api-inference.huggingface.co/models/{model_name}>'
headers = {'Authorization': f'Bearer {API_TOKEN}'}
data = {'inputs': text_to_summarize}
response = requests.post(endpoint, headers=headers, json=data)
summarized_text = response.json()[0]['summary_text']
print(summarized_text)

The pre-trained model used above is[facebook/bart-large-cnn](<https://huggingface.co/facebook/bart-large-cnn>) trained by Meta demonstrates the ability to produce clear and concise summaries.

Note: Not all models are available in this Inference API. Verify if the model is available by reviewing its ‘Model card’.

Sentiment analysis task:

headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "<https://api-inference.huggingface.co/models/distilbert-base-uncased-finetuned-sst-2-english>"def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()data = query({"inputs": "I love how this app simplifies complex tasks effortlessly . I'm frustrated by the frequent errors in the software's latest update"})
print(data)

Text-to-image task:

model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")prompt = "Create an image of a futuristic cityscape on an alien planet, featuring towering skyscrapers with glowing neon lights, a sky filled with multiple moons, and inhabitants of various alien species walking through vibrant market streets"
image = pipe(prompt).images[0]image.save("generated-image.png")

Image generated with stable diffusion is saved to your environment!

You can also encode a sentence and get text embeddings.

from sentence_transformers import SentenceTransformer
sentences = ["GAIA's questions are rooted in practical use cases, requiring AI systems to interact with a diverse and uncertain world, reflecting real-world applications.", " GAIA questions require accurate execution of complex sequences of actions, akin to the Proof of Work concept, where the solution is simple to verify but challenging to generate."]

model = SentenceTransformer('Equall/english-beta-0.3', use_auth_token=API_TOKEN)
embeddings = model.encode(sentences)
print(embeddings)

[[ 0.76227915 -0.5500489 -1.5719271 … -0.34034422 -0.27251056 0.12204967] [ 0.29783687 0.6476462 -2.0379746 … -0.28033397 -1.3997376 0.25214267]]

You can also experiment with image-captioning models:

from transformers import pipeline

image_to_text = pipeline("image-to-text", model="nlpconnect/vit-gpt2-image-captioning")image_to_text("<https://ankur3107.github.io/assets/images/image-captioning-example.png>")# [{'generated_text': 'a soccer game with a player jumping to catch the ball '}]

And also perform experiments with classification and image-to-text models, pre-trained on ImageNet:

from transformers import ViTImageProcessor, ViTForImageClassification
from PIL import Image
import requests

url = '<http://images.cocodataset.org/val2017/000000039769.jpg>'
image = Image.open(requests.get(url, stream=True).raw)processor = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224')
model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224')inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

preprocessor_config.json: 100%
160/160 [00:00<00:00, 10.5kB/s]
config.json: 100%
69.7k/69.7k [00:00<00:00, 3.60MB/s]
model.safetensors: 100%
346M/346M [00:02<00:00, 162MB/s]

the output calculated give us the prediction of an image provided:

Predicted class: Egyptian cat

Now we’ll scrape a web page using Rapid API to get the articles and summarize them with a huggingface model using HF model inference API.

# Function to fetch text from the API
def fetch_text_from_api():
    url = "<https://lexper.p.rapidapi.com/v1.1/extract>"
    querystring = {
        "url": "<https://techcrunch.com/2023/11/25/neuralink-elon-musks-brain-implant-startup-quietly-raises-an-additional-43m/>",
        "js_timeout": "30",
        "media": "true"
    }
    headers = {
        "X-RapidAPI-Key": "xxx",
        "X-RapidAPI-Host": "lexper.p.rapidapi.com"
    }
    response = requests.get(url, headers=headers, params=querystring)
    data = response.json()
    # Extract the relevant text from the API response
    # Adjust the following line according to the structure of your API response
    return data.get('article', {}).get('text', '')# Function to summarize the text using Hugging Face API
def query_huggingface(payload):
    API_URL = "<https://api-inference.huggingface.co/models/facebook/bart-large-cnn>"
    headers = {"Authorization": f"Bearer {API_TOKEN}"}
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()# Fetch the text
text_to_summarize = fetch_text_from_api()# Summarize the text
summarization_payload = {
    "inputs": text_to_summarize,
    "parameters": {"do_sample": False},
}summary_response = query_huggingface(summarization_payload)
print(summary_response)

[{‘summary_text’: ‘Elon Musk-founded company raises $43 million in new venture capital. The company is developing implantable chips that can read brain waves. Critics say the company has a toxic workplace culture and unethical research practices. In June, Reuters reported that the company was valued at about $5 billion.’}]

Conclusion

You have learned how to use the OpenAI Assistants API, and its essential components like Threads and Messages. On a concrete example its shown how the AI assistant can be deployed as a tech support, empowering customer interaction.

In Hugging Face’s free Inference API section, through practical implementations, you’ve seen how to authenticate, access models via the Model Hub, and perform various NLP tasks.

Hope this quick walkthrough was helpuful and gave you additonal perspective on Assistants construct, especially the ones you can construct via API — .

Cheers!

RESOURCES
the Google colab notebook:

Google ColaboratoryEdit description
colab.research.google.com

Note: This article is previously published on Notion, you can read the version written as the lesson for Towards-AI courses on RAG SYSTEMS for Activeloop.
https://signalism.notion.site/Crafting-AI-Assistants-via-OpenAI-and-Hugging-Face-API-92c7331225c54eb1966856bbc662b808?pvs=4