Bridging the Communication Gap: ASL Sign Language Classification using Gemini Pro Vision

Esther Irawati Setiawan
The Deep Hub
Published in
3 min readApr 2, 2024

American Sign Language (ASL) is beautiful and expressive, but computers need powerful tools to understand it. Meet Gemini Pro Vision, a state-of-the-art AI model from Google AI. In this tutorial, we’ll try to classify ASL signs using Gemini Pro Vision in Colab!

Prompting With Google AI Studio

First, we need to open Google AI Studio to get started. https://aistudio.google.com. Google AI Studio is a browser-based IDE for prototyping with generative models. Google AI Studio lets you quickly try out models and experiment with different prompts. When you’ve built something you’re happy with, you can export it to code in your preferred programming language, powered by the Gemini API.

Create a new freeform prompt to start. We can use this prompt type because we only want to classify hand signs.

On the right side of the screen, pick Gemini 1.0 Pro Vision to work with images. While the Gemini 1.5 Pro might work better, we will do it with the 1.0 version since not everyone currently has access to the 1.5 Pro model.

Next, we start prompting. Because Gemini may or may not have specialized knowledge about ASL, we can give it the complete diagram as a reference.

Now that we’ve finished making the prompt, we can access the code by clicking the “Get Code” button at the top right and copying it to Colab for testing.

We need to generate a key in AI Studio to get the API Key. Then, we can copy that key into the code to gain access.

Testing in Colab

import google.generativeai as genai
from pathlib import Path
genai.configure(api_key="<YOUR_API_KEY>")

#Set up the model
generation_config = {
"temperature": 0,
"top_p": 1,
"top_k": 1,
"max_output_tokens": 256,
}

safety_settings = [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_ONLY_HIGH"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_ONLY_HIGH"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_ONLY_HIGH"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_ONLY_HIGH"
},
]
model_pro_1_0_vision = genai.GenerativeModel(model_name="gemini-1.0-pro-vision-latest",
generation_config=generation_config,
safety_settings=safety_settings)

Don’t forget to copy the API key. Also, upload this image to Colab as the reference diagram.

Next, we make a function to classify an image based on the diagram.

asl_diagram_img_path = "ASL_diagram.jpg"
def classifyASL(image_path:str):
if not (img := Path(asl_diagram_img_path)).exists():
raise FileNotFoundError(f"Could not find ASL diagram image: {img}")


if not (img := Path(image_path)).exists():
raise FileNotFoundError(f"Could not find image: {img}")


image_parts = [
{
"mime_type": "image/jpeg",
"data": Path(asl_diagram_img_path).read_bytes()
},
{
"mime_type": "image/jpeg",
"data": Path(image_path).read_bytes()
},
]

text_parts = [
"\nThe image above shows the American Sign Language Diagram.\n",
"\nBased on the diagram, classify the letter the hand gesture is referring.\nanswer with just the class.\nexample: A\nanswer: "
]

prompt_parts = [
image_parts[0],
text_parts[0],
image_parts[1],
text_parts[1]
]
response = model_pro_1_0_vision.generate_content(prompt_parts)
return response.text.strip().lower()

Next, you only need to upload an image of an ASL hand sign to Google Colab and call the function with the file path as its argument.

print(classifyASL("a.jpg"))

By following these steps and leveraging the power of Gemini Vision, you can create an ASL classification model that breaks down communication barriers and fosters a more inclusive world!

--

--