Empowering the Visually Impaired with AI-Powered Assistive Eyewear

Akumar Panday

Published in

Google Cloud - Community

5 min readAug 2, 2024

Revolutionizing Accessibility through Real-Time Sensory Perception and Seamless Interaction

Introduction

In today’s fast-paced world, accessibility is crucial for ensuring inclusivity and independence for everyone. For visually impaired individuals, navigating daily life can present significant challenges. Introducing Shrote, an innovative assistive eyewear device designed to bridge the gap and enhance the lives of visually impaired individuals. By leveraging cutting-edge AI technology and advanced sensors, Shrote offers real-time sensory perception and seamless interaction with the environment.

Project Overview

Name of the Model: Shrote

Brief Description: Shrote is an assistive eyewear prototype designed to empower visually impaired individuals. It integrates an AI-powered virtual assistant with advanced sensors, offering real-time sensory perception and seamless interaction with the environment. The device captures live video through a camera, processes voice commands, and provides instant audio feedback, enhancing accessibility for users.

Innovative Component: The innovative component of Shrote lies in its integration of AI algorithms with live video capture and multiple sensors. This combination enables real-time environmental perception, facial recognition, object identification, and seamless communication through voice commands, offering a comprehensive solution for the visually impaired.

Problem Being Solved: Shrote addresses the challenge faced by visually impaired individuals in interacting with their surroundings independently. It solves the problem of limited accessibility and communication barriers by providing real-time information about the environment. The device empowers users to navigate, identify objects, read printed materials, recognize faces, and connect with others, fostering independence, confidence, and inclusivity for the visually impaired community.

Technical Implementation

Hardware Components:

High-definition camera for live video capture
Microphone for voice command input
Speakers for audio feedback
Lightweight, ergonomic eyewear frame

Software Components:

AI algorithms for real-time video processing
Speech recognition and natural language processing for voice commands
Machine learning models for facial recognition and object identification
API integration for additional functionalities (e.g., OCR for reading printed materials)

Key Features:

Real-Time Environmental Perception: Provides instant information about the surroundings.
Facial Recognition: Identifies and names known individuals.
Object Identification: Recognizes and describes objects in the environment.
Voice Commands: Allows users to interact with the device through simple voice commands.
Audio Feedback: Delivers immediate audio responses to user queries and environmental observations.

Current Implementation

Audio Transcription Using Google Speech-to-Text

Using the Google Speech-to-Text API, Shrote can transcribe audio commands into text for further processing.

Weather Information Using RapidAPI

Shrote can provide weather updates by integrating with the RapidAPI service.

SMS Notifications with Twilio

Shrote can send SMS notifications to caregivers or family members using the Twilio API in case of any danger or the user needs any help.

Face Recognition

Shrote can recognize faces using the face_recognition library integrated with OpenCV.

Google Cloud MySQL Database

Shrote connects to a Google Cloud MySQL database for storing and retrieving user data.

Step-by-Step Instructions

Step 1: Setting Up the Environment

Prerequisites

Before you begin, ensure you have met the following requirements:

Python 3.8 or later
Django 3.0 or later
Google Cloud SDK
cloud-sql-proxy binary
RapidAPI account
Twilio account
OpenCV and face_recognition library

Installation

Clone the repository:

git clone https://github.com/yourusername/shrote.git cd shrote

2. Create and activate a virtual environment:

python -m venv env source env/bin/activate   # On Windows use `env\Scripts\activate`

3. Install the required packages:

.  pip install -r requirements.txt

Step 2: Configuring the Project

Google Cloud Service Account

Create a Google Cloud Service Account and download the JSON key file. Name it key.json and place it in the root directory of the project.
Set up the Google Cloud Speech-to-Text API by following Google’s documentation.

RapidAPI for Weather

Sign up at RapidAPI.
Subscribe to the Weather API.
Create an .env file in the root directory and add your API key:

RAPIDAPI_WEATHER_KEY="your_rapidapi_key"

Twilio API Credentials

Sign up at Twilio.
Create a new project and get your account_sid and auth_token.
In shrote_app/sms.py, add your Twilio credentials:

account_sid = 'your_account_sid' auth_token = 'your_auth_token'

Face Recognition

Add your facial encoding in shrote_app/face.py:

import face_recognition

# Load a sample picture and learn how to recognize it.
your_image = face_recognition.load_image_file("path_to_your_image")
your_face_encoding = face_recognition.face_encodings(your_image)[0]

# Add more known face encodings and their names
known_face_encodings = [
    your_face_encoding,
]
known_face_names = [
    "Your Name",
]

Google Cloud MySQL

Create a Google Cloud MySQL database instance by following Google’s documentation.
Download the cloud-sql-proxy binary and place it in your project directory.
Connect to the MySQL instance using cloud-sql-proxy:

./cloud-sql-proxy your-instance-connection-name

4. Update your Django settings.py to connect to the MySQL database:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'HOST': '127.0.0.1',
        'PORT': '3306',
        'NAME': 'your_db_name',
        'USER': 'your_db_user',
        'PASSWORD': 'your_db_password',
    }
}

Step 3: Running the Project

Run the migrations:

python manage.py migrate

2. Start the development server:

python manage.py runserver

Code Snippets

Here’s a basic example of how some key functionalities can be implemented in Python:

1. Voice Command Processing:

def synthesize_speech(text, output_filename="output.mp3", language_code="en-US"):
    input_text = texttospeech.SynthesisInput(text=text)voice = texttospeech.VoiceSelectionParams(
        language_code=language_code,
        ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
    )audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )response = tts_client.synthesize_speech(
        input=input_text, voice=voice, audio_config=audio_config
    )static_dir = os.path.join(settings.BASE_DIR, 'static')
    if not os.path.exists(static_dir):
        os.makedirs(static_dir)# Save the audio file to the static location
    audio_path = os.path.join(static_dir, 'output.mp3')with open(audio_path, "wb") as out:
        out.write(response.audio_content)return audio_path

2. Object Identification:

def detect():
    response = ""
    person =""
    object_counts = {}

    frame = cv2.imread('static/detected_frame.jpg')

    if frame is not None:
        if not capture_frame:
            # Run YOLOv8 inference on the frame
            results = model(frame)

            # Visualize the results on the frame
            annotated_frame = results[0].plot()

            # Display the annotated frame
            cv2.imshow("YOLOv8 Inference", annotated_frame)

            if results[0].boxes:
                for result in results[0].boxes:
                    obj = results[0].names[result.cls[0].item()]

                    if obj in object_counts:
                        object_counts[obj] += 1
                    else:
                        object_counts[obj] = 1

                for obj, count in object_counts.items():
                    response = response+str(count)+" "+obj+", "

                static_dir = os.path.join(settings.BASE_DIR, 'static')
                if not os.path.exists(static_dir):
                    os.makedirs(static_dir)
                img_path = os.path.join(static_dir, 'detected_frame.jpg')

            

                cv2.imwrite(img_path, frame)
                capture_frame = True

                cap.release()
                cv2.destroyAllWindows()

                if "person" in response:
                    #call recognize to search facial encoding
                    faces = recognize()

                    for face in faces:
                        person = person + face + ", "

                    if len(faces) > 0:
                        person = " Which includes "+person[0:-2]


                response = response[0:-2]+" detected."+person

                return response

Conclusion

Shrote represents a significant leap forward in assistive technology for the visually impaired. By integrating AI with advanced sensors, it provides real-time, actionable information that can greatly enhance the independence and confidence of its users. The journey of developing Shrote is an inspiring example of how technology can be leveraged to create impactful solutions for those in need.

Contribution

This project was made possible through the collaborative efforts of the following team members:

Demo and GitHub Repository

Check out the live demo of Shrote here.

You can find the GitHub repository here.

To learn more about Google Cloud services and to create impact for the work you do, get around to these steps right away:

Register for Code Vipassana sessions
Join the meetup group Datapreneur Social
Sign up to become Google Cloud Innovator

Empowering the Visually Impaired with AI-Powered Assistive Eyewear

Introduction

Project Overview

Technical Implementation

Hardware Components:

Software Components:

Key Features:

Current Implementation

Audio Transcription Using Google Speech-to-Text

Weather Information Using RapidAPI

SMS Notifications with Twilio

Face Recognition

Google Cloud MySQL Database

Step-by-Step Instructions

Step 1: Setting Up the Environment

Prerequisites

Installation

Step 2: Configuring the Project

Google Cloud Service Account

RapidAPI for Weather

Twilio API Credentials

Face Recognition

Google Cloud MySQL

Step 3: Running the Project

Code Snippets

1. Voice Command Processing:

2. Object Identification:

Conclusion

Contribution

Demo and GitHub Repository

Written by Akumar Panday