A-Eye: Auditory Supplement for the Legally Blind

“shallow focus photography of dandelion” by Dawid Zawiła on Unsplash

What if I said I could give you the ability to fly? How crazy would that be. Imagine the ramifications of flight on your everyday life. The dream all of us have had, and the superpower, we always wished for.

It sounds like fiction to us. But there’s something much more normal which seems as fictitious as flight to about 7%, or 285 MILLION people around the world.

Sight.

Imagine losing your vision. Suddenly, not being able to see the things around you. What problems would you face?

  • How would you cross the street without knowing the color of the traffic lights?
  • How would you know where you were, if you couldn’t read a map?
  • How would you be able to navigate your own home?

There are thousands of other problems faced by individuals who are legally blind. Sight is flight.

So, I asked myself how we could solve this.

That’s the natural next question. And that’s what I asked myself a couple of months ago when I created an app called A-Eye. (On the Google Play Store here)

“selective focus photography of person taking photo of trees” by Julian Hochgesang on Unsplash

A-Eye is an app that aids the legally blind with their everyday life. It is essentially an auditory supplement for our visual world. Point your phone’s camera at common household objects, and the app will say what those objects are, aloud to you. Imagine descriptive video, but for life.

How it works.

A-Eye works by using Machine Learning. The app uses a retrained MobileNet, which is essentially a Convolutional Neural Network that allows a computer to identify objects within photos. (Learn more about Machine Learning in my last article!) The app takes a photo using your phone’s camera, passes it into the MobileNet, and gets a string of text as output. The app will then use Text to Speech to speak that string of text aloud, which corresponds to the objects within the photo.

Easy enough.

The process I took to create it.

Dataset

I collected an image dataset of about 1000 common household objects. Anything from photos of washing detergent, to laptops. Each category had about 5000 different images each. Sources came from datasets such as CIFAR-100, ImageNet, and more.

Data Preparation

To begin with, images had to be separated into different folders for my algorithm to work. Furthermore, I quickly realized that the way certain images were named lead to errors in my code (Such as numbers with decimals). Other than that, very little work had to be done during this phase.

Choosing a Model

I chose to use Convolutional Neural Networks because of their unique ability to perform object detection. I specifically chose MobileNet, because it was designed for mobile use in mind. (Lower computing power required)

Training & Evaluation

Next was actually training my model. This took about a week to complete, after going through many iterations on my sub-par laptop. I eventually received an accuracy of approximately 70%.

Implementation

Finally was actual implementation, which by far took the longest! I was partially introduced to this reality which interning at Microsoft this summer, but I really experienced it here. Developing the actual app took much longer than creating the model, especially trying to find documentation on how to properly put the app and machine learning model together! (Please, we need a better and UPDATED tutorial on this…)

And it could be even better.

In the future, I’m planning to update the app to add more functionality. Many auto-captioning models exist which allow machines to explain the context of a scene.

Figure 1

Google’s Show and Tell algorithm automatically captions images like in Figure 1. This context-based system, along with constant updates, would allow for a constant audio supplement which could explain anything from distances, objects, relationships between objects, and more.

What does this mean for those who are Legally Blind?

Imagine stepping up to a red light, and your app automatically tells you how far you are from the curb, and what color the light is. Imagine getting an update if a person is moving towards you, or having text be read aloud to you, automatically. The sheer life improvements such a device would make is monumental.

“worm's-eye view of an airplane flying above city” by Florian Schneider on Unsplash

Just compare it to flight. We can’t fly. But we can use machines such as planes to help us fly. This app is a plane. And planes are insanely useful.

If you enjoyed this article:

  • share it with your network
  • follow me on Medium and LinkedIn to stay up to date with my progress

Passionate about Machine Learning, understanding the world, and other exponential technologies.