Can You Hear An Image?

Published in

Bumble Tech

7 min readMar 21, 2023

I’m currently working as an Android Engineer at Bumble, but the content in this article does not relate to my work at Bumble.

Accessibility is not a feature, it’s a social trend — Antonio Santos

Accessibility is an often overlooked aspect of app development. It requires some forethought about design and implementation. Unfortunately a lot of developers deem it not worth the effort as their user base doesn’t currently contain many people using accessibility features.

Herein lies the problem. If you don’t develop your app with accessibility in mind, then people who use accessibility features won’t download your app, therefore making your user base primarily non-accessibility feature using people. This then leads developers to think that it wouldn’t be that beneficial to implement accessibility features, so they don’t. And the cycle continues.

Photo of person with lines on the contours of their face, highlighting features of their face. — Source

Recently I was listening to someone talking about accessibility on mobile apps and they pointed out that the first thing you see on someone’s profile is their photo. They then pointed out to everyone that, being visually impaired, they couldn’t see it and were only provided with a bit of text that informed them there was a photo on this part of the screen.

I hear you say “well yeah, what else is it going to do?”.

And you’d be right in your assumption that it is exactly what nearly every screen reader does when it encounters an image. And that’s because developers provide the screen reader with alternative text (alt text) to read out instead. And the easiest thing to do in this situation is to write:

altText = "image"

But what if apps could describe the images to the user? What if we could describe what you look like to another person? Sounds difficult right? Well… not really. If we leverage the power of machine learning, it’s more than possible.

If we take into account some characteristics that we’d usually notice about a person when we see a photo of them, I’m sure it’d be along the lines of: their hair colour, eye colour, are they wearing glasses? etc. So what if a screen reader could tell you all that information instead of “this is an image”?

Introducing TensorFlow Lite

TensorFlow is a platform for machine learning and very helpfully provides an Android library (& an iOS one!) which I will use in this example.

If we take just one of the characteristics from the earlier example: hair colour; we can “teach” an app how to tell you what colour hair the person in a photo has.

To start I’m going to assume you know how to create an Android app (if not here’s a tutorial), so go ahead and make a new project.

Here’s what mine looks like:

I have an ImageView of the picture I’d like to know more about and then a Button which I will use to determine the hair colour. There is also a TextView that I will use to show the results just underneath the button.

You’re going to need to import TensorFlow Lite too:

implementation 'org.tensorflow:tensorflow-lite-support:0.1.0'
implementation 'org.tensorflow:tensorflow-lite-metadata:0.1.0'

Also, make sure you add this in your gradle file within the android block:

aaptOptions {
    noCompress "tflite"
}

(If you’d like to try out TensorFlow with a Google tutorial here’s a pretty simple one with flowers that I used to help write this tutorial!)

Training a model

Before we go any further, we need to train a model. A model is what our app is going to use to decide what colour hair is in our photo.

There are a few ways to do this but, as I am limited to the processing power of my machine, I opted to use CoLab (this option requires you to write some Python, but don’t worry if you’re unfamiliar I will paste the code you need here).

Go ahead and create a new notebook in CoLab and paste the following:

!pip install tf-nightly
!pip install tflite-model-maker

This is going to install the necessary modules we need from TensorFlow. The nightly import uses the most up-to-date alpha version of TensorFlow, which I needed to make this project work (however, this may change in the future).

Then, in the same notebook, add the following python imports:

from tflite_model_maker import image_classifier
from tflite_model_maker.image_classifier import DataLoader

We will need these specific imports for the next bit. Then add the following underneath:

data = DataLoader.from_folder('test-data/')
model = image_classifier.create(data)
loss, accuracy = model.evaluate(data)
model.export(export_dir='/tmp/')

This is it. 4 lines. This creates our magical model that is going to help us describe our photos!

This will load images from a folder called test-data, classify them and then produce a TensorFlow model in the /tmp/ folder.

But before we run this code, we need to fill our test-data folder with photos. For this to work, we need a folder inside test-data for each hair colour we want to distinguish. For example: blonde, brunette, black, ginger, grey. Then fill each hair colour folder with photos of people with that hair colour (stick to jpgs and pngs as it accepts only a few file types). Make sure this test-data folder is inside the sample-data folder in CoLab. Aim for at least 10 images in each dataset but the more images the more confident it can be!

Let’s run it!

If all goes well during runtime* then you should now have a model.tflite file in your CoLab /tmp/ folder. Download this file and then import it into your Android project. You will need to add a new resources folder called “ml” at the same level as you other resources folders. Add your model into there.

Classify your photo

Last thing to do now is to run our photo in the app against our TensorFlow model. If your photo isn’t already a bitmap then you will need to convert it. Here’s how I did it:

val img: ImageView = findViewById(R.id.image)
val fileName = "person.jpeg"
val bitmap: Bitmap? = assetsToBitmap(fileName)
bitmap?.apply {
    img.setImageBitmap(this)
}

Then in my app, I trigger the event by pressing the button using the following code:

btn.setOnClickListener {

    var tfImage = TensorImage(DataType.FLOAT32)
    tfImage.load(bitmap)

    val imageProcessor = ImageProcessor.Builder()
        .add(ResizeOp(224, 224, ResizeOp.ResizeMethod.BILINEAR))
        .build()
    tfImage = imageProcessor.process(tfImage)

    val model = Model.newInstance(this@MainActivity)
    val outputs = model.process(tfImage)
    val outputBuffer = outputs.probabilityAsCategoryList
    var outputText = ""
    for (label in outputBuffer.iterator()) {
        val text = label.label
        val confidence = label.score
        outputText += "$text : $confidence\n"
    }
    txtOutput.text = outputText
}

Here, the magic is happening on this line:

val outputs = model.process(tfImage)

This is telling the model we made earlier to take our image and give us back it’s estimations on what hair colour the person has.

The last line in that snippet is responsible for showing the confidence scores in my TextView. In this context, the confidence score will tell you how likely it is that the person is the photo has each hair colour.

Here’s what mine looks like:

You can see here that it is 47% confident that the person has black hair but also 35% confident that it is grey. This is because I used a relatively small dataset of around 100 different images. If I were to use a dataset of 1000 images instead, the confidence rates would be much higher.

So now we have the confidence scores, we can take the one with the highest confidence and using some string manipulation turn that into a human readable fact, eg: “This is a photo of a person with black hair”. This can then be substituted for the alt text on an image. Therefore describing an image!

We can then use this same process to produce models for other characteristics, like eye colour, hairstyle, if they’re smiling etc, to make our alt text even more descriptive!

Machine learning often seems like a scarily difficult concept to get your head around, but with options like TensorFlow Lite and MLKit available, it is something any level of developer can try.

Hopefully this article can inspire developers to consider accessibility within their apps more and even helps them to start working towards a viable solution!

Cartoon of female programmer at computer with code bubbles around her head. — Source

Happy coding! ⭐️

Troubleshooting

(*) If you are having trouble getting the CoLab script to run, you may need to remove the following (replacing the hair colour folder for each one you have):

test-data/blonde/.ipynb_checkpoints

If you’re experiencing another issue, it is likely you have added an image that is neither a png or jpg. Ensure all your file extensions are correct.

Can You Hear An Image?

Introducing TensorFlow Lite

Training a model

Let’s run it!

Classify your photo

Written by Lottie Hope