Trends in Technologies that Provide Visual Assistance for Blind People: 10 Takeaways for Users, Developers, and UX Professionals

Emma Sadjo
HCI & Design at UW
Published in
7 min readDec 7, 2021
This figure shows multiple screenshots of different log in, service instructions, the capturing of visual information, and home screens of different VAT mobile apps, including Envision (AI, Glasses), Seeing AI, Be My Eyes, Orcam Read, (Orcam) MyEye 2, and BeSpecular.

Since the early 2000s, the app-based visual assistance technology industry has gradually been on the rise, and we’ve seen a sharp increase since the mid 2010s. If you are new to the field, it can be difficult to gain a fundamental understanding of the wide range of visual assistance technologies (VATs) commercially available today. Whether you’re a VAT user, UX Professional, researcher, developer, or anyone who is looking to learn more about VATs, this is the resource for you.

What are VATs?

Visual assistance technologies (VATs) provide auditory descriptions of images and videos to answer blind people’s visual questions about their physical surroundings and visual media found online and in documents.

One of the first things to know about VATs is how they provide visual assistance, and whether this influences a person’s access to information presented visually. Some VATs are human-powered, meaning that the user directly interacts with a sighted volunteer or professional agent to learn about their surroundings or complete important tasks like cooking, shopping, fixing objects and more. Human-powered VAT applications enable the user to send the agent an image or video, ask questions about the content of the visual media, and in turn receive an interpretation and description. Examples of VATs that use human agents include Be My Eyes and BeSpecular.

Other VATs are AI-powered, meaning that they use artificial intelligence to analyze and identify the objects, text, scenes, and barcodes captured in images, then provide an auditory and/or braille caption. While these technologies do not yet provide opportunities to directly interact with an agent, for some users, they create a greater sense of independence and anonymity¹. Examples of AI-powered VATs include Supersense, TapTapSee, and AI Poly.

VAT Fact Finding

To provide this resource for you, we spent 60 hours scouring the web for VAT companies’ websites, reading blogs, following VAT companies’ twitter streams, and reading archived information about VAT companies’ growth and development using the WayBack Machine. In total, we found 17 VATs, developed by 20 companies (Link to Table).

This figure shows a list of the logos of the 20 products analyzed in this landscape analysis: Adobe Accessibility, Amazon Rekognition, AI Poly, Aira, Apple VoiceOver Recognition, Be My Eyes, BeSpecular, Envision AI, Envision Glasses, (Facebook) Automatic Alt Text, Google Lookout, KNFB Reader, LookTel Money Reader, LookTel Money Recognizer, OrCam MyEye, OrCam Read, Seeing AI, Sullivan+ , Supersense, and TapTapSee.

We gathered information on the technical design and company communications of each product; technical design information includes how visual information is captured, collected and processed, the types of visual media used, the intelligence type that powers the VAT (AI, human, or a hybrid). Company communications encompasses the information about their products companies communicate to users, taking the form of Mission Statement/Slogans, Keywords, Release Dates, and Use Metrics. For more information about our approach, read our short paper published and presented at the 23rd International ACM SIGACCESS Conference on Computers and Accessibility: “Landscape Analysis of Commercial Visual Assistance Technologies.”

Technical Design Takeaways

From the landscape analysis, the following five main trends in VAT technical design emerged. Understanding these trends can be helpful to evaluate where this access technology market is headed. Combining this knowledge with user-centered research and product development, can further the responsiveness of access technologies to users in ways that promote equity and diversity in terms satisfying user preferences and concerns related to privacy and more. The timeline below shows the five technical trends visually.

  1. The majority of VATs (13 of the 20) are mobile applications, while others are web-based or operate on different types of devices (e.g., glasses, hand-held devices).

A majority of mobile-based VATs are offered on both iOS and Android (9 of the 13 mobile application VATs). A majority of the companies offering VATs on both platforms released their service on iOS about 6 months earlier than Android.

This figure shows a timeline representing the increase in VATs from 2009 to 2021. The y-axis shows the 20 VATs in our study; the x-axis shows the years. For every VAT there is a symbol signifying the operating system with which the VAT works. A key finding from this representation is that between the year that the first VATs were released (2009) and early 2021 (when our data was gathered), two years stand out as the year with the greatest frequency of product introduction — 2015 and 2016.
A timeline representing the increase in VATs from 2009 to 2021. What do you think will happen in the years to come? A list with the same information as this figure is available at the end of this article (see List 1).

2. There is a growing trend to use AI to provide visual assistance through mobile applications. 11 of the 13 mobile application VATs in our set use AI to identify content in images and videos and convey that information to users. Only Be My Eyes and BeSpecular use humans to provide their service, and Aira uses both humans and AI (Aira agents use an AI-powered dashboard to assist users).

3. A majority of VATs take in visual information through the format of image files and/or live images, while human-powered VATs tend to use videos only.

Generally, the visual information VATs interpret come in three different forms: videos, live images, or images. 14 of the 20 total VATs take image files, 7 take live images, and 5 take video.

4. A majority of VATs use “the cloud” to process and store user data.

Seven of the ten AI-powered VATs and all three human-powered VATs use the cloud. At a lower rate, five of the ten AI-powered VATs and one human-powered VAT use users’ on-device storage. (Note: Aira used both on-device and cloud storage).

5. VATs that provide their service on non-mobile devices tend to store data on users’ computers and in the cloud. The majority of Web-based services store on the cloud and all use image files.

The VAT services we analyzed that are provided on non-mobile devices included OrCam MyEye, OrCam Read, and Envision Glasses. Web-based services included Adobe Accessibility, Amazon Rekognition, and Facebook Automatic Alt Text.

Company Communications Takeaways

The messaging that a VAT company uses to convey its service offering can influence people’s perspectives and expectations of what is technically possible for advances within the VAT industry, the importance of these access technologies, and why people use them. Below are 5 findings that surfaced from our landscape analysis.

6. VAT companies’ slogans and/or product names often emphasize the computational innovation of their products.

Six of the 17 companies we analyzed use language related to their technical AI capabilities. For example, Aira (Artificial Intelligence + Remote Assistance), Seeing AI, and Envision AI all emphasize their computational foundation of AI in their product names. Some companies, such as Supersense and AI Poly label their products as “Computer vision for the blind” and “An AI Assistant for the blind.”

7. VAT company slogans and/or product names commonly emphasize vision, and use language related to auditory output.

Seven companies emphasized vision in their slogans and/or product names. Companies that discussed hearing did so in terms of transmutation of visual to auditory information, and mediating between the two senses. For instance, Sullivan+ calls for users using their service to “…let it become your eyes to seeing the world.” Another example is how Microsoft describes how “Seeing AI turns the visual world into an audible experience.”

8. VAT companies use a wide range of key phrases to describe the benefits users experience when using their products.

Multiple companies used the following words directly or implicitly to describe the benefits they provide, including:

  • Convenience, 7 of 17
  • “Experience,” 6 of 17
  • Connection and community, 4 of 17
  • “Independence,” 3 of 17
  • Personalized experiences they create for each user, 2 of 17

Note: Phrases in quotation marks indicate that the word was directly referenced, while words without quotation marks were implied by the language companies use.

9. VAT Companies use language to emphasize the objects of access they bring.

6 of the 17 companies use the word “world” and 4 of the 17 use phrases focused on everyday life and objects.

10. Some companies describe the assistance they provide in terms of “help(s)” and “allows,” and “empower.”

The phrases we used at the following percentages of companies:

  • “Help(s),” 4 of 17
  • “Allows,” 3 of 17
  • “Empower,” 2 of 17

Summary

VATs are moving towards AI and cloud-based computing and thus bring up important questions of how this cross-industry shift impacts users’ experiences related to efficiency, accuracy, visual privacy, data retention, and more.

The language that VAT companies use in their messaging and slogans focuses on the technical innovation they are making, the benefits to end users, the senses that are mediated, and the impact of the product on users. These trends are important because they raise the question of how VAT company messages align to users’ values and decision making.

Are you a VAT user, developer, or UX professional? How does our analysis of the trends in the VAT landscape inform your perspective about these services? What questions do you have about our methods or findings? If you have thoughts you would like to share, add a comment below.

We encourage visual assistance technology users, developers, and researchers to consider whether the words companies use to describe VAT assistance and experiences align with inclusive, equitable design and ability-based design principles.

You can learn more about the rest of the findings from our landscape analysis from our ASSETS 2021 poster paper and our supplemental content.

Additional contemporary research on VATs can be found by exploring past publications within ASSETS Proceedings, including:

Supplementary List

List 1. shows the same information presented in the VAT Release Date timeline above. Both present the name of the VAT, the operating system (OS) with which it was designed to work, and the date of release for each OS’s release (where applicable).Aira: iOS 2016, Android 2018
Be My Eyes: iOS 2015, Android 2017
BeSpecular: iOS 2016, Android 2016
LookTel Money Reader: iOS 2011
LookTel Money Recognizer: iOS 2012
OrCam MyEye (2):
Other 2015
OrCam Read: Other 2020
KNFB Reader: iOS 2014, Android 2015
TapTapSee: iOS 2012, Android 2014
Lookout: Android 2018
Seeing AI: iOS 2017
Adobe Accessibility: Other N/A
Amazon Rekognition: Other 2015
VoiceOver Recognition: iOS 2009
Facebook Automatic Alt Text: iOS 2016, Android 2016
Envision AI: iOS N/A, Android 2018
Envision Glasses: Other 2020
Aipoly Vision: iOS 2016, Android 2017
Supersense: iOS 2020, Android 2019
Sullivan+:
iOS 2019, Android 2018

Footnotes

[1] Abigale Stangl, Kristina Shiroma, Bo Xie, Kenneth R. Fleischmann, and Danna Gurari. 2020. Visual Content Considered Private by People Who are Blind. In The 22nd International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS ’20. Association for Computing Machinery, New York, NY, USA, Article 31, 1–12. DOI:https://doi.org/10.1145/3373625.3417014

--

--