War Stories: My Journey From Blindness to Building a Fully Conversational User Interface
Apps are a nightmare for the visually impaired. So when I started losing my sight, I decided to fix that.
In the summer of 2011, when I was 25 years old, I found out I was going blind. What started as a routine trip to the optometrist ended in multiple trips to a retinal specialist, photos being taken of the back of my eye, and a procedure called an Electroretinogram (ERG) where hard contact lenses attached to electrodes (and hooks to keep you from blinking) are used to measure the electrical activity across different regions of the eye. I was diagnosed with Stargardt’s macular degeneration, a genetic disorder that destroys the retinal cells in the center part of the eye. There is currently no treatment or cure, though there are some promising treatments in clinical trials.
When I was diagnosed, I didn’t worry too much about it. I still had nearly perfect vision. I could still drive and read normally. I can’t say the same for my parents. They were really worried for me. Worried that I wouldn’t be able to maintain the livelihood that every parent wants for their child. If I had stopped to think about it, I would probably have been worried to. But my vision was still fine…what did I have to worry about?
Over the next year I started to get headaches from driving at night. (Have you ever noticed how bright headlights can be?) I also started to get eye strain from reading. It wasn’t a big deal though. I could just not drive at night, right? And eye strain? No problem, I’ll just increase the font size on my computer. But my vision continued to worsen. Eventually I didn’t feel safe driving during the day, and my eyes continued to ache. Once again, no big deal. I’m adaptable. I got a job in the city and sold my car. Public transportation is great and finding parking is more trouble than it’s worth — not to mention I owed more money in parking tickets than the total value of the car itself.
While getting around the city was no longer a problem, continuing to maintain my productivity with a computer was. I had to make more and more accommodations for myself. I created a Chrome plug in to make Google Reader easier to read. When Google Reader was shut down, I created a different plug in for NewsBlur so I could have the news articles read to me.
This whole time my then girlfriend, now wife, kept urging me to explore the resources available to people who are blind or losing their vision. I kept putting it off because…reasons? The retinal specialist who diagnosed me did a very good job of communicating the state of medical research. Pointing me in the direction of services or resources on how to cope; however, were not so forthcoming.
I eventually found the Lighthouse for the Blind in San Francisco. Once there I found out that the services were not free. First I had to get a caseworker from the state of California who would require a low-vision exam before agreeing to cover any services from the Lighthouse (my previous diagnosis, tests and all, didn’t count).
It was at the Lighthouse for the Blind that my eyes were first opened to the world of assistive technology and the realities of how people with vision loss and other disabilities participate in a world not designed for them.
Through the Lighthouse and through the larger community of people who work in accessibility I learned about what I had to look forward to as my vision got worse. When it came to the people I met, many who have been blind since they were very young and who have worse vision than I will ever have, I was blown away. They are extremely independent, good natured, and very successful in whatever field they chose to be in. (See, Parents? Nothing to worry about!) When it came to technology, however, I was appalled.
The primary tool used by the blind to access a computer or a smartphone is called a screen reader. It works by taking what’s displayed visually and reading it aloud as the user enters keyboard commands or gestures to move a cursor around the screen. Screen Readers, while powerful and giving baseline access, are extremely difficult to learn and to use, requiring special training from organizations like the Lighthouse for the Blind, and costing up to $1800 for a single license.
The Screen Reader is really the only option when it comes to using a computer or a smartphone, and this is a problem when you consider the demographics of who is going blind. The vast majority of people who are going blind are losing their vision from aging related disorders, aka seniors. That means people, many of whom have successfully avoided new devices like smartphones and tablets, are forced to confront a technology that is difficult to use even for technically proficient people. Imagine your grandmother using email or Facebook. Now imagine her navigating those websites just by using the keyboard.
It’s a scary thought for anyone to have to rely on this software to gain access to the apps and services we’ve come to use on a daily basis. I wouldn’t want that for anyone else. And I didn’t want it for myself.
So, I set out to build something better. How modest, right?
The Eureka Moment
From the beginning I hated the way that Screen Readers work. Why are they designed the way they are? It makes no sense to present information visually and then, and only then, translate that into audio. All of the time and energy that goes into creating the perfect user experience for an app is wasted, or even worse, adversely impacting the experience for blind users.
What if we designed an experience for audio from the beginning? How would it work? Would it be easier to use, and more importantly, would app developers support this new experience? It’s hard enough to get support for existing accessibility tools let alone something new.
We started to research answers to these questions by doing some user testing with blind and low vision seniors at a local senior center. Initially, we just went through existing accessibility tools, like VoiceOver on the iPad, and showed them how to do common tasks like check email and read the newspaper.
Skills varied. People who had been blind for longer or who were familiar with assistive technology got up to speed faster than those who had lost their vision more recently. While all expressed a desire to regain the ability to perform activities they were once able to do, not everyone was optimistic that technology could help. This viewpoint was best summed up by one participant who said:
“These tools are great. I’m glad they exist, but I’m never going to use them. They are too complicated. I just want Siri to read me the news.”
This was a profound idea. While Virtual Assistants are limited in what they can enable, the experience of using it is very intuitive. You ask your phone to do something and it does it. No translating actions into visual user interfaces with their own controls and then translating again into an audible experience. They just work, well at least when they hear you.
What if we could create a Siri like, conversational experience that grants the same level of access as a Screen Reader with the learning curve and intuition of talking to an assistant?
From Project to Company
In 2014 I started a company, Conversant Labs to take our ideas around conversational user interfaces for the blind and commercialize them. I had recently moved to Pittsburgh, home of much of the early research into Speech Recognition and Conversational applications, and started building a team. I met Greg Nicholas at a local co-working space. At the time he was working at CMU on software to help teachers of kids with autism. He had and still has, a deep interest in working on projects with social impact. We put together an advising team, including Sina Bahram, a blind Ph.D in Human Computer interaction and White House Champion of Change, who focuses on creating accessible museum experiences . And finally got some funding from the local Startup Accelerator, AlphaLab.
Team assembled, we set out to build a voice enabled shopping app for the blind. Despite screen reader support, shopping is one of these daily tasks that is still difficult for many in the blind community, and we thought it would be a good place to start testing out a conversational user experience while potentially improving people’s quality of life at the same time.
It turns out there is a lot more to building a voice-based app than just hooking up speech recognition and defining a set of contexts to understand what someone is saying. A lot more. And it was this additional complexity that kept us busy through the launch of SayShopping in July of last year.
There are entire ecosystems of companies built around building mobile and web applications. From UI Frameworks to payment services, much of the heavy lifting is done for you so app developers can focus on their specific product be it a social network, a retail store, or video game. All these services have a specific use case in mind, namely a visual-based experience for mobile devices or laptops. Building an inherently non-visual app meant that we were stuck rebuilding a lot of these tools. Unfortunately, there were some services we couldn’t recreate.
We wanted to make shopping as easy as talking to Siri, but we didn’t want to create our own retail store. We wanted to sell products from existing retailers that people already have a relationship with. And, we thought it would be easy. Most if not all major retailers have affiliate programs enabling other people to sell their products and earn a small percentage of the sale price. It’s nothing new; It’s the service that’s the entire basis of The 4-hour Work Week. What we didn’t take into account is that all of the programs require you to send the customer to the retailer’s website for checkout, which is of course entirely visual, and we couldn’t easily convert it into a conversational experience.
It took us a long time, trying different approaches, using services built specifically for affiliate businesses, before we were finally able to find a solution. After months of trying to work with retailers directly, we were able to start a conversation with Target, a company that takes accessibility very seriously. They gave us access to their product catalog and more importantly to their checkout API’s, so we could finally allow our users to complete a transaction with their voice. This access, however, came with its own costs. As seriously as they take Accessibility, they (now) take security even more seriously. As a third party accessing Target’s services, we had to undergo an external security audit (aka a penetration test) which was as expensive as it was time consuming. Once passed, we were finally able to release the app, now called SayShopping, publicly, which we did at the National Federation of the Blind’s annual Convention in 2015.
Today, we’re taking what we’ve learned from SayShopping and generalizing it, so that others can benefit from our work, and so we can move closer to the fully conversational alternative to using a computer. Until then, I’m continuing to rely on the tools that are currently available to me. In the office, I use a screen reader paired with a comically large monitor, zoomed in to the point where fully-sighted people can read my email from across the room. I do all of this on a PC because that’s what you have to use to have the best screen reading experience. Unfortunately, I do all of my coding on a Mac in XCode, so I have a second monitor hooked up to a Mac for coding. I use a program called Synergy to share the mouse and keyboard between the two machines, and I have a 12-channel mixing board plus a microphone, set up so I can wear just one pair of headphones and listen to the screen reader on both machines. Needless to say, there is a lot of work that goes into trying to maintain as much of my productivity as possible as my vision continues to decline.
Outside the office, my vision is still functional enough where I can walk to work without too much trouble. I’m not allowed to jay-walk anymore, though, as I’ve had too many close calls not seeing cars coming (sorry, Mom). I can see people walking towards me most of the time, but I can’t make out their faces. This gives me a lot of anxiety in networking situations, where I don’t recognize people I should know, and I worry they are going to be offended. On the other hand, it’s a great excuse for when I don’t remember someone’s name. (Speaking of being socially awkward, taking coffee meetings usually start off with me asking a bunch of strangers if they are the person I’m meeting with: “I’m sorry are you Laura?” The worst is when their name is Laura, but they aren’t the person I’m supposed to be meeting with.)
By the time I’m legally blind sometime in the next 3–5 years, I’m hoping to have created a better option for getting work done and generally using a computer. Solving the coffee meeting problem, unfortunately, is a little beyond the scope of our current company.
Vision for the future
While we set out to create a new and better way for the blind to use computers, so I could maintain my quality of life as my vision declines, we chose a conversational interface as the solution for an entirely different reason.
As devices become smaller and smaller and everything around us becomes internet enabled, and as virtual and augmented reality gain in popularity, traditional methods for using our devices are no longer going to cut it. We are going to have to design entirely new methods of using our devices and we think that voice-based interfaces have the potential to create a powerful and unified experience across all of these form-factors.
For the first time accessible technology for the blind can drive innovation for everyone else, rather than having to play catchup, and that’s pretty cool.
Chris Maury (@CMaury) is the founder of Conversant Labs, a company building voice-enabled applications for the blind and visually impaired. Chris was diagnosed with Stargardt’s Macular Degeneration in 2011. He is the co-organizer of the Pittsburgh Accessibility Meetup a group with 200 members discussing how to make the world around us more accessible to people across disabilities. Before moving to Pittsburgh in 2013, Chris was a Product Manager for Klout.com and Imageshack.com in the San Francisco, Bay Area.
You can read more about us and our mission at conversantlabs.com.