Making the Web More Accessible Using Machine Learning
A look at the state of web accessibility today and how machine learning could help make a more accessible web for all.
Written by Laura Johnson
Machine learning now drives a huge number of day-to-day interactions on the web. When we use Netflix, Amazon, Microsoft, Google, or any of the big social media sites, we’re encountering machine learning (ML) algorithms that help provide a richer and more personalized user experience. Though we don’t consciously think about the ML powering our programs very often, it’s always there.
But if thinking about an algorithm-powered tool directing some portion of our everyday experiences is rare, it’s even more rare that we think about how these tools affect users with disabilities. Machine learning is being used to improve general consumer experiences all the time, but could it be doing more to improve specific experiences? Is the power of these tools, for example, being leveraged to improve the user experience for those using assistive technology? Is it improving automated accessibility testing tools? And if it isn’t, why not? What’s keeping us from harnessing ML for a greater good?
Web accessibility is an important topic to discuss and machine learning is one of the most exciting and talked about technology spaces today, which is why looking at how the two intersect, overlap, and impact one another is so important. We interviewed a number of experts in accessibility testing and advocacy — Sina Bahram, Denis Boudreau, Karl Groves, David MacDonald, Jared Smith, and Myplanet’s own Everett Zufelt — to find out what is happening now in terms of applying ML to web accessibility challenges, and to get a sense of what is on the horizon.
The experts we spoke with talked about possibilities and ideas for new algorithms and tools, which is part of what we’ll explore in this document. They also addressed web accessibility applications for many of the popular types of algorithms, some of which are already being integrated into assistive technology. Because the field is still so new, we won’t be discussing proven solutions in detail. Instead, we want to introduce the possibilities on the near- and longer-term horizons to stimulate discussion around what needs to happen to make the solutions a reality and spur action in that direction.
To help structure our discussion, we’ve broken this paper down into three sections: Potential Applications of Existing Tools, which covers the integration of current machine learning algorithms to accessibility challenges; Possibilities for New Tools, which looks at a variety of future opportunities that could be harnessed with the development of new machine learning algorithms; and The Data Problem, which addresses the challenges surrounding the quality and quantity of data available to make these solutions work. We’ll begin our discussion there.
The Data Problem
There are countless opportunities for applying machine learning to current web accessibility challenges. And beyond the application of pre-trained algorithms, machine learning holds promise for accessibility through the application of entirely new types of algorithms. But there’s a problem: a lack of well-annotated data. And well-annotated data is the fuel for machine learning.
This is an issue that came up repeatedly when discussing possibilities with the experts, and one that Sina Bahram of Prime Access Consulting describes in detail.
“When a large company like Google wants to put together various models — for example language or computer vision models — they have a great deal of data to work with. We’re not talking millions of rows, we’re talking billions of rows. With that level of data, it’s straightforward to then use best practices within machine learning to get pretty good classifiers,” says Bahram.
But we can’t all be Google, which means we’re facing a big challenge.
“We don’t have a dataset of a billion webpages whose semantics are well-annotated,” he continues. “So the question becomes, if we’re not going to take a rather naïve data model approach where we have arbitrary input and transformation comes out, then what can we do? What are the types of things we can do to solve sub-problems within the space or to help build that data or to at least warn about issues without necessarily being able to fix them?”
Several of our experts pointed to the possibility of utilizing user observation to build the data relationships needed to fuel machine-based inferences. User observation could take the form of an ongoing, large-scale study of human-computer interactions, connecting the interactions, their context, why they happen, and the data result.
But as Jared Smith of WAVE points out, any study of user interactions would need to take into account the optimal end-user experience for users of various assistive technologies. A group of many screen reader users, for instance, might not be willing to have a machine track their activity to determine their usage patterns. (To compound the issue, we’d need different studies for different user groups.)
Karl Groves of Tenon discusses the possibility of using web analytics to gather user data, which can then be used to train machine learning algorithms. Again, we would need to know the user and their platform. The data that’s already being collected by Tenon or other rules-based testing tools can factor into training algorithms, with correlations drawn between Tenon data, machine-discoverable problems, and the analytics data that would give us clues as to how those issues impact the user.
There is also the option of using a pre-trained algorithm to help train a new algorithm. Perhaps a simulated user — a concept we’ll discuss a little later on in the AI Simulation section — could navigate through a website with the addition of screenshot or video analysis, in order to generate artificial user data.
In terms of data challenges, then, there are open questions that need to be resolved: Where does the user data come from? Would users be willing to have their usage tracked, or to flag webpages? Could we train a machine to simulate user interactions? Could we look at bounce rates from analytics to infer accessibility errors? There are a lot of possibilities that need to be explored to determine whether they are viable and what kind of data they will produce.
Rome wasn’t built in a day and we’re unlikely to have answers to all of these immediately, but they’re worth considering. For now, however, let’s leave the data problem and jump into some of the ideas for machine learning-based tools relevant to web accessibility.
Potential Applications of Existing Tools
AI-generated image descriptions have come a long way. In the beginning, image recognition algorithms gave descriptions of different components within an image, such as “two cats” or “body of water”. As the algorithms advanced, so did the descriptors: “two orange cats play with yarn”. Without context, however, even more detailed descriptions are often not particularly helpful. But that might be changing.
Facebook, for example, is now starting to contextualize image descriptions by whatever means they can, including user preferences, recent conversations and events related to the user. They’re calling these machine-generated image descriptions alternative text.
Several of our accessibility experts take exception to that nomenclature, arguing a machine can never truly guess the intent of a content author. However, they all agree that a computer-generated description is better than no description. And captioning tools like this, if incorporated with automated accessibility testing, can assist us in generating alternative text for an image or act as a fallback for assistive technology when there is no alt text available.
There is also growing recognition that the current image captioning systems are not particularly well-aligned with the needs of blind and low-vision users. To this end, Microsoft with its AI for Accessibility program has partnered with the University of Texas at Austin on the Microsoft Ability Initiative, the aim of which is to create a public data resource specifically geared to accessibility needs that can be used to power AI captioning tools. This dataset will be an invaluable resource for integrating image recognition for accessibility.
Every automated accessibility testing tool currently on the market tests whether an image provides alternative text or not. By integrating with image recognition technology, they could perform that same test in tandem with image recognition, which would allow us to do a couple of different, very powerful things:
- Provide suggestions for alt text in the generated report
- Compare those suggestions to any pre-existing alt text and flag it if the accuracy of the alt text to the image seems low (what’s known as a low “certainty equivalent”)
Some automated accessibility testing tools already make use of a “certainty” scale for evaluating test results. Tenon, for example, already uses a rules-based method for determining whether an alt attribute is accurate (in other words, does the content makes sense in context?). In this case, the image recognition algorithm can be called only when the certainty of accuracy is low, which would speed up the automated test by making fewer calls to the algorithm.
There are assistive technologies that already incorporate image recognition — JAWS, for example, with its Picture Smart image recognition feature. This feature lets users submit an image to be analysed by “various services (such as Microsoft and Google)” and access the results that are returned.
Denis Boudreau of Deque believes that combining image recognition with a content summarization algorithm will eventually allow machine learning to give context to images within a page, giving a better approximation of the intent of the author.
“Any automated tool out there right now will be able to tell you if an image has an alt attribute. But no tool will be able to tell you if the alt attribute value is relevant within the context in which it’s being used,” says Boudreau.
Humans are required for that type of analysis today, but in the next few years, as image recognition improves and as abstractive summarization of text becomes more reliable as well, it’s easy to imagine how that will change.
“We could parse through text really quickly,” Boudreau adds, “and see whether the essence of that text carries enough value to provide a context for that image within that page.”
There’s clearly a lot of potential in combining existing algorithms with assistive technology. As the algorithms improve, so will the accuracy of the image descriptions. In automated accessibility testing, it will be used to make the laborious job of both testing for the accuracy of alt text and manual remediation faster and easier.
And what about using image recognition with graphical images? Graph accessibility is a major issue for low-vision users. David MacDonald of CanAdapt discusses getting contrast requirements for graphics into WCAG 2.1, and the possibilities for using machine learning to aid in accessibility of graphics.
“Take a bar graph: if the bars are not in proper contrast with each other and there is no separation between the bars, it’s often unreadable,” says MacDonald. “The worst example would be a pie chart where you can’t distinguish between two slices of the pie.”
For low-vision or colour-blind users, it might render the graph meaningless. And it’s an issue that may impact more than 5% of the average population.
It’s worth examining ML use cases for colour contrast. This type of contrast analysis on an image is certainly possible using machine learning. But the real improvements will come when the technologies are combined: when we integrate this analysis with automated accessibility testing tools or going a step beyond that, when we use machine learning to extract the data from graphics for screen reader users.
AI Simulation of User Navigation
AI is commonly used in video games to simulate human interaction. This technique often captures screenshots or video and uses machine learning to classify what is taking place in the screenshots as the AI performs actions. This form of Machine Learning can be called Deep Learning as it involves multiple layers of input. Elements of this technique may be adaptable for use in automated accessibility testing.
There are accessibility errors, particularly in the area of keyboard accessibility, that are difficult to test for using automated tools. Consider this example: if a modal element opens and receives visual focus but it doesn’t receive programmatic focus, then a keyboard-only user will not be able to interact with the modal nor will they be able to close the modal. This is a common example of a keyboard trap.
Jared Smith says that this type of issue generally requires human vision to find in testing. But consider the use of AI to navigate through webpages using keyboard only navigation, where screenshots are used to give feedback on what is happening on screen. The machine could identify if it was no longer able to navigate, if navigation was extremely inefficient, or if it was able to navigate but nothing was taking place on screen (as would be the case, for example, when a user is able to navigate through a hamburger menu even though the menu is collapsed). Using an AI-based observation of the webpage like the one described could aid in flagging the problem.
Returning to the issue of colour contrast, Karl Groves notes this is another issue that’s difficult to detect without visual testing — which makes it another potential candidate for simulated user interaction and screenshot analysis.
“You would think colour contrast is easy to determine,” says Groves. “Measure the foreground and background colour, then subject those findings to a mathematical algorithm that measures the contrast.” And while that does sound easy and straightforward, nothing in life is ever that simple.
“What if this item is absolutely positioned?” he adds. “Now you have no idea on what area the content is absolutely positioned or what that background colour is. If you add to that CSS3 transitions, gradients, and animation, you can throw any assumptions about contrast out the door.”
So. Not that simple after all.
However, if we had AI simulating user experience and comparing it with screenshots that showed the contrast of elements on top of various backgrounds, it could potentially flag any major contrast issues.
Once we start thinking through these frequently occurring and obvious challenges, it’s easy to imagine that there are a host of other applications for simulating the user experience of navigating webpages — including generating annotated code.
Content simplification has the potential to simplify, restructure, and represent content in ways that better facilitate the needs of people with cognitive and learning disabilities. There are tools that are doing this already, such as IBM’s Content Clarifier, which can take any piece of content and simplify it, but there’s more that can be done.
Jared Smith suggests this type of content transformation could be offered the same way that language translation is offered. “Could you take that same content and translate it from simplified English to not-English-at-all but entirely graph-based representations? Could you chunk that content into smaller, more digestible pieces that a person with a learning disability could then consume as content in the small, little chunks that they need? That’s pretty exciting.”
Everett Zufelt expands on this idea and its utility, speaking about the value of such a tool and its ability to extend to other users as well. “Machine learning can synthesize the entire document for you down into its salient points,” he notes, “which by the way isn’t just good for persons with disabilities — it’s good for lots of people who might want to read the five sentence summary of the 1200 word article.”
The applicability of this technology is vast and, as always, highlights how a more accessible web experience for some users is a better web experience for all users. Used in conjunction with other tools, content simplification could be combined with simplified navigation and incorporated into any number of scenarios where the user chooses to read summary text ahead — or instead — of any chunk of content.
Possibilities for New Tools
Today, accessibility testing tools can use rules-based methods to point at specific WCAG issues (for example, does an image have an alt attribute or not?) But these tools can detect only 25% of total errors; the other 75% require manual testing. A machine learning-based tool would behave differently, showing probabilities of classes of errors based on patterns that it detects in the code, ideally increasing the percentage covered through automation.
A web accessibility expert can often predict the presence of problems that may seem unrelated to an issue at hand during testing. For example, a certain type of issue within heading structure — a page that doesn’t have a first-level heading, for instance — suggests that other heading structural issues may be present.
Without extensive testing, we won’t know how effective a testing tool trained using ML might be, but we can safely assume it would leverage the intuitive heuristics that experts employ today.
Sina Bahram talked about an experiment he conducted in this area:
One of the projects we did 8+ years ago at NC State was to build a classifier to try to teach it, given an arbitrary web page, how to classify it as having accessibility issues or not. What was interesting was that it did seem to point out and quickly gravitate towards some very common features that were shared amongst pages that were indicative of other accessibility issues being on the page.
If the dataset is expanded exponentially, those classes of problems could become more refined. Even a tool that suggests likelihood of certain non-detectable errors would be helpful.
Of course, because machine learning methods can reveal obscure relationships that might not seem intuitive to humans — but are nonetheless valuable — an artificially-intelligent accessibility testing tool could start to detect patterns that automated tools and even humans can’t. These patterns would have the potential to inform a level and nuance of issue prediction impossible for a human to achieve.
Applications for Maker & Expert Workflows
What are the uses for such a tool? Bahram believes that as part of continuous integration or a particular Git flow the tool could be incredibly valuable.
And within existing testing tools, machine learning could reinforce confidence and certainty when it comes to issue detection and classification. As Everett Zufelt points out, a tool that articulates issue likelihood and confidence in its classification would help direct and expedite expert review and remediation.
“Today, rulesets can’t say ‘I’m fairly confident that your mega menu is not accessible’. You have to know to look for that yourself,” he notes. “It’s not going to come up on the WAVE evaluator, it’s not going to say ‘I think you have a carousel that rotates on the homepage, and there is high confidence that it fails WCAG in these three specific ways’.”
But if it could?
“That adds a lot of value, because it now becomes an educational tool rather than just a simple checklist of things that are wrong,” adds Zufelt.
Applications for non-Maker, non-Experts
A high-level indicator would benefit non-maker and non-expert workflows as well. Karl Groves believes that such a tool could warn end users of potential issues on a website. As a browser extension, the tool could display a warning: “This looks like the type of website that will give you problems with forms.” If configurable to fit the needs of a particular user, the value of this type of mechanism would be significant.
Project owners or executives that drive technical decision making in organizations would also benefit.
“We can use it as a tool for selecting technologies,” says Groves. “In the Federal government in the United States, people pay a lot of money for commercial, off-the-shelf products. Imagine that their decision is between two CMS and this hypothetical tool has revealed that one of them produces websites that have certain accessibility risk patterns associated with them. They can then decide to accept those risks or to avoid them.” Informed decision-making can save countless dollars and headaches for both providers and end users.
Groves goes on to extol another potential gain: “I think that one of the goals, even if it’s a secondary goal, would be to drive innovation. If you expose a company’s problems, then they’re going to want to fix their problems. But if it’s not known, then it’s not a problem for them.”
(Note that Google and other search engines already tend to rank more accessible pages higher. A more accessible website is easier for a machine to search and understand, further reinforcing the business value of accessibility and inclusive design.)
Evidently, a high-level indicator like this has the promise of leading to a lot of helpful applications down the line that can reduce overhead and guide better decision making.
If you examine a modern website with proper semantic markup, it usually has a header. Within the header, you’ll likely find a primary navigation and secondary navigation. The site will have a main region that maps to the <main> tag, if people have done their job correctly. There will be a footer. There’s <article>, <aside>, all of the HTML 5 semantic regions, plus the various associated elements and landmarks within those regions.
And in all that, there are two problems that screen reader users often face:
- In websites that lack good code structure, it’s hard for a screen reader user to orient themselves within the site’s content.
- Items that are grouped together visually are often not labeled as a group for screen reader users, making it difficult to understand which element belongs to which group.
Sina Bahram has proposed addressing poor markup by employing ML-driven, segmentation-based approaches to label the different regions of a page. Beyond labeling, the system would actually extract the content and rearrange the markup in a way that is not only accessible, but ideal for screen reader users in addition to other types of users (low vision users, for instance):
We could take a webpage, hit a button, and make it appear as if the developer meant to have different markup there. Especially if we’re relaxing the constraint that we don’t care if we affect the way the page looks visually, because we’re doing it for a screen reader user. Machine learning could come into play in a few different ways: Number 1, by identifying those different regions. Number 2, by identifying modifications that could be made to take advantage of some enhancement that the user needs, for example taking the text and doubling the size. And then number 3, by presenting the overall hierarchy in an accessible way to the user.
It’s important to note that such a tool would not magically fix all accessibility issues on the web. The correct solution is still to write proper semantic markup in the first place and creators must continue to author proper, accessible websites. But a solution like Bahram’s suggests that we can provide tooling to persons with disabilities to get to what they need, even when semantic markup is lacking.
Everett Zufelt points out that already, if a document is designed semantically, browsers like Safari will go into reader mode and cut out a lot of the extra markup. But it would be extremely helpful for users to have the ability to jump quickly and easily to the content they care about on the page, regardless of the web developer’s mistakes or oversights.
Bahram and Zufelt also talked about the benefits of grouping data. In a scenario where information is presented in list form, a screen reader user will struggle to determine which repeating calls-to-action or interactive elements apply to which items in the list if a developer has not done the work of creating an aria-describedby label. For example, a list of articles with corresponding links to share each article can be extremely disorienting. There are countless examples like this one.
Moreover, in the age of responsive design, developers often simulate tables using nested divs and spans to enable more elegant transposition into list form on a mobile device. But this means that, despite how they look on various devices, screen reader users don’t have table semantics anymore. Because the developer has done away with a true HTML table, screen reader users can get lost in a long list of decontextualized data that lacks meaningful, associated column headings.
If an AI-based, machine-learning taught browser extension grouped these items together and labeled them, it would be enormously helpful for screen reader users. This concept could also be applied to rotating carousels, where the content from slides could be extracted and stacked. Screen reader users would then potentially have different options for navigating through the slide content, including skipping it entirely.
Broadly-speaking, if we can reorganize pages in a way that is easily navigable or easier to see, users of diverse abilities will be able to easily navigate web content regardless of how semantically well-structured a document is. But perhaps more importantly, this type of real-time, user-specific accessibility remediation has the potential to level the playing field in the context of assistive technologies (AT) altogether.
Currently, many users rely on outdated assistive technology, which isn’t compatible with the newest accessibility advancements, including ARIA and HTML5 elements. This means many accessibility-minded organizations and companies must straddle the line between being both cutting edge and backwards compatible in order to remain accessible. This backwards compatibility adds significant cost, and for organizations without the mandate or budget, tends to be deprioritized. In turn, this limits exposure of content, advantaging organizations with deeper pockets.
But perhaps the more problematic asymmetry (net neutrality concerns aside) lies with users. Users with outdated AT are often socio-economically disadvantaged and assistive technology can be expensive.
In Ontario, where Myplanet is headquartered, the government subsidizes the cost of a new computer plus AT for persons with certain disabilities every 5 years. In many other parts of the world, however, similar programs don’t exist. Having the ability to simplify and replace web markup could break down the barriers introduced by outdated technologies, making the web truly more accessible and empowering businesses to provide high-calibre experiences for all.
Artificial intelligence and the advancements that machine learning affords have massive potential in the accessibility arena. Of course, there’s still a lot of research to be done, but the possibilities have a great deal of potential and need to be explored.
The success of some of the solutions we’ve discussed will depend on first clearing the hurdle of acquiring user data. Other ideas, such as integration of existing machine learning tools with accessibility evaluators, don’t rely on user data and can begin today.
Our experts suggested at several points that the greatest value to the community would be to make new tools like these open source. When developing tools such as browser extensions for end-users with physical and cognitive disabilities, understanding that a large percentage of them are socio-economically disadvantaged means that ideally the end product or service would be free. Paying for a subscription service is often not feasible.
The ideas and solutions we’ve discussed are never going to replace the need for real humans to write standards for accessibility, to do manual testing, and to educate developers and content editors on how to create accessible and semantic code. Accessibility is increasingly part of the language of web development and that must continue. But even imperfect solutions that expedite our journey to a fully accessible web are worth investment. The benefits are clear for people in many different roles — from those in accessibility testing to developers, project managers, and ultimately end users.
Denis Boudreau sums it up nicely:
What I’m looking forward to is that moment when these algorithms are good enough or provide good enough quality output that the definition of our job goes from providing information to the algorithms to training or coaching them, so they do these things in a better way. And as we do that, as the algorithms pick up on what us humans consider to be quality work in that particular area, then the content becomes more semantic and more accessible. And all of a sudden, people with disabilities begin to benefit from all of this.
We thank our experts for their excellent input and we look forward to further discussions and collaboration with the community. It will be fun to see where all of this takes us.
Click the button below to get a downloadable PDF: