Accessibility APIs and Enablement

Semantically Architecting Applications for Runtime A11y

The author developing software on his laptop computer in HomeAway’s Austin office. Photo by Kayla Chance.

Introduction

One might think that, in order to justify resource allocation to the accessibility enablement of services or content, we first need to detect the presence of assistive technologies (ATs) consuming the content. We could then understand the market for the audience and prioritize resource allocation based on detection analytics.

The detection of AT, like screen-readers, is an interesting problem, but it diverts our attention in several ways from accessibility enablement and the significant problems inaccessible content can pose. First, there is no deterministic approach to detecting AT usage consistently across user agents, and the detection that is achieved may mislead implementation efforts. Indeed, failing to fully enable accessibility may be costly and lead to non-compliant or even legally questionable solutions. It is best to view accessibility as one facet of universal usability, the ability of persons to access applications and information in any situation, regardless of native abilities or environmental limitations. I want to use this post to discuss how accessibility (a11y) enablement is actually implemented in platforms and by developers and content authors.

Accessible Runtime Platforms

Historically, we have defined the accessibility of information or an application as the degree to which it was usable by persons with disabilities using ATs. Applications or content might be consumed on a desktop, in a web browser on a desktop or mobile device, or through an application native to a mobile device. To keep things simple, I will define a platform as a runtime environment for an application, an environment in which an application is executed. Runtime platforms typically include APIs and linkable libraries, application protocols (e.g. for event handling, I/O, and scheduling) and presentation frameworks. In this way, the Microsoft Windows 10 Desktop, a web browser, Java Standard Edition (SE), and iOS 10 all count as runtime platforms.

Before tackling the question of what it means to enable an application for accessibility, we need to consider whether or not a runtime platform even supports accessibility enablement. This is a legitimate question since the first several versions of the iPhone, for instance, supported enabling neither native applications nor content in the Safari web browser. Apple did not introduce accessibility support until the iPhone 3GS. Early releases of the Microsoft Windows operating system did not include accessibility support until the introduction of the Microsoft Active Accessibility (MSAA) library in Windows 98. This, too, was rudimentary support since full application integration did not happen until several years later, as was demonstrated by the fact that my first experience with Windows 98 did not permit me, a totally blind individual, to fill out the simplest of web forms in Internet Explorer.

There are several requirements for us to say that a runtime platform supports a11y enablement. Such accessible platforms must provide:

  1. a way of exporting additional semantic information about an application;
  2. a manner in which to observe state change in the application;
  3. device independent access to the application.

The primary aim of an accessibility API boils down to its ability to serve these three purposes. I will present several examples later in this post. I will first, though, spend a bit of time explaining the first goal above, since it is by far the most important and the most difficult.

Note: For a deeper discussion of the motivation for these terms and a deeper explanation of some of the other concepts presented here, see Accessibility requirements for systems design to accommodate users with vision impairments, an article by my former colleagues at IBM.

Eyeglasses clarifying blurry computer screens behind them. Photo by Kevin Ku on Unsplash.

Semantic Information vs. Presentation

What does it mean to export additional semantic information about an application? You are currently engaging in one of the simpler forms of semantic information consumption, reading. Although complex from a neurological perspective, your comprehending the words you are currently viewing and being able to form more complicated concepts and complete thoughts from the sentences they form derives from only the words on the page. There is little or no gap between the presentation of the information being consumed and its semantic import (with the possible exception, say, of using font for emphasis). Hence, achieving cognitive parity between the sighted reader and the screen reader user is straight-forward in this case, a mere exercise in text-to-speech or transformation of the character output to a braille display.

Next, consider a web form, say, for completing a purchase. The form asks for the buyer’s name, billing and shipping address, and credit card information. The form presents labels for indicating the purpose of each field in which text is to be entered, asterisks or color-coding for indicating required fields, groupings of fields to indicate related information, and individual widgets like buttons and drop-down lists to guide the user in how to interact with and complete the form. The semantic import of the form is primarily a series of questions for the buyer that demand certain types of answers in a certain format, and the presentation (along with the words on the page) is a key component in conveying these semantic components. Conveying the semantic information included in the presentation layer to a blind or visually impaired user, then, is a much more demanding task than a simple exercise in text-to-speech transformation.

Now, consider a final example that exhibits the tight coupling that can exist between semantic content and visual presentation. A colleague recently sent my team a Datadog alert indicating that , “queries to one of our services spiked ~3x at 2pm,” accompanied with a link (presumably to a graph depicting the spike). I confirmed with the colleague that this graph showed the time over a four-hour period (given the query parameters in the link) along the x-axis with the number of requests hitting the node shown along the y-axis. A curve is then drawn from left to right showing the progress of the number of requests made to the node over this time interval.

The semantic content of such a graph is entirely derived from the pixels that make up this image. Viewing the graph, an engineer could instantly see a spike in requests at a particular time, surmise the overall average requests per time unit for this node, and determine the relative intensity of other spikes (if any) leading up to the spike in question. Indeed, even the most fundamental semantics of the image — that it is a graph relating time and number of requests and the units by which time and request count are measured is immediately obvious to a sighted engineer. Here, a picture really is worth a thousand words or, as the famous media theorist Marshall McLuhan put it (albeit in a slightly different context), the medium is the message. The only way to achieve cognitive parity between the blind and sighted reader is to have the latter describe the graph to the former.

MVC and the Accessible Object Model

In the cases of the web form and of the graph, the aim is to export the semantic information that is conveyed by the presentation to the screen reader user. This is, of course, not a new problem. Indeed, rendering a view of data and a means by which to interact with it is what these applications are already doing. The view of the form in the browser or the image of the graph are just two of many views of the data that might be provided.

This way of thinking about data and our interactions with it is so prevalent, in fact, that an entire architectural approach has been built around the problem, the Model-View-Controller (MVC) architecture. In this paradigm, the model embodies the data, the state of the data, and the operations that can be performed on the data, while the view is the rendering of the data to the end user. The controller sits between the model and the view, adjusting the view when the model is modified and the model when the user interacts with the view.

A11y enablement can be understood in terms of the MVC architecture. The AT (the screen reader, for example) has the goal of rendering a view to the user that is consumable by that user. This view consists primarily of speech that conveys information about the application’s model and state to the user. Often, such information will include contextual information about the application that might be readily obvious in the visual presentation such as a list of form controls, all of the links on the page in alphabetical order, or the current row and column position of a table as the user navigates it. The controller is also the AT since the role of the AT is to intercept interactions from the user (e.g. keyboard input) and update the model accordingly as well as to keep track of updates to the model in order to alert the user within the context of the view to the update.

The model in this MVC understanding of accessibility enablement is the accessible object model, a model that is the primary artifact of an accessibility API. In most cases, the accessible object model is tied to the UI object model that is being employed by the application. For example, in a web application, the accessible object model is tied to the live document object model (DOM) within the browser. Within a Java Swing application, it is coupled to the hierarchy of javax.swing.JComponent objects that comprise the Swing UI.

It is somewhat unfortunate that this coupling between the accessible object model and the UI component model exists. After all, there is nothing about the view that is rendered by an AT that dictates it is derived from the same set of components and interactions as traditional UI models. However, there are two good reasons for this coupling. First, these UI models are familiar to developers and it is usually straight-forward to programmatically move between the two models, ensuring the characteristics of the visual and alternative presentation are similar. Second, it allows the AT user to participate in a similar experience of the application or content as that had by her sighted counterparts, offering common ground for an understanding of the application.

Some Sample Accessible Object Models

  1. One of the earliest accessible object models was developed by Microsoft as part of the Microsoft Active Accessibility (MSAA) for the Windows 98 platform. Users of the .Net and earlier COM or MFC frameworks used the IAccessible object to retrieve or set additional semantic properties for alternate presentations for ATs.
  2. With the advent of the Java programming language and, later in the so-called Java 2 releases which included the Java Swing UI framework, a much more sophisticated accessible object model became available. Indeed, each instance of a javax.swing.JComponent in the Swing UI hierarchy provides the getAccessibleContext method, which returns an AccessibleContext object. This is the object used for exporting additional semantic information to an AT in the world of Swing.
  3. On the Linux platform, applications can implement the Accessibility Toolkit (ATK) in which the primary member of the accessible object model is the AtkObject.
  4. To supplement MSAA, the Linux Foundation, Microsoft, IBM, and a number of other organizations worked to create the IAccessible2 interface, an implementation primarily for the Windows platform to help it gain parity with other accessibility object models like those just enumerated. This effort was originally conceived in order to support accessibility-enablement of the Open Document Format (ODF).

This is obviously not an exhaustive list, but reviewing the documentation referenced above should reveal a familiar way of modeling application data and the operations permitted on it. The key difference is that the aim of the exposed model is to aid ATs in rendering an alternative presentation for the user with a disability.

The author working at his desk at HomeAway Austin. The desk is equipped with monitor arms that hold no monitors. Photo by Kayla Chance.

Common Properties

Every accessible object that is a part of the model revealed by an accessibility API has, at minimum, the following properties:

  • accessibleName — a short description or label for a control. The accessibleName property is often the label on a button, the text for a checkbox, or the label associated with a drop-down or text field.
  • accessibleRole — a description of the purpose or role of the component in the presentation. This property is one of the most important of those presented by the accessible object, informing the AT how to interact with and present the widget. It’s value is often simply the type of control having focus, e.g. ‘button’, ‘text field’, ‘tree’, ‘table’, ‘calendar’. It is also one of the most difficult properties to consume, as most ATs only include implementations for presenting and interacting with a small set of roles.
  • accessibleState — the state of a control, e.g. ‘selected’, ‘checked’, ‘activated, ‘pressed’.
  • accessibleValue — the numeric value, e.g. within a slider
  • accessibleParent/accessibleChildren — the parent object/child objects of this object. Like many data models, the accessible object model is an object graph. Among other purposes, this allows semantic information regarding the relationship among objects to be provided to the AT.

Specialized Model Members

The above properties provide a description of the many types of components that may be sufficient to provide an alternative rendering of that component for an AT. Nonetheless, more sophisticated controls require their own mode of presentation and interaction. For example, a table must not only have a name and a role of ‘table’, but provide a way to navigate by cell, row or column, distinguish row and column headers from standard data cells, or associate header values with cell values as the table is being navigated. Some other specialized components within the accessible object models just listed include:

  • accessibleTree — for navigating tree nodes, managing collapsing and expanding nodes, and querying the depth
  • accessibleList — for navigating members and querying state or position of a member
  • accessibleMonitor/ProgressMonitor — for reporting on progress
  • accessibleText — for providing information about styling, cut/copy/paste functionality, and querying position. Indeed, it is just such an interface that is providing my AT with enough information for me to write and edit this document within the wiki editor.

Other features

Three other features of the accessible APIs just described should be noted because they permit these APIs to fulfill all three criteria of an accessible runtime platform. First, most provide access to a set of accessibleRelations. Relations between accessible objects provide context for controls that are typically given in visual presentation by placement. For instance, two common (and symmetric) relations are the labelFor and labeledBy relations. An accessible object for a text field, for example, might include the labeledBy relation in its set of relations that points to a label. In turn, that label would contain the labelFor relation for the text field. The screen reader’s logic for interpreting the relation can then assign the name or description of the field based on this relationship.

A second feature is the accessibleAction set of an accessible object. This interface provides the device-independent manner of performing actions, the third criteria of an accessible runtime platform. It exports information about the default action for the object (usually its activation) but also includes alternative actions and information about their invocation and effect.

Finally, notice that accessible objects implement the standard observer pattern for allowing the AT to monitor state change within the application. This permits the AT, for example, to indicate to the user that a checkbox has moved from the ‘unchecked’ to the ‘checked’ state once the activate action is invoked. Other features in most of the APIs also provide a number of methods for tracking the gain or loss of keyboard focus or when top-level windows or frames have been focused.

Conclusion

In this post, I’ve shown the importance in distinguishing the ways in which semantic information is conveyed to users and the cognitive disparity that can exist between users with and without disabilities. I’ve also defined what it means to construct a truly accessible runtime platform and the role of accessibility APIs in such a platform and how they bridge the cognitive disparity when persons of different abilities are exposed to content or applications. Although the MVC pattern employed by the most common accessibility APIs may be familiar, how these APIs are used directly or indirectly to enable applications or content for accessibility may still remain a bit of a mystery. There will be plenty of examples of these APIs in action in future posts.