Catching ambiguous link text: open-source & an intern’s perspective

Published in

CZI Technology

9 min readSep 7, 2022

Hi! I’m Matt — an intern on CZI Education’s Backend Infrastructure team. I just wrapped up two B.S. degrees at UCLA and will be returning for a Master’s in computer science. I’m particularly passionate about CS & Education, Programming Languages & Human-Computer Interaction (PL & HCI), and open-source software.

In addition to my specific intern project for CZI, the nature of the work gave me the chance to contribute to and maintain various open-source projects. It made for a unique and fulfilling intern experience, and was a chance for me to give back to the broader community.

In the rest of this post, I’ll briefly dive into one of the contributions I made: a new ESLint plugin rule to catch ambiguous link text (anchor-ambiguous-text). It’s a relatively bite-sized project, but one that I’m particularly proud of! It’s also an opportunity to talk about one common accessibility anti-pattern. Let’s get started!

Context

Accessibility is a core component of CZI Education’s mission: building a world where demographics like race and socioeconomic status — or, disability — are not predictive of student outcomes. From an engineering perspective, that means building technology that any student can use, regardless of how they use their device. Blind or visually impaired students often use screen readers to interact with websites. In order to provide them with the best experience possible, engineers need to put in active effort to make their applications screen reader-friendly; this is one part of a much broader field of web accessibility.

One key tool in our accessibility toolkit at CZI is automated testing via linting. ESLint and Stylelint — linters for JavaScript and CSS respectively — have robust plugin systems that allow us to flag inaccessible patterns in code editors. This lets us “shift left”: instead of catching accessibility bugs at the QA (or post-deployment) stages, we can resolve them earlier at the code implementation stage! We use a wide range of linters and static analysis tools in both the Summit Learning platform and Along.

While automated tests can’t capture issues affecting the full spectrum of disability, improving our automated tests still improves our ability to write more accessible software. And, for an intern, it’s a great place to start!

The Problem

Our Frontend Infrastructure team noted one inaccessible pattern: links that do not provide context for screen readers. The largest offenders are links with text of the form “click here”, “learn more”, or “link”. Here’s one (real) example:

The text-to-speech feature allows students to hear text read aloud in Focus Area Content Assessments. <a href=”…”>Learn more</a>

This causes problems for screen reader users who tab through links to gain context about the page. In this case, the screen reader would dictate “Learn more” — learn more about what?

Ideally, the link text alone should provide the context:

Learn more about <a href=”…”>Text-to-Speech in Content Assessments</a>.

Alternatively, extra screen reader text can be added (with some CSS required):

<a href=”…”>Learn more <span class=”sr-only”>about Text-to-Speech in Content Assessments</span></a>.

This problem can be especially frustrating if this pattern is used throughout the page. Another example:

Please click <a href=”…”>here</a> to download the template. Please <a href=”…”>click here</a> to view the errors. <a href=”…”>Learn more.</a>

Tabbing through the links would dictate “here”, “click here”, “learn more”. There’s no context on what the links are to or why the user should click!

This pattern is common across the internet and frequently discussed (here’s a blog post by Stephanie Leary, a short summary by UC Berkeley’s Web Access Team, and a WebAIM article). Unfortunately, we don’t have a tool to catch these mistakes automatically… yet!

The Solution

Remember how I previously mentioned our use of linters? One core plugin in this ecosystem is eslint-plugin-jsx-a11y. It statically analyzes JSX to capture issues like headings not having screen reader-accessible content, media without captions, or missing required ARIA attributes. CZI Education already uses this plugin when developing the Summit Learning platform and Along.

I love open-source, and I took this opportunity to open an issue in the eslint-plugin-jsx-a11y repository. It turns out other developers wanted this rule too! After some clarifying questions with the core maintainer and some great discussions at CZI, we agreed upon a spec for the rule: it would be called anchor-ambiguous-text.

The initial spec of the rule is simple.

Walk the AST (Abstract Syntax Tree) looking for <a> tags; for each anchor,

get the “accessible child text” (i.e. what a screen reader would announce)
if the accessible child text exactly matches a user-configurable disallowed list (e.g. “click here”, “read more”, etc.), report a warning

Getting the accessible child text sounds simple — just get what’s in-between the tags! Unfortunately, it’s not entirely straightforward.

Deeper Dive: Accessible Child Text

There are various edge cases that override or hide text from a screen reader. The Accessible Rich Internet Applications (ARIA) standard defines various HTML attributes that annotate existing elements for assistive technologies.

In designing the lint rule, I added logic for most common ARIA attributes (and other semantic HTML attributes). Let’s walk through a few!

If an element has the aria-label attribute, the screen reader reads the value instead of the children. The lint rule respects that override.

// this will not error; the screen reader would dictate “Chan Zuckerberg Initiative website”.
<a href=”…” aria-label=”Text-to-Speech in Content Assessments”>Learn more<a/>// this will error; the screen reader would dictate “Read more”, which is not descriptive
<a href=”…” aria-label=”Learn more”>Text-to-Speech in Content Assessments</a>

If an image (but only an image) has the alt attribute, a screen reader reads it as the descriptive text. The lint rule uses that behavior too.

// this will error; the screen reader would dictate “Learn more”
<a href=”…”><img src=”…” alt=”Learn more” /><a/>

If an element has the aria-hidden attribute, a screen reader skips it. This is usually used for “decorative” elements that don’t convey semantic meaning. So, our rule skips it too!

// this will error; the screen reader would dictate “Read more”
<a href=”…”>Learn more <span aria-hidden=”true”>about Text-to-Speech in Content Assessments</span><a/>

The rule also strips punctuation, collapses whitespace, and lowercases the text.

There are other edge cases (described in the next steps), but we felt that the above captures most simple use-cases.

Customization

Rules that rely on context — like this one — should be configurable. I used existing infrastructure in eslint-jsx-plugin-a11y to better tailor this rule for their use case.

One simple customization is a custom disallow list. This allows developers to add other inaccessible words in their context (e.g. “tap here”) but also allow for ad-hoc internationalization; any JavaScript-compatible string is valid!

Another is a plugin-wide custom component setting. This lets the plugin easily work with JSX elements that wrap/override the anchor tag, like Next.js or CZI’s own Education Design System.

Implementation and Deploy

After all of that thinking, I was able to quickly implement a draft. A well-defined spec meant that writing tests for the behavior was clear-cut, and the existing infrastructure in jsx-ast-utils made the AST walk relatively simple.

I opened and merged three PRs (#873, #880, #886) to implement the described behavior. If you build the package from source, the rule is completely usable! A minor release is pending; once that’s out, developers can update their eslint-jsx-plugin-a11y version and use the rule live.

In the meantime, I updated the ESLint plugin internally to trial this rule.

In Along, we found no violations! Great!

In contrast, the Summit Learning platform had 24 lint errors across various entry points in the app. We also found that this behavior was baked into our codebase: various components were named LearnMoreLink or renderLearnMore. We determined that none of these were false positives. Then, we worked with various teams to fix the copy so that the original meaning would be preserved while also providing more context for screen reader users. I’m happy to say that there are now no lint errors (in a codebase of ~78k lines of JS)!

We can quickly walk through one example on our Terms of Service page for students:

The Summit Learning TOS page. A sentence reads “you can click here to view it on YouTube”, where “here” is the highlighted link text. — The old TOS page

The “here” is too ambiguous! The team that owned this page indicated that a copy change would be in-scope. So, I changed the copy to accurately describe the video.

The Summit Learning TOS page. The sentence now reads “Watch our Data Privacy overview video on YouTube”, where the link text is “Data Privacy overview video” — The new TOS page!

This provides more context — not only for screen reader users, but any user!

All in all, I think this first solution was quite successful. However, there’s more to do!

Open Problem: Referential Elements

One open problem of the rule is “referential elements”. These accessibility features involve references to other elements that may or may not exist anywhere in the DOM.

The <label> element provides screen readers with semantic meaning for form controls (e.g. <input> or <textarea>).
The aria-describedby and aria-labelledby attributes reference other elements (by id) that provide semantic meaning for the element and are dictated by a screen reader. In particular, the behavior here is less straightforward: aria-labelledby can contain multiple ids, be self-referential, is ordered, and cannot be chained.
The aria-details attribute provides more information to screen readers, though it is often not used in link-scanning.

This poses various problems.

What should the rule do if the element is not in the AST?

eslint-jsx-plugin-a11y has some rules that enforce this behavior, but not all — and, users may not use these rules.
Erroring every time likely creates many false positives: referred elements could be in sibling or parent components, which could be in a separate JSX expression.

How should we search for these nodes?

One option is to maintain state while walking the AST. State in linters can introduce flakiness and are often hard to implement and maintain.
However, keeping this stateless likely requires many re-walks of the AST, which is not performant (a critical aspect of linters).

That being said, I think this is a solvable problem — with a relatively permissive strategy when elements cannot be found in a reasonable amount of time. I eventually plan on implementing support for referential elements soon!

Caveats

There are also two broad categories of problems with linting as a solution for this problem.

Lint tools have little semantic understanding of content and are typically used for syntax. In contrast, the problem statement is fundamentally semantic in nature.
Lint tools are a form of static analysis and do not evaluate code. This misses out on dynamic interaction, which is a huge portion of web applications.

I don’t think this lint rule is all-encompassing in dealing with ambiguous link text. Instead, it’s one small tool in every developer’s arsenal to build accessible applications. To resolve this tool’s core problems, you need tools that are dynamic, semantic, and, most importantly, involve real people.

If you’re curious, Andrew Huth has written a great overview of our other automated testing practices — I’m especially a big fan of Storybook and Chromatic!

Broadly, this work aligns with CZI’s continuing efforts to audit and develop more solutions to make all our sites and technology more accessible.

Conclusion & Thanks

Overall, this was very motivating. I love open-source, and it’s very fulfilling to work on tools that other developers need! I learned more about accessibility and linting tools, and I had a chance to work with several other engineers and designers. And I did this all as an intern — which speaks to how supportive my internship experience at CZI has been!

Many people have made my time at CZI wonderful. The biggest of shoutouts goes to Katy Ho, my intern manager — she has gone out of her way to be responsive, supportive, and kind throughout the program, and has been the best intern manager I’ve had! The BE Infra team overall has been a warm and welcoming home at CZI!

Beyond that, the Frontend Infrastructure team iterated with me on the specification and behavior of the rule. Thanks to Diedra Rater, who proposed this rule, and Andrew Huth, Annie Hu, Jeremiah Clothier, and Jin Lee who all provided input! Thanks as well to Jordan Harband (a maintainer of eslint-jsx-plugin-a11y) for working with me on this project. And, thanks to a wonderful group of interns who formed a stellar community during this summer!

The 2022 CZI intern class! From left to right: Terry, Mark, Elias, Albert, Eric, Siena, Rohan, Jess, Purva, Trisha, Matt (me), Sav, Duke, Larry, Tapan; missing: Angela, Dalia, Nick

Thanks for giving this a read; I hope it was at least somewhat informative! At the very least, ambiguous link text should now be in the back of your mind. If you use ESLint, I may have a tool just for you…

Interested in CZI’s internship program? Learn more about it on the CZI Careers Page.

Do you use this rule? Or have any suggestions, feedback, complaints? I would love to know!