WCAG Auditing

Tom Bonanni
UpSkillie
Published in
3 min readFeb 8, 2020

WCAG compliance is becoming an important issue for tech companies. By not being compliant companies open themselves up to serious law suits. PornHub is currently being sued for not having closed captions on their videos.

Yaroslav Suris, who is deaf according to the court document, has filed a class action suit against the adult website operators on behalf of himself and others who are deaf or hard of hearing. The complaint states that the website has videos without closed captions, or with limited closed captions, and thus violates the Americans with Disabilities Act (ADA)

https://www.boia.org/blog/topic/lawsuits-settlement

I saw this and thought that if I had to solve this problem for a site with millions of pages I would need to create some sort of automated WCAG audit to take care of auditing things that can be detected by a machine. Unfortunately not all WCAG standards can be automatically tested and many require manual checking, but it is a good baseline to begin a proper audit and can save a lot of time for larger sites.

Automated Auditing

In terms of auditing my own sites for WCAG compliance I have always used Google Lighthouse in the Chrome Inspector. I immediately thought that there had to be some way to crawl my site using Lighthouse to generate reports for each page in my site. After a few minutes of googling I found these:

https://www.npmjs.com/package/lighthouse

https://www.npmjs.com/package/crawler.

Configuring lighthouse

So I whipped up a node.js environment and started with:

npm i -g lighthouse

I installed the package globally because I wanted to experiment with it on the command line before I integrated it into a program. I know that lighthouse does more than just accessibility testing, so I was looking for a way to isolate the accessibility results for a single page and bubble that up into a report.

After messing around for a bit I eventually came across the command that I needed:

lighthouse https://www.example.com --output json --output html --output-path ./example.json --chrome-flags="--headless" --only-categories=accessibility

Boom. Quite a mouthful in my opinion but then again my use case is fairly narrow. So let’s break this down:

lighthouse https://www.example.com

I actually got burnt on this when I first started. You actually need to have the https://www. in there or else the report breaks. Not sure why but just make sure to have the fully qualified domain name on there.

Next:

--output json --output html --output-path ./example.json

This tells lighthouse to generate a report in json and html and store the files at the root of where the command is being run. In this case lighthouse will generate both example.report.json and example.report.html.

Lastly we have this:

--chrome-flags="--headless" --only-categories=accessibility

This tells lighthouse to run using headless chrome and to only give back a report on the accessibility of the site. This effectively removes the SEO, PWA, and Performance reports. If you wanted these reports you would simply remove the —only-categories=accessibility line.

Setting up the crawler

Next, I wanted to be able to crawl a site and get all of its valid internal links. These links should resemble a sitemap once finished. After looking through the crawler docs I came up with this basic implementation

https://gist.github.com/redhair/a71a725a31254c54e2e6ce87e0de9384

This code initializes the crawler with the root url, a max connection pool of 10 and a 0.3s delay between page crawls. The callback function passed to the crawler runs for each page queued. So I designed the function to search for all anchor links in the page, then if they have an href attribute and the href passes validation we add it to the crawler queue and visited array. We will use this visited array later as a list of valid urls to audit.

Putting it all together

Finally the last step was to merge the two discoveries.

You can do whatever you want with the results. For me I just stored them in a reports folder for this example but for practical use I would want to use a database to store this information. It could even be exposed via an API to create an app if you wanted to. You could also set up another condition to look for pages that scored under 75% or some other “failure” threshold. I may add this in later as a feature and separate passing and “failing” pages.

--

--