HTML validation in Phoenix using Rust

Alex Drummond
multiverse-tech
Published in
4 min readMar 17, 2021
Credit @darkcut

The product team at Multiverse is committed to accessibility. We kick off most of our projects with a UX research phase, where we consider the disparate user journeys and input methods that the site has to cater to. Once a particular design or user flow has been sketched out, the tech team ensure that it’s implemented using clean markup that follows web standards. That way, our site has the best possible chance of providing a great experience on all of the devices and browser configurations that our users throw at it.

From the point of view of standards compliance, our HTML rendering pipeline is a weak point. We use Phoenix for most of our web development work. Phoenix’s EEx templates work by string substitution and don’t guarantee the validity of their output. As a result, trivial errors — such as missing or misplaced closing tags — can lurk unnoticed for years. These errors aren’t just a stain on our professional pride. They’re also a source of cross-browser inconsistencies, and a maintenance burden on whoever has to modify the page in future.

As a team we didn’t want to take the radical step of moving all our existing templates over to a template language that guaranteed valid output. But we did want to do something to reduce the risk of generating invalid markup. As a first step, we initiated a research spike to experiment with adding HTML validation to our rendering pipeline. Phoenix makes this really easy to do. Here’s an example plug that uses register_before_send to pass the rendered HTML to a validation function:

To implement the validation function we needed to find a suitable HTML parsing or validation library. Good candidates turned out to be surprisingly thin on the ground. The most feature-complete validation tools are implemented in Java and Python, and can only be integrated with a Phoenix project by wrapping them in a service of some kind. The TidyEx library seemed to offer a more lightweight alternative, but we were unable to get it to build out of the box.

It seemed, then, that we were stuck. Luckily, the whole point of a research spike is to get stuck and then try out some fun solutions! This is where Rust comes into the picture. I’d long been itching for an excuse to give Rust/Elixir integration a whirl, and had a suspicion that html5ever, the HTML parser used in the Servo project, might be a good fit for our needs. It’s admittedly not designed as a validator and can’t be relied on to emit errors for every case of invalid markup. Still, it’s perfectly capable of catching issues such as as missing closing tags or illicit embeddings, so it seemed like a good place to start.

Why turn to Rust? Rust is pretty much the ideal language for writing Native Implemented Functions. It’s high level and memory safe (if you stick to the safe subset), but also able to export functions using the C ABI. Moreover, the excellent Rustler library does almost all the hard work of binding to Elixir. Rather than attempt to expose the entire html5ever API to Elixir, we elected to write a Rust function that took an input string and used html5ever to generate a list of error messages with source position information. Binding this function to Elixir was then pretty straightforward:

To make things a little more idiomatic, we converted the list of errors to either :ok (in the case of an empty list) or {:error, error_list}. The only other code required was an Elixir module stub:

With the bindings to html5ever in place, the time had come to run our suite of controller tests. As feared, the HTML validator threw a slew of exceptions. None of the markup errors identified was really causing any serious problems, but it definitely felt good to fix them all! We also learned a fair bit about the ins and outs of the HTML5 specification. Quite a good result for ~200 lines of code. If you’re interested in the approach, we’ve put up some example code on github for your perusal.

Where do we go from here? While we definitely got a lot of utility from html5ever, we’re undecided on whether or not this justifies integrating Rust as part of our build process. We also need to investigate how feasible it would be to integrate more heavyweight validators with better coverage of the HTML standard — less fun, but perhaps more practical. In any case, our experiment with Rustler has certainly been educational. It’s exciting to know that we have at our fingertips a safe and easy way of implementing Elixir functions using native code.

Join the team

Multiverse’s mission is to create a diverse group of future leaders. The products we’re building are designed to connect apprenticeship candidates to employers and to set them up for success throughout their qualification. Our social mission is at the heart of our business, so if doing good is as important as building a commercial business to you, you will find Multiverse incredibly rewarding.

To view our latest open roles, visit our careers page. You could also check out Why you should join Multiverse by our awesome new VP of product Emma van Dijkum.

--

--