Ode to Regular Expressions
You might even like them after this
I use regular expressions for everything: for parsing GraphQL queries, parsing GraphQL errors, validating drivers’ licenses by state, and most recently for rolling our own HTML-style popovers for a Vue.js side-project I have been working on.
My team and I were tasked with creating Genius.com-like comments, where more information would pop over from the side whenever you clicked a highlighted word. We needed a method for declaratively creating these static comments in a CMS, containing both a header and a body, preferably with full HTML support (since the popovers should be able to have images, videos, etc.).
What we landed on looked like this:
As you can image, we needed to be able to extract all of these tags from the CMS, but also parse the summary and body for each. It was the perfect job for regular expressions. Our first attempt at writing this regex looked like this:
It’s hideous. Will another developer, (or even you) still know what this blob means in two years? Definitely not. We needed a better way to pull apart, describe, and piece together individual parts of this pattern.
So, I wrote
After my first pass at decomposing our ugly mess of a regex into pieces, this is what we had:
It’s still a lot to look at, but, as you can see, we already got a couple of benefits. We can now add comments to individual pieces of it. Secondly, we are able to pull out duplicated pieces into their own variables and reuse them in different places, something you would have to retype in a native regex blob.
After extracting the regex into this collection of function calls, we could already notice patterns emerging, abstractions forming, and simplifications presenting themselves.
To start, we had a section of code that repeats,
r.capture(r.and(<tag>...</tag>)), so we could pull that out into its own function and reuse that logic. It had already become a lot easier to digest.
After another pass, I noticed that each of my pieces were separated by the
space constant, the one we created during pass #1. We dealt with this by adding a basic
Array.join() (because remember, we are just dealing with strings at this point). This refactor also made it obvious that our
summary section would not support newline character matches like we wanted it to (since the
. character does not match newlines in regex). After a couple touches, it became:
The two sections between the
<details> tags were basically identical at this point, each representing an HTML component, so we just pulled that into its own function as well:
Finally we just called
new RegExp(pattern) and we had our completed HTML-supported popover matcher.
I couldn’t help minifying it a bit more though (I admit, this might have been a little too far).
At this point, our regular expression is complete — it’s flexible, readable, commentable, composable, and most importantly understandable. If we need to change anything, we know exactly where to look to make those changes. Beyond that, we saw along the way that by making these refactors with
rexrex, we are able to gain insights into how our regex operates, where patterns are forming, and also where bugs exist in our initial constructions!
See? Regular expressions are not as ugly, menacing, or difficult-to-use as you might have thought 😄. If you are interested in
rexrex or have any questions/issues/concerns with it, please submit an issue or pull-request at https://github.com/mfix22/rexrex.
- 🙏 regex101.com is an absolute life-saver. It is one of my favorite developer tools, and has made my life easier on so many occasions. There is really no knowing if my regular expressions are correct or not without regex101.com. If you have the capability, please consider donating towards their efforts.