Regular expressions you can read: 
A new visual syntax (and UI)

Part 1

Lots has been written about the problems with regular expressions: learning them, debugging them, etc.

I propose a more visual syntax and a keyboard-usable UI for generating regular expressions.

The UI/syntax proposed here helps address issues related to readability, learnability, and memorability. Those who readily understand regex will find that this visual syntax does not slow them down. It makes existing regexes easier to read for both novices and true regex superheroes.

(If you just want to use it already, go here to enter your email and we’ll let you know.)

Simplified email matching in new visual regex syntax (not for production use: more sensible way & onboarding UX advice)

You write regexes just like you always have — with optional ctrl+space popup menu command completion or insertion. Also, part of the UI concept is to be able to import existing regex expressions for editing, then export them in your chosen dialect.

This dialect-agnostic visual syntax seeks a balance between two ends of a continuum:

  • Traditional regexes are so terse that it is hard to tell apart elements and their meanings. Literals, syntax, wildcards, variable placeholders, etc. are all mashed up together: 
    \b[A-Z0–9._%+-]+@[A-Z0–9.-]+\.[A-Z]{2,4}\b
  • Some editors do already visualize regexes with charts. They are not directly editable, particularly not with a keyboard. These representations are typically very verbose and as such, are not particularly quick to scan through.

An example from regexper.com:

A regexper.com sample output for the above email example.

The real power of the visual syntax comes to life with the suggested UI. The UI will particularly help those who find the traditional syntax hard to remember.

You write a regex as you normally would. The UI will visualize the structure on the fly. When you find that you can’t remember a command, you can press ctrl+space to summon a search menu. This menu contains all regex commands and descriptions: You can either search by command (to confirm if you remember the command’s meaning right) or by description (to recall what is the command for given task).

Supporting user memory

Regular expressions have a hard-to-memorize syntax. This is a particularly serious an issue considering that most of us do not write regexes for a living.

For many users, regex is a tool that gets summoned say, a couple of times a year. When we come back to them, previous learning has faded, and we might need hours just to get up to speed with the syntax.

To solve this, we will augment the above visual syntax with an UI that enables learning. This means three things:

  • As mentioned above, the new visual language is dialect-agnostic. Generate any dialect from your expression, the engine behind the syntax takes care of the actual generating.
  • Progressive disclosure for learning special element meanings. The general aim is to make elements self explanatory. To remain terse though, not all meanings are readily visible. If you forget the meaning of a symbol, you can just hover or click on elements to get explainers on what each element does.
  • In the visual syntax, a symbol means the same no matter where in the expression it is shown. The traditional regex language is context modal: Different characters mean different things in different situations, and have different escaping rules. This is particularly true inside and outside character classes [ ]. These inconsistencies are particularly difficult to remember between usages.

Implementation

This is a concept design. The idea is that the visual syntax will generate traditional regexes. You could see it as a visual DSL that generates (only barely human readable) traditional regular expressions. Ideally, IDEs would have support for this visual syntax such that you could switch between traditional syntax and this visual one.

See also Part 2: Regex You Can Read: How It Works

@TODO

Even this syntax can get unwieldy if the expression is complex enough.

Also, this does not solve all issues with regular expressions. Namely, it does not solve the core issue more intrinsically built into regexes: How do I make sure that my regex matches exactly those strings I want it to and none of the ones I don’t? There are debugging tools for regexes that allow you to find what you want by means of trial and error, but that’s a topic for another post.

To get early access to our crowdfunding campaign, and to know when you can try this in action, go here to enter your email.

Although Regex UCR will likely be open source, we will need your financial support to pay for coders doing the work. Hear things before others as they happen, and receive exclusive perks when our crowdfunding thing happens. (Added August 3, 2016)

Contact me to join our Slack channel if you want to work together on this and get write access to our github repository as well. We warmly welcome any help bridging the gap from design to code.

We’re still on the lookout for more people to join us. We would especially like more folks who

  • understand parsing regexes, or
  • have ideas on the implementation of the data structures needed or
  • on generating regexes from the syntax or
  • have any experience doing usability tests
  • are all around regex gurus and can help us gather data about different dialects and how different elements correspond

We already have a bunch of folks who have shown interest and plans are underway, but discussion is still just getting started. Open source and GPL. Now’s the time to step up!

BTW, thanks to Bret Viktor. His work has provided inspiration for much of this.

Go here to enter your email and we’ll let you know when we have something you can try.

See also Part 2: Regex You Can Read: How It Works with more samples of syntax.