Lexical differential highlighting instead of syntax highlighting

Oleksandr Kaleniuk

In 2013 I was working in nuclear power plant automation. Can’t talk much since I still haven’t figured out which part of it was classified. But I probably wouldn’t get arrested for mentioning that the job required reading a lot of assembly code.

Reading assembly is not as hard as it might occur to an untrained person. In fact, everyone can read a bit of assembly. But in large quantities, it’s not too easy either. The mnemonics like RCR, WBINVD, and CMPXCHG8B are fun to write, but hell to read.

What’s worse, the standard approach to syntax highlighting doesn’t help at all. It’s fine that mov doesn’t look like eax, but I’d rather prefer pmulhw and pmulhuw to be shown as differently as possible.

So I employed another kind of highlighting. It’s not sytnax but lexical differential highlighting. “Lexical” since it doesn’t need true syntax analysis, primitive tokenization and filtering are enough. And it’s “differential” because it aims to highlight the difference between lexemes. Ideally, the smaller the lexical difference, the greater the color difference should be.

It works like this.

It’s 2019, and I’m getting back to this idea. I’m using lexical differential highlighting not only for assembly but for most of the code published on Words and Buttons. There are two reasons to do so. First, it works with any languages, even with the most exotic ones. And second, it saves your traffic.

Even considering quotes and comments, the tokenizer itself can be implemented in about 30 lines of code. And the painting function is even smaller. So instead of doing syntax highlighting statically or dragging a third-party library as a dependency, I simply rewrite the coloring code specifically for every page.

This way I can highlight code for any assembly dialect or any language including the most obscure and outdated ones. And it only takes a few KB per instance.

I can even emulate an element of syntax highlighting to make code structure more apparent. For instance, this is how a piece of JavaScript code looks with this hybrid highlighting.

Click the picture to enlarge it

And this is, by the way, exactly the code for the highlighter itself. The whole thing.

It is small. I’m still too lazy to rewrite it by hand for every page so I made a generator for it. I can’t embed it into a Medium post but it’s available on Words and Buttons Online.

This is just a screenshot. The real thing is here: https://wordsandbuttons.online/lexical_differential_highlighting_instead_of_syntax_highlighting.html#highlighter_generator

Feel free to use it however you like. Just as every other piece of code on Words and Buttons, it’s properly unlicensed.

Oleksandr Kaleniuk

Written by

I do wordsandbuttons.online

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade