The DOM is not a Datastore
This should go without saying, right? It doesn’t.
This is a story about web-based Rich Text Editors (or as they are less accurately called, HMTL WYSIWYG editors).
The concept applies to user interface components more broadly, but we’ll focus on rich content editors because we’ve been doing those wrong for decades, and they’re all over the web. Every blog platform, webmail client, forum, CMS, form builder, even comment threads.
The Rich Text Editor: You know, the component that is designed to replace a <textarea> when you want to give your users the freedom to use bold and italic and links.
Don’t get me wrong, an interactive editor in the constraints of a web page, originally conceived for displaying linked text documents, is a difficult thing to get right — made all the more frustrating by the worst API in all of browser-land: contentEditable. But still, we’re doing it so very, very wrong.
Rewind with me back to the middle of 2004. Facebook has just launched (which you’ve probably not yet heard of), Apple has released the new iPod mini with 4Gb of storage and pretty much everyone is using Internet Explorer 6.0 which is now about 3 years old. Firefox 1.0 won’t come out till the end of the year, but maybe a few early adopters are using a preview release. WordPress 1.0 has just been released but it won’t get “WYSIWYG Editing” until v2 at the end of next year.
But no matter, because building a custom web-based CMS for your client (probably in classic ASP or ColdFusion) is the new hotness. It’s probably full of SQL injection bugs but we don’t really know much about that yet (plus it’s super fun to query our database from within our markup right around where the <tr>’s go!). Since you really want cool editing features, there’s two mainstream, semi-open-source, rich content editors to choose from. The poorly named FCKEditor has been around about a year and the not-so-small TinyMCE has just come out!
These editors basically wrap a UI around a feature from Internet Explorer (and later other browsers) called “contentEditable” which has actually been around since IE5.5 in July of 2000. API design aside, contentEditable is pretty revolutionary for it’s time. Just slap a contentEditable=“true” on your div and boom, you’ve got a cursor to start editing! It’s widely suspected this feature was created to allow MS to build Outlook Web Access which might just be the most advanced web interface of this era (remember OWA is also the first to ship something that would later become known as Ajax).
Anyway, these fancy, drop-in <textarea> replacements were incredible. And remember, at this time we would sprinkle just a little JS around our HTML, like onClick attributes to annoy people with alert boxes that block the entire rendering thread. And now here are these rich, fully-interactive UI components written in pure JS. It was fantastic.
But as our users kept finding more creative ways to paste in markup from who-knows-where, just to make their page pop with a little blue comic sans, all kinds of things would go awry. We’ll just fix these issues as they get reported (we don’t really have integration tests). Let the browser hacks begin!
It must have been a nightmare for the maintainers of FCKEditor and TinyMCE because the weird edge-cases in contentEdible are unimaginably endless. I’ve seen the hacks. I think it only worked because they kept re-implementing core features from scratch. Line breaks, backspace functionality, entire libraries around pasting, intercepting every keypress, even re-implementing spell-check in Ajax and hijacking the right-click menu (really, you shouldn’t have).
Shoot, to get leading or trailing spaces we have to insert and remove invisible entities all over the place. We forget to remove them most of the time, but who cares, they’re just implementation details, right. And what about <br> vs putting text in <p> blocks. Wait, is two <br>s equivalent to a paragraph break? Doesn’t it depend on the CSS? Oh, and empty elements collapse, so we need a dummy or <br> there. But if we forget that we end up with invisible empty elements everywhere. Now what about inline elements. Shouldn’t <b> and <i> always go inside the <a> so we never break the <a> into segments? What if the user makes the text bold first and then makes a portion of it a link and then unbolds a portion of the link? Can a <em> go inside a <strong> that’s in another <em>? Maybe use spans with inline-styles. Do we want semantic markup or this thing behaving sanely? Can we have both. (Or even one?)
As the years go on, the hacks stack up and we now have multiple mainstream browsers to worry about. The codebases of these things have grown almost as quickly as their adoption. You’d think the development community would realize that we cannot keep state in DOM, right? Nope.
In 2009, FCKEditor, now called CKEditor, gets a complete rewrite, a new API and a sleek new look. Fresh starts are awesome. Plus we now have Google Chrome and mobile browsers! But still, all the state is in the DOM. We did get new stuff though. There’s an entire HTML parser in there, no joke. Written completely in JS. With it’s own AST that’s similar, but not the same as the DOM. (Virtual DOM, circa 2009!). This was a beast, but it works well. Every time you want to get the editor’s contents, it will stringify the real DOM, parse it into it’s own data model, do a bunch of transformations and then serialize it to beautifully formatted HTML. But what happens if you want to listen for onChange event? Oh crap, do I have to parse, transform and diff the entire editor contents on each keystroke? We don’t yet have MutationObserver but even that wouldn’t be much help. Plus the DOM isn’t a 1:1 mapping of the view. You can have various combinations of nodes that represent the same visual layout! Is it a change event if something changes the DOM but it still renders identically?
I don’t mean to pick on CKEditor, it’s just that I’m more familiar with that one. We were all doing the same stuff back then. For the love of all that is good and decent, we actually got data-* attributes standardized! That’s a free pass to put all your state in the DOM. I admit, I didn’t think this was an issue back then.
Turns out, it took us another 5-6 years to figure out there’s one weird trick to make all your contentEditable woes go away.
Choose a data model that best represents the content you are editing. Make that the single source of truth for your editor’s state.
Each user interaction should result in a change-set that’s applied to your data model. The updated data model should be flushed to your view. Simple. Sorta. Detecting which changes should result from a given interaction will be an exercise in browser hackery, but the architecture is straightforward.
I’m not sure if the Medium engineers figured it out first, or if that’s just where I heard it first, but in 2014 I was blown away by their post on why we’re doing contentEditable all wrong. Around that time, some engineer(s) at Facebook came to a similar conclusion: web-based editors are hard but they are much harder when you’re keeping your state in the DOM. They took a more React-like approach and built the editor that you use every day to compose FB posts. I was equally impressed when I got to attend a technical deep dive from a FB engineer named Isaac about how that editor works. It makes so much sense now!
With this new approach the DOM is an implementation detail and the output format (HTML) is an implementation detail! So we can totally separate concerns and get robust, unit-testable components. We can also build plugins to render our data model to DOM or serialize it to well-formed HTML or Markdown or what-have-you.
Somewhere around that time SalesForce engineers also figured this out and they built the impressive Quill editor which keeps getting better. Based on inspecting the awesome Quip editor, I suspect those engineers made some similar realizations. These things tend to have a cascading effect.
This did not escape the attention of the mainstream, well-established editors either. Just a few months ago CKEditor made a big announcement that they’re working on another rewrite which, you guessed it, will not keep state in the DOM! I’m sure TinyMCE will get there eventually.
What shocks me about this story is not that we we’ve been doing it wrong. It’s that we knew it doesn’t work and yet it took us over a decade to re-think the way we do this simple aspect of UI!
As you know, the things that cause big paradigm shifts in software development often seem obvious in hindsight. But for so long we were all happily making our lives unnecessarily difficult.
This happened with React. The majority of developers, myself included, saw views co-located with presentation logic and thought it’s a fools errand. But React went on to change UI development in 2015 and it hasn’t finished yet.
Your view should be a pure function of your application state.
We have by no means solved the biggest UI challenges in software engineering, but we’re getting closer. And not with many small steps over decades, but with huge leaps that happen once per decade or so.
2016 has some exciting things coming down the pipe in the world of software development and it’s an exciting time to be building user interfaces!