Fixing ContentEditable

Published in

Content Uneditable

10 min readAug 21, 2015

Working with a system which is driving you mad because of a monstrous library overflowing with quirks is often a great discouragement. How many times did you feel that you should totally drop your project because it’s beyond repair and it’s too much work to implement it from scratch?

That’s pretty much the situation of contentEditable today. The difference here is that in order to fix it, first you need to convince the entire world that it needs to be fixed. If you succeed, then you still need to find a consensus on every tiny detail with other interested parties. Finally, if the specification is accepted it can be implemented in all browser engines.

The History

(Please correct me if I excavated some of the facts incorrectly.)

ContentEditable was first implemented in Internet Explorer 5.5 back in 2000. 5 years later it was reverse-engineered by Anne van Kesteren and around the year 2007 it was implemented in all major engines. In each and every single one of them differently. Here you can read more about the history of contentEditable.

Since 2007 the Web has evolved. We had HTML 5, ECMAScript 5 and 6, CSS 3, countless new APIs and JavaScript frameworks, rounded borders and pastel colors were replaced by flat design and single page websites (there’s a running joke in W3C that when border-radius was finally standardized, we invented flat design). A constant race to implement all the new features and the evergreen browsers bring us all the new goodies faster than we can learn about them.

What has changed in the contentEditable world during this period? Let’s see… IE9 was the first IE to support modern Selection API, IE11 was a big step forward too, but other browsers were mainly left untouched (except occasional bug fixes which sometimes turned into new bugs).

We can’t blame anyone, because contentEditable has no spec. That doesn’t mean that no one tried to write it. HTML5 might only scratched the surface, but Aryeh Gregor did a great job creating a real spec for all the HTML Editing APIs. I’m not sure whether it’s complete and what was its exact timeline, but the spec is really detailed and mentions year 2011 here and there. So why didn’t it work? I wasn’t a CKEditor developer back then (I joined shortly after), but as far as I know there might have been two reasons. First, Aryeh was a contractor for Google, so as it ended there was no one who could (or wanted to) continue his work. Second, the spec didn’t convince others, especially editor developers.

Risk Of Spontaneous Combustion. Author: Andy Maguire

From a developer’s perspective, the most apparent issues are that each browser implements contentEditable in a completely different way and that it’s a high level API which gives you no helping hand if you want to do something your way. If you want to use the bold feature, then you need to accept how it works or reimplement it from scratch. If you want the Enter key to insert paragraphs instead of <div>, then you need to accept how it works or reimplement the whole feature. If you want to change the way the caret responds to user actions, then again — hack it or accept it. Given the inconsistency of the APIs that you’d need to use, it’s a nightmare.

While Aryeh’s spec could have solved the first problem (inconsistent behavior) it did very little with the second problem — opening the browser to developers. Advanced rich-text editors already avoid document.execCommand() because the existing commands are useless and had to be reimplemented. Furthermore, every rich-text editor is different and every use case is different, so rich-text editors must be extensible and configurable. Therefore, even if Aryeh’s spec was perfect and the browsers followed it precisely, the authors of rich-text editors would still need to implement all the features themselves.

The third problem with contentEditable is its scope. If you’re familiar with my article “ContentEditable — The Good, The Bad and The Ugly”, then you have the notion of how complex editing can be even when talking about basic features (if you haven’t read it, please do ;). Since contentEditable tries to be a complete WYSIWYG editor itself, with support for all the most important HTML features (lists, tables, links, inline styles, images, etc.) its scope is immense. Even if Aryah succeeded and the spec was complete, there would still be a lot of doubts and discussions. In the end, browser vendors would need to work on aligning their current implementations to the spec.

Following this path, we come to the last problem. ContentEditable is already widely used and every change made to its implementations would break the existing editors. For some people, this is an argument for killing contentEditable and working on a new, clean solution. For others, this means that we should only commit minimum necessary changes in the current implementation to make it more digestible. Eventually, the technologies of the future will enable rich-text editor developers to build their apps without contentEditable.

However, both approaches have one major flaw — the time that we would need to wait for their implementation. We do need contentEditable (or a complete replacement of it) because you can’t just build a real editor based on today’s features.

There is a third way though.

Modular ContentEditable

Last year we’ve finally noticed a light at the end of the tunnel. Two lights, actually. First of all, Ryosuke Niwa extracted the Selection API from the HTML Editing APIs spec created by Aryeh Gregor. This move allowed working on the selection system more independently from contentEditable.

The second light was lit by Johannes Wilm from Fidus Writer and Ben Peters from Microsoft who started pushing forward a new, promising concept, initially called contentEditable=minimal.

So what’s that contentEditable=minimal? The initial idea was described by Ben Peters as follows:

(…) it makes sense to enable a new, simpler version of contentEditable that provides basic functionality only. For the sake of discussion, call it contentEditable=’minimal’. The functionality provided by the browser under contentEditable=’minimal’ would be as follows:
* Caret drawing
* Events such as Keyboard , Clipboard, Drag and Drop
* Some keyboard input handling- caret movement, typing of characters including Input Method Editor input
* Selection drawing and manipulation according to the new Selection API spec

I replied with my initial vision, but it quickly turned out that everyone sees contentEditable=minimal differently. Fortunately, the morale and engagement of various parties were high enough to keep the idea alive. The Editing Task Force was established (see the mailing list, bug list and produced documents) and we are actually about to meet in Paris to discuss the current state of affairs and create an action plan to move forward.

I’d like to take this occasion to thank Johannes Wilm for his work as the editor of the spec. Without his determination we might’ve given up since our whole energy is often consumed by discussions.

The Vision

Ben Peters’ email triggered countless discussions on how contentEditable=minimal could work. There were some widely accepted concepts, but all in all there was still no agreement about the basics. Because dwelling on every idea that was put forward would make this article awfully long (if I would ever finish it at all), I decided to focus on the current state of things.

<Disclaimer>
The following vision is strictly subjective. It can’t be treated as an official statement of the members of W3C Editing Task Force!
</Disclaimer>

ContentEditable=true

Let’s start from simple things. There seems to be a general consensus about not touching contentEditable=true and document.execCommand(). They’re too big, too messy, wrong and already widely used. These features are unlikely to become officially deprecated (at least not yet), but their use will be discouraged.

What’s still a bit unclear is whether the new features which I describe later in this article will be added to contentEditable=true as well (and if yes, then which ones exactly). I hope some of them will be capable of being “backported” — that could, in a relatively short time, improve the situation with contentEditable=true too. But that’s not a priority.

Events

This is a second topic which is pretty clear — we need browsers to fire more events. Especially those which allow developers to:

understand the intention of the user,
override the default behavior of the browser.

What’s the big deal with events? Today, you can listen to keyboard, mouse, touch and other events but you always need to figure out the meaning of the action. User pressed a key, so perhaps a letter should be inserted. User clicked or touched the screen, so perhaps selection was moved. In general terms, guessing isn’t what developers like to do.

Furthermore, guessing user’s intention is just the beginning. In many cases you’ll want to alter the default native behavior. Imagine that you want to implement a “track changes” feature. When the user starts typing, the text should be inserted in an <ins> tag and each time something is deleted (either by Delete or Backspace keys, cutting, selecting some text and pressing any key or the Delete option in the native context menu) instead of removing something we would like to wrap it with <del>. With today’s events such as keydown and no ability to manually handle typing or text deletion (again, if you haven’t read this article, please do) you need to resort to the ugliest possible hacks.

To tackle these problems the family of events that we often use in our editors like paste, beforepaste (from the Clipboard API) or dragstart, drop from the HTML 5 will be extended with:

beforeinput (from HTML Editing) — to enable control over input (making changes). This event will be fired with inputType (e.g. ‘insertCharacter’, ‘replaceText’, ‘insertNewLine’) and data (e.g. a letter that should be inserted). It is designed in a way that will cover such editing operations as character composition or spell checking. With a single event you’ll now be able to control all editing operations.
selectionchange, selectionstart and beforeselectionchange (all in the Selection API) — to enable better control over the selection and caret movements. The beforeselectionchange event will be fired with a property that will tell how the user intends to change the selection (extend it forward, backwards, upwards, jump a whole word, etc.) as well as the proposed selection.

The best thing about beforeinput and beforeselectionchange is that these events can be cancelled to prevent the default action (if there’s any) or the default action can be changed. These events will also carry useful information like a range that should be affected.

There’s also one more interesting initiative that touches this subject — Independent User Interface (IndieUI), which exposes meaningful events for various UI–related actions. As far as I can see, text editing isn’t currently covered, but we could imagine a bold event which would be fired when the browser discovers that a user may want to bold a text (in some languages Ctrl+B may not be the usual keystroke for that or the action could be triggered from context menu or a special popup shown on mobile browsers).

Selection API

It’s crucial for rich-text editor authors that the Selection API is predictable and complete. There are currently many inconsistencies which seriously impede our work. To name just a few — too drastic selection normalization, behavior around HTML elements such as <canvas> and <svg>, behavior around non-editable elements. The API also lacks some features — the modify() method that would allow customising caret movements, a way to specify whether the caret should be rendered after or before line break and some other methods which would simplify various other tasks. The list is pretty long and actually goes beyond the Selection API since some extensions for the Range API would also be useful.

ContentEditable=events

The initial concept of introducing contentEditable=minimal has evolved during the past year. There were proposals to introduce contentEditable=typing, contentEditable=cursor and other modes from which a developer will be able to pick. By using e.g. the “typing” mode, a developer would inform the browser that it should handle text insertion. Similarly, the “cursor” mode would enable native caret movements (yes, there’s something wrong with the name :D).

While the concept of splitting contentEditable=true into many opt-in modules sounds reasonable, it means that we keep the same (or even higher) complexity as before, because the native behaviour must still be spec-ed and the modes must be orthogonal.

To reduce the scope, the modules concept was currently limited to contentEditable=events (although other modules may be spec-ed in the future). It’s a much simplified proposal, which at the same time may solve the majority of our problems.

For me, the whole “editing” can boil down to two types of actions:

those which affect the content (typing, deleting, formatting, pasting, dragging),
those which do not affect the content (various ways of changing the selection, rendering of native UI components such as context menus, floating toolbars, spell checker markers).

As a rich-text editor implementer I would love to control every aspect of editing, but my highest priority is to control the content. I can accept that the caret doesn’t move exactly like I would like it to move in some cases, but I cannot accept if content in my editor gets messed up by the browser.

ContentEditable=events seems to reflect this division. In this mode the browser will handle caret movements and selection changes, while it won’t modify the DOM. This means that you can relatively easily add support for typing (by using the beforeinput event), but you don’t need to worry about implementing your own selection handler which would be a very tricky job taking the scope into account (mouse, keyboard, touch devices and countless little details like BiDi). However, thanks to the beforeselectionchange event I can handle the cases in which I do not agree with the native behavior. This also means that spec-ing caret movements isn’t that crucial and can be done later.

I hope that we will be able to reach consensus about this proposal, because it seems to be the middle ground between spec-ing everything and brutal reality, in which no one has enough energy to do that.

So what’s the ETA?

I wish I knew. Next week we’re meeting in Paris to tackle the remaining issues. There’s no mutual agreement yet about some important details such as whether in contentEditable=events the browser should handle caret movements or how selection may be normalized and where it can be displayed. The beforeselectionchange event has been proposed a long time ago, but it hasn’t been added to the Selection API spec either.

If we agree about these and other matters in Paris, then, as far as I understand, the Editing and Selection API specs will be very close to becoming Candidate Recommendations. Given that browser vendors are involved in both specs, this means that we’ll be on the threshold of fixing contentEditable. I can’t predict how quickly the new features will land in browsers, but let’s hope for “soon” :).

In other words, it seems that this attempt to fix contentEditable will not share the fate of the previous one. Keep your fingers crossed.