Wikipedia needs an IDE, not a WYSIWYG editor

(Btw, I recently released Vidrio for Mac, Tony Stark’s presentation app. See https://vidr.io for more.)

Wikipedia, if you haven’t heard of it already, is the world’s biggest software project. At almost a billion lines, it’s twice as large as healthcare.gov. It’s coded in a homegrown Turing-complete programming language, and the full thirteen years of history is kept in a homegrown version control system. The final built artifact — an online encyclopedia — gets more traffic than Twitter.

What text editors do contributors use to write this software? Just one: an HTML <textarea> element. It’s decorated with some buttons for common operations like inserting a link, but otherwise it’s a plain old textarea with the functionality of Notepad. The workflow for editing is: find the page, click either “edit” at the top of the page or next to a section heading, hunt in the textarea for the text you wanted to edit, edit the source, click “show preview”, scan the page to see if it looks like it’s changed in the way you expected, if it hasn’t then try again, then click “show changes”, scan the diff to see if that’s what you expected, if it isn’t then try again, then finally enter an edit summary message and click “save page”.

What’s so wrong with that? Well, users apparently hate it. It’s too difficult, too intimidating, too time-consuming. Users apparently dislike this workflow so much that they don’t bother contributing at all: significantly less people are editing Wikipedia than did a few years ago.

A failed solution: Visual Editor

Wikipedia, having identified its editing interface as a key reason for the decline in active users, has spent many years and many dollars producing its Visual Editor. It’s supposed to be a WYSIWYG editor: the editing interface looks almost exactly the same as the page, only with a blinking cursor.

Visual Editor was rolled out in 2013. Thing is, editors hated its bugginess so much that the roll-out was reverted shortly afterwards. Today, we again have the humble textarea, while the Visual Editor remains as an obscure opt-in checkbox in user preferences. I believe it will stay there.

Why did the Visual Editor fail? Because it tries to deny the basic fact that Wikipedia is a program, not a word document. Pages on Wikipedia are built with a wealth of abstractions: templates, transclusions, tables of contents, references, categories, semantic triples, and so on. One cannot simply place the cursor anywhere in an infobox and arbitrarily edit it. There is a basic tension between WYSIWYG, which the user wants, and the concept of abstraction, which the programmer wants and which Wikipedia is built upon. This tension is irresolvable: the terms “WYSIWYG” and “abstraction” are literally antonyms. Editing the artifact and editing the source are only isomorphic if the compilation step is trivial and involves no abstraction. Wikipedia’s compilation step is certainly not trivial: it’s Turing-complete. A WYSIWYG editor for Wikipedia has about as much hope as a WYSIWYG editor for a weather prediction program.

A better solution: an IDE

Rather than hiding the fact that Wikipedia is a program, the editing tools should embrace that fact. Most of us, when we write, edit, and navigate conventional programs, use an IDE to help us: the IDE understands the syntax and semantics of the programming language and helps us to understand it too. Instead of a WYSIWYG editor, Wikipedia needs an IDE.

What would an IDE for Wikipedia look like? Rather than having the user clicking “compile” every now and then to check their work, the IDE would update display the source and the HTML together, and update the HTML in realtime. Source on the left, HTML on the right.

The IDE would understand the relationship between the source and the HTML. Scrolling one panel also scrolls the other to the approximate equivalent location. Hovering the mouse over a word in the source highlights it in the HTML, and vice versa. In other words, the IDE understands a correspondence between characters in the source and characters in the HTML, and can convey that correspondence to the user.

The IDE would understand the Mediawiki syntax. A little bit of syntax highlighting goes a long way to helping the user understand the syntax. User error shows up immediately as a syntax error in the source and as an unexpected output in the HTML.

The IDE would integrate the diff viewer, for both the source and the HTML, in realtime. Add a new word in the source, and it gets highlighted in green in the source as you type. The equivalent change in the HTML also gets highlighted in green, as you type.

Rather than hunting for an “Edit” button to open the IDE and then having to re-find the text that you wanted to edit, you can right-click any character in the HTML and choose “jump to source”. The editor scrolls in from the left-hand side of the page, with the cursor at the equivalent character in the source, ready for you to edit.

These additions, rather than hiding Wikipedia’s semantics, instead make the semantics manifest. For beginners, these additions quickly teach you the Mediawiki language by trial and error. For seasoned users, these additions provide the usual benefits of an IDE.

Character correspondence

Most of what I suggested is pretty conventional: syntax highlighting (implemented everywhere), realtime compilation (any Markdown form on the web), realtime diff highlighting (many IDEs).

The one unconventional suggestion is the idea of a correspondence between the characters of the source text and the HTML text. To be more precise, there is a partial function that, for each character in the HTML, assigns it its originating character in the source:

   '''Operation Northwoods'''  was a ...
↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑ ↑↑↑↑↑
<p><b>Operation Northwoods</b> was a ...

If we can establish this provenance at compile time, then we can use this to trivially implement three features: (1) scrolling the source pane with the HTML pane; (2) bidirectional text highlighting on mouseover; (3) “jump to source”.

Instead of the compilation step producing a plain output string, we would like it to produce a string where every character is annotated with its original source character. We can then split this up to produce the plain HTML and the character correspondence map.

Unfortunately, PHP, like most programming languages, doesn’t keep track of the provenance of values through execution. Values go into the black box, and other values come out, but if you want to understand the relationship between the input and output then you have to look in the black box.

But it doesn’t have to be this way! We can track provenance without changing the source code at all. We just need to change the representation of strings, so they continue to act as strings according to the language semantics, but additionally keep track of their provenance. This could be achieved by hacking the PHP compiler, or by creating a data structure which implements the ‘string’ interface but also tracks provenance.

There is some prior art for this. In the TeX world, a program called SyncTeX apparently does a similar job: you can jump-to-source from a location in your PDF to a location in your original .tex file. In Perl and elsewhere, taint checking tracks values at runtime and annotates them according to whether they originated from an untrusted source. Such values are called ‘tainted’. Operations on tainted values yield new tainted values; i.e. taint is infectious. What I’m suggesting is pretty similar, with just a more fine-grained notion of provenance than tainted/not tainted.

Conclusion

Wikipedia has a declining population. Their own research identifies the editing process as a significant barrier to entry and as a reason for leaving. Their solution to this was a WYSIWYG editor, which failed for the basic reason that it denies the fact that Wikipedia is a program. I suggest a more conservative solution: as a program, Wikipedia needs an IDE that embraces and understands the Mediawiki language. That IDE should make rapid feedback its priority: realtime compilation, realtime diff viewing, and realtime correspondence between source and HTML.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.