How I chose Markdown parser
Intro
I really love Markdown. It’s a powerful yet laconic markup language. It’s based on the concept of separation data from representation. This makes Markdown useful in a variety of applications, for example, in version control. That’s why, for instance, Markdown is a standard tool for documentation on GitHub.
Markdown is widely distributed across the web as a markup language for text editors: on blogging platforms, wiki projects etc. Personally, I use Markdown every day not only for software development, but also for taking notes. I use Obsidian: a Markdown-based ide-like note-taking app for knowledge base management.
Generally speaking, Obsidian is one of the best note-taking apps ever. If you haven’t already heard about it or about zettelkasten principle, maybe you should take a look at this and this.
Recently I decided to build my personal website, and thus I needed to choose a markup language for articles. Of course, I chose Markdown. All that was left was to figure out the rest of the stack.
Searching for ready solutions, I stumbled upon jekyll — a Markdown-based static website generator. It looked like a not bad solution for minimalists, but for me it had too many limitations. Finally, I decided to stay with my favourite framework vue.js and use some library for Markdown to HTML conversion. And that’s where the fun things started…
Tool Selection
Thanks to openness, relative simplicity and popularity of Markdown among developers, there are several dozens of Markdown realizations in various programming languages. A far from complete list of realizations you can find here.
As soon as I saw all this multitude of solutions, my first thought was to write my own from scratch and fingers stretched themselves to the keyboard, but I bravely overcame myself. Instead, I decided to compare parsers and choose the best one.
Of course, for static pages rendering it’s possible to choose realization in any programming language, but I decided to stay with pure-JavaScript solutions for flexibility.
So, I had 9 candidates left:
- commonmark.js
- markdown-js
- markdown-it
- MarkdownDeep — GitHub and website
- Marked
- remark
- remarkable
- Showdown
- texts.js
For comparison of parsers, I came up with this list of parameters:
- licence
- infrastructure
- documentation
- availability of demo
- active community
3. support of a certain subset of Markdown syntax
4. ability to modify parser’s behaviour
5. performance
Licences
So, let’s get started! Will with the license.
Everything is simple here:
- commonmark.js licence — 2-clause BSD with two dependencies, both MIT
- markdown-js licence — MIT
- markdown-it licence — MIT
- MarkdownDeep licence — Apache 2.0
- Marked licence — MIT, refers to John Gruber, the creator of Markdown language (who distributes it under the 3-clause BSD licence), which is quite cute :)
- remark licence — MIT
- remarkable licence — MIT
- Showdown licence — MIT
- texts.js licence — Apache 2.0
In other words, all projects are distributed under free licenses, which is to be expected.
Infrastructure
All projects more or less have documentation, so let’s skip this part.
As for demo, things are not so good:
- commonmark.js demo
- markdown-js demo — not present
- markdown-it demo
- MarkdownDeep demo
- Marked demo
- remark demo — not present
- remarkable demo
- Showdown demo
- texts.js demo — not present
It’s hard to estimate the support of the community without diving deep into a project and encountering difficulties within it. Project can be evaluated implicitly by the number of stars on GitHub, but I won’t do this for ethical reasons.
Speaking about activity:
- markdown-js project is currently not maintained, the last commit was in 2019
- texts.js — last commit in 2013
- remarkable — last commit in November 2021 (which is not so old)
- the rest of the projects have commits in this year, so we can consider them active
Syntax
That’s probably the most important part. At first, I’ve put together a list of all the syntax I need:
Although it is an article about Markdown, unfortunately, Medium doesn’t fully support even some basic Markdown features, such as nested lists, multi-block quotes and collapsible blocks, so I used some tricks to overcome it: inserting images, using “ — — ” for indent levels, etc. Hope it’s still readable :)
For testing parsers, I’ve made a text with examples of all required markup: Test text (raw)
I didn’t want to spend time on installation of all the parsers, so I tested only those that had a demo. If you want, you can test others yourself using the above text (or any other).
Now, let’s move on to the results. But first:
A little note
After writing the article, I found out that the Markdown standard supports the line break functionality in a text block (like <br>
tag in HTML). To break the line, you need to add two spaces at the end of the previous line, and start a new one on the next line.
All the parsers mentioned in the article support this function. At the time of writing this article, I did not know about this, so I was checking parsers for supporting the line break without two-space padding.
Personally, I think adding two extra spaces seems inconvenient and redundant. It’s also lacks clarity, because spaces are not displayed in the text editor window. Therefore, I decided to leave the check for “line break without extra spaces” in the article, but as a pleasant bonus instead of a critical feature.
commonmark.js
What doesn’t work:
- line break in text blocks (without two extra spaces)
- syntax highlight
- text decoration
- strikethrough (need to use
<del>
) - highlight (need to use
<mark>
) - subscript
- superscript
4. tables
It’s quite inconvenient that links don’t work in the demo and iframe with YouTube video don’t render, but the raw HTML code seems to be correct.
markdown-it
Everything works fine!
What can be turned on optionally:
- line break in text blocks (without two extra spaces)
- HTML parsing
MarkdownDeep
It seems to be the most buggy Markdown parser of examined.
What doesn’t work:
- line break in text blocks (without two extra spaces)
- nested quote interferes with the list element
- code blocks
- line break in code
- syntax highlight
- special symbols escaping
- code is duplicated somewhy: once as code and then once again as text
4. text decoration
- strikethrough
- highlight
- subscript
- superscript
5. iframe doesn’t work
Bugs with quotes and code blocks themselves:
What’s wrong here:
- text
- list in quote
should be on the next line - all the text inside code block goes in one line
- special symbols ``` ended up outside the code block
- text from the last code block is repeated — but now as Markdown
Marked
What doesn’t work
- text decoration
- strikethrough
- highlight
- subscript
- superscript
2. tables
What can be turned on optionally:
- line break in text blocks (without two extra spaces)
- Demo doesn’t support syntax highlight. However, in config there are fields
"highlight": null
and"langPrefix": "language-"
indicating that syntax highlighting can somehow be enabled. However, I didn't understand how to do it.
Iframe with YouTube video doesn’t render, but the raw HTML code seems to be correct.
remarkable
Everything works fine!
What can be turned on optionally:
- line break in text blocks (without two extra spaces)
- HTML parsing
This project resembles markdown-it a lot and for a good reason (see below).
Showdown
What doesn’t work
- Headings h5 and h6
- line break in text blocks (without two extra spaces)
- syntax highlight
- text decoration
- highlight
- subscript
- superscript
5. iframe doesn’t work
Something strange is going on with headers: #
is translated into <h3>
, ##
into <h4>
etc. So, there are no tags left in HTML for header of levels 5 and 6. So they are translated into plain text. It prevents them from being styled correctly via CSS, and also leads to a bug with the transfer to the next line:
In the demo interface, there are checkboxes to enable options, but they don’t work. When you click on the checkbox the page reloads but the changes don’t apply.
Judging by the name of one of the checkboxes (simpleLineBreaks
) it should be possible to enable line breaks, but I failed to make it work.
Bonus: Obsidian
After all, I wanted to check my note-taking app Obsidian as well, because that’s where I will write my articles before uploading them to the website. (You can guess where I typed this article ^_^). Happily, Obsidian easily coped with everything except subscript and superscript. But this is excusable🙃.
Bonus 2: PyCharm
I write code for my website in PyCharm Community Edition, which, in turn, has embedded Markdown viewer, so… yes, you got it right :)
What doesn’t work
- line break in text blocks (without two extra spaces)
- text decoration
- highlight
- subscript
- superscript
3. internal links don’t work inside IDE
4. special symbols escaping somewhy displays backslashes before symbols while normally should hide them
5. iframe doesn’t work
What works partially:
- Syntax highlight only works for Python. Maybe it’s all about the Community Edition, and Enterprise Edition supports other languages as well, but I haven’t checked it.
Note
In fact, in several cases absence of implementation of some syntax (for example, text decoration and tables) is not a bug, but a feature, since some Markdown parsers adhere to the CommonMark specification. Other parsers, such as remarkable, allow you to enable the “CommonMark” mode optionally.
The CommonMark specification aims to unify Markdown language. This can be useful, for example, if you need to transfer text in Markdown between different systems. However, for my site I needed extended functionality, so these parsers didn’t suit me.
Also, in a number of parsers, html tags considered unsafe (like <iframe>
) are not rendered intentionally. This is called "html sanitizing". It is useful, for example, if the Markdown parser is used to render user-generated content. However, I will write all the content on the website myself, so this function will only bother me.
Ability to modify parser’s behaviour
Generally, parsers work according to the following algorithm:
Markdown -> parsing -> internal representation -> rendering -> HTML
Some parsers allow to modify their inner logic. Parser can give you access to parsing and rendering functions or let you modify internal representation. This gives you the ability to insert additional functionality or to alter existing. Such extensibility opens the way for the evolution of the community plugins.
I failed to find any mention of extensibility in the documentation of these parsers:
- commonmark.js
- MarkdownDeep
Others will be investigated below:
markdown-js
markdown-js lets you access to its internal representations. The parser’s logic is as follows:
Markdown -> parsing -> Markdown syntax tree -> conversion -> HTML syntax tree -> rendering -> html
Internal representations are stored in a form of JsonML trees. You can access them by calling parsing, conversion and rendering functions separately one by one.
markdown-it
The pipeline of markdown-it consists of parser and renderer.
Parser’s behaviour is defined by rules, which are divided into 3 groups: core
, block
and inline
(whatever that means). You can add your rules along with existing.
The result of a parser is not a usual abstract syntax tree, but a list of tokens. The developers state that it’s made for simplicity. Although I don’t see any difficulty in syntax trees, flat structure should definitely have its own benefits.
After the list of tokens is ready, you can modify it yourself on demand.
Then, the list of tokens goes to the renderer, which is also extensible via custom rules.
You can see the list of available plugins here.
Marked
Marked works mostly like others:
Markdown -> parser -> syntax tree -> renderer -> HTML
However, the documentation uses the terms quite loosely.
The parser, which is called lexer
, manages a set of rules named tokenizers
. You can both add your own tokenizers and modify built-in by some kind of subclassing the container for built-in tokenizer functions. Thus, you shadow functions from parent class with your own ones, but you can fall back to default behaviour by making your function return false.
You can specify a function walkTokens
, which receives a syntax tree as an input and should return it as an output. Any modifications of the tree can be done inside.
The tree goes to the render (which is parser
here) and it calls a set of rules called renderers
. As with the parser, you can either add your own functions or inherit from existing ones.
remark
The remark project is developed with a passionate love to decomposition. Remark uses mdast-util-from-markdown parser based on micromark, mdast syntax tree, which is an implementation of unist for Markdown, mdast-util-to-markdown renderer and a wrapper called unified to put it all together. Hmmmm…
Anyway, I don’t really want to dive deep into all this, especially since this parser’s logic is not particularly different from the others.
On the other hand, the list of plugins for this project is rather impressive, so perhaps the micro repository approach has its own advantages.
remarkable
Due to remarkable having common roots with markdown-it (see below), their core logic is similar. I didn’t go into details, so for the details of implementation, please go here.
You can see the list of plugins here.
Showdown
It seems that plugins in Showdown are just a set of regular expressions and functions that sequentially modify the whole text.
Logic can be defined as so:
Markdown -> regex/function 1 -> modified text -> regex/function 2 -> … -> regex/function n -> HTML
It’s a pretty straightforward solution that lets you make plugins really easily.
However, a significant flaw of this approach is a poor performance, because each function runs independently and has to parse the entire text from scratch.
texts.js
As far as I understand from the documentation, it is possible to access the internal representation of texts.js, which is a custom JsonML implementation called TextJSON.
Conclusion
Interestingly, although all the examined parsers work on similar principles, they differ greatly in the details of the implementations.
I like the logic of the markdown-js implementation — it’s not overcomplicated and convenient for writing plugins. Unfortunately, markdown-js is not maintained anymore, and therefore I won’t use it.
The logic of the markdown-it, remarkable and Marked implementations is okay, but the documentation confuses with its terminology.
Remark looks like the best-documented project, but its level of decomposition seems redundant.
Plugins in Showdown are really easy to create, but at a price of significant performance reduction.
It’s difficult to say anything about texts.js due to incomplete documentation.
To sum up, if you are for plugins, you can safely take:
- markdown-it
- Marked
- remark
- remarkable
Performance
I could make benchmarking myself, but it was much easier to find existing benchmarks and compare them.
Searching for benchmarks
I’ve found 4 benchmarks:
- сommonmark.js benchmark — 2015
- commonmark.js
- markdown-it
- Marked
- Showdown
2. markdown-it benchmark — 2015
- markdown-it
- Marked
- commonmark
3. remarkable benchmark — 2014
- remarkable
- Marked
- commonmark
4. markdown-benchmark — 2015
- markdown-js
- Marked
- showdown
I haven’t found benchmarks for:
- MarkdownDeep
- texts.js
- remark
All these benchmarks were made at about the same time, so we will consider them comparable.
The studies date back to 2014–2015, but we will consider them valid, because if the developers had significantly improved performance since then, they would have been written about it in the project’s readme.
Now, having these benchmarks, we can build the dependency graph:
The colors of the arrows match with the colors of the benchmark authors.
Comparison
Performance is measured in operations per second, that is, the more, the better.
I calculated the relative performance for each benchmark separately:
- commonmark
- showdown = 1
- commonmark.js ~ Marked ~ markdown-it = 3
2. markdown-it
- commonmark.js = 1
- markdown-it = 0.6 (1.28 in CommonMark mode)
- Marked = 1.3 (version 0.3.5)
3. remarkable
- commonmark.js = 1
- remarkable = 1.88 (2.34 in CommonMark mode)
- Marked = 0.573 (here is an old and slow version — 0.3.2)
4. markdown
- Showdown = 1
- markdown-js = 0.61
- Marked = 2.99
Benchmarks analysis:
- You can see that commonmark.js, Marked and markdown-it are approximately 3 times faster than Showdown.
- Second benchmark’s data roughly confirms first benchmark’s data.
- According to the third benchmark, remarkable is 2 times faster than commonmark.js and therefore 6 times faster that Showdown. This is an impressive result, which is too good to be true. Since it is produced by the developers of remarkable, it can’t be trusted too much. Considering that remarkable and markdown-it have the same roots, I assume that their performance is about the same.
- According to the fourth benchmark, markdown-js is 40% slower than Showdown.
Now let’s sum everything up. The first benchmark looks the most reliable, so I’ll take the Showdown performance as a baseline.
So, here is the comparative table I got:
Conclusion
The results of the comparison show that markdown-js and Showdown are total performance outsiders, while other parsers are roughly at the same level.
According to benchmark by remarkable developers, this parser is the fastest among all by a large margin. However, I doubt it.
I am curious to see the performance of the remark parser. Maybe another time…
To sum up, if you struggle for performance, you can safely choose:
- commonmark.js
- markdown-it
- Marked
- remarkable
Final choice
According to the results of the comparison, two parsers won: markdown-it and remarkable. In fact, these projects have a lot in common, including common developers.
Looking through the version histories of both projects, you can find a lot of interesting things. The remarkable project emerged first. A few months later, markdown-it appeared — most likely as a fork of the remarkable. Since then, the projects have been developing in parallel.
Both projects have:
- MIT licence
- functioning live demo
- faultless results at syntax text
- great tools for extensibility
- a broad variety of plugins
- top performance
Personally, I chose remarkable because it had a sample code in the demo and I was able to quickly integrate it into my project.
In general, I didn’t find any significant differences between these two parsers, so I recommend both!
How I set up the parser
So, I chose remarkable, and I had to configure it.
What’s out of the box
I was pleasantly surprised that out of the box it supports a lot of cool features. Including those I didn’t know about:
- Footnotes
- Abbreviations
But what turned out to be exceptionally useful is the support of the collapsible blocks (spoilers)!
Plugins
There are many interesting plugins on this list.
I made use of remarkable-katex plugin based on KaTeX library for rendering LaTeX formulas in the web.
If you know Japanese, remarkable-furigana plugin may be useful for you, as it lets you draw the pronunciation of the hieroglyphs above them.
I’ll leave the rest of the plugins to you for self-study.
include
On my site, I keep the sources of my articles as plain Markdown files. For convenience, I needed a way to include contents of some files into others.
It would be wrong to solve this problem using remarkable, so I wrote a preprocessor function that receives the path to the root file as input and recursively inserts the necessary subfiles into it.
For example, given such a file structure:
- posts/
- — main.md
- — parts/
- — — part.md
- — — part2.md
with such a file contents:
// main.md
// absolute path
@include '/posts/parts/part.md'
// or relative path
@include './parts/part2.md'
!!!
// part.md
Hello
// part2.md
world
we will render main.md
into this:
// output
Hello
world
!!!
For those who are interested, here is the preprocessor code:
async load_content_by_url(url) {
let response = await fetch(url)
let text = await response.text()
return text
},
// str.replace() can't handle asynchronous requests, so we need a wrapper
// source: https://stackoverflow.com/questions/33631041/javascript-async-await-in-replace
async replaceAsync(str, regex, asyncFn) {
const promises = [];
str.replace(regex, (match, ...args) => {
const promise = asyncFn(match, ...args);
promises.push(promise);
});
const data = await Promise.all(promises);
return str.replace(regex, () => data.shift());
},
async load_content_with_includes(url) {
let file_dir = url.substring(0, url.lastIndexOf("/"))
let text = await load_content_by_url(url)
let out_text = await replaceAsync(
text,
/^@include\s*"(.+)"\s*$/mg, // regex for file includes
async (...match) => {
let url = match[1]
url = url.replace(/^\./, file_dir) // if relative path -> make absolute
let included_text = await load_content_with_includes(url) // get data by url
return included_text
}
)
return out_text
},
Styles
In order to customize articles’ appearance, I created my own styles for all HTML components which can be obtained by Markdown to HTML conversion.
Here is my stylesheet for those who are interested:
<style lang="scss">
$site-defaults-color: #c8c3bc;
// 3.
$quote-border-color: #666;
// 4.
$code-border-color: #666;
$code-bg-color: rgba(255, 255, 255, 0.05);
// 6.
$highlight-color: $site-defaults-color;
$inline-code-color: rgb(3, 218, 197); // = #03dac5
$inline-code-bg-color: rgba(3, 218, 197, 0.1);
// 9.
$table-border-color: #666;
$table-stripe-color: rgba(255, 255, 255, 0.07);
// 10
$hline-color: $site-defaults-color;
//
$details-border-color: #666;
.md-wrapper {
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~1. headers~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@for $i from 1 through 6 {
$sel: "h" + $i;
#{$sel} {
// nothing here
}
}
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~2. text blocks~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~3. quotes~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blockquote {
margin: 15px 0;
padding: 0 20px;
border: 1px solid $quote-border-color;
border-left: 5px solid $quote-border-color;
}
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~4. code blocks~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
code {
font-family: Raleway;
}
pre {
padding: 10px;
margin-bottom: 10px;
display: block;
border: 1px solid $code-border-color;
border-radius: 4px;
background-color: $code-bg-color;
overflow-x: auto;
code {
white-space: pre;
word-break: normal;
word-spacing: normal;
word-wrap: normal;
}
}
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~5. lists~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ul {
list-style-type: circle;
}
ol,
ul {
padding-inline-start: 25px;
}
li {
padding: 3px 0;
}
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~6. text-decoration~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mark {
padding: 2px;
background-color: $highlight-color;
}
code:not([class]) {
padding: 2px 4px;
font-size: 90%;
color: $inline-code-color;
background-color: $inline-code-bg-color;
}
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~7. links~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@mixin link {
color: #fff;
font-weight: bold;
text-decoration: none;
cursor: pointer;
}
a {
@include link;
}
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~8. images~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
img {
max-width: 100%;
}
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~9. tables~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
table {
width: 100%;
max-width: 100%;
margin: 15px 0;
border-collapse: collapse;
border-spacing: 0;
text-align: left;
display: block;
overflow-x: auto;
th,
td {
padding: 10px;
border: 1px solid $table-border-color;
}
thead tr th {
border-bottom: 2px solid $table-border-color;
}
tbody tr:nth-child(odd) {
td,
th {
background-color: $table-stripe-color;
}
}
}
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~10.2 hline~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hr {
border: 0;
height: 1px;
width: 100%;
background-image: linear-gradient(
to right,
rgba(0, 0, 0, 0),
$hline-color,
rgba(0, 0, 0, 0)
);
}
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~spoilers~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@mixin user_select_none {
-webkit-touch-callout: none;
-webkit-user-select: none;
-khtml-user-select: none;
-moz-user-select: none;
-ms-user-select: none;
user-select: none;
}
details {
background-color: rgba(255, 255, 255, 0.02);
padding: 10px;
border: dotted 1px $details-border-color;
summary {
@include link;
@include user_select_none;
}
}
}
</style>
I used highlight.js for syntax highlight. You can see an example of how to use it on the remarkable demo page.
Conclusion
It’s time to sum up. The parser for my site is chosen and configured, which you are joyfully experiencing while reading this article in this blog.
Working on this article, I learned a lot about Markdown and discovered it for myself from a new perspective.
I will be glad if you enjoyed reading this little study. See you, and all the best!