A Pure Elm Markdown Parser

James Carlson

Yes, another Markdown parser! This time, in pure Elm, the results of which you can see in this demo. The idea was to make a Markdown parser that could also handle mathematical text, as well as some convenient extensions — strike-through text, verbatim and poetry blocks, and tables. There may be a few more extensions to come, but I am trying to be somewhat conservative in this regard. The API is quite small, consisting essentially of the function

Elm.Markdown.toHtml : Option -> String -> Html msg

which one might apply like this:

Elm.Markdown.toHtml ExtendedMath "Pythagoras said $a^2 + b^2 = c^2$"

The fist argument is of typeOption = Standard | Extended | ExtendedMath. The second argument, a string, can of course be as long as you want, e.g., a whole document.

See jxxcarlson/elm-markdown for the code and documentation.

The Parser

As usual, it is the parser that takes the most care to build. Fortunately, the combinators in the elm/parser library, combined with the general expressiveness of Elm are up to the job. The parser follows the strategy recommended by the CommonMark group — or rather I tried to follow this strategy as best I could. The idea is to first parse the text line by line into a tree of blocks — headings, paragraphs, list items, etc. The content of the blocks is unparsed at this point. My approach was to parse the text into a list of elements of the form (Level, Block) where

type Block = Block BlockType Level Content

and where type alias Level = Int. Think of the document as a kind of outline like this

Introduction
Biology
Plants
Flowering
Non-flowering
Animals
Furry
Non-furry
Chemistry
Organic
Inorganic

The level of an outline element is the number of leading spaces divided by three (integer division).

Consider now a list of things of the form (Level, Whatever), where the level of a thing after another one is either the given level, a lesser level, or the given level plus one. Call such a list annotated. Then outlines define annotated lists, and vice versa. Annotated lists also define a corresponding rose tree. These correspondences make it easy to transform a Markdown document into a tree of blocks. Mapping a parser for inline elements over the tree yields a tree for which it is easy to write a suitable rendering function.

Digging a little deeper

A few more words about parsing into blocks. For this I used a finite state machine defined by

type FSM = State (List Block) Register

The “real state” of the machine is

type State
= Start
| InBlock Block
| Error

The (List Block) part accumulates the list of annotated blocks, while the Register is used to manage information on section numbers and also a stack that is used for parsing tables. It is quite a flexible set up — easy to add to and to modify. One runs the machine using

runFSM : Option -> String -> FSM
runFSM option str =
let
folder : String -> FSM -> FSM
folder =
\line fsm -> nextState option line fsm
in
List.foldl folder initialFSM (splitIntoLines str)

Of course, the real work is the the construction of

nextState : Option -> String -> FSM -> FSM

From Annotated Lists to Rose Trees

To get a tree from the annotated list, one uses the jxxcarlson/htree library, which exposes

fromList : a -> (a -> Int) -> List a -> Tree a

The first argument is the root node label, the second is a function that maps node labels to integers (levels), and the last is the annotated list. This library relies on a zipper in the zwilias/elm-rosetree library.

Compliance with CommonMark, Plans

I’d like for the toHtml Standard function to satisfy the CommonMark spec. It is definitely not there yet, albeit quite serviceable in its current form. This is a goal towards which I will work, given time and resources. I would also like to use MathJax 3 for rendering math, rather than the current 2.7.5 version. MathJax 3 is much faster than 2.7.3, and I am hoping that using it will eliminate the “flashing” that one sees when live editing a document that has math.

The current version of the library renders strings to Html msg. I plan functions to render to String representing (a) HTML, (b) LaTeX. The latter is (a) for the heck of it, (b) to provide a way to generate PDF output.

NOTE: no Javascript whatever needed for the Standard and Extended options. Of course, you will need it for ExtendedMath.

James Carlson

Written by

jxxcarlson on elm slack, http://jxxcarlson.github.io

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade