Using parser combinators to parse ingredient amounts in javascript — part 1

The happy code
4 min readNov 29, 2019

--

This is a post about parser-combinators and how to use them to parse an input. Ingredient amounts is just a simple example to demonstrate how you can use them to parse anything from css to your own custom domain specific language. You can even parse most programming languages using this technique.

To start with, I’ll list a few examples of amounts I’d like to be able to parse.

  • A simple number: 1
  • A fraction: 3/4 or 1 1/2
  • A decimal: 1.5
  • Any of the above with a unit: 3/4 lb
  • Any of the above as a range: 1 to 2 or 1 to 2 ounces or 500 ml — 1l

If you prefer a more formal definition, here are those rules in BNF:

         <amount> ::= <range> | <quantity>          <range> ::= <quantity> <range-separator> <quantity><range-separator> ::= "-" | "to"       <quantity> ::= <number> <unit> | <number>         <number> ::= <fraction> | <float>       <fraction> ::= <integer> <integer> "/" <integer>
| <integer> "/" <integer>
| <integer>
<integer> ::= /^[0-9]+/ <float> ::= /^[0-9]+\.[0-9]*/

Looking at the above definition we can see that many of the rules are defined with respect to other rules and they are combined in two ways.

  1. The pipe (|) separates rules where one of them must apply
  2. A space separates rules that must apply in sequence

We will define parsers for parsing primitives (such as strings/regexes); we’ll define parsers that combine those primitives together in the two ways mentioned above and finally we’ll define parsers for all the rules (amount, range, etc) by combining the parsers already defined.

In order to make it easy to combine them, parsers should be higher-order-functions that take no arguments. That is, they should not parse an input themselves, rather they should return a function that parses the input.

This makes it easy to combine parsers together since there’s no need to write complex code to pass through the arguments.

We’ll start with a parser generator that generates string parsers:

function string(str) {
return () => input =>
input.startsWith(str) ? [str, input.substring(str.length)] : null;
}

As I said above, parsers don’t take any input and I stand by that comment. This is a generator of parsers and returns a function that does not take any input which is the parser itself.

Using this parser generator, I can create parsers to parse a dash or the text: to

const stringDash = string("-");
const stringTo = string("to");

If the text parses, these parsers will return a tuple of the parsed text and the remaining un-parsed text. If the text does not parse, they will return null.

stringDash()("abc") ==> null
stringDash()("-bc") ==> ["-", "bc"]

Next we’ll write the first combinator. This combinator will require that each parser must successfully parse the input that remains after the previous parser finished parsing

function followedBy(...parsers) {
return () => input => {
const values = [];
for (const parser of parsers) {
const result = parser()(input.trimLeft());
if (result == null) {
return null;
}
const [value, remaining] = result;
values.push(value);
input = remaining;
}
return [values, input];
};
}

Like the string function previously, this is actually a parser generator. We need to be able to create arbitrary sequence parsers.

Also like the previous string parsers, the generated parser returns a tuple. In this case the first element is an array that collects the results of all parsers that were passed in. The second element is (again) the remaining text after parsing.

Finally, I don’t care about whitespace between the text that is being parsed so this trims whitespace before calling the next parser.

You can combine parsers now and test it out:

followedBy(string("parser"), string("combinators"))()("parser combinators") ==> [ [ 'parser', 'combinators' ], '' ]

Lets define a parser generator for generating regex parsers:

function regex(test) {
return () => input => {
const matches = test.exec(input);
return matches != null
? [matches[0], input.substring(matches[0].length)]
: null;
};
}

And use it to define parsers for parsing integers and floats (excluding signs):

const integer = regex(/^[0-9]+/);
const float = regex(/^[0-9]+\.[0-9]*/);

Lets also define another combinator, this time where, given a list of parsers, one of them needs to parse sucessfully for this parser to succeed. It’ll return the result of the first parser that succeeds and it corresponds to the bar (|) in the BNF representation:

function oneOf(...parsers) {
return () => input => {
for (const parser of parsers) {
const result = parser()(input.trimLeft());
if (result != null) {
const [value, remaining] = result;
return [value, remaining];
}
}
return null;
};
}

That is all the primitives we need and that’s also all the combinators we need. All that is left is to combine these together as per the BNF grammar above (in reverse order).

To see how that is done, take a look at the full listing below:

function parse(input) {
function oneOf(...parsers) {
return () => input => {
for (const parser of parsers) {
const result = parser()(input.trimLeft());
if (result != null) {
const [value, remaining] = result;
return [value, remaining];
}
}
return null;
};
}
function followedBy(...parsers) {
return () => input => {
const values = [];
for (const parser of parsers) {
const result = parser()(input.trimLeft());
if (result == null) {
return null;
}
const [value, remaining] = result;
values.push(value);
input = remaining;
}
return [values, input];
};
}
function regex(test) {
return () => input => {
const matches = test.exec(input);
return matches != null
? [matches[0], input.substring(matches[0].length)]
: null;
};
}
function string(str) {
return () => input =>
input.startsWith(str) ? [str, input.substring(str.length)] : null;
}
const float = regex(/^[0-9]+\.[0-9]*/);
const integer = regex(/^[0-9]+/);
const fraction = oneOf(
followedBy(integer, integer, string("/"), integer),
followedBy(integer, string("/"), integer),
integer
);
const number = oneOf(fraction, float);
const unit = regex(/^(ml|l|ounces)/);
const quantity = oneOf(followedBy(number, unit), number);
const rangeSeparator = oneOf(string("-"), string("to"));
const range = followedBy(quantity, rangeSeparator, quantity);
const amount = oneOf(range, quantity);
return amount()(input);
}
console.log(parse("12 ml to 12 1 / 2 ml"));// [ [ [ '12', 'ml' ], 'to', [ [ '12', '1', '/', '5' ], 'ml' ] ], '' ]

To find out how to make more sense of the parsed output, see part 2

--

--