Image for post
Image for post
Photo by Erik-Jan Leusink on Unsplash

Building an Inline Comment Parser

Get all your inline comments in a single place

Ilya Meerovich
May 29 · 7 min read

When it comes to documenting JavaScript code, it’s hard to overstate the convenience of JSDoc. Write a comment — get auto-generated documentation in whatever format you want. Very nifty, and exactly the kind of tooling we would expect to be available in the JS ecosystem.

Sometimes though, we might want to annotate our source code for other reasons, such as setting reminders to refactor something, remove a method, rename a variable, etc. and then be able to access that information in one place.

In this post I’ll walk through writing a script to parse a series of files and extract these kinds of annotations, with the assumption that they are written as some form of inline comment. For everything else, there is, of course, JSDoc.

Parsing a file

So we have a file that may or may not have inline comments we’re interested in. Before we can do anything with it, let’s read it into memory.

Now we can access the contents of our file.

Once we have our file, we can get each individual line by simply splitting on the line break character.

We can loop over each line and see if it contains information we want by checking for the existence of our comment delimiter (in this case I’ve just decided to go with ‘///’ like in Swift, for example, but of course you can use anything you like).

In the above code we’re just looping over the lines and getting back a better formatted object with the comment text as well as the line number where it appears.

Now we have all the comments from the file.

We could stop here but it would nice to concatenate comments on consecutive lines for better readability. This is why we’ve stored the line number in an array — it will come in handy in the next step.

Concatenating Comments

To concatenate the comments, we’ll reduce the comment array we got in the last step. In our reducer function, we’ll check each comment present in the accumulator to see if it has a line number in its lineNumbers array such that that line number is 1 less than the current comment's line number.

If there is such a comment with such a line number, we’ll concatenate the contents of the current comment we’re iterating over with that comment, and add the current comment’s line number to that comment’s lineNumbers array.

This way we’re setting up the lineNumbers array to be used by the next comment we iterate over, so that we can concatenate as many consecutive comments as we want.

Now we’ve got concatenated comments from a file. To make this a ready-to-use utility, let’s wrap this code in something that will take a glob pattern and then run the code above for every match. For this we will need the glob npm package. Here’s the code all put together:

Edit May 30th: line 36 in the gist above should read comments: comment,

Although this utility can give us enough to detect the information we want, it would be great if we could add some context.

In most IDEs, if you hover over some identifier that’s been annotated with a doc comment, you’ll get some information about that identifier that has been extracted from the comment.

This ability to link comments with the thing they describe is a really nice feature. We’re going to implement a version of this by using a parser combinator.

In the next section I’ll describe what a parser combinator is, but if you’re in a hurry, you can just skip to the next section to see the code :)

Parser combinators

A detailed exploration of this technique is outside the scope of this post, but there are tonnes of great resources that show how it can be used and also how one can build a parser library of one’s own.

The principle behind parser combinators is that instead of writing a giant regex to capture complex values inside an input string, we can compose smaller parsers that themselves only parse a portion of the input, like individual characters, for example.

Parser combinators make our program much easier to understand, and allow us to avoid messing with dots and slashes every time we want to change our parser (not to minimize the awesome power of regular expressions).

Parsers in a parser combinator are commonly pure functions that can accept and return a state object of the same shape. This allows them to be chained together so that each parser can pass along the result of its operation to the next parser in the chain.

The end result of applying a chain of parsers to an input string is the combined result of each individual parser. Parser combinators can also be designed in such a way that if parsing is unsuccessful, the particular parser whose operation failed can add an error message to the state it returns so that we can see at a glance which part of the input was not parsable. That’s something you definitely don’t get with a regex.

To take a concrete example, the str parser you'll see is a higher-order function that takes a string to match and returns an object whose run method accepts target input and just checks if it begins with the earlier string value.

These examples are taken from the documentation of Arcsecond, the parser combinator library we’re going to use.

In addition to this string checking parser, there are also functions for composing parsers that work by taking in a series of parsers and applying some combination of them to the target input.

So the aptly named sequenceOf parser can potentially be used like this:

and similarly, the choice parser is used like this:

Getting the associated identifier

As I mentioned above, we’re going to be using Arcsecond to meet our parsing goals.

For this example, we will attempt to parse any identifier declaration of the form:

  • const x = 'y'
  • let x = 'y'
  • var x = 'y'

Here are the parsers that we will need to parse them:

In the above code, identifierName is a parser that looks for what we've decided are all valid characters inside of an identifier name.

The varDeclaration parser will attempt to parse a string consisting of:

  1. 0 or more whitespace characters
  2. one of const, let, or var
  3. 0 or more whitespace characters
  4. The name of our identifier

Then, we apply a map function to format the resulting output to something more readable.

Note, this map function is not the regular JavaScript map, but rather a function that takes the result of applying the parser, and returns another parser whose state has been modified by the callback we give it.

If we wanted to parse different kinds of identifier declarations, like functions, we could write separate parsers for them, and then simply combine them using choice.

Since we know the line number of the last inline comment we parsed in a given sequence, we just need to check the next line number in the file to see if it contains a declaration. If it does, we’ll extract it, if not, we’ll do nothing.

Our function will take the comment object and the lines array of lines in the file.

We will also need to change what we’re going to be returning from our file loop, like this:

Now, our parsed comment object looks like this:

And that brings us to the end!

If you’re interested in playing around with the code, here’s a Repl!

As long as we have files with comments, our parser should give us a nice list of all the comments, the variables (if any) that they describe, and the files where they can be found.

Our implementation has some limitations — it’s not able to parse object destructuring syntax, for example, so const { hi } = someObject would parse our variable name to be { which is not good.

With the tools we’ve looked at, the reader should already be able to fix this, and make other improvements to our parsing logic.

Finally, if you want to use this in your project, here’s the complete package.

The package uses a slightly different parser combinator, built by following this series by the creator of Arcsecond, where he provides a step by step walk-through of the process of building a suite of parsers. I highly recommend watching the series for an engaging, informative exposition of this technique.

If you’ve made it this far, I hope this excursion into comment parsing was useful to you in some way.

Happy coding!


A note In Plain English

Did you know that we have four publications and a YouTube channel? You can find all of this from our homepage at plainenglish.io — show some love by giving our publications a follow and subscribing to our YouTube channel!


JavaScript In Plain English

New JavaScript + Web Development articles every day.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store