Custom language plugin development for IntelliJ IDEA — Part 02

4 min readMay 2, 2018

After few months of intense deadlines, finally found some time to write the part 02 of this series(it’s only been 8 months since I wrote 01 :P). So let’s get down to business without wasting more time.

In the first article of this series, I discussed how to add file type association support to a custom language plugin. In this article, I am going to discuss about writing a simple Lexer using JFlex and a simple Parser using BNF grammar.

Are you scratching your head right now because you don’t know about Lexers and Parsers? Don’t worry, I’m going to discuss first about these (briefly) below :) And you’ll see some samples as well.

Prerequisites

You’ll need to install following plugins to IDEA.

Also you will have to clone the Simple-Intellij-Plugin repo and checkout v0.1.0 tag if you don’t have it already.

What is a Lexer?

Lexer is responsible for converting a sequence of characters into tokens. This is also known as Lexical Analysis.

A sample Lexer

There are many tools which we can use to generate a Lexer. But we will use JFlex here. JFlex is a lexical analyzer generator. Below sample code generates the Lexer for us. Note that this is not the actual Lexer, we only use this code to generate the Lexer.

In here, the YYINITIAL is the starting state(similar to ANTLR mode) of the Lexer. That means we can have complex rules which has multiple states.

What is a Parser?

Parser is responsible for matching tokens produced by the Lexer with grammar rules. This is also known as Syntax Analysis.

We can use ANTLR to write the parser as well, but I found some issues (when we try to add indexing support) with that approach and so we will use BNF.

A sample Parser

This is the BNF grammar file which we will use to generate the Parser. This has some sample rules which we will use to demonstrate various features of IDEA plugins in future tutorials.

I use the following convention when writing parser rules.

Parser Rules — Pascal Case(Upper Camel Case)
Keywords — Lower Case
Lexer Rules — Upper Case
Other tokens are written within single quotes.

Live Preview

Now that you know the basics of Lexers and Parsers, lets use the Live Preview feature which is provided by the Grammar-Kit plugin we previously installed.

Lets write some “Simple Language” codes :D

Create a new file called Simple.bnf in the org/shan/grammar package and copy the above BNF grammar to it. Now the project should look like this.

2. Now Right Click anywhere on the opened file and select “Live Preview”. Now, a new editor should be opened. In this new editor, you can write sample codes to test your grammar rules.

Now, lets write some Simple functions. Pun intended ;)

Wait, what are those red lines !? :/ :(

No need to panic, you did nothing wrong so far if you see red lines, at least not yet….

These red lines are there because the Grammar-Kit plugin don’t know anything about our Lexer. Specifically, it does not know how to process whitespaces and identifiers. These tokens are identified by the Lexer. But so far, we only defined grammar for Parser. So the Grammar-Kit plugin doesn’t know what to do with whitespaces and identifiers when it encounters those character sequences. That is why those tokens are highlighted as errors.

But what about keywords?

Grammar-Kit will automatically create tokens for these. So keywords are identified correctly and we don’t have to worry about them.

So how can we add other Lexer rules?

Good question. We can provide various attributes to the BNF grammar. “tokens” is such an attribute. Lets add whitespace and identifier Lexer rules as tokens.

Add the following code snippet on top of the BNF grammar file.

{
    tokens = [
        IDENTIFIER = 'regexp:[a-zA-Z_][a-zA-Z0-9_]*'
        WHITE_SPACE = 'regexp:\s+'
    ]
}

So your file should now look like this.

If you still have the Live Preview window open, you’ll see that all of the red lines have disappeared.

That is why it is called Live Preview (Duh! :P).

Whenever you edit the grammar file, the changes will be reflected immediately in the Live Preview window. This is one of the most useful features you will use when you are developing a custom language plugin.

I should admit that this article is longer than I intended. So I’m going to discuss Lexer and Parser generation in the next article. Until then, get your hands dirty.

You can find the code corresponding to this point at GitHub.

Lets hope I’ll find the time to write the next article soon :)

Part 3 — https://medium.com/@shan1024/custom-language-plugin-development-for-intellij-idea-part-02-f948a078dc81