Custom language plugin development for IntelliJ IDEA — Part 03

Shan Mahanama
4 min readOct 27, 2018

--

Hello folks, It’s me again :) I got some awesome responses regarding the previous parts (thanks all) and requests to write the next parts of the series. So here I am with the part 3 of the series.

In the last article, we discussed lexers and parsers. From this article, I’ll discuss some concepts which you have to know when developing a custom language plugin for IDEA.

Parse Tree

Parser tree is a tree that represents the syntactic structure of a string (program) according to some context-free grammar.

Let's look at a parse tree for the grammar which we have defined in Part 2 of this series.

Grammar

Program

function test() {
int i;
int j = test2();
}

Parse Tree

Example Parse Tree

I was too lazy to draw this tree myself, so I converted the BNF grammar to ANRLR grammar and used the IDEA’s ANTLR plugin to generate this tree. I’ll show how to do this in a bit. This tree is very useful for debugging issues in the grammar. Unfortunately, we cannot generate a Parser Tree like this using the BNF plugin. But we can see the node structure in the BNF plugin (I’ll show this in the next article).

Parser Tree generation using ANTLR

ANTLR is a powerful parser generator created by Terence Parr. The following code snippet is the equivalent ANTLR grammar for the BNF grammar I have defined earlier.

Generating the Parser Tree

  1. Install ANTLR plugin in IDEA.
  2. Create a new file called Simple.g4 in the IDEA.
  3. Copy the content of the above gist to it.
  4. Right click on the simpleFile grammar rule and select Test simpleFile Rule menu item.

This will open up the ANTLR Preview window.

ANTLR Preview window

You can write your sample program on the text editor which is located at the left side of the ANTLR Preview window. The corresponding Parse Tree will be generated in the right-hand side of the ANTLR Preview window.

Parse Tree generation

Abstract Syntax Tree (AST)

Abstract Syntax Tree which is commonly referred to as AST is a tree representation of the abstract syntactic structure of a source code. This tree does not represent all of the detail about the real syntax, but only the structural and content related details. That is why it is called “Abstract”.

Program Structure Interface (PSI)

This is the layer in IntelliJ platform which is responsible for parsing files and creating syntactic and semantic code model for those files. This model powers various features of IntelliJ platform.

PSI Tree

For each file which is supported by the custom language plugin, a tree-like structure which is called PSI Tree will be created using the AST. We will directly work with this PSI Tree rather than the AST when developing custom language plugins.

These PSI trees will have 2 kinds of nodes (refer the “Example Parse Tree” above).

  1. Internal Nodes — Referred to as Elements or Composite Nodes.
  2. Leaf Nodes— Referred to as Tokens.

Classes for these elements will be generated using the BNF grammar file. We will go into more details in the next article.

Eg — SimpleFunctionDefinition, SimpleFunctionBody, etc.

The flow of creating the PSI Tree from AST

  1. Implementation of the ParserDefinition interface will provide the lexer and parser to the IntelliJ platform. Then the IntelliJ platform will use these to parse the files which are handled by our plugin and create the AST.
  2. When the file is parsed, AST nodes will be created. When creating AST nodes, element type (node type) of each node will be added to it. Eg — FunctionDefinition, FunctionBody, etc.
  3. Then each of these nodes will be passed to a Factory class to generate the PSI element (composite node) for that particular AST node. This generation is done using the element type specified in the AST node. If the element type is FunctionDefinition, then SimpleFunctionDefinition instance will be created, etc. These classes which represent PSI elements (SimpleFunctionDefinition) will be generated by the BNF plugin along with the parser.
  4. These PSI element classes are just wrappers for the AST nodes. This is done by extending all of the composite nodes from ASTWrapperPsiElement class.

Don’t worry if you don’t understand this flow yet. This will become clear when we generate the lexer and parser and create the ParserDefinition class.

Note — This explanation of the flow is done according to the best of my knowledge. But it might not be 100% complete. In such cases, please let me know.

That is it for today. See you folks with the next part of this series soon :)

Part 4 — https://medium.com/@shan1024/custom-language-plugin-development-for-intellij-idea-part-04-df2f3ce88b30

--

--