Language Server for Ballerina— Auto Completion Engine in Depth
Before going through this post, I would like to recommend you read through my previous post on Why Use Language Server. This will provide you a basic idea of the purpose and applicability of Language Server (LS) in real world.
When I started working with BallerinaLang we had the requirement of having a universal implementation of a language smartness provider. This is when we decided to do an implementation of Language Server Protocol (LSP) for Ballerina.
This post is for those who try to a LSP implementation, focusing on one of the most important language smartness feature, Suggestions and Auto-Completion. I will describe the underlying architecture and design decisions in detail in the architectural point of view, independent of the implementation details.
Syntactic and Semantic Knowledge
When we consider a language smartness provider, there are two major aspects to focus on.
- Syntactic Knowledge
Syntactic knowledge is important when making smartness decisions related to language’s grammar specification, such as which language constructs are allowed within given language construct.
Eg: In Ballerina, You cannot have a Resource inside a Function Definition. - Semantic Knowledge
Semantic knowledge is important when making smartness decisions related to defined semantic rules of a Language.
Eg: Visibility of defined variables in different scopes, such as variables defined inside a while loop are not visible outside of the while loop construct.
Consider the aforementioned two factors. When you consider the Syntactic Knowledge, it tightly binds to the grammar of a certain language and when you consider the semantic knowledge, it tightly binds to the Semantic rule set of the language. In Ballerina, this is handled by the Semantic Analyzer. I’ll go through about these implementations and knowledge extraction methodologies later in this post.
When you implement your auto-completion solution you need to consider both of the above factors and implement a hybrid solution to provide the best suggestions to the user.
When the user keeps typing the content, you cannot get a completed source 99% of the time, instead you get an incomplete source and using that you have to guess the type of language construct which is being typed. This is where you need to integrate the Syntactical knowledge of language (Grammar Specification) with your Auto-Completion implementation. When you decide which constructs are being typed, and then need to provide the appropriate suggestions. As an example consider the following sample Ballerina function.
function helloWorldFunction(string userName) { string customString = "Custom"; string combinedMsg = <<cursor_position>>}
When the user executes the key combination for suggestions, auto-completion engine should be able to suggest both variables (userName and customString) as the result. Who decides both of the variables can be used at the given cursor position? All these information is bound to the Semantic Rule Set of the language. Therefore, your Auto-Completion implementation should be aware of the Language’s Semantic Rule Set as well.
Design Diagram for Auto Completion Engine
Core completion engine is consisted with three main steps as shown in the above high level design diagram. Let’s have a look at those main three components and the usage in brief.
- Symbol Extractor
Symbol extractor is the starting point where the completion engine parse the given incomplete Ballerina Source and identifies the valid scope of the cursor position. According to the LS Specification, for a given completion request, we get the cursor position as the line number and the column number. Based on the cursor position, Symbol extractor identifies the scope where the cursor is positioned.
As shown in the diagram, Symbol Extractor uses a ANTLR Custom Error Strategy. When the ANTLR Run-time parses an erroneous/ incomplete source according to the grammar specification, it throws an error. You can write your own, extended custom Error strategy in order to utilize the token stream, current token, and etc. When a grammar rule is violated in the source, ANTLR will provide you the information about the violated context/ rule and this particular information is used by the Ballerina Completion engine in order to Identify the parser rule context and hence to identify what type of statement is being written by the user.
Ballerina Compiler is also being used at the Completion Engine in order to build the Abstract Syntax Tree (AST) for the source. ANTLR will skip the erroneous rule (Single Token Insertion/ Single Token Delete) and proceed building the parse tree and hence the Ballerina Compiler will generate the AST.
This particular AST is being processed with the Ballerina Compiler up to the Semantic Analyzer Phase and the resulting tree is visited by using a Tree Visitor in order to extract the visible symbols.
While visiting each and every node, we check whether the cursor position is located within a given node’s scope.
Eg: If a tree node representing a While node is given, check whether the cursor is located between the start and end of the while node.
When a node is identified then extract all the visible symbols for that particular scope. According to the Ballerina Grammar, these include, defined variables, imported packages, functions, endpoints and etc.
You can find more about the implementation of the symbol tree visitor at [1] (I would recommend to follow the implementation in order to get a better grasp of the symbol extractor’s behavior :)). - Symbol Resolvers
Symbol Resolvers are the next step after extracting all the visible symbols for a given scope. Depending on the Parser Rule Context (Extracted through the ANTLR Error Strategy) and the cursor’s scope (Identified during the Tree Visitor stage)we choose a Symbol Resolver. This particular Symbol Resolver filter the extracted visible symbols and creates a list of completion items from the Symbols. These completion items contains snippets, various signatures derived from the symbols and etc. - Completion Item Sorters
It’s not enough populating a list of completion items when we present those to the user. One of the main advantages of using auto completion and suggestions feature of an editor is a meaningful way of suggesting the various items. These should include how relevant the suggested items to a user as well as what are the priorities should be given to those items depending on the context and usage.
In order to sort and prioritize the completion Items, Ballerina Completion Engine includes another phase of item sorting which sorts the completion items by analyzing the Parser Rule Context as well as the Cursor Scope.
Eg: int total = ….
In the above incomplete segment, if the completion engine suggests functions, then the priority should be given to the functions which are returning an Integer value.
In this section I described how the layout of the Ballerina Completion engine in an design point of view. You can get a more detailed information about the implementation by following the language server implementation of Ballerina at [2]. There you can find the module for language-server and will contain the actual implementation.
Summary
During this article we had a brief look at the design perspective of Ballerina Language Server’s Completion Engine. This completion engine implementation focuses on both semantic and syntactic information about the source as well as considers the contextual information while presenting the completions and suggestions to the users in order to increase the user experience.
Our Completion engine implementation is based on the Ballerina’s compiler as well as the Ballerina Grammar (ANTLR Grammar). I hope this particular design can be altered and extended to be used with any other grammar combined with the Language compiler in order to extract syntactic and semantic information.
As a major requirement of a Completion Engine, it is important to improve the user experience and in the sense, implementation should always be aware of the contextual information. In order to improve the user experiences following actions can be taken depending on the context.
- Appropriate snippet suggestions
- Sorting and prioritizing the completion Items
- Binding Post completion actions such as on demand refactoring
You can experience the actual implementation of Ballerina’s Language Server through the VSCode Plugin for Ballerina.
References
- https://github.com/ballerina-platform/ballerina-lang/blob/master/language-server/modules/langserver-core/src/main/java/org/ballerinalang/langserver/completions/TreeVisitor.java
- https://github.com/ballerina-platform/ballerina-lang/tree/master/language-server
- https://ballerina.io/
- https://marketplace.visualstudio.com/items?itemName=ballerina.ballerina