VS Code Language Server Extension for COBOL Preprocessors

Published in

Modern Mainframe

8 min readJan 10, 2024

Why do we need preprocessors?

COBOL is one of the most widely adopted languages for writing business applications that require heavy data manipulation and processing. Additionally, COBOL is known for its readability and efficiency.

COBOL handles runtime and database interaction with nonstandard COBOL syntax like EXEC CICS or EXEC SQL. COBOL compilers use preprocessors to translate these statements into COBOL syntax before compiling the program. Many organizations and vendors have extended the COBOL language with their custom statements and preprocessors that can translate those custom statements into COBOL before compilation. Handling these custom extensions as part of the build process is straightforward. The challenge is handling these custom extensions in a modern developer environment like VS Code, where users expect editor capabilities like syntax highlighting and syntax validation to apply to custom extensions as well as standard COBOL. This is where LSP Support for COBOL preprocessors comes into play.

To enable support for various preprocessors of COBOL, we came up with a framework that would enable vendors or customers to plug in their custom preprocessor syntax so that it can be handled by the COBOL LSP editor. This support can be enabled using an add-on to the COBOL Language Support extension for Visual Studio Code (commonly referred to as COBOL LS). With this framework, any organization or vendor can develop their own VS Code extension that can register with the COBOL LS extension and provide language server features like syntax highlighting, code completion, and code analysis for their preprocessor right out of the box.

We refer to these custom language extensions/preprocessors as dialects.

Now let’s walk through the steps to create a custom VS Code extension.

Prerequisites

Refer to the wiki page on the COBOL LS repository, which offers instructions on “How to Create Your Own Dialect” and specifies the requirements that dialect implementers should adhere to.
Dialect support requires the Java runtime environment.
A basic understanding of Java and JavaScript is necessary for creating a dialect extension.
Familiarization with any parser generator technology is nice to have. In this blog, we’ll be showcasing an implementation using ANTLR (Another Tool for Language Recognition). It’s worth noting that this choice is not restrictive; the implementer has the flexibility to opt for any parser generator or even regex, as long as they adhere to the outlined requirements.

Check out a colleague’s blog for ANTLR enthusiasts.

Hands-on

Let’s create a VS Code extension that provides Language server support for the below hypothetical statement

Shift Statement: `EXAMPLE SHIFT identifier1 TO identifier2`

Take a look at the example code below for a better understanding of its usage.

       IDENTIFICATION DIVISION.
       PROGRAM-ID. AS.
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01 STRUCT-FROM.
           03 VARNAME  PIC X(20).
           88 FLAG1-ON              VALUE 'Y'.


       01   STRUCT-TO.
           03 VARNAME  PIC X(20).
           88 FLAG1-ON              VALUE 'Y'.


       PROCEDURE DIVISION.




      *    Shift statement: SHIFT <identifier> TO <identifier>.
       EXAMPLE SHIFT STRUCT-FROM TO STRUCT-TO.

Any VS Code language extension consists of two parts

Language Server: A language analysis tool that normally runs in a separate process. It is the role of the Language server to parse the source code and generate an abstract tree representation of the code, commonly known as an abstract syntax tree (AST), and provide language features based on the AST.
Language client: A normal VS Code extension written in JavaScript / TypeScript. This extension has access to the whole VS Code Namespace API.

Phase 1- Create the language server jar

To create the language server jar:

Create a new Maven project to support your dialect, naming it, for instance, “dialect-abc”. Use the antlr4-maven-plugin for parser generation from the grammar. For reference, consult the POM.xml file.
Include a dependency on the “common” module, which contains the contract exposed by COBOL LS.

The next step is to create an ANTLR grammar for a custom dialect. The grammar for this sample can be found here.

Once the grammar is created, the next step is to create an ANTLR parser.

parser grammar ExampleParser;
options {tokenVocab = ExampleLexer;}
startRule: .*? myRules* EOF;
myRules: (shiftStatement | injectStatement | dataDescriptionEntry100 | bitwiseShiftstatement | untieStatement | unsetStatement | rpcParseStatement) .*?;
shiftStatement: EXAMPLE SHIFT qualifiedVariableDataName TO qualifiedVariableDataName DOT_FS?;

Once the parser is created, the next step is to create a java class that handles the implementation of support for the custom dialect. This is accomplished by creating a java class implementing the CobolDialect interface. The CobolDialect interface exposes all the necessary contracts needed by a dialect interpreter.

Override the method CobolDialect#processText, to process the dialect source and generate dialect-specific nodes and errors (if any). The essence of dialect processing lies in the replacement of dialect-specific code with either blank spaces or “FILLERS” while returning nodes and errors. This substitution ensures that the COBOL engine receives only the transformed source code, eliminating any remnants of the dialect source code.

The example utilizes the ANTLR visitor approach to parse and process the text, as evidenced below. The complete implementation of the visitor can be found on GitHub.

   @Override
  public ResultWithErrors<DialectOutcome> processText(DialectProcessingContext context) {
    ExampleVisitor visitor = new ExampleVisitor(context);
    List<SyntaxError> errors = new ArrayList<>();

    // parse the document text to get parseTree
    ExampleParser.StartRuleContext startRuleContext =
        parseMyRule(
            context.getExtendedSource().getText(), context.getExtendedSource().getUri(), errors);

    // Traverse the parse tree to generate dialect specific nodes
    List<Node> nodes = new ArrayList<>(visitor.visitStartRule(startRuleContext));

    // Add nodes returned by extend method. CopNode in this scenario.
    nodes.addAll(context.getDialectNodes());

    // Add error encountered while visiting the parser. To be reported to COBOL LS engine.
    errors.addAll(visitor.getErrors());
    return new ResultWithErrors<>(new DialectOutcome(nodes, context), errors);
  }

  private ExampleParser.StartRuleContext parseMyRule(
      String text, String programDocumentUri, List<SyntaxError> errors) {
    ExampleLexer lexer = new ExampleLexer(CharStreams.fromString(text));
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    ExampleParser parser = new ExampleParser(tokens);
    DialectParserListener listener = new DialectParserListener(programDocumentUri);
    lexer.removeErrorListeners();
    lexer.addErrorListener(listener);
    parser.removeErrorListeners();
    parser.addErrorListener(listener);
    parser.setErrorHandler(new CobolErrorStrategy(messageService));

    ExampleParser.StartRuleContext result = parser.startRule();
    errors.addAll(listener.getErrors());
    return result;
  }

Our hypothetical “SHIFT” statement also contains variables. In this case, we should allow variable navigation or the hover feature for these variables. To accomplish this, we need to create a QualifiedReferenceNode and VariableUsageNode at these variables’ positions. These predefined nodes are specifically designed to facilitate COBOL variable usage, providing the ability to hover over and peek into variables seamlessly. A comprehensive set of these out-of-the-box node definitions can be found here.

NOTE: We could generate or create a custom node, but it would not be helpful in this situation. However, this is a prevalent use case, where a dialect creates a custom node. After generating an Abstract Syntax Tree (AST) with these custom nodes for the provided source code, a dialect can perform additional processing on top of it by implementing “CobolDialect#getProcessors”. An instance of this scenario can be found here.

   @Override
  public List<Node> visitShiftStatement(ExampleParser.ShiftStatementContext ctx) {
    // replace the dialect specific keywords by FILLER (empty space). So, COBOL LS engine doesn't see these.
    VisitorUtility.addReplacementContext(ctx.SHIFT(), context);
    VisitorUtility.addReplacementContext(ctx.TO(), context);
    VisitorUtility.addReplacementContext(ctx.EXAMPLE(), context);
    ofNullable(ctx.DOT_FS()).ifPresent(ctx1 -> VisitorUtility.addReplacementContext(ctx1, context));
    return visitChildren(ctx);
  }


  
@Override
  public List<Node> visitQualifiedVariableDataName(
          ExampleParser.QualifiedVariableDataNameContext ctx) {
    VisitorUtility.addReplacementContext(ctx, context);
    return addTreeNode(ctx, QualifiedReferenceNode::new);
  }
  
    @Override
  public List<Node> visitVariable(ExampleParser.VariableContext ctx) {
    return addTreeNode(ctx, locality -> new VariableUsageNode(getName(ctx), locality));
  }
  
  @Override
  protected List<Node> defaultResult() {
    return new ArrayList<>();
  }
  
    @Override
  protected List<Node> aggregateResult(List<Node> aggregate, List<Node> nextResult) {
    return Stream.concat(aggregate.stream(), nextResult.stream()).collect(toList());
  }

Once the class implementation is done, use maven to create the dialect support jar.

NOTE : Discover a comprehensive collection of hypothetical statements that begin with the keyword “EXAMPLE” on GitHub. Each statement presents a unique scenario. Feel free to explore the GitHub repository if you’re interested in exploring these various use cases.

If you’re interested in creating test cases refer to the wiki.

Phase 2- Create a VS Code language client.

Step 1: Create an empty VS Code extension

To create an empty VS Code extension follow the steps in the VS Code documentation here.

Step 2: Add a dependency to the COBOL LS extension. This ensures COBOL LS gets installed automatically as dialect extensions should only be activated once we’ve installed and activated the COBOL LS extension.

To add the dependency to the COBOL LS extension, follow these steps:

Open the package.json file in the extension folder.
Add a dependency to the COBOL LS extension by adding the following entry.

“extensionDependencies”: [“BroadcomMFD.cobol-language-support”]

Step 3: Register the metadata of your dialect implementation.

Add @code4z/cobol-dialect-api dependencies in package.json and a corresponding .npmrc configuration file.

"dependencies": {
    "@code4z/cobol-dialect-api": "^1.0.0"
  }

Follow the steps below to register the extension with the COBOL LS extension:

1. Import the “getV1Api” function from @code4z/cobol-dialect-api.

2. Use the “getV1Api” function to get an instance of v1Api. Use the “registerDialect” method of v1Api to register the extension.

3. To bundle the dialect support JAR, put the previously created dialect support JAR into a “server” folder and use its URI when registering the dialect extension.

4. Similarly, provide the URI for the dialect snippets during the registration process.

You can find a code sample here.

Step 4: Coloring support for dialect keywords can be added with a VS Code injection grammar. The details of this process are explained in the VS Code documentation. In essence, a contribution needs to be made to the syntax grammar by injecting it into the “source.cobol” scope. According to the coloring grammar of the COBOL LS extension, the injectTo value should be set as “source.cobol”.

"contributes": {
    "grammars": [
      {
        "injectTo": ["source.cobol"],
        "scopeName": "example-cobol.injection",
        "path": "./syntaxes/example.injection.json"
      }
    ],
    "commands": []
  },

For reference, an example snippet called “example-cobol.injection” can be implemented. The injectionSelector represents a scope selector that specifies the scopes in which the injected grammar should be applied. In our case, it will always be “source.cobol”. In this example, we aim to highlight additional words specific to the dialect.

{
    "scopeName": "example-cobol.injection",
    "injectionSelector": "L:source.cobol",
    "patterns": [
      {
        "include": "#example-keywords"
      }
    ],
    "repository": {
      "example-keywords": {
        "begin": "^.{7}",
        "end": ".{72}$",
        "match": "(?i:BITWISE|CALL|DETAILS|INJECT|LEFT|MESSAGE|MSG_TYPE|MSG|EXAMPLE|PARSE|REPLY|RIGHT|RPC|RPC-MSG|SHIFT|UNSET|UNTIE|XID)",
        "name": "keyword.myrule"
      }
    }
  }

The usage of “L:” in the injection selector indicates that the injection is added to the left of existing grammar rules. This implies that the rules of our injected grammar will be applied before any existing grammar rules.

Step 5: Bundle your dialect jar into an extension, package the extension, and deliver it to your organization.

To bundle your dialect jar into an extension, you can use the VS Code vsce tool, which can be installed using the npm install -g vsce command.

Then, you can use the vsce package command to package your extension into a .vsix file. This file can then be delivered to your organization or uploaded to the VS Code marketplace for public consumption.

Voilà, we are all set to test our first COBOL dialect extension

Let’s try some sample code. Remember to add the following entry in settings.json for the dialect to enable the dialect for that workspace. This dialect name is the same as what we used to register the dialect extension.

NOTE: Processor group configuration can be used as well.

"cobol-lsp.dialects": ["example"],

Summary

The code is parsed successfully with hover support for our custom COBOL syntax.

Overall, creating a dialect extension for COBOL LS involves defining the grammar for the dialect, implementing the main engine class that handles the dialect, and adhering to the contract defined in the CobolDialect interface. This enables COBOL LS to provide support for the dialect, including syntax highlighting, code completion, and code analysis. With a little effort, we can create custom dialects for our specific needs and make our development experience much smoother.

The COBOL Language Server extension is part of the Code4z extension pack. For more information about Code4z, visit code4z.broadcom.com.