Automatic semicolon insertion in Go

Formal grammar specifies what constitutes syntactically valid program in Go (or other programming language):

Block = "{" StatementList "}" .
StatementList = { Statement ";" } .

Above definitions are taken from Go specification. They’re using Extended Backus-Naur Form (EBNF). What it all means is that block of code is one or more statements separated by semicolons. Function call is an example of statement. Knowing that we can create a simple block:

{
fmt.Println(1);
fmt.Println(2);
}

Seasoned Gophers should notice semicolons which aren’t used at the end of each line in idiomatic code. It can be simplified to:

{
fmt.Println(1)
fmt.Println(2)
}

Such code works in the same way as the first one. What makes it possible though since grammar requires semicolons?

The roots

What is the reason for language designers to even start working on getting rid of tokens like semicolons? The answer is quite simple. It’s all about readability. The less artefacts code has, the easier it’s to work with. It’s important since once written piece of code will be probably read many times by different people.

Grammar uses semicolons as productions terminators. Since the goal is to free programmer from typing these semicolons, there must be a way to automatically inject them. This is what Go’s lexer is doing. Semicolon is added when line’s last token is one of:

Let’s a some an example:

func g() int {
return 1
}
func f() func(int) {
return func(n int) {
fmt.Println("Inner func called")
}
}

Having such definitions we can analyse two scenarios:

f()
(g())

and:

f()(g())

The first snippet prints nothing but the second one givesInner func called. It’s because of the 4th aforementioned rule — semicolons were added after both lines since the last tokens are closing parentheses:

f();
(g());

Under the hood

Adding semicolons in Golang happens while lexical analysis (scanning). It’s at the very beginning of processing .go file when characters are transformed into tokens like identifiers, numbers, keywords etc. Scanner is implemented in Go itself so we can use it easily:

package main
import (
"fmt"
"go/scanner"
"go/token"
)
func main() {
scanner := scanner.Scanner{}
source := []byte("n := 1\nfmt.Println(n)")
errorHandler := func(_ token.Position, msg string) {
fmt.Printf("error handler called: %s\n", msg)
}
fset := token.NewFileSet()
file := fset.AddFile("", fset.Base(), len(source))
scanner.Init(file, source, errorHandler, 0)
    for {
position, tok, literal := scanner.Scan()
fmt.Printf("%d: %s", position, tok)
if literal != ""{
fmt.Printf(" %q", literal)
}
fmt.Println()
if tok == token.EOF {
break
}
}
}

Output:

1: IDENT "n"
3: :=
6: INT "1"
7: ; "\n"
8: IDENT "fmt"
11: .
12: IDENT "Println"
19: (
20: IDENT "n"
21: )
22: ; "\n"
22: EOF

Lines printing ; "\n" are the places where scanner (lexer) adds semicolons for program:

n := 1
fmt.Println(n)