Lexical and Syntax Grammars of JavaScript

John Klimov
3 min readMar 4, 2018

--

I spent some time reading the ECMAScript standard and noticed a lack of definitions of some terms like “lexical grammar”, “statement”, “expression” etc. Then I googled these words but did not find full correct definitions.

Let’s try to make them.

Grammar — a set of language rules.

image source: quex.sourceforge.net

Lexical Grammar — a set of rules to separate the source code to different lexical units: Tokens, Separators (WhiteSpace and LineTerminator) and Comments (one-line and multi-line).

Lexeme — a sequence of characters in source code matching a pattern for a token (a word of code). Examples: “42”, “if”, “foo”.

Token — a pair of token name and a lexeme as a value. Examples: “literal: 42”, “keyword: if”, “identifier: foo”, “punctuator: !”.

There are 4 different tokens in JavaScript: reservedWord, identifier, literal, punctuator. The Standard generalizes reserved words and identifiers to one term “IdentifierName”. Then it says: You may use identifierName as an object property name, but you can use only a valid identifier as a variable name.

Syntax Grammar — a set of rules to make a syntax tree showing a logical structure of code (a syntax tree example).

Statement — a syntax unit which controls a program flow.

I also have another definition for this term: a command to a JavaScript engine describing how to process the next portion of code.

Declaration — a syntax unit which creates a reference.

Expression — a syntax unit which produces a value and can be written wherever a value is expected.

I’ve found a couple of great articles describing a difference between statements and expressions: one, two. But I want to add another explanation.

Statement as a command with parameters.

There 2 statements which are used without parameters: EmptyStatement and DebuggerStatement.

;;; debugger;

BlockStatement receives an unlimited count of properties, each of them is a statement.

{{debugger;}{{;;}{;;;;}}}

VariableStatement receives a list of VariableDeclarations which are pairs of Identifier and Expression of initialization.

var x, y = 5;

As an Engine I would see that code like this:

VariableStatement([{id: 'x', init: 'undefined'}, {id: 'y', init: '5'}])

So, an Expression is just a parameter for some statements. And the next statement is a great illustration of this idea.

ExpressionStatement takes a single Expression as a parameter. Some examples are listed below:

'use strict'
3;
x = 5;
delete x;
fn();
new A;
A `hello`;

ExpressionStatement is just a wrapper for a single Expression and appears there a statement is expected. This statement has no keyword and may be confusing. It looks exactly the same as an Expression, and only a semicolon points on its nature.

All other statements are easily noticeable by their keywords: if, for, while, etc. I’m about to draw a chart visualizing a hierarchy of statements and expressions. Stay tuned :)

--

--