JavaScript Basics: Lexical Grammar, Expressions/Operators and Statements

How does the computer understand JavaScript?

11 min readApr 23, 2017

--

It is important to understand the semantic differences between a JavaScript statement and a JS expression. Why? Because JavaScript is filled with quirks that are difficult to reason with unless you understand how the machine will read your code. Imagine someone comes up to you and says, “Is hot water now.” As a human listener, you may understand what this person is trying to say, maybe, “It is hot, can I have some water now?” (?) but a computer does not have that kind of interpretive capacity. The computer machine might be extremely fast at reading, but it cannot deviate from a specific set of rules to understand a particular statement. However, there is some flexibility, the machine has been programmed to understand seemingly nonsensical pieces of code which unfortunately creates strange interactions in JavaScript.

ECMAScript calls these “rules” the lexical grammar of JavaScript. Much like the grammar of any human language, JavaScript has strict rules in what constitutes an valid statement. Note, I did not say “strict rules that constitutes an understandable statement.” In the English language, understandability often trumps validity. As in the hot-water statement above, while the statement may be invalid by the rules of English grammar, the statement may be understandable to a listener who might give the person a glass of water! In computer code, a valid statement is an understandable statement! If the computer is not programmed to interpret “Is hot water now” as “It is hot, can I have some water now?”, then the computer will throw an error or simply not execute it as you might expect.

Just as one should try to communicate grammatically in human languages, it should be a priority for developers to understand the lexical grammar of JavaScript to communicate with the computer.

My series on JavaScript Basics is built on a simple idea: Understanding JavaScript leads to better code. It reduces unintended results, mitigates bugs, prevents confusion, and most importantly, grants the user the freedom to exercise the full extent of the language’s capabilities! I believe this is a foundational concept that will greatly contribute to that end.

Let’s start with lexical grammar.

Lexical Grammar — The “How” of JavaScript Understanding

According to the ECMAScript specifications, lexical grammar is the set of rules that determine how the computer will parse (understand) the source code (written by you). This includes line breaks, white space, comments, punctuators, semi-colons, identifiers and more! Every little character (empty space, letter, symbol, or number) has meaning!

See the following code below:

var
a
=
3

Would it surprise you if I were to tell you that those four lines is valid in JavaScript (read: JS lexical grammar compliant)? Why? Because if you look at the section in the ECMAScript specs called Line Terminators, it says that “line terminator code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other.” In essence, line breaks in your code are for you. The computer will interpret the code above as var a = 3. However, the following line terminator will create a probable unintended result:

function foo() {
var a = 10;
var b = 10;
return // line terminator
a + b; // on a separate line
}
foo(); // undefined!

You may have thought that invoking foo should have returned 20. However, in a return statement, a line terminator "breaks" the return statement. If you must put a + b on a separate line, you should use something called a grouping operator, which is a set of parentheses, (). See below:

function foo() {
var a = 10;
var b = 10;
return (
a + b
)
}
foo(); // 20.// However the following return statement would have been invalid
return // the grouping operator must be on the same line as return!
( a + b )

You might have felt misled. You were led to believe that line terminators, like white spaces, were for our readability! If only things were that simple in JavaScript. In the lines following the quoted section above, the specification states, “In general, line terminators may occur between any two tokens, but there are a few places where they are forbidden by the syntactic grammar. Line terminators also affect the process of automatic semicolon insertion.” Ah there it is… Lexical Grammar.

Next up, expressions and statements — where these lexical grammar rules apply in our code!

Expressions vs. Statements — The “What” of JavaScript

First, the definitions:

  • A valid expression in JavaScript is the combination of tokens that results in a value.
  • A statement in JavaScript is the collection of one or more expressions and keywords that completes a valid JavaScript thought.

Expressions

Going back to the language analogy, an expression in JavaScript, can be likened to a phrase in English. For example, the sentence “Because the weather is hot, I need a glass of water,” can be broken up into two phrases: (1) “The weather is hot” and (2) “I need a glass of water.” While those phrases can be broken up further, in the context of the full sentence, they exist in two parts, each communicating an idea (in JS, this would be a value) to the overall sentence. Keeping in line with this analogy, the sentence would be the statement in JavaScript.

To be sure, this is an imperfect analogy that falls apart under any level of linguistic scrutiny. The main takeaway is to see that expressions can be distinguished from statements by (1) the context in which they are written AND (2) that expressions are usually a smaller subset of a statement (though not always the case as in expression statements).

Look at the definition above — there are 3 key parts to that sentence: (1) validity, (2) tokens, and (3) value.

First, validity is not only in syntax, but by its context. For example:

var a = 10;a;          // this is a valid expression because 
// `a` has been declared already
b; // this is an invalid expression because
// `b` has NOT been declared

By the strictest definition of an expression, b is an expression, but it is not a valid one in the lines above. It may be syntactically correct, but it is incorrect grammatically (lexical grammar). As such, it is more helpful to discuss valid expressions.

Second, tokens (see here) describe a single “word” of code. This can be a string/number literal (e.g., "hello", 6), an identifier (a), a punctuator (=), etc. When you combine these tokens in a way that results in a value, you have an expression!

Lastly, value means that the expression results in a specific value. This implies that a valid expression can be substituted by its resulting value without breaking the code.

Illustrated below:

10;     // results in 10, therefore an expression
var; // NOT an expression but a keyword (a type of token),
// results in a SyntaxError!
a; // An expression, but an invalid one
a = 10; // finally a valid expression, results in 10;
function foo(x) {
console.log(x);
}
// because `a = 10` is a valid expression,
// it can be substituted for its value.
foo(a = 10); // 10, invoking the function in this manner is
// permissible, though not that common

The example above demonstrates that a proper JavaScript expression is a particular combination of tokens that produces a value. But we also saw that not just any mix of tokens will achieve that.

So then… what are operators? There’s a whole list of them!

Operators

Operators are punctuator tokens (e.g., (, [, <, <=, =, +, -, *, %, ++, etc.) that can combine with string/number literal tokens to form an expression.

Some are unary (only one operand required) operators (e.g., +, the unary plus operator which converts its operand into a number), some are binary (requiring two operands) like the addition operator (+) which adds two numbers together. Note: though the same symbol, the evaluation depends on the context (1 or 2 operands present). There is a single ternary operator (conditional operator) that follows the condition ? expression : expression pattern using ? : .

It may be helpful to think of operators as the articles, conjunctions, and prepositions in English. Just as “for” or “the” lacks meaning in isolation, when combined with other words, it can convey value/meaning. To be sure, I am not saying that “for” or “the” lacks definition meaning. Of course, these words have a definition. However, by itself, these words does not convey information that can be understood by someone. Likewise, though operators like + and * has definitions (addition, multiplication) attached to it, the computer does not evaluate them as a value on its own.

Code:

+;    // SyntaxError! (it expects something after it)
+10; // 10, now this is an expression using the unary plus operator

An expression doesn’t necessarily need an operator (a can be a valid expression by itself), but even the most simple pieces of code will have them. Just be aware that operators can be used to compose more "complex" expressions. There are so much more you can talk about with operators such as operator precedence and the specific definitions of the many operators available, however, this is enough to understand its relevance to this topic on expressions.

Statements

Continuing with the English language analogy (imperfect as it may be), statements are the sentences of JavaScript. However, just as I cannot just say “the slowly man here running” and expect people to understand, we cannot just chain some JavaScript expressions together and expect the computer to execute the code as you had intended.

Say I want to write a line of code that will tell me to plug in my laptop if its battery goes under 10%. I write it like this:

is batteryPercent < 10,
yes?
alert('Plug in your laptop! Low on battery!') // SyntaxErrors!

Duh! This doesn’t work, but why not? Because there is no such statement in ECMAScript. Instead, if we wrote if (batteryPercent < 10) alert('Plug in your laptop! Low on battery!') this would work because we are using the if statement that ECMAScript gives us! There are other statements like iteration statements (for/while loops), block statements, return statements, and many more!

In short, a valid JavaScript thought is one that has been enumerated in the ECMAScript specifications as a statement. In a roundabout way, you know it’s a statement because the documentation says so!

Going back to the jumbled up English sentence, if we wanted to fix it, we can write “The man is running here slowly” or “The man is slowly running here.” Here, we see that there are patterns of sentence construction that has the same meaning. In JavaScript, even these patterns are predefined.

For example, there is a specific valid pattern for if statements in JavaScript:

if (true) {
console.log('valid if statement')
}
{
console.log('invalid if statement')
} if (true)

This is very obvious, but demonstrates once again that you must not only use an enumerated kind of statement (if is being used in both instances), but that use must be patterned correctly as the specs indicate. You cannot just chain together some valid expressions in any order expecting output. Statements in JavaScript are enumerated for the computer so that they know how to handle a chain of expressions. And just as sentences are broken up with punctuation like periods and commas, JavaScript statements are often divided up by semi-colons and sometimes line terminators.

Now comes the tricky part!

Expressions here, but statement there?

Check out the code below:

var a = 10;
var b = a; // here, `a` is an expression
a; // here, `a` is used a statement

Notice in the statement definition above, I noted that a statement may contain one or more expressions. Here, we see an example of a single expression being a statement in line 3. ECMAScript calls this the expression statement. Again, this comes down to the context in which the expression is written. In line 2, a is an expression because its context/position is after an assignment operator (=). See more about assignment operators here. However, in line 3, a is an expression that is also a valid JS thought on its own.

The most popular example of this is when looking at a named function expression (NFE) and a function declaration (FD):

var a = function foo(x) {
console.log('I am a Named Function Expression');
}
function foo(x) {
console.log('I am a Function Declaration');
}

What’s the difference? I plan on going into much greater depth on this specific topic in the following blog post; however, in short, again the context is key. Because the NFE foo is after the assignment operator, it is evaluated as a function expression. However, in the function declaration, it is on its own, and the computer knows what a function declaration looks like and will evaluate it as a FD statement on its own.

Note: it is NOT a function expression at all (therefore not an expression statement there).

The specs on expression statement explicitly states, "an ExpressionStatement cannot start with the function or class keywords because that would make it ambiguous with a FunctionDeclaration." There you have it. The two are distinct, however the determination of which it is depends on the context (or some like to say the "position").

An analogy pulled from the English language would be something like “run” being used as both a noun and a verb. Depending on the context in which “run” is used, it will be identified by the listener/reader as a noun or a verb. Think: “I run to the store” versus “I went on a run”. Likewise, depending on where and how the function keyword is used to create a function object, it can either be a function expression or a function declaration and this distinction has implications for hoisting, stack traces, etc.

Remember, context is king.

Anatomy of the statement

Lastly, I wanted to briefly just take apart a simple statement that would have been confusing to reason without understanding the contents of this post.

Consider the code below:

var a = 10;
var b = 20;
a;

Quickly notice that there are 3 separate statements delimited using semi-colons and line breaks.

If we look at the specs, we see that var is not only a special keyword, but signifies the beginning of a variable statement! The basic syntax of a variable statement is: var + identifier + optional assignment expression. The reason why I wrote "optional assignment expression" is because var a; would have been a perfectly legitimate variable statement. On the other hand, var without an identifier (another kind of expression) is not a complete, valid JS thought because it does not follow the spec's pattern of a proper variable statement.

So in sum, var a = 10 can be broken up as follows: var keyword + a (identifier expression) + = (assignment operator) + 10 (number literal expression) or simply, var keyword + a = 10 (assignment expression).

We know that var cannot be an expression because it does not itself produce a value thus cannot be substituted for a value. Look below:

function foo (x) {
console.log(x);
}
foo(var a = 1); // SyntaxError!
foo(a = 1); // 1 <-- but this works

This also explains the following code:

var a = 10;
function foo (x) {
console.log(x);
}
foo(a > 3 ? 20 : 50); // 20

This works because a > 3 ? 20 : 50 is a valid expression using the conditional operator and as expected, evaluates to a value. However, an if statement cannot do the same because statements cannot take the place of an expression.

Final Thoughts

JavaScript is a tricky language, but like any other language (machine or human), learning the proper grammar and syntax is crucial to communicate effectively. In code, there is a heightened demand for strict adherence to these rules. As such, it is helpful to look at the ECMAScript specs to see what statements are available for you to use and how you can construct those statements using expressions (and expressions using tokens).

Just remember, expressions evaluates to a value and can be substituted for one. Statements cannot be substituted in the same way. To distinguish the two, always consider the context/position in which the expression is written.

There’s a whole lot more to everything I’ve said in this post, but my goal isn’t to get you acquainted with all those details (I certainly don’t know all the quirks). The sole purpose is to expose you to this world so you can reason with the idiosyncrasies of JavaScript. It isn’t magic.

Read more:

--

--

Dan Park

Husband, student of JavaScript, love React! ex-attorney with J.D. from Georgetown Law