YDJKS — Scopes & Closures: Takeaways
My personal learnings from You Don’t Know JS: Scopes & Closures
This is the 2nd post in a series of 6, where each post covers my personal takeaways from a book in the You Don’t Know JS series, by Kyle Simpson.
In the previous post, we covered the first installment in the series, Up & Going. In this post, we’ll cover the second installment, Scopes & Closures.
I’ll only cover my most important learnings from this book. If you want to get more information, you can check out the book directly on GitHub, or check out my personal notes on GitHub, in which I express my thoughts and less significant reflections on the book.
Before we dive in, here’s a table of contents for this article:
- Basic compiler theory
- Name resolution
- Looking up identifiers in outer scopes
- Cheating lexical scope
- Block scope before ES6
- Hoisting
- Overriding previous declarations
- A simple closure definition
- Closure terminology
- Closing over identifiers past a function’s declaration
- Infamous loops + closure example
- Revealing module pattern
Basic compiler theory
There are 2 broad categories of programming languages:
- Interpreted
- Compiled
(Classifying languages this way doesn’t mean that they’re always one or the other. It just speaks to how to program is usually processed and executed. There’s nothing stopping someone from writing a compiler for a traditionally-interpreted language or an interpreter for a traditionally-compiled language.)
Originally, JavaScript was interpreted. But nowadays, it’s usually compiled.
However, it’s not compiled in the traditional sense.
C, C++, and Java are traditionally-compiled languages. By traditionally
compiled, I mean that they use ahead-of-time (AOT) compilation.
They’re compiled well before execution.
By contrast, JavaScript is usually just-in-time (JIT) compiled. It’s compiled
mere microseconds before execution.
For reference, here are a few JavaScript engines that compile JS before running it:
- V8 (JS => Machine Code)
- Rhino (JS => Java Bytecode)
- SpiderMonkey (JS => Machine Code)
AOT-compilation and JIT-compilation are significantly different. However, the
core process is the same. AOT-compilation and JIT-compilation share 3 core
steps:
1. Tokenization/Lexical Analysis
2. Parsing
3. Code Generation
Tokenization/Lexical Analysis
Tokenization is simply splitting a string into tokens based on a delimiter (usually whitespace).
Lexing/Lexical Analysis goes further. Lexing describes each token.
For example, consider var a = 2
.
Tokenization would yield something like:
['var', 'a', '=', '2']
Lexical Analysis would yield something like:
[
[‘var’, ‘VariableDeclaration’],
[‘a’, ‘Identifier’],
[‘=’, ‘AssignmentOperator’],
[‘2’, ‘NumericLiteral’]
]
To reiterate, Lexical Analysis adds semantic meaning to each token.
Parsing
Parsing is the conversion of tokens to an Abstract Syntax Tree (AST).
Based off of the previous example, this might look like:
Code Generation
Code generation is the conversion of an AST to executable code. The exact process varies depending on the language, target platform, etc.
However, the general process for something like our current example var a = 2
is to create code that
- Creates a variable
a
. - Allocates memory for it.
- Assigns
2
to it.
This is a huge oversimplification of the process. More details will be discussed in the next section.
Processing var a = 2
The steps for code generation we just discussed are a bit inaccurate. A little bit more occurs when processing var a = 2
.
But first, let’s discuss 2 relevant entities: the engine and the compiler.
The engine manages a JavaScript program from start-to-finish. It handles compilation & execution of programs.
The compiler is a component of the engine. It handles the 3 compilation steps we previously discussed (lexical analysis, parsing, code generation).
So, the statement var a = 2
is actually processed twice. 1st at compile time by the compiler. 2nd at run time by the engine.
When the compiler encounters var a = 2
in the code,
- First, it checks if an identifier named
a
already exists in the current scope. - If so, it ignores this declaration.
- Otherwise, the compiler generates code to:
- Declare a new variable nameda
in the current scope.
- Defer thea = 2
assignment to the engine at run time.
When the engine encounters var a = 2
in the code,
- It checks if a variable named
a
exists in the current scope (it does, because the compiler declared it). - If so, the engine assigns
2
toa
. - Otherwise, the engine checks outer scopes for
a
, and might perform some alternative actions, depending on what it finds (will be clarified later).
Name resolution
When the engine encounters something like x = 7
, it performs a lookup for a variable named x
. This process is called name resolution.
Kyle describes 2 types of lookups:
- LHS (Left-Hand Side)
- RHS (Right-Hand Side)
“Left-Hand Side” and “Right-Hand Side” refer to an Assignment Operation’s sides.
LHS Lookups
According to Kyle, LHS lookups are container lookups. An LHS lookup seeks a variable’s “container” to insert a new value inside of it.
For example, x = 'hello'
causes an LHS lookup for x
.
RHS Lookups
On the other hand, RHS lookups are value lookups. An RHS lookup seeks a variable’s value to apply it somehow.
For example x = 'hello' + y
causes an RHS lookup for y
.
Something to note: RHS lookups can appear outside of assignment operations. RHS lookups simply return a variable’s value.
So, console.log(y)
also causes an RHS lookup for y
to print its value. Is console.log(y)
an assignment operation? No.
Speaking at meta-moment for a second, I think that RHS is a misnomer, kind of. Kyle chose these words for explanatory purposes. In other compiler literature, it seems more commonly called an rvalue
lookup, or just a value
lookup. He talks more about this in this GitHub issue.
LHS lookup: assigning arguments to parameters
An LHS lookup occurs in this snippet
function foo(a) {
console.log(a);
}foo(2);
Where? foo(2)
. You may not see an =
sign, but an assignment operation does happen here. Thus, an LHS lookup happens.
The engine encounters foo(2)
. It performs an LHS lookup for a
, foo
's parameter. It assigns 2
to a
's container.
RHS lookup: function invocation
An RHS lookup occurs in this snippet:
function foo(a) {
console.log(a);
}foo(2);
Where (besides console.log(a)
)? foo(2)
again!
To apply foo
to 2
, you need foo
's definition. You need to lookup foo
's definition. You need to perform an RHS lookup for foo
.
Function declaration: not an LHS lookup
This does not cause an LHS lookup for foo
.
function foo(a) {
console.log(a);
}
This does cause an LHS lookup for foo
.
var foo = function (a) {
console.log(a);
};
Function declarations don’t cause LHS lookups. They’re special. For functions created via function declaration, the compiler both declares and defines them at compile time.
By contrast, functions created by assigning a function expression to an identifier are handled normally. The compiler only declares them at compile time. Like with typical assignments to identifiers, the compiler defers assignment for the engine at run time.
Errors
Scopes are nested in other scopes. This topic will be explored a bit more thoroughly in a later section, but it’s important to state this now. If the engine needs an identifier and can’t find it within the immediate scope, it checks outer scopes until
- it finds the identifier or
- it reaches the global scope
Suppose the engine reaches the outer scope and the identifier isn’t there. The outcome depends on 2 factors:
- Lookup type (LHS vs RHS)
- Strict Mode
If RHS lookup, throw a ReferenceError
.
If LHS lookup,
- If Strict Mode is enabled, throw a
ReferenceError
. - If Strict Mode is disabled, declare the identifier in the global scope, and give that back to the engine.
(as a side note, identifiers declared on the global scope automatically become properties of the global object. For web browsers, this is the window
object.)
TypeError
Suppose an RHS lookup succeeds.
If you do something silly/illegal with that returned value, that will (probably) throw a TypeError
. It won’t throw a ReferenceError
.
Suppose an RHS lookup for foo
returned 'test string'
. Also suppose your code does this:
foo.isInteger();
This will throw a TypeError
. Only Number
values have isInteger()
on their prototype.
Basically, ReferenceError
represents a failed name resolution. TypeError
represents a successful name resolution that was misappropriated.
Looking up identifiers in outer scopes
As mentioned previously, if the engine is looking up an identifier, and it can’t find it in the immediate scope, it will begin checking outer scopes. It starts from the nearest outer scope and ends with the global scope. It stops at the first match it finds, if one exists.
It’s helpful to think of scope as “nested bubbles”. For instance, in the following code example (directly taken from YDKJS):
function foo(a) { var b = a * 2; function bar(c) {
console.log( a, b, c );
} bar(b * 3);
}foo( 2 );
You can draw bubbles to represent the nested scopes:
Bubble 1 represents the global scope. It has 1 identifier: foo
.
Bubble 2 represents foo
's scope. It has 3 identifiers: a
, b
, and bar
.
Bubble 3 represents bar
's scope. It has 1 identifier: c
.
Misconception: x in Parent Scope => x in Child Scope
This is somewhat pedantic, but I’ll say it anyway.
Reconsider the above code snippet. b
is in foo
's scope, but it’s not in bar
's scope.
function foo(a) { var b = a * 2; function bar(c) {
console.log( a, b, c );
} bar(b * 3);
}foo( 2 );
Yet b
's value is accessible to bar
's scope if an RHS lookup wanders outside of bar
's scope — into foo
's scope.
However, that doesn’t mean that b
literally is in foo
's scope.
Put another way, a function can access identifiers in outer scopes because the JS engine’s name-resolution system checks outer scopes.
It’s not because a function’s scope literally includes other scopes.
Variable Shadowing
You can declare variables with the same identifier, so long as they’re in different scopes.
If foo
exists in an outer scope, then you declare another foo
in an inner scope, that’s called variable shadowing. The inner foo
“shadows” the outer foo
.
Cheating lexical scope
JavaScript has 2 mechanisms for bypassing the lexical scope system:
eval
with
Both of these features are considered bad practices for several reasons. The primary reason is performance.
Before discussing performance, let’s discuss their mechanisms.
eval eval(str)
executes str
, as if str
were injected at eval
's location.
This can drastically affect the program’s lexical scope.
Therefore, eval(str)
's location in the code can affect
- its execution
- the surrounding code’s execution
For example, consider the following code. It prints 2
.
function foo() {
console.log(a);
}var a = 2;
foo(); // 2
Now, consider this code. It prints 7
.
function foo() {
eval('var a = 7');
console.log(a);
}var a = 2;
foo(); // 7
Note: If Strict Mode is enabled, eval()
is executed in its own scope. That prevents it from tampering with other scopes.
with with (obj) { ... }
creates a lexical scope from the given obj
, then executes the block’s code inside of that scope.
It treats the given object’s properties as if they were lexically defined identifiers in the new scope.
Observe:
const a = 3;const obj = {
a: 7
};with (obj) {
console.log(a); // 7
}
Inside the with
statement, you can actually mutate the given object like so:
const obj = {
a: 7
};with (obj) {
a = 3;
}console.log(obj.a); // 3
Though, there’s a major risk when doing this. You can accidentally declare variables on the global scope.
const obj = {
a: 7
};with (obj) {
z = 3;
}console.log( window.z ); // 3
This happens because of LHS lookup rules, as discussed earlier.
- The engine encounters
z = 3
. - It checks the immediate scope
obj
for a variable namedz
. - It’s not there.
- It checks the outer scope (the global scope) for a variable named
z
. - It’s not there.
- It declares a variable named
z
on the global scope. - It gives that global variable back to the engine.
- The engine assigns
3
to the new global variablez
.
Note: If Strict Mode is enabled, with
-statements are completely banned. They will throw SyntaxError
s.
Performance
JavaScript engines perform certain optimization at compile time. If you can assume your program’s lexical scope is completely finalized at compile time, you can optimize certain things.
If that assumption is invalid, you cannot perform these optimizations.
(unstrict) eval
and with
invalidate that assumption. They modify lexical scope at run time.
eval
and with
prevent JS engines from performing those optimizations. This is the main reason to avoid their usage.
Block scope before ES6
Contrary to popular belief, block scope existed in JS before ES6. Although, it only existed in 2 ways, both of which are kind of trivial. Yet, it’s still fun to talk about, so let’s talk about it.
with
The identifiers created from the given object are blocked to the with
statement’s block. They are inaccessible from outside the block.
const obj = { a: 3 };with (obj) {
console.log(a); // 3
}console.log(a); // ReferenceError
try/catch
's error parameter
The catch
clause’s error parameter is block scoped. This behavior was specified back in ES3.
try {
throw Error();
} catch (err) {
console.log(err); // err exists in this block.
}console.log(err); // ReferenceError. err doesn't exist here.
As an aside, apparently many linters will complain if multiple catch clauses name their error parameters with the same identifier, e.g.
try {} catch (err) {} catch (err) {}
because they think the error parameter is function scoped and that subsequent error parameters with the same name will shadow previous ones.
This is not true, because the error parameters are block scoped. The linters are ignorant or something, I don’t know.
Anyway, to bypass this, many devs will either manually disable this check or give the catch clauses’ error parameters different identifiers, e.g.
try {} catch (err1) {} catch (err2) {}
Either way, such a workaround is unnecessary, from a spec-compliant standpoint.
Hoisting
To be honest, I already knew most of the stuff about hoisting discussed in this book. I already knew that function declarations and var
declarations are both hoisted.
However, I didn’t know that function declarations occur before var
declarations.
In the above example, if the var foo
declaration occurred before the function foo() ...
declaration, it would have thrown a TypeError
. In that case, foo
would be undefined
, and attempting function invocation on undefined
leads to a TypeError
being thrown.
Indeed, this is what happens if I try the code snippet without the function foo() ...
declaration.
EDIT: So, apparently const
and let
declarations are hoisted as well, but the JS engine doesn’t let you access those identifiers until the engine evaluates the const
/let
declaration line (where the LexicalBinding occurs).
Overriding previous declarations
Subsequent function declarations override previous ones.
Same with var
declarations.
However, this is not true for let
and const
declarations. Subsequent declarations of either will throw SyntaxError
s.
A simple closure definition
Kyle states that the discussion of “closure” is anticlimactic, because it’s just the culmination of rules discussed thus far. Indeed, it is.
Online, closure is portrayed to be a spooky, magical thing, but it’s really not. Here’s a definition of closure that I’m personally satisfied with:
A closure is a function and lexical scope
If I were to give a less technical definition, I would compress it even further:
closure = function + its environment
This is really all closure is. Yet, there’s a situation where it’s most
apparent/where it spooks people the most:
When a function references an identifier that
- is accessible from its lexical context
- is not accessible from its execution context
So, for example,
const sayHello = createFuncThatSaysHello();
sayHello(); // prints 'hello'function createFuncThatSaysHello() {
const word = 'hello'; return function () {
console.log(word);
};
}
Here, the returned function prints 'hello'
, even though word
isn’t accessible from the function’s execution context. Though, it is accessible from the function’s lexical context (createFuncThatSaysHello
's scope, specifically).
Closure terminology
A function is said to “close over” its lexical context’s enclosing scopes. Thus, it also “closes over” the identifiers in those enclosing scopes.
So, for instance,
function foo() {
const bar = function () {
console.log(a);
}; const a = 3; return bar;
}
bar
“closes over” foo
's scope (and the global scope). a
exists in foo
's scope. So, bar
also closes over a
.
Closing over identifiers past a function’s declaration
This is actually the one major thing I learned.
I thought that a closure could only contain identifiers declared before the function. E.g. I thought this wouldn’t work:
const createFunc = function () {
const func = function () {
console.log(a);
}; return func;
};const func = createFunc();const a = 4; /* This was declared after createFunc was
created (and thus func as well). */func(); // I thought this would fail, but it actually works.
So it’s true, when a function “closes” over its enclosing/outer scopes, it
really closes over those scopes entirely, not just the parts that have
executed before the function’s lexical context. So, it can access identifiers
in its closure that were declared AFTER the function’s lexical context.
This makes sense to me. I’ve seen values in closures can mutate over time. It would make sense that identifiers can also be added to closures over time.
Infamous loops + closure example
Here’s an example of loops + closure that I’ve seen on every JS-quirks quiz.
for (var i = 1; i <= 5; i++) {
setTimeout(function timer() {
console.log(i);
}, i * 1000);
}
So, this prints 6
, five times.
Explanation:
- The
var i
is function scoped, not block scoped. - Therefore, each new
timer
function closes over the samei
variable. A newi
is not created upon each loop iteration. They all share the samei
. - Before any of the
timer
functions execute,i === 6
. It’s6
and not5
because it had to become6
for the loop to terminate. - When the
timer
functions execute,i === 6
, so all five functions print6
.
This can be fixed in 2 ways, one with function scope & one with block scope.
Function Scope Fix
This can be fixed by using an IIFE to create a new scope + copy of the iterator for each loop iteration.
for (var i = 1; i <= 5; i++) {
(function () {
var j = i; setTimeout(function timer() {
console.log(j);
}, j* 1000);
}();
}
A new is IIFE is created upon each loop iteration. In each IIFE, a new j
is
created that simply copies i
’s current value. Since j
is only in the IIFE’s
scope, this ensures that a new j
is created upon each loop iteration. Now, each timer
function closes over a j
that is locked to i
’s value when the surrounding IIFE executed.
Block Scope Fix
This can easily be fixed using ES6 let
, which is block scoped.
for (let i = 1; i <= 5; i++) {
setTimeout(function timer() {
console.log(i);
}, i * 1000);
}
Since let
identifiers are block scoped, a new i
is created upon each loop
iteration. Each timer
function closes over a unique i
whose value never changed after it (the timer
function) was declared.
Revealing Module Pattern
Modules are chunks of code that have private data and expose public data.
Prior to ES6 modules, modules were created solely using closures. The predominant approach to this was the Revealing Module Pattern, which has been heavily documented online. Here’s an example of it.
var dog = createDog('Hugo', 'Pug');function createDog(name, breed) {
var thisName = name;
var thisBreed = breed; function sayName() {
console.log(thisName);
} function changeName(newName) {
thisName = newName;
} function sayBreed() {
console.log(thisBreed);
} return {
sayName: sayName,
changeName: changeName,
sayBreed: sayBreed
};
}
createDog
creates a module with some private information:
thisName
thisBreed
And some public information:
sayName
changeName
sayBreed
thisName
and thisBreed
are only accessible from createDog
's scope, which is only accessible from the public methods’ closures. So, in effect, thisName
and thisBreed
are private.
Here’s a variation of the Revealing Module Pattern that only creates a Singleton:
var dog = (function createDog(name, breed) {
var thisName = name;
var thisBreed = breed; function sayName() {
console.log(thisName);
} function changeName(newName) {
thisName = newName;
} function sayBreed() {
console.log(thisBreed);
} return {
sayName: sayName,
changeName: changeName,
sayBreed: sayBreed
};
})('Hugo', 'Pug');
It uses converts createDog
to an IIFE, which is appropriate for a one-time-use scenario like creating a singleton.
That’s all for this post. Stay tuned for the next one!