Introduction
Last time we discussed some more terminology related to functional programming. You now understand concepts like higher-order functions, first-class functions, as well as pure functions — and this is something we will build on today.
We will see how pure functions can help us avoid bugs related to managing state. You will also get to know (and hopefully — understand) some new vocabulary: side effects, immutability, and referential transparency.
First, let’s see what we mean by application state, what it’s needed for, and what issues can arise if we’re not dealing with it carefully.
What is state?
The term state can be used in multiple contexts. The notion that we’re interested in is application state.
Simply put, you can consider application state to be an entirety of:
- current values of all the variables,
- all allocated objects,
- open file descriptors,
- open network sockets, etc.
It is basically all of the information that represents what is currently happening in the application.
In the following examples, both counter
and user
variables contain information about the application state in a given moment in time:
The code snippet above is an example of a global state — every piece of code can have access to both counter
and user
variables.
We can also talk about a local state, like in the snippet below:
Here, the counter
variable holds the current state of the countBiggerThanFive
function invocation.
Every time we call the countBiggerThanFive
function, a new variable will be created and initialized with 0
. Then, it gets updated while iterating over numbers
, and finally ceases to exist after it is returned from the function. It is only ever accessed by the code inside the function – this is why we can consider it a part of the local state.
Similarly, the index
variable represents the current state of the for
loop – no code outside the loop can read or change it.
The point is, application state is not only about global variables — it can be defined at various “levels” of the application code.
Why does it matter? Let’s dig a bit deeper.
Shared state
State, as we can see, is necessary for our programs. We need to keep track of what’s happening and be able to update the application state to model behavior.
We might want to use a more global state to hold information that may be useful for any piece of code in our program.
Let’s say we use a currentUser
variable to keep information about a user who's currently logged in. We can imagine different parts of our application need this data to make "decisions" – about authorization, customization, etc.
It may be tempting to have currentUser
be a global variable so that every function in the codebase can access and change it as needed. This is what we mean when we talk about a shared state.
But this comes with the territory — if every function in your application is able to make changes to currentUser
, you need to consider what happens if they do. And if they do change it, it affects multiple other functions that also have access to currentUser
.
This can lead to nasty bugs and make reasoning about application logic more difficult. It’s not easy to track down where and when the change occurred if it could happen literally anywhere.
The general rule of thumb is — the more global a piece of state is, the more careful you need to be when changing it. For a more local state, the consequences will not be that far-fetched.
Mutable shared state
Having a global state that is read-only is not quite as troublesome as having mutable shared state.
Let’s see what consequences can a mutable shared state have on our application’s readability and maintainability.
It makes reasoning about code harder
In general, the more “open” a piece of state is to changes from different places of your codebase, the more difficult it is to follow what’s its current value at a point in time.
Let’s say you have a couple of functions that can (and do) make changes to the same global variable. You end up in a situation where there may be multiple possible sequences of these functions being called one after another.
If you want to prove that a variable like this is always in correct (logical) state, you might need to consider all possible flows of interactions — and there may be infinitely many :)
It hurts testability
To write a unit test for a function, you need to predict circumstances it can run under. You then write test cases for these — to make sure your function always behaves correctly.
It is easier to do it when the only things your function depends on are its parameters.
If your function, on the other hand, uses and changes shared state — you will have to pre-configure this state for all tests. You may also need to reset the shared state afterwards so that other functions that depend on it can be tested correctly.
It affects performance
If your function depends on the mutable shared state, there is no easy way to run it concurrently — even if it conceptually makes sense.
Different “instances” of the function, running concurrently, would access and mutate the same piece of state, potentially influencing each other’s behavior in unpredictable ways.
Handling issues like that is not trivial. Even if you can find a way to do it reliably, you will most likely introduce more complexity and make your functions less modular and reusable.
Okay, so what do we do if we want to avoid having a global variable to represent and keep track of application state? Let’s look at some possibilities.
Use parameters instead of state
The simplest way to avoid issues caused by a shared state is to verify your functions do not reference it if they don’t have to. Let’s see an example:
We can see how the getUserBalance
function references currentUser
– which is, in fact, shared state.
On the surface, it all looks good — but in reality, we have introduced an implicit coupling between getUserBalance
and currentUser
. If we wanted to, for example, change the name of currentUser
, we would need to change it inside getUserBalance
as well.
To mitigate this, we can change getUserBalance
to have currentUser
passed in to it. Even though the change looks trivial, it makes for a more readable and maintainable code.
Immutability
Even if you do pass all necessary variables to a function explicitly, you still need to be careful.
Generally speaking, you need to make sure you don’t mutate any of the arguments passed in to your function. Let’s see an example:
The issue here is that the rewardUser
function not only returns a user with doubled balance – it also changes the user
variable that was passed in. It effects in having both currentUser
and rewardedUser
variables reference the same, updated value.
This kind of operation makes the logic more difficult to follow.
Here’s how this can be improved:
In general, you need to make sure that your functions almost* always return new objects and don’t mutate their arguments. This is what we refer to as immutability.
One way to go about it is to simply keep this rule in mind and use it dogmatically across your codebase. In my experience, it can work pretty well.
Other options include using external tooling to provide immutable collections, like Immutable.js from Facebook. Not only does it guard you against mutating data, but also tries to reuse data structures efficiently to improve performance.
For a more comprehensive overview, please read Cory House’s article on approaches to immutable values. Don’t worry about “React” in the title — the techniques outlined there apply to JavaScript in general.
* the only reason to mutate arguments (I’m aware of) is to optimize performance. Before you go down this path, make sure to profile your application.
Back to functions
Okay, but what does it have to do with functional programming — you may ask.
Last time, we discussed functions that we called pure but didn’t really get specific. Now, with our newly acquired knowledge, we can adjust our definition.
We said pure functions meet the following criteria:
- they can’t depend on anything except their input (arguments),
- they have to return a single value, and
- they need to be deterministic (can’t use random values, etc.).
We can now see that these can be rephrased.
“They can’t depend on anything except their input” and “they need to be deterministic” really means that pure functions can’t access or mutate shared state.
“They have to return a single value” means that there should be no observable effects of calling the function, other than the return value.
When a function does mutate shared state or have other observable consequences, we say it produces side effects. What that means is that the outcome of calling it is not contained to this function’s internal state only.
Let’s dig a bit deeper into side effects.
Side effects
There are a couple of different types of side effects, including:
- mutating shared state or arguments — discussed above,
- writing to disk — because it is, in fact, modifying the computer's state,
- writing to console — just like writing to disk, it modifies computer’s internal state — as well as the environment (what you see on the screen),
- calling other, impure functions — if one of the functions you call produces side effects, your function is “infected” as well,
- making API calls — it modifies the state of your computer and the target server, etc.
Here are some examples of functions that produce side effects:
If you think about it, for your program to be useful it needs to produce side effects. Otherwise, you wouldn’t even be able to observe the effects of its operation.
Computer programs can’t be “pure functions all the way down”.
We don’t want to create useless, theoretical programs.
Functional programming is not about writing code entirely without side effects. It’s about structuring your code in a way that side effects are easy to manage and contained to a small portion of the application. It’s about making your program easier to understand and maintain.
There is one more term that is often used in this context — referential transparency. Although it’s a little bit more complicated and has fancy words in its name, we are now fully equipped to understand how it ties into pure functions.
Referential transparency
We say that a function is referentially transparent when we can replace the expression that calls our function with the value this call produces — without changing the program’s behavior.
Even though it intuitively makes sense, we need to understand that for a function to be referentially transparent, it needs to be pure (not produce side effects).
Let’s see an example of a function that is not referentially transparent:
It seems like, for the getUserData
to still work correctly, a call to getUserName
could be replaced with its result, like so:
However, we did change the program’s functionality here — it used to log stuff to the console
(a side effect!), and now it doesn't. It looks like a trivial change, but it does indicate that getUserName
was not referentially transparent in the first place (and neither was getUserData
, to be fair).
Summary
We now understand what it means to manage application state, what functional programmers mean by immutability, referential transparency, and side effects — and what issues can a shared state introduce.
Next time, we will start discussing more complex functional programming techniques. We will learn how to recognize and use closures, partial application, and currying.
It’s going to be fun and exciting, but also a bit challenging. See you in the next part!