You might have heard about the high impact security vulnerability issue recently fixed and announced by nodeJS team. This post will attempt to explain the issue, how and why it happened.
Thanks to the amazing node team, this vulnerability has been fixed immediately across all lines (4.x, 6.x, 7.x, and 8.x).
Make sure you upgrade your node version as soon as you can (preferably when you finish reading this post!). For upgrading your node head to the official nodeJS blog for details.
The vulnerability roots from HashTables, so lets start with a quick recap on HashTables, just in case you missed your computer science classes.
HashTables fulfill the dream of *constant time* insertions and access.
Some of the places where HashTables are used:
- Method lookups.
- Sets, Objects, and Maps.
- HTTP headers.
- JSON representations.
- URL-encoded post form data.
A HashTable is a group of linked lists, the linked-lists must be small in order for the performance to be good.
The create HashTable function pre-allocates the number of linked lists it can contain, throughout this post, the number of linked-lists the HashTable can hold is denoted by the word
l = 2**n
The best scenario for optimal performance is to have
n/l entries in each linked list, where
n is the number of entries in the HashTable. so if we have 256 entries in the HashTable, the perfect hash function would distribute them into the 256 lists, each holding only 1 entry.
Example: Storing strings one to ten in a HashTable
In this example, our hash function is the following:
H(s) = first byte of s
l = 256
In our “one” to “ten” example, inserting the values into the HashTable will result in something like the following:
“o” : [“one”],
“t” : [“two”, “three”, “ten”],
“f” : [“four”, “five”],
“s” : [“six”, “seven”],
“e” : [“eight”],
“g” : ,
“h” : ,
“n” : [“nine”],
“r” : ,
Notice that we have empty linked-lists in the HashTable, because they are not filled up by any value. Additionally, there is a pile up on the
t list in the HashTable. This is a bad hash function, since it does not do a good job in distributing the strings through the linked lists. No matter how many linked lists you have
l, if the hash function does a bad job in the distribution among them you will get bad performance, since linked lists are slow, hash tables are fast by having only short linked lists in them.
To avoid overflowing the HashTable, since there are more
H(s) results than
l, the hash function is always reduced by
l using mod
H(s) % l
H(s) = first byte of s is not a good hashing function. NodeJS implements a very good hash function which quickly look through the whole string, and gives randomly looking results to spread the results across the HashTable.
Since we have a good hashing function, normal usage should *never* cause HashTables to over-flood. However an attacker just might.
Hashing malicious strings.
Attacker provides strings where
H(s)%l will result in the same HashTable index, hence all these strings will be stored in the same linked list, making the app go super slow.
To solve such issue, programming languages looked at different solutions, some are the following:
Replace linked lists with another structure, such as red black trees or any balanced tree structure.
However, most programs go with the simpler approach of using a simple linked lists to avoid bugs and heavy re-writings, debugging, and the implementation complexity.
Stop inserting into the linked list if it gets more than
This solution looks pleasant, but for real world applications, you cannot just discard newly coming results, hence this solution is not viable for all situations. It can be only used in caching solutions.
Use crypto hashing functions like sha256.
Crypto hash functions should be collision-resistant, you should not find any two strings with the same output.
But this solution is troublesome for two main reasons:
- The hash function should to be fast, crypto hashes are not the fastest.
- Even if the crypto hashes were fast enough, this does not solve the problem. The attacker does not need to find collisions in the crypto hashing function itself, instead, they need to find collisions in the output which is reduced to the number of
llinked lists specified.
H(s)%lis not collision resistant based on the hashing function, since the output will always be reduced to the
2**nlists. So the attacker might run a few million iterations to find critical amounts of collisions.
Randomizing the hash function with a secret key.
These attacks were repeated many times during the years, with different names, low bandwidth denial of service attacks, hash flooding, algorithmic complexity attacks, etc.
Attacks where targeted at many programming languages, such as Perl, Redis, and python.
The response to these attacks was to secretly randomize the hash function on each application boot, or hashtable initialization, this way the attacker has no way to guess the hashing funciton, the attacker cannot find collisions because they do not know the secret key of the hashing function.
NodeJS uses the same approach, by randomizing its “HashTable seed”.
**Wether this is actually secure or not is a debate for another time.
The vulnerability came along by a bug introduced in the randomizing of the HashTable seed, where the HashTables seed was always the same across a given version of node.
NodeJS was susceptible to hash flooding remote DoS attacks as the HashTable seed was constant across a given released version of NodeJS.
For example, an http node server is vulnerable to this attack since the url-encoded post form data is stored in a HashTable on the heap. The attacker can send small payloads each with different input but yield the same hash key, across multiple post requests (lets say 10 bytes each), the results of the post request will keep accumulating in the same linked list in the post-data HashTable stored on node’s heap.
So after accumulating the requests and abusing the same linked list, the server will face huge delays in responses since it is synchronously trying to access the same linked list in the HashTable.
V8 uses `snapshots` to solve this issue, a snapshot is basically a saved context that can be re-used in future boots.
Fortunately, V8 uses a shortcut to speed things up: just like thawing a frozen pizza for a quick dinner, we deserialize a previously-prepared snapshot directly into the heap to get an initialized context. On a regular desktop computer, this can bring the time to create a context from 40 ms down to less than 2 ms. On an average mobile phone, this could mean a difference between 270 ms and 10 ms.
Snapshots basically enabled node to reduce boot time by taking a snapshot of the pre-initialized boot context, and using it for the next runs. Additionally, users are enabled to use custom snapshots to build on top of the original context, but thats not for our topic!
The vulnerability is caused as the v8 snapshots were enabled by default, so the initially randomized hash table seeds were overwritten in the NodeJS build process for each released version of NodeJS.
This minor error resulted in NodeJS being susceptible to remote DOS attacks via hash flooding.
Thanks for reading! — Written by Ahmad Bamieh