Unique Array Values in JavaScript, performance rundown for Node.js v6 & v7

Sometimes you might find yourself in a need of creating a duplicate-free version of an array. While getting only unique values from an array might seem quite simple at first, different implementations don’t perform equally.

Loops

Most obvious solution is to use a loop to iterate over the source array and check if value already exists in the results.

let result = [];
for (let index = 0; index < SOURCE.length; index++) {
let el = SOURCE[index];
  if (result.indexOf(el) === -1) result.push(el);
}

We can make this more idiomatic using the Array.includes() method.

let result = [];
for (let index = 0; index < SOURCE.length; index++) {
let el = SOURCE[index];
  if (!result.includes(el)) result.push(el);
}

Let’s measure performance of both implementations.

Using an array of 200k random float values and MacBook Pro 13" i5 2.6GHz on Node.js v6.10.0 we get the following results:

for loop usign Array.indexOf: 9759.874ms
for loop usign Array.includes: 72671.379ms

This is interesting. Iterating over an array using a for loop and indexOf() to check if a value was already seen takes 10 seconds while using the includes() method takes over 70 seconds. Let’s check on Node.js v7.6.0

for loop usign Array.indexOf: 3498.865ms
for loop usign Array.includes: 3551.349ms

Not only Node.js v7 is 3 times faster here, it also yields comparable results for both, indexOf() and includes() methods.

We can try to optimize this implementation a bit using a label:

let length = SOURCE.length, result = [];
outer:  
for (let index = 0; index < length; index++) {
let value = SOURCE[index];
  if (result.indexOf(value) > -1) continue outer;
  result.push(value);
}

but the performance gain is negligible.

Node.js v6.10.0
for loop: 3442.164ms
Node.js v7.6.0
for loop: 3437.755ms

Functional

Another and maybe a more elegant solution is to use functional approach and utilise the forEach() or reduce() methods.

Sample code might look like:

let result = [];
SOURCE.forEach((el) => {
if (!result.includes(el)) result.push(el);
});

And with reduce():

let result = SOURCE.reduce((p, c) => {
if (!p.includes(c)) p.push(c);
return p;
}, []);

Let’s measure performance of both implementations with the same array of 200k random float values.

Node.js v6.10.0
Array.forEach using indexOf: 3677.678ms
Array.reduce: 3904.915ms
Node.js v7.6.0
Array.forEach using indexOf: 3523.004ms
Array.reduce: 3556.878ms

Performance of both methods is comparable on both versions of Node.js, the v6 and v7. Worth noticing is that it’s 3 times faster that using a for loop on Node.js v6.

Set

The Set is a new built-in object available in ECMAScript 2015 (ES6) that lets you store unique values of any type, whether primitive values or object references. Using a Set in conjunction with the Array.from() method we can quickly deduplicate our source array.

let result = Array.from(new Set(SOURCE));

Or with the Object rest/spread properties proposal we can use the splat operator instead of the Array.from() method.

let result = [...new Set(SOURCE)];

Let’s measure performance, again with the same array of 200k random float values.

Node.js v6.10.0
Array.from(Set): 73.795ms
Node.js v7.6.0
Array.from(Set): 65.726ms

It turns out that this implementation is 50 times faster than any previous solution.

But maybe we can use the Set object in some other way. The Set.has() method allows to check if an element exists in a Set in similar way as Array.includes() or Array.indexOf(). Let’s take the “optimized” version of the for loop from the first example and use a Set as a cache.

let length = SOURCE.length, result = [], seen = new Set();
outer:  
for (let index = 0; index < length; index++) {
let value = SOURCE[index];
  if (seen.has(value)) continue outer;
  seen.add(value);
result.push(value);
}

It turns out that this implementation is the fastest for creating a duplicate-free version of an array:

Node.js v6.10.0
for loop using Set.has: 36.334ms
Node.js v7.6.0
for loop using Set.has: 31.301ms

Summary, aka. TL;DR

  1. If you are using Node.js v6.10 (LTS Boron) don’t use Array.includes() when performance matters. It’s 7–8 times slower than using Array.indexOf(). On Node.js v7.6 the difference in performance is negligible;
  2. Array.reduce() and Array.forEach() are 3 times faster than simple for loop on Node.js v6.10;
  3. Array.reduce() performs comparable to for loops but the code is shorter and more readable;
  4. Using a Set constructor to filter out duplicates and recreating an array with Array.from() method is a swift solution. You can make the code even more readable using the Object rest/spread properties proposal and spread the Set with a splat operator. This is the shortest and maybe the best solution;
  5. The fastest, but requiring a few more lines of code solution is to use a Set object as a cache and check for already seen values with the Set.has() method;

Found issues or errors? Do you know a better or easier method? Let me know!

Thanks for reading! Please get in touch with me on Twitter if you run into any issues. And, please share this with others that you think would find it useful.