The Fisher-Yates Shuffle

While working through the Maker Square pre-course work (so…much…course work) I came across this nifty little algorithm that is just SO COOL that I had to write about it.

More importantly though, I came across this excellent blog by Mike Bostock, which is where I learned about this algorithm. Mike breaks it down in such a beautiful, concise way, and illustrates it with such truly helpful animations that you really just have to go and check out.

I’ll be right here.

Pretty cool, right?

Welp, there’s really no reason for you to read my shit post now, but it will definitely help solidify my understanding of this function to break it down for myself, so if you’d like to stick around while I do, I’ll put the kettle on.


The Fisher Yates Shuffle

This function/algorithm/whatever you want to call it was conceived by Ronald Fisher and Frank Yates way back in 1938, in a book called Statistical Tables for Biological, Agricultural and Medical Research, which I’m sure was absolutely riveting. (You can check out the book here if you’d like, but I’d advise against it unless you want to spend the next three hours cross-eyed.)

The purpose of the algorithm is to randomly shuffle a set of data. Seems like a pretty simple thing to do nowadays, especially when you’ve got JavaScript’s handy Math.random function available. Unfortunately, as Mike Bostock explained (and as I spent a frustrating two hours figuring out), there are a couple of hiccups to consider.

Firstly, there are many ways you could shuffle the items. Say you have a row of people sitting in chairs on one side of a room, and a machine that randomly picks one chair at a time and tells the person sitting in that chair to move to a new chair on the other side of the room. The problem here is that eventually, you will have a lot of empty chairs on your side of the room — but the machine will not know that they are empty. It will continue to pick random empty chairs with no result. As more and more people move to the other side of the room and leave their chairs behind, the process of building up the new row will become slower and slower because the odds of the machine picking the few random chairs that still have people in them become smaller and smaller. (This is Bostock’s first illustration.)

Okay, so we don’t want that. How else could we shuffle the items? Sticking with our chair analogy, let’s say we had the people take their chair with them when they moved. This solves our empty chair problem. Unfortunately, it also means that twice as many things are now being moved across the room, which, again, slows the process down….something about O(n²)….I haven’t gotten that far yet. (Bostock’s second point.)

That doesn’t seem so great either. The somewhat obvious solution is to forget about the other side of the room and just have the machine pick people at random to switch seats with each other. The reason this solution isn’t obvious as far as JavaScript goes (at least to me), is that you have to tell the machine how many times to switch people. When people and chairs are disappearing, it’s pretty easy for the machine to stop randomly picking them. But when they are switching in place, the probability of picking duplicates or accidentally writing an infinite loop and crashing your computer increases greatly.

How can we solve this problem, then? Let’s look at the code:

[code lang=text]
function shuffle(array) {

var remainingItems = array.length;
var randomIndex;
var temp;

while(remainingItems) {

randomIndex = Math.floor(Math.random() * remainingItems — );

temp = array[randomIndex];
array[randomIndex] = array[remainingItems];
array[remainingItems] = temp;
}
return array;
}
[/code]

Ahh it’s so easy when you know how to do it! Let’s break it down in terms of chairs and people.

In this code block, the array is our row, our chairs are the indexes, and our people are the actual array items. Remember: an array is basically like a list of slots (indexes) that hold values (items.) The length of the array (which we can find using array.length) will always be the number of indexes in the array, and the indexes will always be numbered starting with zero.

So here, we set a variable remainingItems to hold our array length (makes sense if you think about it — this is how many items we have to work with.) We then set two variables that we know we are going to use later. One to hold a random index and one to temporarily hold the value of that index.

We tell our while loop to run as long as there are remaining items — in other words, as long as our array length. If our array has 3 things in it, our while loop will run 3 times. If it has 3000 things in it, our while loop will run 3000 times. That’s the great thing about while loops: they never give you shit about your arrays being too long.

Every time the loop runs, the first thing it does is pick a random index. Math.random returns a random decimal between 0 and 1 (up to 1, not including 1.) We need to multiply that by a larger number to scale it to the size we need. If you multiply the decimal by 10, it will become a number between 1–10, by 100, will become a number between 1–100, etc. So, depending on how long our array is, we will get a number that could correspond to an index (anything greater than 0.) It’s not quite ready to be used as an index, though, because it’s still a decimal, so next, we use Math.floor to round it down to a whole number.

Every time the loop runs, our multiplier will decrease by 1 (the — just means “minus one”.) This decreases the odds of picking duplicates and tells the machine how many times to run the code, because remember, that same variable is being used at the top of our while loop. Start at the end of the array, and go backwards towards the beginning. When you get to the beginning (i.e. when there are no items remaining), stop.

Sort of reminds of a quote from Alice in Wonderland:

“Begin at the beginning” the king said gravely, “and go on until you come to the end: then stop.”
- Lewis Carroll, Alice’s Adventures in Wonderland, Chapter XII

Okay, well, like that but backwards.

Next, we make the swap, using our temp variable. I detailed how to swap array elements in this post if you need a refresher. Here, we are swapping a random element with the element in the current position (our current position is the last index of the array, then the second to last, then the third to last, etc etc.)

And that’s it! Once our while loop has run out of items to swap, we exit the loop and return the shuffled array.

I think what I love most about this is the obvious point that you should start at the end of the array and go backwards, using the end of the array to hold the newly swapped elements. If you think about it, it’s no harder than reversing an array — you just need to implement Math.random.

Woo! JavaScript! Now get out there and start shuffling those arrays!