Hashtables and Wine Cellars
Who Uses Hashtables
Short answer: anyone with a lot of stuff. It’s just a great way to store data, or anything. Think of a hashtable as a container, or a cellar. You want something, just ask for it. But instead of looking at every item in your container, hashtables have a way of indexing items, so that you can go straight to them. We’re going to consider a very specific case, and hopefully abstract along the way.
Along the way will be touching on memory, time complexity (though not explicitly), randomly uniform distributions, and collisions.
First, let’s think about Thomas Jefferson’s wine collection. The dude had some wine. He wants to put all of his wine in one place. The cellar.
Say he’s holding one bottle of 1812 Scuppernog, and he wants it in the cellar. Now, Jefferson never had a wine steward, but for the sake of our analogy, let’s say he tells Antoine Giusta, J. Q. Adams’ wine steward, “hey, man go, put this delicious bottle of Scuppernog in the cellar. But don’t just put it willy-nilly anywhere. Remember where you put it, so we can pop it open when wen want.”
So the steward’s got the wine, but where in the cellar does he put it? There are a couple of ways the steward could associate the bottle of wine to a label indicating it’s location in the cellar.
- He could make a list, like a phone-book.
- He could use the name of the wine bottle to somehow suggest its location.
The first method is pretty annoying for the steward. It requires him to go downstairs, see which space is empty, store it, and then mark it on the reference sheet. It gets messy. Let’s consider the other method.
He looks at the Scuppernog, oh hey, there’s a year on the bottle, great. So he slides the bottle snugly into the 1812th slot.
The problem with this method is, for one, lot’s of other bottles of wine will have the same date, and if this is the only indicator of their locations in the cellar, our steward is still stuck with going through each bottle to retrieve one. Additionally, the cellar’s space will not be used optimally, or, well, at all really, if there’s room for hundreds of bottles and Giusto sticks everything in the first 1800s slots. And to top it off, Giusto might be drunk himself when he puts wine in the cellar. If he puts a bottle in a space already occupied, he might crush the first bottle.
But we’re on the right track. We are looking for a number to index our bottles. And we’re looking for a way to relate the name of the bottle, or whatever may be its referent, to the location index in the cellar.
Wouldn’t it be great if we could just take each name and turn it into a number? Such that all names would be treated the same and we could maximize our use of the cellar’s space?
What we really want is to apply one function to all objects so as to get a randomly uniform distribution.
The Hashing Function: an Index Generator
The index generator has to be random. We do not care about the name of the bottle once we have an index for it. So we do something totally unrelated to the bottle, just to get a number. The function is only pseudo-random though, because it is deterministic. Given a single name, the index generator will always provide us with the same index, every time. It treats the symbols in the name, and not the name’s meaning.
The index generator has to be uniform. We apply the same method of making an index to every bottle of wine that goes into the cellar. No more schlepping around that phone-book. We just make an index, and go get the bottle.
Lastly, the index generator has to consider the space available. Without this caveat, the index generator could not only generate small numbers, which is fine, but also really big ones. So we scale the index to the number of slot’s in the cellar. If there’s room for two thousand bottles, the index generator makes an index of one through two hundred.
So now, when Jefferson asks for the Scuppernong Carolina Vineyard Select 1812, Giusto just runs that title through the index generator, and goes to that index, just like you would if you wanted pick an item out of an array. So simple.
Up to this point, we haven’t needed to cast any spells. But this next little part requires us to remember that computers can instantaneously make more space. I have never heard of construction done on a wine cellar while it was full of wine.
So we have one problem. The index generator doesn’t store the numbers that have already been generated. There’s a chance that two different bottles could generate the same index number. In the case of a two hundred bottle cellar, there’s a one in two hundred chance of that happening. Low odds, but non-zero probability nonetheless. To avoid crushing the bottle currently occupying the slot at that index, Giusto casts a spell, and makes a new slot at the same index. With enough bottles, numerous index slots will hold a handful of bottles, not so many that it’s a hassle for Giusto to look through and find the one Jefferson wants.
Lets go through once more what Giusto has to insert a bottle of wine when we treat a wine cellar as a hashtable.
He starts with a name and a bottle.
He turns the name into an index number with a hash function.
He goes to see if this index has a slot. If there’s no slot for this index, he makes one (an empty array).
And he slides this bottle along with its name into the slot.
Now lets what he has to do to retrieve a bottle.
- He starts with a name.
- He turns the name into an index number with a randomly uniform distributor.
- If this index number doesn’t exist, he returns to Jefferson with nothing.
- Otherwise, he looks through the slot for the bottle with the right name.
If it’s there, he returns to Jefferson with the bottle.
That’s how a hashtable works.