## Solving a real-world problem with a well-known data structure

Jul 17 · 5 min read

This is the first piece in a new series: Fun With Data Structures.

In this piece, we are going to solve a real-world problem using a well-known data structure — dictionaries (or hash table).

We have a multiplayer space survival game. There are players, spaceships, galaxies, planets… And the list goes on.

One day, we decide to introduce a new feature:

Every second, the game randomly selects a spaceship out of all the ships in the game and teleports it to another galaxy. Just for fun.

For the sake of simplicity, assume that every spaceship has a unique integer `id` and `name` . They are stored in a dictionary; `id` as key and `name` as value.

There are millions of spaceships in the game. Players are constantly creating, destroying, and updating spaceships.

Our task is to implement an algorithm to pick a random spaceship from this dictionary (return a random item from a dictionary).

Time complexity assumptions:

• Generating random number: O(1) time
• Search/insert/delete in dictionary: O(1) time
• Removing an item from a list, O(n) time
• Appending an item to a list, O(1) time

Sample dictionary input for `spaceships`:

`spaceships = {  5:   "Ghost",  40:  "Death Star",  16:  "B-wing",  ...}`

Sample output: `Death Star` (randomly picked)

## First attempt: generate a random int as key

It seems fairly simple, right? Just create a random number and use it as key to picking an item from the dictionary. Let's assume that every spaceship has an `id` between `0` and `100000000`. It is easy to generate a random number in this interval:

Let's analyse this algorithm.

Time complexity: O(1)

Extra space complexity: O(1)

Wow. This is as good as it gets. But does this algorithm really work?

No.

This would work only if we were sure that there are exactly 100000001 items in the dictionary and their ids are consecutive, starting from 0. But remember, players can create or destroy spaceships any time they want. There might be just eleven spaceships at one time, and three million spaceships at another time. They have unique ids, and there can be gaps between these ids.

## Second attempt: move keys to a list

OK, it is not that straightforward to randomly choose an element from a dictionary. How about lists? They have a certain order and length. We can easily choose a random element from a list.

Let’s put all the keys in the `spaceships` dictionary in a list and pick a random element from this list:

It definitely works. (How about data race?)

Time complexity: O(n) to copy dictionary keys to a list.

Extra space complexity: O(n) to keep keys in the list.

Well, not that bad. We can afford O(n) memory.

But remember, we are going to call this function every second. Which means it is going to cost O(n) time every second. It will be really slow if there are millions of spaceships.

Do we really need to recreate `key_list` every time `get_random_ship` is called? Can we do better?

## Third attempt: mapper dictionary

Instead of creating a new key list every time, we can create it once and use it for every call.

To do that properly, we need to keep `key_list` and `spaceships` in sync:

• Whenever a key is removed from `spaceships`, remove this key from `key_list`
• Whenever a new item is added to `spaceships`, append its key to `key_list`

We can append new keys to `key_list` in O(1) time, but what about removing? We need to find the key in the list, which will take O(n) time. We also need to remove it, which will take another O(n) time.

If we knew the position of the key in the list, we could have just found it in O(1) time, and, with a clever trick, we could remove it in O(1) time.

If we want to access something in O(1) time, dictionaries are our best shot. It is time to introduce mapper dictionary, `key_mappings`, which keeps the positions of each key in `key_list`.

Here is a snapshot of three data structures we use:

`spaceships = {  5:   "Ghost",  40:  "Death Star",  16:  "B-wing",}key_list = [5, 40, 16]key_mappings = {  5:  0,  # Key 5  is at position 0 in `key_list`  40: 1,  # Key 40 is at position 1 in `key_list`  16: 2,  # Key 16 is at position 2 in `key_list`}`

By looking into `key_mappings` we know that key `5` is at position `0` in `key_list` array, key `16` is at position `2`, and so on… Now we can update `key_list` in O(1) time at the expense of extra O(n) space of `key_mappings` .

Let’s go through an example and add a new spaceship, `4: "Starfighter"`

1. Add new value to spaceships: `spaceships[4] = "Starfighter"`
2. Append `4` to key_list: `key_list.append(4)`
3. Add the index of new key to mappings: `key_mappings[4] = len(key_list)`
`spaceships = {  5:   "Ghost",  40:  "Death Star",  16:  "B-wing",  4:   "Starfighter",}key_list = [5, 40, 16, 4]key_mappings = {  5:  0,  40: 1,  16: 2,  4:  3,}`

Let’s remove the key `40` from spaceships. We know that `40` is at position `1` in `key_list` . The trick is, instead of removing `40` directly from `key_list`, we can put the last element into this position in O(1) time.

1. Remove item from spaceships: `del spaceships[40]`
2. Find the index of the key in key_list: `index = key_mappings[40]`
3. Replace it with the last key: `key_list[index] = key_list.pop()`
`spaceships = {  5:   "Ghost",  16:  "B-wing",  4:   "Starfighter",}# Notice that how we moved last element into position 1key_list = [5, 4, 16]# Notice that we set the value of `4` to 1 and removed key `40`.# Although we can keep key 40 there instead of removing.key_mappings = {  5:  0,    16: 2,  4:  1,}`

Time complexity:

• One time O(n) to create `key_list`
• O(1) every time we update `spaceships` dictionary.
• O(1) to get a random spaceship.

Space complexity:

• O(n) for `key_list` and O(n) for `key_mappings`

Try to implement the same logic in a relational database where spaceships are stored in a table.

Written by