The Missing Boarding Pass Problem: A Simulation in Python

Anthony Schams
The Startup
Published in
8 min readJul 22, 2019
Photo by JESHOOTS.COM on Unsplash

100 people are boarding a plane. They each have a unique assigned seat, printed on their ticket. The first person boarding has lost their pass, and therefore does not know their assigned seat. They take a random seat on the plane. The rest of the people board, one-by-one, sitting in their assigned seat or taking a random unoccupied seat if theirs is taken. What is the probability the last person boarding gets their assigned seat?

This problem was given to me by a friend, who was TA-ing an introductory math-for-biology-majors course at the University of Maryland. The professor she was TA-ing for gave this problem to the class to work on during a recitation period. Interestingly, the students as a whole were convinced the answer was 50%, because the last person boarding either gets their seat or does not. By that logic, we should all buy lottery tickets because we each have a 50% chance of winning.

I felt that this was an unnecessarily cumbersome problem if you were to go at it directly, calculating the probability of every single possibility. While analytic solutions exist, some of which can be found here, I hypothesized that this problem could be solved with a simulation. I elected to do so using Python.

When I began coding, I set out a number of steps for myself to complete the simulation:

  1. Create a ‘plane’ (an array) of 100 ‘seats’.
  2. Create a seating chart, randomly assigning each person to a seat.
  3. Randomly assigning the first person to a seat.
  4. Board the rest of the plane, with each person taking their assigned seat or a random, unoccupied seat.
  5. Count whether or not the last person to board got their assigned seat.
  6. Repeat steps 1–5 a bunch of times.
  7. Find the probability of the last person boarding getting their assigned seat.

My initial simulation function looked like this:

It’s admittedly pretty ugly, but it’s functional. We build a queue of passengers, assign seats, and slowly fill up the plane. The workhorse is the assign_seat function, which takes the seating chart and a passenger’s assigned seat. It then figures out where that person will sit. Once we get to the final person, we then simply check if their seat is taken. If it is not, we increment lggas (last guy/gal gets assigned seat). While this logically does everything we need it to do, and it gets the correct solution to the problem, it does so in a very slow and cumbersome way, trundling towards the solution. For one, we randomly assign seats to each passenger. A simple improvement over this algorithm would be to just assume, without loss of generality, that every person’s assigned seat is in the order they boarded the plane. (Person 1 in seat 1, person 2 in seat 2, etc.) This would eliminate the shuffling of seats as well as the constant lookups for a person’s assigned seat. This simplification of the problem also removes the necessity of the assigned seats array, saving memory space.

I then worked to eliminate the assigned seats array, coming up with the solution airplane_sim2 below:

If you run this second solution and compare its runtime to the first, you’ll notice it is much faster. You’ll also, with sufficient repetition, realize what the solution to the problem is (more on that in a second). In this second solution, the only array is where each person is sitting.

After line 3, each plane looks something like one of these columns.

We are making the assumption that the j-th person is assigned to seat j. Once again the first person is randomly assigned to a seat. For each passenger we then check if someone is sitting in their assigned seat (line 5). If so, then that person is assigned to a random seat behind them in the plane, or to seat 0 (line 6). This is done because all prior seats besides seat 0 must be taken because of how the plane is boarded, and seat 0 is the first boarder’s seat. Random assignment of seats can only occur if they are not sitting there, so we can safely assume that the seat is empty. The last line is a bit of a hack that I thought of when trying to vectorize this entire process. (I thought it was cool so I put it in all my solutions.) Because of the way the problem is proposed, the last person boarding the plane either gets their seat (99) or the first boarder’s assigned seat(0). We can therefore convert this value to a boolean and sum along the row to get the total number of times the last person got their seat. This is completely unnecessary for a single plane simulation, but is nevertheless an interesting way to count non-zero elements in a row.

After the for loop in line 4, each plane will look like one of these columns

This simulation confirms how the solution comes to form. Looking at the seat of the last passenger, it appears that this person indeed either gets their seat or the first boarder’s assigned seat. This means that we can determine whether or not the last person gets their seat depending on if someone has taken seat 0. Once seat 0 is taken we know immediately that every remaining person will get their assigned seat. We also can see that if the first person sits in seat j, then every passenger before seat j gets their seat. This means we can further optimize our solution by examining only the people whose seat have been taken. This implementation is below.

As we can see, we take a big shortcut in only looking at people whose seat had been taken, which eliminates the for loop and replaces it with a while loop. At worst this while loop will run (n_passengers -1) times, but will likely run much fewer. It also only runs until someone is assigned to seat 0, at which point the seat the last boarder gets is set.

I also came across a blog by David Robinson (the specific blog can be found here) in which he solved this problem in R. He used a different strategy to randomly assign seats, in which a person who lost their seat to another passenger instead swaps positions with a later passenger. My implementation of his algorithm is below.

Dr. Robinson also performs his simulation on an entire matrix instead of a 1-D array/vector, which he shows speeds up computation 20x. I was unable to implement his algorithm on a matrix in Python, but the matrix implementation of my algorithm is below.

This generates an (n_passengers x n_planes) 2-D array that we will work on. After the first boarder is assigned their random seat, the magic of this implementation happens at lines 8 and 9. For each passenger, line 8 finds the planes in which that person’s seat is already taken. Line 9 then assigns each of those people kicked out of their seat to a new seat (to either somewhere behind them or seat 0). (Notice that each person is assigned a ‘different’ random seat, not to the same assigned seat across all planes. I spent a lot of time debugging that for myself.) Once again, we use the cute, hacky way to find how many times the last boarder got their seat and return the proportion of times they got it.

Speed

Which is faster? Python or R? My implementation or Dr. Robinson’s? My speeds on Dr. Robinson’s code are below. ~2.6 ms to simulate 1000 100-person planes using his matrix solution.

Unfortunately, I was not able to replicate Dr. Robinson’s matrix algorithm exactly in Python. As I was able to implement his single-plane algorithm, we will examine the speeds of that.

My second algorithm, helping seat passengers on a single plane
Dr. Robinson’s algorithm seating passengers on a single plane

As we can see, my second algorithm is ~10% faster than Dr. Robinson’s algorithm for the single-plane problem. What about the better optimized algorithm?

My optimal solution for seating passengers on a single plane

As we can see, my optimal solution for single-plane boarding is about 20x faster than my previous implementations using a for loop. It appears that the substitution of the while loop really sped up this process! This algorithm also performs 3 times faster than Dr. Robinson’s single-plane performance. Because Dr. Robinson saw a 20x speed increase when acting on a matrix instead of a single plane, I was excited to hopefully see the same result. Unfortunately, I wasn’t really able to get this algorithm to work on a matrix well, but I nevertheless attempted it:

The while loop runs while there exists a plane in the matrix in which a person is not sitting in seat 0. (Which will ultimately go through to the last person because we expect the last person to get seat 0 in half of the planes.) It then loops through the seats that have been taken and assigns that person to a random seat, much like in the other simulations. The seat randomly assigned is then remembered, with this process repeating until all seat 0’s are taken. It’s speed to seat one thousand100-passenger planes is shown below.

Performance of the optimized algorithm on a matrix in Python

While I couldn’t figure out how to utilize matrix operations well, we still saw an increase in speed. Even with this speedy algorithm being performed on a matrix, we see that Dr. Robinson’s R implementation is about 3 times faster than my algorithm in Python, despite my algorithm performing better on the single plane problem.

Why Simulation?

Because analytic solutions exist for this problem, you might ask ‘Why even bother using a simulation?’ While a simulation isn’t necessary for this problem specifically, there are countless problems and questions where a simulation IS more practical. For example, I may ask ‘What is the average number of people who were randomly assigned seats when boarding the plane?’ or ‘What is the variance of the number of people assigned to random seats?’ This is a simple extension for a simulation like the one we have here, but requires a bit more work analytically.

Simulations can also serve as a sanity check when you aren’t confident in an analytic solution. An example of this is with Paul Erdos and the Monty Hall Problem. (You can read more about that here.)

Take-aways

  1. Simplify the problem where possible. In this problem, we assumed without loss of generality that assigned seats were the order the passengers boarded, which reduced the overhead of our algorithm.
  2. Understand the problem and its nuances. Realizing many passengers get their assigned seat allowed us to eliminate a lot of slicing/checking of arrays.
  3. (From Dr. Robinson) Vectorize when possible. Vectorize across simulation replications if individual simulations can’t be vectorized. We got between 2 to 20x faster by vectorizing.
  4. Simulations allow us to approximately answer some questions easily when analytic solutions are difficult (assuming they exist).

--

--