Predicting the Sole Survivor

Analyzing Survivor through its network structure.

Quinn Theobald
Analytics Vidhya
8 min readSep 6, 2021

--

Survivor logo
Image property of CBS

I have been watching the game show Survivor my whole life– quite literally. The first season premiered just days after I was born. Now every time I visit home, the new season is on the family agenda.

In the past few years, I began to wonder if it was possible to predict the winner of a season of Survivor by the alliance network of that season. Survivor is a social game after all, almost entirely determined by who you’re friends with and who’s your enemy. Can we determine who will win the season before the final votes are read?

My Hypotheses

Survivor is a strategic-social game where contestants stranded on a deserted island form alliances and vote each other off until there is just on Sole Survivor left. There are challenges, and there are advantages, but I always thought growing up that by far the most important part of the game were a contestant’s alliances.

I wanted to investigate the follow two theories:

  1. Can you predict the winner of a season of Survivor by alliance network (out of the final three contestants)? How accurate would this prediction be?
  2. There is a sense among Survivor fans of the evolution of the game. People today are more likely to make temporary alliances than people in the past, and so have a much greater number of alliances in total. Would this trend appear in the network structure? More concretely, would the network structure become more dense the more recent the season?

Spoiler alert: the answer to both those questions is no. But rather than a failed experiment, this finding has some fascinating implications for the game of Survivor.

Analysis

First off, I had to define what was an alliance. Counting the number of times players agree to an alliance onscreen is not only difficult, but is often false as the producers get to decide which conversations to show and often players agree to alliances they don’t really stick to. I defined an alliance as existing between two contestants if they had voted together at least twice. This way alliance data could be calculated off of voting histories, available on the Survivor wiki. In addition, alliances are discrete. Players either have one or they do not.

Of course this method caught a couple of fake alliances and disregarded some true alliances where the players didn’t get a chance to vote together twice. But I believe it is still a good representation for the true alliance network of the season. And even if the resulting network structure is a little off of the true alliance network, it captures enough of that data that there would be some correlation between network structure and winning if that relationship existed. As we shall see, there was no correlation whatsoever.

Using this definition for alliances I webscraped the Survivor wiki for the voting histories of all 40 seasons. With this data, I used Gephi to produce a visualization of the alliance structure for each and every season.

season 1: Borneo
Season 1: Borneo
season 40: Winners at War
Season 40: Winners at War

The size of the nodes represent their eigenvector centrality — essentially how central that node is to the overall structure and to other central nodes. The color intensity of the nodes represent that nodes betweenness centrality — or how well that node connects otherwise disconnected nodes.

It is intuitive that high eigenvector and betweenness centrality would lead to a higher chance of winning; if you were allied with more central people, there is a greater chance they could convince others to vote for you; if you were allied with people with few allies, there is a much greater chance those individuals would vote for you.

The fact that this relationship did not hold was quite surprising. It turned out not to matter how central a player was to the alliance network – the most central contestant had no better chance at winning than either of the other final three(or in some cases final two).

Alliance Density

I graphed alliance density against season number to create the following:

alliance density vs. season
alliance density vs. season

There is only weak support that the seasons get denser over time. In fact, negative slopes are within the confidence interval! Players in more recent seasons seem likely to make just as many alliances as players from twenty years ago.

But that doesn’t mean the show isn’t evolving. As a viewer of the show, it is clear that strategies have changed over the years. But after investigating this data I now suspect that alliance density is not one of those factors, or at least only a very minor factor. The number and diversity of alliances players form has not changed in any significant systematic way.

Predicting a Winner

The bigger shocker was that network centrality had no predictive power on being the Sole Survivor.

Predicting the winner based on eigenvector centrality was only accurate 37.5% of the time. Predicting the winner based on betweenness centrality was correct a whopping 32.5% of the time. Randomly guessing (averaged between seasons with two or three final players) had about a 40% accuracy!

I did a bootstrap resampling of the data to make sure there wasn’t a negative correlation between centrality and winning. I found that with only 40 seasons, it was unsurprising that the random noise of probability would land the prediction accuracy of these models below the random-guess line. This is evidence, however, that there is absolutely no relationship between alliance centrality and winning. And even more certainly no positive relationship!

bootstrap resampling
Bootstrap resampling of prediction accuracies

Results

In the end I found that the alliance network couldn’t really predict the winner of Survivor. This result was super surprising: it seems completely counterintuitive to me that in a game so reliant on social maneuvering that the social structure had no predictive power on winning! So what could this mean for the game of Survivor?

Well, there are several possibilities. It could be the case that the other portions of the game are so important in the minds of the jury that the social aspect is dwarfed: things like winning challenges, finding hidden advantages, or surviving despite being the underdog. It could also be the case that this network data is muddied for another reason: perhaps near the end of the game everyone votes with everyone, covering the true network structure of people that really have each other’s backs. An alliance formed in the last two days is probably less predictive of a jury member voting for their ally than an alliance formed since day one. It could also be the case that the network structure is important in getting to the final three, but not as important in deciding which of them wins. I’m sure I have not exhausted the list of possible reasons for why this finding might be true.

This opens the door to some really fascinating further research opportunities. Predictors based on challenge wins, idols found, or the number of tribal councils where the player was voted against at least once (to measure ‘underdog’-ness) would test what aspect of the game does have predictive power on winning. Maybe none of these factors are correlated with winning at all — or perhaps a multivariate predictor would find some success.

It would also be really interesting to analyze the network structure at the merge: the halfway point in the game where separate teams all join up and everyone gets to meet everyone else. A predictor based on this midway structure might have some interesting revelations — because everyone’s been on separate teams up until this point, shuffled every couple of episodes, we could potentially see small-world-like structures and a ‘truer’ alliance network. A predictor based on this network would eliminate any of those last-minute alliances and might have a better accuracy (incidentally, because there are ten to thirteen players at the merge rather than three, the random guess accuracy will be close to 1/12 and confidence intervals will be narrower). Or you could use this data to investigate how the merge network structure predicts being in the final three rather than winning.

Other Findings

In the process of this analysis I stumbled upon another fascinating find. I had always imagined that Survivor mirrored the social structures of the real world. We don’t vote each other out, but we form friendships that we rely upon to get ahead in life. But all of the most common network trends that have been shown to develop in real life did not appear in the network of Survivor – smallworld trends, preferential attachment, core periphery networks…

Instead, the networks for the 40 seasons of Survivor followed one of two trends. Either the network was a single interconnected blob, or two opposing teams.

season 21
Season 21. An interconnected blob.
season 22
Season 22. Two opposing teams.

This plays out in-game. Some seasons felt like free-for-alls, while others felt like a pitted match between two teams.

The thing is, Survivor is not real life. Survivor is a competitive game, with a million dollar reward. Players in the network are always ‘on the offensive’, actively trying to promote their interests to avoid being voted out. Structures that arise naturally in ‘real’ networks are often unconscious and gradual. In Survivor, there is a lot less time and a lot less wiggle room for unconscious and gradual structures to emerge.

Final Remarks

By the end of this project I realized Survivor is not the metaphor for life boiled down to its basics that I thought it was. And the alliance network is not nearly as import to winning as I had assumed.

And yet… I’m not yet ready to give up on the idea that the social network of this game can give some insight into how it works. Perhaps a study in one of the aforementioned research directions could show some connection between network centrality and success – maybe socially central players last longer, or elevate the chances of winning for their friends.

Until then I will keep watching Survivor and predicting the winner the old-fashioned way: picking my favorite contestant and yelling out with my family during tribal council.

References

Acknowledgements

The network concepts used in this project (such as eigenvector centrality and smallworld network structures) were taught in INFO4360 at Cornell University.

A big thanks to Professor Drew Margolin for assuring me that my null results were just as interesting as positive results, if not more so.

--

--

Quinn Theobald
Analytics Vidhya

Loves data science, books, and theatre. Cornell ‘22.