How to visualise the top 500 board games in 2-D using t-SNE in R

David Foster
Apr 5, 2017 · 3 min read

The website Board Game Geek (BGG) surfaces data on just about every board game ever created and it’s got an API. You can probably guess what happens next…

Scraping the data

There’s a great Python scraper that taps into the API here.

I adapted it slightly to capture the categories and mechanics for each of the top 500 games as ranked on BGG. Each game can have multiple categories and mechanics, so the resulting csv has 84 columns for the categories and 51 columns for the mechanics, with 0/1 indicating whether it’s present for each game.

t-SNE / clustering in R

This data is then read into R and t-SNE performed on the binary category / mechanics data using the Rtsne package. The clustering is performed after the t-SNE dimensionality reduction, using hclust.

I used Ward D2 as the agglomeration method as it generally produces roughly equally sized clusters and cut at 30 groups as it seemed to colour the t-SNE plot nicely, and produce well defined clusters that weren’t too small or big.


t-SNE has done a pretty good job of grouping together board games with a similar theme or playing style. Let’s have a look at a portion of the plot, zoomed in and try to attach some meaning to the segments.

Purple Segment (network building style games that involve hand management / card drafting / set collection)

Ticket to Ride
Taj Mahal

Yellow Segment (card games with card drafting / set collection)

Close to the purple segment, due to the card drafting / set building similarity, but strictly card games

7 Wonders

Red Segment (economic games that involve hand management)

Close to the yellow segment, due to being mostly card games that use the hand management mechanic, but these games have an stronger economic flavour.

Through the Ages: A New Story of Civilization

Next steps

I’m planning to build a web app that uses this analysis to recommend a board game to play / buy based on your personal preferences. Stay tuned…

Check out the Github repo for the R code and segments. High-res image of the t-SNE plot is here. All other images are from Board Game Geek. And if ever you’re in London check out Draughts — a board games cafe. It’s awesome.

Please do leave us a green heart at the bottom of the page, if you found this useful, interesting or cool. ‘Pity hearts’ also accepted.

I’m a co-founder of Applied Data Science, a London based consultancy that implements end to end data science solutions for businesses. If you’re a business that wants to do more with your data, do get in touch.

Applied Data Science

Cutting edge data science, machine learning and AI projects

David Foster

Written by

Co-founder of Applied Data Science

Applied Data Science

Cutting edge data science, machine learning and AI projects

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade