How I Ruined My Vacation By Not Using Data

Tiwaa Bruks
The Great Imposter
5 min readJan 10, 2019

--

Inspired by a recent trip to Paris where I stayed in an Airbnb disastrously far away from most of the attractions I actually went to see, I decided to create a program that shows the most popular neighborhoods in a city according to Airbnb

Introduction — Why am I writing this?

As a data analyst I’m constantly asking myself the same question, “Is there a dataset for that?” You see I’m uncomfortable with ambiguity and so while it may be overkill I’m always happier when I base my decisions on evidence. However, it seems that there is an exception to this rule: vacation planning. I abhor planning vacations and I know that I should be grateful that I even get to take a vacation but as a complex individual I reserve the right to be simultaneously appreciative AND censorious, just ask any teenager.

Deciding where to stay on holiday is a tedious and time-consuming rabbit hole and anyone who disagrees is most likely in cahoots with the enemy. Let’s say you’ve decided to spend your 25th birthday in Paris . You decide to do what you and almost everyone else does when planning a trip: get on Google and lookup “where to stay in Paris”. Naturally you take a look at Airbnb, then TripAdvisor¹ and so on and so forth. However, if you’re a bit anal like me then you’ll probably still be online three hours later trudging through user SVvirgin4eva’s account on Reddit of everything that’s wrong with Paris. It’s actually so horrifying that you’re rethinking your choice of holiday destination. Plus, you’ve spent so much time on Google Maps trying to figure out whether your choice of Airbnb is centrally located enough that you’re on a first name basis with the app’s algorithm, and practically a local at this point. Not to mention, you’re a millennial so you are perpetually punching above your weight class every. single. time. you make a purchase. This means that although you have your eye on a beautiful penthouse with a panoramic view of the City of Lights, you’ll probably make the smarter decision and go with the more low key Airbnb so you can save your money for what really counts. Okay so you’ve finally decided on a place to stay, well done Sherlock! You get there and it’s lovely, cue the singing birds. There’s just one problem…the first attraction on your list, you know the one that tops TripAdvisor’s list of things to do in Paris and Conde Nast called the “to die for” attraction of 2018, yeah that one. Well it’s a 35-minute drive from the Airbnb² which is a $40 round-trip Uber ride.

There has got to be an easier way to decide where to stay on holiday. I chose to use code to alleviate some of this stress on my next vacation.

How did I do it?

First, I downloaded scraped³ Airbnb data from Inside Airbnb. For the purpose of this essay I defined popularity as the number of bookings in each neighborhood but since Airbnb doesn’t provide that data I used the number of reviews as a conservative proxy given that in 2014, Brian Chesky, Airbnb’s CEO commented that about 72% of guests leave a review for hosts. Plus, Airbnb themselves use growth in bookings as an indicator for their annual list of top trending destinations.

Second, I amalgamated the number of reviews by neighbourhood using Python’s Pandas library.

Third, I fed that result into a Leaflet map I created. In the result below you will see that each neighbourhood is color-coded by popularity, a choropleth of sorts. The more saturated the neighbourhood is, the more popular it is. This way the message is both impactful and easily digestible. The data was broken into classes using k-means algorithm which clusters data points in a way that minimizes within-group variance and maximizes between-group differences.

Live version | Code | Jupyter Notebook

What did I find out?

Disclaimer: Most popular does not translate to best place to stay for vacation, I’m just easily influenced and like to follow the crowd. I know I know, I’m ashamed too.

Next Steps

Now that we know where most tourists stay in Paris, I’d like to know the reasons behind that popularity. Evidently it isn’t as clear cut as proximity to main attractions. The Eiffel Tower and Musée D’Orsay (two of Paris’ most famous attractions according to TripAdvisor) are located in Palais Bourbon but it’s 17th on the list of most popular neighbourhoods. The Louvre which hosts the museum of the same name is last on the list. Most likely because it is the smallest Parisian neighbourhood and most accommodations are exclusive 5-star hotels which are almost as illustrious as the guests that stay in them.

Paris is a fairly compact city with an incredibly efficient and affordable public transportation system, so maybe proximity to attractions is not as important a factor. Can the same be said of spread-out cities like Los Angeles and Miami?

I also found that the top 3 neighbourhoods also had the largest volume of listings. Is it as simple as that? If you build it, they will come.

Perhaps a different measure of popularity would be more appropriate? I’m also curious to see what a heatmap of Parisian attractions because then tourists could just book in neighbourhoods that are close enough to the most popular attractions.

I plan to explore other factors that might conclusively prove which is the best place to stay and why, so if you’d like to discover that answer with me then stay tuned.

Also, this is my first blog post as an aspiring data scientist, so any feedback and comments is greatly appreciated. Thanks :)

TL;DR:

Where I stayed:

Where everyone else stayed:

Great job Tiwaa! -_-

¹ Neither Airbnb or TripAdvisor provides rankings, only neighbourhood descriptions.

² Obviously an exaggeration to prove my point (It was the 4th item on the list and Conde Nast called it “to die for” in 2011 not 2018) not completely inept only slightly.

³ Last scraped December 7th 2018

--

--