This will make you save money on your next trip to Milan

Data Science for travellers on a budget

Valentina Ceriani
Sep 1 · 4 min read

Milan is one of the most-visited tourist destinations in the European Union, with 8.81 million visitors in 2017, and it’s a must stop-over on your Italian trip. So, how to choose the perfect Airbnb to get the authentic Milanese vibe, without empty out your wallet? Lucky for you, I’ve done the math so you don’t have to.

I followed the CRISP-DM process of data science to extract insights from the data. Doing the Udacity Data Science Nanodegree, I stumbled upon Airbnb Open datasets, which have a great listing of most of the cities around the world, with lots of datasets you can download and investigate.

Milan Airbnb accommodations map. Image from: http://insideairbnb.com/milan/

So, where should I stay in Milan?

Here are the statistics — don’t worry, we are going through it together.
In the plot below I’ve selected for you the top 20 Milanese neighbourhoods for the number of accommodations currently listed on Airbnb, just to have the broader selection available. For each of them, I’ve plotted the prices distribution and the mean price around the whole city. Note that neighbourhoods are sorted by the number of accommodations.

Long story short:

  • If you’re not on the fancy side, Corso Buenos Aires — Porta Venezia is the place for you: is currently the area with the greatest number of accommodations, meaning that you have high chances to find the one fitting your needs. Moreover, the price range is definitely lower than Duomo and Brera. Central Railway station area (Centrale) is a good deal too, but you need to hurry: housing here seems to be high in demand harder to find.
  • If you want to be in the spotlight, Duomo has a good range of housing, but you should be prepared to pay much higher than the average price in Milan: better to stay in Navigli or Ticinese, which seem to have a more reasonable range of prices. If you love the narrow streets in Brera, be prepared to book in advance, and pay a higher price.

*Bonus tip from locals: if you look for a quieter place to stay, with excellent public transports connection, Isola is my personal best spot in Milan.*

What about room types?

Are you the kind of person who doesn’t like to share your apartment with strangers? I got you covered. You can actually find entire houses cheaper than single rooms if you’re willing to move away from usual tourist spots. Città Studi, which is the universities area, has a great value for money. Viale Padova is the cheapest place, most likely for its bad reputation.

Which other variables correlate most with prices?

Do you strive for more data insights? Here you go, analysing the correlation between the features in our dataset.

What most correlates with price are basically all the size-related features: how many people can be accommodated, how many bedrooms and bathrooms are there. No surprise there. Perhaps more fascinating is the fact that the number of reviews is negatively correlated with price. That would mean that listings with hundreds of reviews tend to have a lower price. Is it because they have the best value for money?

What about the negative correlation with free street parking feature? Shouldn’t it be right the opposite? Well, here’s a possible explanation: in Milan, central areas have no free parking lots on streets, and therefore probably the prices are lower because they are on suburbans. Surprised? You shouldn’t: remember, correlation doesn’t imply causation.

Next steps and open points

This analysis has been good fun, and a way to better understand my own city. Some final considerations here:

  • Prices in Milan seems not to be strongly linear correlated with the amenities and other accommodation features, contrary to cities like London or Boston. In fact, my best linear model couldn’t go above 0.3 of , which is a very poor result. I should definitely check some tree-based approach, given also the number of one-hot-encoded variables.
  • It would be interesting to have a look at the accommodation availability, and how prices shift over time.

If you’ve made it through this point, congrats! Here’s my GitHub repo, so you can have a look at the code below.

Valentina Ceriani

Written by

“In God we Trust, all others bring data”

Datagonist

Where data plays the leading role.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade