eMAG TechLabs
Published in

eMAG TechLabs

Sicily-AirBnb homes analysis


In this post we’ll explore data from the AirBnb homes in Sicily, as part of a project from Udacity’s Data Scientist Nanodegree.

Who’s Sicily

According to Wikipedia :

Sicily (Italian: Sicilia [siˈtʃiːlja]; Sicilian: Sicilia [sɪˈʃiːlja]) is the largest island in the Mediterranean Sea and one of the 20 regions of Italy. The region has 5 million inhabitants. Its capital city is Palermo.

Sicily is in the central Mediterranean Sea, south of the Italian Peninsula, from which it is separated by the narrow Strait of Messina. Its most prominent landmark is Mount Etna, the tallest active volcano in Europe,[5] and one of the most active in the world, currently 3,329 m (10,922 ft) high. The island has a typical Mediterranean climate.

The earliest archaeological evidence of human activity on the island dates from as early as 12,000 BC.[6][7] By around 750 BC, Sicily had three Phoenician and a dozen Greek colonies and it was later the site of the Sicilian Wars and the Punic Wars. After the fall of the Roman Empire in the 5th century AD, Sicily was ruled during the Early Middle Ages by the Vandals, the Ostrogoths, the Byzantine Empire, and the Emirate of Sicily. The Norman conquest of southern Italy led to the creation of the County of Sicily in 1071, which was succeeded by the Kingdom of Sicily, a state that existed from 1130 until 1816.[8][9] Later, it was unified under the House of Bourbon with the Kingdom of Naples as the Kingdom of the Two Sicilies. The island became part of Italy in 1860 following the Expedition of the Thousand, a revolt led by Giuseppe Garibaldi during the Italian unification, and a plebiscite. Sicily was given special status as an autonomous region on 15 May 1946, 18 days before the Italian institutional referendum of 1946.

Sicily has a rich and unique culture, especially with regard to the arts, music, literature, cuisine, and architecture. It is also home to important archaeological and ancient sites, such as the Necropolis of Pantalica, the Valley of the Temples, Erice and Selinunte. Byzantine, Arab, Roman and Norman rule over Sicily has led to a blend of cultural influences.

From my point of view, Sicily is a wonderful island, which I explored a bit on bicycle and where I would return at any time. Here are some pictures from my trip.

View from Monte Veneretta, near Taormina
Road towards Mount Etna
Syracuse archeological museum

Where to stay in Sicily

The loveliest place I’ve stayed in Sicily was this mountain hut at 1820m altitude, called Rifugio Timparossa, but I definitely have a special taste regarding comfort 🙊 So, better trust the data, not myself! 🤫

Udacity’s team suggested the following directions:

  • understand how much Airbnb homes are earning in certain time frames and areas
  • compare rates between some cities
  • try to understand if there is anything about the properties that help predict the price
  • find negative and positive reviews based on text

What I’ve managed to do so far:

  • where to find an Airbnb home in Sicily
  • how are the prices distributed
  • which are the cheapest/most expensive neighborhoods
  • which room types are available
  • who and where are the hosts with the most reviews
  • where are the verified hosts or the super-hosts
  • how are the features of a home correlated
  • how much a feature impacts the price of a property
  • are the prices higher during the summer
  • are the properties available in the future
  • a basic exploration of what people say about the places they’ve been to

Let’s explore

Where are the Airbnb homes

We’ll use the map coordinates, latitude and longitude, to plot a heatmap of the Airbnb homes available.

The properties are all over the island, but also in the little islands nearby, like Isola di Pantelleria or Lampedusa.
Most of them are near the coast, in the big cities like Palermo, the capital city, but also in Catania, Siracusa, Ragusa, Agrigento and Trapani.😎

Where are the super hosts

If you want to be really picky, try only the super hosts, although the options are not that many and I don’t actually know what a super host is 😀

What about the verified hosts

You might also want to go to verified hosts only, but it’s a matter of confidence and time of the year when you’ll have to choose what’s available.

Which are the most crowded neighborhoods

Palermo, the capital city, Siracusa and Catania have the largest number of Airbnb homes. The other cities from this top 10 are smaller as area, but still offer a large palette of homes.

Which room types are available

Seems like you get mostly the entire home or apartment, which is great!

Cheapest neighborhoods

The mean price for our Airbnb homes is 95 dollars, the median is 60 -> the data is highly skewed, but 75% of prices are below 93$

Here are the cheapest neighborhoods by median price, where the neighborhood has at least 100 Airbnb homes available. So, you can visit Catania, Trapani, Messina, Palermo and other cities for less than 50$ per night.

Most expensive neighborhoods

Taormina, Trecastagni and Noto are the most expensive places, their median price by night is around 85$, but I can assure you that they worth it!

Prices by category values

We’ll check the distribution of prices for the values of qualitative features like: neighborhood, room/property type, host response time and try to see if for some categories the prices go up.

There are no significant differences for features like:

  • the host identity is verified or not
  • the host has a profile picture
  • the property is instantly bookable

There are some differences for :

  • the availability of the home — the median price is smaller and the range of prices is bigger for available homes.
  • the host type: the price range for normal hosts (about 85% of total hosts) is larger and the median price is a bit higher than for the super-hosts.

There are clear differences in price when:

  • the hosts respond within a day, median and price range are higher
  • the number of bathrooms available: 3/4 baths means definitely an increase in price (I’ve plotted only 10 categories, but the types of baths in ‘Other’ varies a lot, from more than 5 baths and some shared baths)
  • an entire villa is also more expensive (not a surprise either)

Categorical variable’s impact on the price

Here’s another way to estimate how the price is influenced by the categorical variables. For each variable, prices are partitioned into distinct sets based on category values. Then we check with the ANOVA test if these sets have similar means. If a variable has a minor impact, set means should be equal.

From the ANOVA tests, it seems like the property type and the neighborhood are the most important categorical features, in terms of impact in price, which definitely not a big surprise.

Numerical variable’s impact on the price

We’ve encoded categorical variables by the mean price of category values (called target encoding). Then we’ve calculated Spearman correlation coefficients, which pick up relationships between variables even when they are nonlinear. Here are the obtained scores:

From the Spearman’s scores, the main criterions in establishing Airbnb home prices are:

  • accommodates
  • bathrooms (target encoded)
  • property type (encoded)
  • neighborhood (encoded)
  • number of reviews

Another way to determine feature’s impact on price

We’ve used the features available to make a model that predicts the price.
Then plot the feature importance of the model (cat Boost), to see which features affect most the price.

The cat Boost model gives a new perspective about the features and their importance in predicting the price. There are many transformations one can make to both the model and the features, which will probably lead to other ranks in feature’s importance, but this was not the scope of this post.

The future

Listings available by day

How many listings(or Airbnb homes) are available in the near future:

There are plenty of homes available until 2022! We must only pray for covid to go away! 🙏

Average price by day

The property types are quite diverse, so we’ll look only at those for two people.

Prices do go up a bit during summer, but I wouldn’t pay more to be more uncomfortable during the hot summer days of Sicily 🌞

What people say about Airbnb homes in Sicily

Hosts with most reviews

Take me to the clouds

There are reviews in many languages, but most of them are in English, Italian or French. Here are the most common words people used in their reviews.

Positive/negative reviews

The analysis was quite basic and incomplete: I’ve used the SentimentIntensityAnalyzer from python to get some scores of positivity /negativity / neutrality/ compound in the reviews.

The scores are not that relevant:

  • the score of neutrality is mostly 1
  • the scores for positivity/negativity are mostly 0

Originally published at https://andreea-alexandrescu88.medium.com on February 17, 2021.




On this blog you will find materials written by eMAG Tech community about the projects they are currently developing, the technologies they use and the manner they are using them for best results.

Recommended from Medium

Determining the best first wordle word to guess, using data

Big data is just another tool so please stop treating it like the messiah

Things That Data Scientists Should Be Aware Of

Data Scientist Capstone Project

Udacity IBM Recommendations Project

Introducing Earthmetry Signals

Why You Should Start Your First Data Science Project NOW

How I learn Tableau Productively — Part 1

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
eMAG Teams

eMAG Teams

More from Medium

An Intro to Transaction Cost Analysis

Calculating a correlation between COVID-19 cases and MTA ridership

Compliant Advertising with a Programmatic Solutions for Cannabis Retail

A step by step solution for Knapsack using Branch and Bound.