Sicily-AirBnb homes analysis
In this post we’ll explore data from the AirBnb homes in Sicily, as part of a project from Udacity’s Data Scientist Nanodegree.
According to Wikipedia :
Sicily (Italian: Sicilia [siˈtʃiːlja]; Sicilian: Sicilia [sɪˈʃiːlja]) is the largest island in the Mediterranean Sea and one of the 20 regions of Italy. The region has 5 million inhabitants. Its capital city is Palermo.
Sicily is in the central Mediterranean Sea, south of the Italian Peninsula, from which it is separated by the narrow Strait of Messina. Its most prominent landmark is Mount Etna, the tallest active volcano in Europe, and one of the most active in the world, currently 3,329 m (10,922 ft) high. The island has a typical Mediterranean climate.
The earliest archaeological evidence of human activity on the island dates from as early as 12,000 BC. By around 750 BC, Sicily had three Phoenician and a dozen Greek colonies and it was later the site of the Sicilian Wars and the Punic Wars. After the fall of the Roman Empire in the 5th century AD, Sicily was ruled during the Early Middle Ages by the Vandals, the Ostrogoths, the Byzantine Empire, and the Emirate of Sicily. The Norman conquest of southern Italy led to the creation of the County of Sicily in 1071, which was succeeded by the Kingdom of Sicily, a state that existed from 1130 until 1816. Later, it was unified under the House of Bourbon with the Kingdom of Naples as the Kingdom of the Two Sicilies. The island became part of Italy in 1860 following the Expedition of the Thousand, a revolt led by Giuseppe Garibaldi during the Italian unification, and a plebiscite. Sicily was given special status as an autonomous region on 15 May 1946, 18 days before the Italian institutional referendum of 1946.
Sicily has a rich and unique culture, especially with regard to the arts, music, literature, cuisine, and architecture. It is also home to important archaeological and ancient sites, such as the Necropolis of Pantalica, the Valley of the Temples, Erice and Selinunte. Byzantine, Arab, Roman and Norman rule over Sicily has led to a blend of cultural influences.
From my point of view, Sicily is a wonderful island, which I explored a bit on bicycle and where I would return at any time. Here are some pictures from my trip.
Where to stay in Sicily
The loveliest place I’ve stayed in Sicily was this mountain hut at 1820m altitude, called Rifugio Timparossa, but I definitely have a special taste regarding comfort 🙊 So, better trust the data, not myself! 🤫
Udacity’s team suggested the following directions:
- understand how much Airbnb homes are earning in certain time frames and areas
- compare rates between some cities
- try to understand if there is anything about the properties that help predict the price
- find negative and positive reviews based on text
What I’ve managed to do so far:
- where to find an Airbnb home in Sicily
- how are the prices distributed
- which are the cheapest/most expensive neighborhoods
- which room types are available
- who and where are the hosts with the most reviews
- where are the verified hosts or the super-hosts
- how are the features of a home correlated
- how much a feature impacts the price of a property
- are the prices higher during the summer
- are the properties available in the future
- a basic exploration of what people say about the places they’ve been to
Where are the Airbnb homes
We’ll use the map coordinates, latitude and longitude, to plot a heatmap of the Airbnb homes available.
The properties are all over the island, but also in the little islands nearby, like Isola di Pantelleria or Lampedusa.
Most of them are near the coast, in the big cities like Palermo, the capital city, but also in Catania, Siracusa, Ragusa, Agrigento and Trapani.😎
Where are the super hosts
If you want to be really picky, try only the super hosts, although the options are not that many and I don’t actually know what a super host is 😀
What about the verified hosts
You might also want to go to verified hosts only, but it’s a matter of confidence and time of the year when you’ll have to choose what’s available.
Which are the most crowded neighborhoods
Palermo, the capital city, Siracusa and Catania have the largest number of Airbnb homes. The other cities from this top 10 are smaller as area, but still offer a large palette of homes.
Which room types are available
Seems like you get mostly the entire home or apartment, which is great!
The mean price for our Airbnb homes is 95 dollars, the median is 60 -> the data is highly skewed, but 75% of prices are below 93$
Here are the cheapest neighborhoods by median price, where the neighborhood has at least 100 Airbnb homes available. So, you can visit Catania, Trapani, Messina, Palermo and other cities for less than 50$ per night.
Most expensive neighborhoods
Taormina, Trecastagni and Noto are the most expensive places, their median price by night is around 85$, but I can assure you that they worth it!
Prices by category values
We’ll check the distribution of prices for the values of qualitative features like: neighborhood, room/property type, host response time and try to see if for some categories the prices go up.
There are no significant differences for features like:
- the host identity is verified or not
- the host has a profile picture
- the property is instantly bookable
There are some differences for :
- the availability of the home — the median price is smaller and the range of prices is bigger for available homes.
- the host type: the price range for normal hosts (about 85% of total hosts) is larger and the median price is a bit higher than for the super-hosts.
There are clear differences in price when:
- the hosts respond within a day, median and price range are higher
- the number of bathrooms available: 3/4 baths means definitely an increase in price (I’ve plotted only 10 categories, but the types of baths in ‘Other’ varies a lot, from more than 5 baths and some shared baths)
- an entire villa is also more expensive (not a surprise either)
Categorical variable’s impact on the price
Here’s another way to estimate how the price is influenced by the categorical variables. For each variable, prices are partitioned into distinct sets based on category values. Then we check with the ANOVA test if these sets have similar means. If a variable has a minor impact, set means should be equal.
From the ANOVA tests, it seems like the property type and the neighborhood are the most important categorical features, in terms of impact in price, which definitely not a big surprise.
Numerical variable’s impact on the price
We’ve encoded categorical variables by the mean price of category values (called target encoding). Then we’ve calculated Spearman correlation coefficients, which pick up relationships between variables even when they are nonlinear. Here are the obtained scores:
From the Spearman’s scores, the main criterions in establishing Airbnb home prices are:
- bathrooms (target encoded)
- property type (encoded)
- neighborhood (encoded)
- number of reviews
Another way to determine feature’s impact on price
We’ve used the features available to make a model that predicts the price.
Then plot the feature importance of the model (cat Boost), to see which features affect most the price.
The cat Boost model gives a new perspective about the features and their importance in predicting the price. There are many transformations one can make to both the model and the features, which will probably lead to other ranks in feature’s importance, but this was not the scope of this post.
Listings available by day
How many listings(or Airbnb homes) are available in the near future:
There are plenty of homes available until 2022! We must only pray for covid to go away! 🙏
Average price by day
The property types are quite diverse, so we’ll look only at those for two people.
Prices do go up a bit during summer, but I wouldn’t pay more to be more uncomfortable during the hot summer days of Sicily 🌞
What people say about Airbnb homes in Sicily
Hosts with most reviews
Take me to the clouds
There are reviews in many languages, but most of them are in English, Italian or French. Here are the most common words people used in their reviews.
The analysis was quite basic and incomplete: I’ve used the SentimentIntensityAnalyzer from python to get some scores of positivity /negativity / neutrality/ compound in the reviews.
The scores are not that relevant:
- the score of neutrality is mostly 1
- the scores for positivity/negativity are mostly 0
Originally published at https://andreea-alexandrescu88.medium.com on February 17, 2021.