Evaluation of the Boroughs in London, UK in order to identify the ‘Best Borough to Live’
Background
London is considered to be one of the world’s most important global cities and has been called the world’s most powerful, most desirable, most influential, most visited, most expensive, innovative, sustainable, most investment-friendly, and most-popular-for-work city.[1] Every year, thousands of people make the move to London both from within the UK and from overseas. They decide to move and settle down in London due to many reasons such as work commitment changes, looking for better living conditions, etc. However, there are certain things they have to consider before moving in. London housing and rental prices are among the highest in the world and other living costs are not cheap either. Considering these and many other facts, it is relatively tough matter to decide to where to settle down within London.
Problem
London has 32 boroughs which vary from each other by many aspects: cost of living, housing prices, crime rates, etc. to name a few. Therefore, our problem here would be to find out the best London borough to live considering various facts & environments mentioned above.
Interest
Any newcomer has to educate themselves beforehand about the things mentioned above to decide the best place for them to settle down. Furthermore, it is for any real estate agent’s advantage that they are well updated on such matters whenever a client contacts them with such inquiry. Also, this knowledge will be welcomed by any property developer as it helps them on deciding best places to build their next housing scheme.
Data description
Based on the definition of our problem, I’m going to use the following data sources to extract or generate the required information:
- Statistics about London boroughs will be obtained as a .csv file from London Borough Profiles and Atlas, London Data Store website [2]
- Important venues and other desired locations will be obtained using Foursquare API [3]
- London borough boundaries will be obtained as a .json file from Statistical GIS Boundary Files for London, London Data Store website [4]
Obtaining Data and Fine Tuning
The statistical data about London Boroughs was downloaded and cleaned ie. removed unnecessary fields, removed duplicates, found any missing data and fixed, long column names were shortened, etc. The finished data set would be looks like this.
However, these the data varies from small fractions to hundreds of thousands. Therefore, before performing any math on these data I have them normalized as follows.
Methodology
By looking at the above data set, we can clearly distinguish 2 different categories of stats:
- Stats that a higher value makes positive impact: Ex:- Green space, Public Transport, Education, etc.
- Stats that a lower value makes positive impact: Ex:- Crime rate, House Price, Council Tax, etc.
Therefore, I will calculate 2 mean values: 1 positive and 1 negative, and calculate the difference between these two. And the Borough that has the highest difference between the positive mean and the negative mean will win this contest.
The results of this analysis can be seen in the following charts. According to these stats, we can say that Wandsworth wins as the most desired Borough.
Choropleth Maps
I’m going to visualize the above findings geographically by creating color coded map using Python folium library. To implements this, I need Geo-locations for our Boroughs thus I used a Wikipedia table [6] to obtain those. However, some of these Geo-locations were incorrect so I used the Python Nominatim geocode() function to acquire the correct lat/ long data by providing addresses. And our map visualize nicely as follows.
By looking at the above map, we can see that:
- Outer London boroughs are scoring more against the Inner boroughs. Probably because of the higher house prices and council taxes in the Inner Boroughs. Also because the Outer boroughs are bigger, more spacious with lots of greenery and provides nature to enjoy.
- For some of the Geo-locations we used to mark the boroughs are actually not representing the center of that Borough.
With considering the size differences of boroughs, (especially between Inner and Outer boroughs) it is unwise to retrieve radius-wise venue data from Foursquare for each Borough and compare. Since a relatively smaller set radius will not cover enough area in bigger Boroughs, while a relatively bigger set radius might overlap venues for smaller Boroughs.
To solve this, and simplify the process, especially since now we have found out that Wandsworth is our go-to place, I will concentrate on Wandsworth alone and find out what are the attractive venues it provide to make it so popular.
Wandsworth is split into 20 areas or wards: (Balham, Bedford, Earlsfield, East Putney, Fairfield, Furzedown, Graveney, Latchmere, Nightingale, Northcote, Queenstown, Roehampton and Putney Heath, Shaftesbury, Southfields, St Mary’s Park, Thamesfield, Tooting, Wandsworth Common, West Hill, West Putney). So let us have an analysis among these 20 Wards.
I get the Geo-locations from Google Maps: Special thanks to the creator(s) of Wandsworth Borough Wards Google My Map [7] for providing Geo-locations plus boundaries KML file to use in this project.
I also use the Average House Prices by Borough, Ward, MSOA & LSOA, London Data store [8] website to obtain the average house prices for these wards.
Foursquare Data
Building the Wandsworth data set; it looks as follows.
With this lat/ long data, we can acquire Foursquare venue data for each of the above ward and the data set looks as follows.
There are 106 unique categories. The maximum venue count for a Ward is for Thamesfield which is 62, followed by Northcote (51) and Fairfield (43). West Putney has not returned any venue records.
Now I perform one hot encoding on this data and group rows by Ward, and by taking the mean of the frequency of occurrence of each category. With that we can print each Ward along with the top 10 most common venues in it.
I’m going to run k-means algorithm — one of the most common cluster algorithm in unsupervised learning, to cluster the wards into 5 clusters and analyze. Then I will merge all these clusters as well as the top 10 venues for each ward and Geo-locations data to visualize the resulting clusters on a choropleth map.
Result
The first part of this analysis was to find out the best London Borough to live, given on various stats. We obtained a London data set and analyzed these stats against each other and also among Boroughs. We found out that there are 2 types of stats.
- Higher the value positive the impact
- Lower the value negative the impact
So we calculated a positive mean and a negative mean for each borough and then calculated the difference of these two. And the higher the difference the better. According to that analysis, the Top 5 Boroughs were:
- Wandsworth
- Barnet
- Bromley
- Richmond upon Thames
- Harrow
Since we found out that Wandsworth is the best Borough to live, we shifted our focus to Wandsworth and used Foursquare data to find out all the popular commercial venues located in each of the 20 Wandsworth Wards. Finally we visualized these data using Choropleth maps, both for London Boroughs and Wandsworth Wards.
Discussion
As mentioned in the Introduction, London is one of the most important global cities plus one of the most desirable city to live as well; Having said that, London is also a huge city, with 32 Boroughs of different sizes spread over thousands of hectares.
Since the beginning, we clearly identified 2 different sets of Boroughs, which are:
- Inner London Boroughs
- Outer London Boroughs
Inner London Boroughs are relatively small in size, and being close to the epicenter of the city, are more expensive to live. Higher house prices, higher council taxes plus other high living costs associated with these Boroughs might make someone think twice of living in there regardless of many more positives such as good access to public transport and so on.
By comparison, Outer London Boroughs are bigger in size, less congested and offer more space to live in. Also they offer more green space which would delights the nature lovers. The living costs are relatively lower; however, they also have their own drawbacks too.
I believe that this analysis could be making better by breaking it into 2 parts. By analyzing Inner Boroughs and Outer Boroughs separately, we can have a better idea. Probably we will be able to find the ‘Best Inner Borough’ and the ‘Best Outer Borough’ to live in.
Having said all these, the winner of this contest was rather surprisingly, an Inner Borough, Wandsworth. Now, that is another thing to analyze.
By coincidence, while surfing through the Internet, looking for data, I found out this newspaper article. Seems like they have found out an answer for that!
7 reasons why Wandsworth is the best place to live in London
Conclusion
As mentioned in the Introduction these kind of analysis are very important to real estate agents and property developers. Also, anyone who wishes to move in to London will welcome such information.
People can take better decisions through accessing such information where they are provided freely and frequently.
References
- London Wikipedia Page, https://en.wikipedia.org/wiki/London
- London Borough Profiles and Atlas, London Data Store, https://data.london.gov.uk/dataset/london-borough-profiles
- Foursquare API, https://developer.foursquare.com/
- Statistical GIS Boundary Files for London, London Data Store, https://data.london.gov.uk/dataset/statistical-gis-boundary-files-london
- City of London Wikipedia Page, https://en.wikipedia.org/wiki/City_of_London
- List of London Boroughs, https://en.wikipedia.org/wiki/List_of_London_boroughs
- Wandsworth Borough Wards, https://www.google.com/maps/d/viewer?gl=us&ptab=2&ie=UTF8&oe=UTF8&msa=0&mid=1XBK2S1kOEmVeKPx3eFs325ewt4E&ll=51.45172462249272%2C-0.19283999999993284&z=12
- Average House Prices by Borough, Ward, MSOA & LSOA, London Data Store, https://data.london.gov.uk/dataset/average-house-prices
This article was published as part of the final assignment in IBM Data Science Professional Certificate, a 9 course certificate program offered by IBM via Coursera.
If you wish to go more technical, please visit the Jupyter Notebook associated with this project that stored in my GitHub repository. Link is below.