Estimating Living Costs & Exploring Neighbourhood of Colleges under Delhi University
Every year thousands of students across India enroll in various programs in different colleges in the Indian capital : Delhi. Since its inception in 1922, University of Delhi has grown to be the most prestigious university with an increasing number of colleges and enrolments year after year. Students look forward to joining the college of their choice based on their merit and interest. A student gets a choice to choose between a number of colleges ranging from one to as high as fifteen based on his merit.
A student before joining any college wants to make an informed choice and wishes to weigh his preference on the basis of concrete facts. When on one hand the rank of the college matters to him, other factors such as the cost of living in the vicinity of the college and the accessibility of services in and around the college(knowledge of neighbourhood) is also a factor. Google search can help him/her explore colleges individually, but doing this for 70+ colleges and them comparing them all together is a tedious task. Moreover, the data which a student seeks to study is neither readily available nor clean enough or in the same metrics for all the college locations.
Therefore a comprehensive study of college location needs to be done. Students looking forward to enrolling in any college in the capital city of Delhi would be interested in this study. Also since this study categorize colleges in terms of rental price and venues in the vicinity, this report shall interest business people to set up affordable renting space and other marketplaces.
Data Source
The list of colleges under Delhi University was scrapped from their official website
The rental prices for various localities was scrapped from the real estate website Makaan.com using this link. Makaan is a company that lists real estate pricing and its details all across India.
The co-ordinates of colleges as well as localities of Delhi was retrieved using Nominatim geocoder. Since this package couldn’t find coordinates for each and every college, the leftover entries were typed in manually.
The FourSquare api was used thereafter to get the nearby venue details. Foursquare helps find trending, nearby, or specific categorical places in and around a location based on its geographical coordinates.
Exploratory Data Analysis
Colleges and Avg Rent relationship
The locality average pricing data scraped from makaan.com contains many outliers in high range values. These prices are of the expensive housing which are not availed by students and therefore needs to be removed from our analysis.
Locality and Avg Rent
The average rent of places is mostly between 5000–20,000 thousand with outliers as high as 35000.
The average rent of places is mostly between 5000–20,000 thousand with outliers as high as 35000.
Colleges Zone wise
Most of the colleges are in the Central, New, or southeast Delhi region. West, South, East Delhi have mediocre numbers of colleges while North West, South West, and North Delhi have the least number of College campuses.
Clustering Neighbourhoods
The data contains venue details in the form of one-hot encoding and rent pricing in the range of INR 5,000–35,000. Therefore data standardization is needed before plugging the values into the cluster model.
Min-max Scaler is used for standardization.
K-means algorithm is an iterative algorithm that tries to partition the dataset into K pre-defined distinct non-overlapping subgroups (clusters). In other words, we try to find homogeneous subgroups within the data such that the data points in each cluster are as similar as possible according to a similarity measure.
Result
The colleges are clustered into 4 clusters represented as a scatter plot.
Cluster 1 (Red Color)
This cluster contains colleges of, particularly Central Delhi zone which have a mediocre living expense. The neighbourhood of such colleges consists of places like restaurants, hostels, and parks.
Cluster 2 (Purple Color)
This cluster contains colleges of mostly all zones which have the least living expense. The neighbourhood of such colleges consists of places like department stores and shops for all needs.
Cluster 3 (Yellow Color)
This cluster contains colleges of New Delhi zone which have a very high living expense. The neighbourhood of such colleges consists of places like cafes, plaza, and bistro.
Cluster 4 (Green Color)
This cluster contains colleges present in the New Delhi zone that have a considerably expensive rent price. The neighbourhood of such colleges consists of lavish places such as Theatre, Art Gallery & Museum, Arcade.
Conclusion
The colleges of Delhi can be clustered in four different clusters based on their pricing and neighbourhood data. The division of clusters shows a similarity with the zones in which the capital city of Delhi is already divided. The New Delhi Zone has the highest housing price as visible from clusters 3 and 4. This is true since the New Delhi region consists of places of national importance thus adding value to its region. The Central Delhi zone has mediocre housing options in the range of 12–15 thousand.
We can also conclude that the average rent price of a neighbourhood has a relation with the venues surrounding it. Colleges in neighbourhoods with considerably high housing prices have places like cafes, bistros, art museums, and theatres. On the other hand, colleges in neighbourhoods with considerably decent/low housing prices have places like department stores and shops small and big of all necessities. This is a classic example of how venues decide the pricing of the neighbourhood.
Therefore using this study a student can have the knowledge of colleges within a common cluster and can use it while deciding a college of his choice backed by his financial status and expected quality of living.
Footnote: Complete code to the project can be viewed at this link.