Measuring Urban Similarities in Los Angeles using, Open Data, ArcGIS, and Sklearn.

Benny Friedman
Mar 2 · 11 min read

Finding similar places

After exploring ways to measure happiness in cities, A Look at Spatial Happiness in Cities, using Tweepy, Text Blob, and ArcGIS, I continued to look for ways to use spatial data to glean important information about our urban surroundings. I was intrigued when a friend mentioned that, as an urban planner, she spent a lot of time identifying places similar to her project sites to serve as precedents. A precedent study is usually conducted as one of the first steps in urban planning projects. The goal is to identify applicable ideas from similar projects. If we could quantify the similarity between places, that would help urban planners find new and similar locations to serve as precedents. But its uses are not limited to urban planning. It would also help businesses and real estate. A retailer may want to find areas like those it has succeeded in already. A real estate agent may be interested in quickly identify areas that clients might like.

Measuring a Neighborhood

To identify similarities between neighborhoods, we must define a neighborhood. People know what a neighborhood is, but a consistent definition is elusive. For this project, I borrowed the 10-minute walk/ 1/2 mile radius frequently used in urban planning, and created a 1/2 SQMI grid to serve as my neighborhood unit. It is somewhat crude but is fine-grained enough to derive some helpful insights.

LA County Population Density on a 1/2 mile grid
LA County Average Age on a 1/2 SQMI grid

Quantifying Similarity

Now that the structural and demographic features are selected and derived for each 1/2 SQMI neighborhood in the LA County grid, it’s time to talk about exactly how to quantify similarity. Broadly there are two common methods used in data science to understand the similarity between samples: clustering algorithms can intelligently identify groups of similar samples in a dataset, and distance measurements that quantify the multi-dimensional distance between samples. Clustering is best if we are looking to group neighborhoods by similarity, and distance measures are better if we want a way to quantify how similar or dissimilar any two neighborhoods are. Both have their advantages, but for my friend’s situation, where we want to find the N most similar neighborhoods, distance measures will work better.

Simplified multi-dimensional Euclidian distance formula for data with k features.

Building the tool

Now that we have engineered our features and settled on a method for quantifying the similarity between samples, we can begin building the tool itself! For simplicity, I built the tool in a Jupyter Notebook, but I intend to move it into an interactive Dash/Plotly web app in the future.

  1. The tool finds N number of similar 1/2 SQMI neighborhoods in LA County in list and map form.
Sample distributions for the four structural features
Sample distributions for the four demographic features
Adding weights to the scaled data with lambda expressions
Input OID for analysis
Code to find the 100 most similar neighborhoods.
A map of the 100 most similar areas to a selected neighborhood

Next Steps

This was certainly just a proof of concept, but it provides a valuable starting place for further work. In the future, I think that the feature set can be expanded to include far more structural and demographic features. The tool could be packaged in a web app to improve usability and enable the user to tune the analysis for their specific use. It will also be important to expand the tool beyond the LA Region. However, further analysis will have to be undertaken to determine how well one can compare neighborhoods across diverse regions.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Benny Friedman

Written by

I am a spatial data scientist fascinated by how we measure our built environment.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store