Spatial Pattern Analysis

Mykola Kozyr
Aspectum
Published in
5 min readMay 21, 2019

Data enrichment matters. Business and Education collaboration

What do we expect to see while visualizing location data by adding it to the map for the first time? I’m pretty much sure our goal is to discover whether there are any spatial patterns: regions with higher and lower intensity of features, clusters and outliers, common directions, irregular trajectories, territories with higher or lower values, etc. As soon as we explore patterns we are ready to determine features significance, build predictive models and call ourselves data scientists.

The approaches used to discover patterns are worth a separate article. Currently, I’m working on it and going to start it with the title “Stop using heatmaps…”. So let’s imagine the researcher is aware of pattern analysis algorithms, and ready to go deep into the world of GIS.

In this article, I will mention point pattern analysis, show the results of pattern analysis algorithms and issues the non-GIS professional would face, and describe the collaboration between business and science in developing the solution for the current software incompetence.

Point Pattern Analysis. Intensity Clusters & Outliers

Let’s start with a simple, but a real-world case: researcher has got locations of thefts (car accidents, emergency calls, tweets, photos, birds’ nests or beavers’ lodges. Absolutely any type of location data) in a city (park, sea, country… no matter what). Researcher’s task is to explore if there are any spatial patterns and distinguish them if they are. He or she has built the grid over the area of distribution using the specific cell size to avoid randomness. At the next step, the researcher has counted the number of features inside every cell and finally run the Clusters & Outliers Analysis and/or Hotspots & Coldspots Analysis, it depends on the actual task.

In addition, he or she has run NNA and Quadrat Analysis, so the researcher is pretty much sure now the objects have clustered distribution and results show these clusters, as well as outliers.

I would say the researcher is pretty much satisfied with the result. It is good enough to start exploring outliers, however, the area covered by clusters of high and low values is insanely huge. For people who know the city, high value cluster looks obvious — these are residential and business areas. The cluster with low values (blue part) corresponds with forests, river, and industrial areas.

At this point to get more precise results for clustered areas the researcher decides to take into account urban classes while analyzing the data. It may help to find clusters within similar territories: comparing features distribution within residential areas separately from green or water zones.

So, imagine the researcher did not use the Aspectum and faced all the GIS-related issues: shapefiles, projections, clumsy desktop software, areas of distribution vector data, grid size formula, appropriate algorithms via plugins or separate extensions (btw, in case of using Aspectum it’s just about the raw data).

Now he or she has to build the urban land classification vector dataset. But how? Where to find data? What zones should be distinguished? What are the parameters to classify this zone? How to automate this for further tasks?

The Project. When business and education enrich

We believe it is the task of the software development companies to deliver a product able to produce a valuable result with no need to teach the target audience to work. And in case of GIS, it usually takes 6 years of high school studying.

Aspectum decided to initiate a project to generate urban landuse data for all the settlements in the world. Such an ambitious goal became interesting to our friends and partners from Rivne Noosphere Engineering School. The team of students with software engineering, GIS, urban studies background, and first of all, passion was organized under the supervision of lecturers.

Since there is no correct approach for the task, we started to develop by iterations, constantly reviewing and discussing the results. This article will introduce the core idea of the implementation of the project, and show the results now available for testing.

In the following articles, we will share approaches have been tested, results which did not work, ideas for improvements and supplemental features implemented.

The Concept

There are huge open source projects dealing with geospatial data. Our goal is to merge the data and the ways to process it to get the most accurate and complete information about the certain area in settlements all around the world. Just to clarify — we are dealing not with landuse types, it is about the idea of similarity of the influence on the phenomena and processes taking place in these areas. So as well as residential areas could be absolutely different across the city, we have to assign them different classes. On the other hand, we are not dealing with the zoning approach — there is no need to combine retail and business areas in one group, as well as it is not obvious if it makes much sense to separate education and medical areas in terms of influencing on spatial patterns in these areas.

Current results

We are now testing the MVP version of the product, delivering information about the 4 classes inside an area of data distribution. Go to Aspectum, add your data and run Urban Land Classification analysis. It may take a while to process and we’ve still got a number of limitations, but we are constantly improving the algorithm.

Check the map with the classes at local and regional level

Classes in local and regional levels. Screenshot from Aspectum

There have been a number of developments made during the task, and we are happy to share them. Take a look at our GitHub page, we are constantly updating and adding projects there. osm2geojson, our latest update, is a pure Python solution to process OpenStreetMap data. Follow Aspectum not to miss the next articles.

--

--

Mykola Kozyr
Aspectum

Product Management and Geospatial Innovations.