DATA STORIES | GEOSPATIAL ANALYTICS | KNIME ANALYTICS PLATFORM

Who closed up shop? A geospatial analysis of pharmacy decline in Baden-Württemberg with KNIME

A low-code approach to determine whether there are structural differences between the remaining pharmacies and those that have closed shop

Ch A Knobloch
Low Code for Data Science

--

Photo by Serkan Yildiz on Unsplash.

The number of pharmacies in Germany has been declining for years. It is often claimed that it is mainly pharmacies in rural areas that are closing. Using the example of pharmacies in Baden-Württemberg, this article shows how KNIME Analytics Platform and the Geospatial Analytics Extension for KNIME can be used to analyse whether there are structural differences between the remaining pharmacies and those that have closed.

For this purpose, we scraped Google business profiles of pharmacies in Baden-Württemberg, information from the Baden-Württemberg State Office for Geoinformation and Regional Development on municipal boundaries, as well as data from the Federal Institute for Research on Building, Urban Affairs and Spatial Development on spatial organisation at municipal level.

We show how KNIME can be used to link this data and determine suitable key figures based on the distances between pharmacies.

The figure shows an overview of the individual steps of the workflow:

Fig. 1: An overview of the entire workflow.

You can download the workflow for free from the KNIME Community Hub.

Step 1: Read in data, preprocess and (spatially) join

Let’s start with the three reader nodes from bottom to top. The data on pharmacies in Baden-Württemberg is structured as follows:

Fig. 2: List of pharmacies in Baden-Württemberg.

The geodata (latitude, longitude) and the business status are of particular interest for the analysis. The latter can take two forms, remaining and closed. Data on pharmacies was collected by scraping the Google business profiles of all pharmacies in Baden-Württemberg at the end of 2022, and again at the beginning of 2024 to check how many of those pharmacies are still in operation.

Note. A comparison with a list of pharmacies from the Baden-Württemberg Chamber of Pharmacists showed that 100% of pharmacies in Baden-Württemberg have Google business profiles.

Out of 2313 pharmacies, 112 pharmacies have since ceased trading. For the spatial join, Geometry point objects must first be created from the longitude and latitude with the Lat/Lon to Geometry node.

Fig. 3: List of pharmacies in Baden-Württemberg with Geometry column.

The next file contains information on the urbanity of the municipalities:

Fig. 4: Urbanity of the municipalities in Baden-Württemberg.

The third file that is read in contains the geoinformation on the administrative boundaries of the municipalities in Baden-Württemberg:

Fig. 5: Administrative boundaries of the municipalities in Baden-Württemberg.

Note. An example metanode for reading a .geojson file into the GeoFile Reader via a relative file path from current workflow data area is available on the KNIME Community Hub:
https://hub.knime.com/-/spaces/-/~cjucaWtMAvbRuRMY/current-state/

It should also be noted that the projection of the administrative boundaries provided is carried out in accordance with EPSG: 25832. Therefore, we adapt the Coordinate Reference System (CRS) of the administrative boundaries to the CRS of the geodata of the pharmacies with the Projection node:

Fig. 6: Configuration of the Projection node.

Fortunately, we have a municipality code in both tables, so that we can join them together without any problems or after removing the zero at the beginning of the code in one table.

Now that we have the geodata for the pharmacies and municipalities in Baden-Württemberg, it is clear what comes next: we want to assign a municipality to each pharmacy. We do this with the Spatial Join node. The configuration looks like this:

Fig. 7: Configuration of the Spatial Join node.

In “Match mode”, there are various options that can be displayed by clicking on the question mark at this point:

Fig. 8: Match mode options for spatial joining.

“Covers” (Matches if every point of the right object is a point on the interior or boundary of the left object.) seems well suited, but it is then important to select the correct table as the bottom/right table, in this case the table with the geodata of the pharmacies. It is also important to select the correct join mode: we use a right (outer) join, as we want to link each pharmacy to a municipality.

Two Column Filter nodes, which reduce the number of columns to the number required for the respective evaluation, follow the spatial join. (This is particularly helpful for the speed of the cross-joining operation in the following section.)

Step 2: Calculate distances between all pharmacies

The distances between the pharmacies are calculated using several nodes that are available via NodePit. (To install NodePit in KNIME, see here: https://nodepit.com/node-installation-guide.)

Fig. 9: Calculating distances.

First, the longitude and latitude of the pharmacies are converted into coordinates, as the Geo Distances nodes can only work with these. To then calculate the distances between the individual pharmacies, a cross join is first performed, i.e. each row of the left table is joined with each row of the right table. This results in a table with approximately 5.5 million (= 231³²) rows, but also two columns with the coordinates of the two pharmacies. Geo Distances nodes and Column Distances nodes are then used to calculate the distances (in kilometres) between the coordinates described in these two columns, i.e. the respective pharmacies.

A Row Filter (Labs) node is then used to remove all rows with distances between pharmacies that are less than 5 metres, as these are join rows of identical pharmacies. In reduced form, the table looks as follows:

Fig. 10: Distance table for pharmacies.

Note. As an alternative to the combination of Latitude/Longitude to Coordinate, Geo Distances and Column Distance nodes, distances can also be determined using just the KNIME proprietary Haversine Distance node (after the Cross Joiner node). However, the execution takes much and since I am an impatient person, I prefer the “NodePit solution” ;-).

Step 3: Form key figures based on the spatial location and distances between pharmacies

When answering the question of whether there are differences between the remaining and the closed pharmacies, it is interesting to look at the competitive environment. To this end, the following section investigates whether there are differences in the distances between the pharmacies and the nearest (competitor) pharmacy:

Fig. 11: Calculation of key figures based on the nearest pharmacy.

The first GroupBy node shows the row with the minimum distance for each pharmacy, i.e. the distance to the nearest pharmacy. The second GroupBy node then calculates the average of these distances for the two groups, resulting in the following table:

Fig. 12: Average distance to the nearest pharmacy.

On average, the nearest pharmacy was a good 200 metres closer to the closed pharmacies than it was to the remaining pharmacies. Direct competition was, therefore, stronger for the closed pharmacies. This argues against the thesis of a pure demise of rural pharmacies.

Note. As this is a complete survey of pharmacies in Baden-Württemberg, it is not necessary to use statistical testing methods for sampling and the associated indication of significance levels here and in the following.

In the lower branch of the workflow shown above, the different distances to the nearest pharmacy are calculated. Additionally, the relative shares (in %) of pharmacies in the individual bins are calculated to be able to estimate structural differences between the two groups of pharmacies more precisely. Pharmacies with up to 500 m to the nearest pharmacy were closed in particular.

Fig. 13: Proportion of pharmacies with different distances to the nearest pharmacy.

To only look at the competitive situation within a radius of 1 km, all distances between the pharmacies that are greater than this kilometre are first filtered out. Then, for each pharmacy, the number of distances, i.e. how many competitor pharmacies remain within this radius, is counted. These numbers of competitor pharmacies are combined to produce a pivot table for each bin and each group (remaining vs. closed) showing the number of pharmacies that have the corresponding number of competitor pharmacies within the 1-km radius.

Fig. 14: Key figures based on competitor pharmacies within a 1-km radius.

The result is as follows:

Fig. 15: Proportion of pharmacies with different numbers of competitor pharmacies within a 1 km radius.

Here is how you should interpret the table above: 25.9 % of the closed pharmacies have at least five competitor pharmacies within a radius of one kilometre.

Here, too, the higher proportion values in the “at least five pharmacies” class indicate that the competitive situation was more pronounced for the closed pharmacies than for the remaining pharmacies.

One analysis is still pending, namely the consideration of the urbanity of the locations of both pharmacy groups. For this purpose, the following table is displayed at the end of step 1 in the lower column filter node:

Fig. 16: Urbanity of pharmacy locations.

In step 3, the following data operation takes place with the corresponding output:

Fig. 17: Key figures based on the urbanisation of pharmacy locations.
Fig. 18: Percentages of pharmacies with different urbanity of location.

The comparison of the two pharmacy groups (remaining vs. closed) shows almost no differences in the percentages regarding the urbanity of the municipalities in which they are located.

So far, we have seen how easy it is to link usually structured data with geodata and how these can be transformed and aggregated. However, there is still one thing missing that is often the most fun when dealing with geodata: visualisation. Good visualisation enables even non-experts to grasp the key message at a glance. In our case, we have established that there are no signs of a pure rural pharmacy decline.

Step 4: Multi-layer geo-visualization with Kepler.gl Geoview, preparing for Foursquare Studio

Firstly, the municipality and pharmacy geometry data is written to two separate files for import into Foursquare Studio (more on this in a moment). But before moving to that, let’s take a look at how the files are prepared so that they can be displayed by the Kepler.gl Geoview node.

Fig. 19: Preparing for visualization.

This node can (so far) only read in one table, so the geodata of the pharmacies and the municipalities must be merged into one table. No complex join operation is necessary here, as the two datasets act as separate layers in the visualisation. We can therefore work with the Column Appender and select the Generate new RowIDs option so that the new table contains all the rows of the two input tables.

However, the number of pharmacies exceeds the number of municipalities, so that several columns of the municipalities have missing values. This is not a problem per se, but the Kepler.gl Geoview node cannot work with missing values. The trick here is to fill in the missing values with the Missing Value node, in which the previous values are inserted. This results in many duplicates, but this is not a problem as all duplicates are only displayed once on the map.

Fig. 20: Impute previous values.

In the configuration of the Kepler.gl Geoview node only one geometry column can be selected, but this is irrelevant. Simply set all remaining columns including the remaining geometry columns under “Include”. The actual map customization (e.g., colors, layers, tooltip, legend, etc.) is then carried out after running the node in the interactive view by pressing the F10 key.

However, it takes quite a long time for this interactive view to open (as does the execution of the node) and I have already said that I am an impatient person ;-). Another disadvantage is that the Kepler.gl Geoview node does not retain the configurations/customisations that are done in the interactive view. This is due to the highly customisable nature of the Kepler.gl framework and how it was wrapped to work in the node.

Hence, I looked around for an alternative and ended up with the software Foursquare Studio. Like the Kepler.gl Geoview node, this is based on the kepler.gl engine, but offers the advantage that you can directly import several files, which can then be used as separate layers. The work is also saved in each case and all configured maps are clearly collected and retrievable in a dashboard. Another major advantage is that Foursquare Studio accesses the PC’s graphics card directly for the computationally intensive visualisation work, so there are no long loading or waiting times.

Free registration is required to use Foursquare Studio. Alternatively, Kepler.gl can also be used without registration, but the work disappears when the browser is closed. In my opinion, the free plan of Foursquare Studio is completely sufficient for smaller applications. For example, if you have 1 GB of storage space for your data records.

After registering, you can create a new map and are immediately prompted to import files:

Fig. 21: Importing (geo)data into Foursquare Studio.

After the import, both files can be selected as layers. Various parameters can then be set for these layers. The most important ones in our case are the fill colour. For the pharmacy layer, the fill colour is based on the business status, for the municipality layer on the area type. The respective colours for the characteristics can be defined individually.

For an introduction to Foursquare Studio, we recommend the Get-started page: https://docs.foursquare.com/analytics-products/docs/get-started or the Studio user manual: https://docs.foursquare.com/analytics-products/docs/studio-user-manuals-overview.

The result is as follows:

Fig. 20: Visualisation of pharmacy closures in Baden-Württemberg.

The black dots mark the closed pharmacies, the green dots the remaining pharmacies. The colour of the municipality indicates urbanity, the redder, the more urban. Here, too, it is clear that not only rural pharmacies are closing shop.

Question to answer yourself: The visualisation shows several municipalities with only one black dot, i.e. the last remaining pharmacy there has now closed. Therefore, it makes sense to use KNIME to determine in how many municipalities this is the case. Are you up for the challenge?

Summary

This article has hopefully shown how easy it is to work with geo-data in KNIME. The KNIME for Geospatial Analytics collection page in the KNIME Community Hub provides a good introduction to the topic: https://hub.knime.com/-/collections/~kc8Kc8MDDqG-sqoF/.

The Practitioner’s guide to geospatial analytics can also be found there as a PDF. Further reading can be found in the book Computational Methods and GIS Applications in Social Science — Lab Manual, which demonstrates the practical application of various GIS methods using KNIME workflows. For learners who prefer a more concise resource, there’s also a KNIME Geospatial Analytics cheat sheet for free download.

In addition to the KNIME nodes such as the Geospatial View node, several other options are available for visualising the results. The text mentions the Kepler.gl Geoview Node and Foursquare Studio, while other options include Datawrapper and Flourish. For highly advanced and customizable applications, we recommend working with QGIS.

--

--