City constellations

Maximilian Gödicke
6 min readNov 17, 2018

--

I found a a very interesting data visualization : Considering places and some connection between them (i.e. roads, shortest distance as the crow flies) as stars and star constellations.

Immediately I became inspired to work at this idea with some bells and whistels, i.e. star colour, star brightness, how to draw star constellations…

The main idea of this code is as follows: Construct maps of countries with the biggest X cities, draw them as stars in the “night sky2, adjust brightness and size according to the population and draw star constellations with them via Minimum Spanning Trees using the distance between the cities.

I get the coordinates and population count by the geonames-API. I also need the borders of the country — imagine a country where every big city is in the center. Then the stars will only cover the center and the night sky will not contain the outer regions. This will give the wrong impression about the location of the cities. For this I get the “smalles rectangles containing the country” from opencagedata.

This post describes my iterative approach to the problem. If you are only interested in the visualization click here.

First some imports:

The geonames-API gives access to a wealth of information, but I find it a little bit tricky to use, see here, since you can only filter by country-code and not by country-Name. For this I first get every country-code available.

PCLI are independet countries, see here . With every request I send to geonames I ask for 1000 Rows, so I can sort out some clutter if necessary. Note the empty query, simply requesting everything.

Now I use opencagedata to get some represantation of the borders. Note that this could also be done with shape files for the actual borders, but I think this approach is good enough. It fails for countries with remote exclaves, though, since the rectangle also has to contain all the colonies. This means that the actual plot for e.g. the netherlands has the cities of the mainland all bunched in a very small area in a big night sky.

Now that I have the country informations as needed, I can query the geonames-API for the cities. Note that I can only filter by country-code, not by country-name (the search parameter is still called *country*, though. I of course can also filter after getting the results, but there is the possibility that the results are so bad that the 1000 rows are wittled down to not enough rows for a meaningfull visualization. Search results can get bad if the search engine thinks that it is a relevant result from the literal query, but this is not true for the query as meant, e.g. Searching for Italy and getting cities in the USA with a district named Little Italy back. I understand why this is returned, but I don’t want it.

Featureclass P means only “places of people”, i.e. cities, districs…
FeatureCode PLX are districts. I only want the whole city, so I filter it. See here.

Looks … promising. I now add the “night-Sky-feeling”. I do this by a dictionary to be more flexible, one could also do this with a configuration file.
I got the star colour from [here]() and the night sky colour from [here]().

Note that matplotlib expects RGB-Colours as 3-tuple of values from 0 to 1.

Also the picture is a little bit too “dense”.

Homely! The next step would be to add size and brightness according to the city population.

Size: I simply define a size for the biggest city and then scale accordingly.

Brightness: When is a star “not bright”? When it is the same colour as the night sky. So I consider the continuum of colours “between” night sky and stars. If a city is 30% as big as the biggest city, then it gets the colour 30% the way from night sky to star. So

city_colour = colour_night_sky + ratio_population_count * (colour_star — colour_night_sky)

The dropoff in the size and brightness is too big. In many countries this would be even worse: consider France where the 2nd largest city already has only a third of the population of the biggest city, i.e. Lyon will only be a third as bright and big as Paris.

For this I introduce scaling functions, e.g. taking the n-th root of the ratio (square root: 0.81 -> 0.9, 0.36 -> 0.6).

I also represent the citys by literal stars instead of circles.

Note the sorting of the cities: I want small cities to be drawn first, so brighter cities that are roughly at the same location are drawn over them.

I really like it. You can infer the borders and see every metropolitan area.

The stars are finished. Now onto the star constellations: A first approach would be Minimum Spanning Trees of the complete graph on the vertices. They can be fastly computed and are planar, not too sparse and not too crowded.

Note the join on the city name. This means I have to delete duplicate city names, I will take the one with the biggest population.

I also add a filter how many citys will be displayed — the biggest count_cities cities.

Hmmmm….

A unicorn!

Now a loop for some countries.

I see a glaring problem, especially with Russia: The map can be stretched. If we look at the axes, the x-axis covers 300 degrees of longitude, the y-axis only 35 degrees of Latitude. The reason for this is that I try to draw the country in a (rough) square, but not every country’s “containing rectangle” is close to a square.

For every country I will define an individual rectangle contained in a square. The square defines the maximum size of the plot, the rectangle the actual size.

Much better. I still have a problem with “colonies” (see USA) and countries having high negative and high positive longitude (see Russia), but it is good enough for me now.

You can look at the result for the top 100 countries here.

Here we have it. Additional work that could be done:

  • star constellations are not MSTs, e.g. here or here. I think one could get better visuals by choosing a random connected planar graph, but this is computationally much more expensive (idea: construct every graph possible, check for connectedness, check for planarity)
  • there are some mismatches between the borders and the countries: Mexicos borders are to small, USAs & Netherlands borders are much too wide(since I am only interested in the mainland). The mismatches stem from the fact that two different data sources are used.
  • the colouring could be improved. What does it mean for a star to be sparkling?

--

--