A data-driven journey through Vector Tile optimization

Isaac Besora
9 min readOct 29, 2018

--

Once upon a time, in a kingdom called ICGC, a bunch of fools named Geostarters discovered, while doing some of their enchanments, a source of magic called Vector Tiles. Produced by Mapbox, a group of some of the most powerful magicians in the entire world, it gave the power to efficiently encode and stream big amounts of vector data to the ones who could tame its power…

Prelude

For those not knowing what a Vector Tile is, Mapbox’s Vector Tiles website states the following definition:

Vector tiles make huge maps fast while offering full design flexibility. They are the vector data equivalent of image tiles for web mapping, applying the strengths of tiling — developed for caching, scaling and serving map imagery rapidly — to vector data.

Although their main contribution is as a distribution format, offering full design flexibility should not be understated. It does so by separating styling from raw data in two different files: an MBTiles file where the raw vector data is stored and a JSON file describing how that data should be rendered on the client. That means that we can have a single source of data and different styles applied to it without having to change the underlying data. Examples of this one-source-of-data-multiple-styles can be found on Mapbox’s Map page and in our own demo prototype.

Styles example from Mapbox’s Map page

Some time ago Mapbox had a series of recommendations about Vector Tile datasets on their website. Although that website is now defunct, some traces of the recommendations can still be found on their github issues:

  • Average level tiles must be less than 50KB
  • No tile should be bigger than 500KB

A quick calculation shows that those are very reasonable recommendations: a full-screen map needs to download about 30 tiles. At an average of 50KB, this means almost 1.5MB worth of data to see the initial view of a map. Me, being obsessed with performance, decided to evaluate some of our own Vector Tiles with those recommendations in mind.

I was willing to try it. I set to embark on a journey that would take me to places I didn’t know existed. I hadn’t thought about which kind of difficulties I would find, I was just willing to learn about this kind of magic through experience, by getting my hands dirty and studying the very basic concepts it was based on.

Chapter one: Missing pieces and revelations

One of the rules of code optimization states something that should be unremovable from every optimization process: Profile first! In order to evaluate Vector Tiles, some things were missing: a way to explore how its size is distributed inside them, which tiles contribute more to the overall size of a given level and the layers they contain. After not finding a tool that could give me this information on the level I wanted I decided to create it… introducing vt-optimizer, a small tool to examine and optimize Vector Tiles.

In this post I’m using an in-house Vector Tile created aggregating data from different sources. This Vector Tile is the result of the ContextMap initiative at ICGC which aims to translate our own 1:25000 Topographic map to vector and to integrate it with a global map. This is a group effort from different teams: cartographic bases, cartography, geosystems, datastart and geostart, the group I work in. The result can be seen here.

Schematic of the ContextMap initiative from the Open Source to use and style Vector Tiles talk by @bolosig at #siglibre2018. Slides and video are available online (in Spanish)

Then came a revelation: separating data and style is a double-edged sword. While giving us the freedom of customizing data on the client, it also opens the possibility of downloading data that won’t be rendered at all, be it because a given layer is set to be rendered from a higher level than the one where that layer’s data is downloaded from or because its style is set as invisible or with an oppacity of 0. That’s something that we should be really aware of.

Armed with the needed tools and with that revelation always on my mind I was now set to start my journey.

Tinted vintage Iowa sign — Buy it here

Chapter two: The optimization path

Using vt-optimizer with its inspection mode on our Vector Tile we can get a picture of how the size is distributed and get a brief check of Mapbox recommendations. As shown below we have a long road to go to meet them.

Size distribution per level and recommendations check: taking into account the entire 7 levels of the Vector Tile, just 2 levels meet the desired size. The other 5 have an average tile size bigger than 50KB. No tile exceeds the recommended 500KB.

First things first though. We are using this Vector Tile with a fixed style so there’s not need for it to include the layers that are not being used. Running vt-optimizer with its optimization mode, a new Vector Tile that only has the needed data to satisfy a given style is created. This new Vector Tile is indistinguishable from the starting one when rendered but it’s potentially much lighter. Notice that this should only be done as the last step on a Vector Tile style creation process, when we are really sure that our style won’t change at all.

Optimization process results: The number of features per level removed and a list of each layer and the levels it has been removed from are shown.

Examining the resulting Vector Tile we can see that we’ve improved but we are not quite there yet. Levels 7 and 9 do now meet the recommendations. Notice that the level 11 average tile size has increased although the total level size and number of tiles have decreased.

Size distribution per level and recommendations check after the initial optimization

Continuing with the inspection mode we can get a better understanding of what’s happenning with layer 11.

Tile size distribution: Tile sizes on a given level are distributed in ten buckets of equal size. We should look for big disparities between the % of tiles in this level and % of level size columns.

Bucket 10, with a percentage of level size almost 4 times bigger than its percentage of tiles seems to be the big contributor to this level average size. Reducing it should bring the entire average down but what does this bucket contain that weights so much? Let’s continue our inspection.

Listing of tiles inside the selected bucket (11/1036/1283) and statistics about the tile and the layers it contains

As seen above, the biggest culprit is a layer called edificacio_a_escala_0_pol with a total of 58503 vertices and 10576 features. Now the question is if this layer is absolutely needed to be rendered on this level or if we can render it on a higher one. To answer this, we can use Maputnik, an awesome tool to style a Vector Tile, to see which effect it would have if this layer was not visible at this level. Once our style is loaded we can look for the layer on the layer list and change the style attributes to see what would happen if it was not visible.

Layer edificacio_a_escala_0_pol visible (left — in red) vs invisible (right).

Removing that layer on this level greatly alters our map and we might find this unreasonable. To continue, we should skip this layer and concentrate on the next one, called vial_o_cami_no_pavimentat_entre_4_i_2_5_metres_4_lin. Repeating the same process in Maputnik we can see that it’s a much more fair compromise.

Layer vial_o_cami_no_pavimentat_entre_4_i_2_5_metres_4_lin visible (left — in yellow) vs invisible (right).

If we deem this modification as acceptable we can alter the JSON style file, changing this layer visibility from level 10 to 12. Running the optimization step with the updated style and examining it afterwards we can see how everything has changed.

New average sizes

Notice that in level 11, changing a single layer reduced our average from 110KB to 88KB.

The path to optimization is a circular one: we may have to go around it multiple times, changing the style definition of different layers, before we have a Vector Tile that satisfies all the recommendations.

After going around multiple times and failing at getting a clean grasp about what I was missing to tame its power, I started to doubt this path would lead me where I desired. Despairing, I rose my head to get a broader view of the situation and saw a sign I had not seen before. It pointed to a much more narrow path than the one I was on. Where did that road lead to? Only time would tell

Chapter three: On the road from despair to success

We may get to a point where, although we don’t want to remove any more layers from a given level, we need to do something more to get to the recommended sizes. Enter vt-optimizer simplification mode.

Following the optimization process where we left it, we can use the inspection mode to see what’s contributing to the level 11 size.

Listing of tiles inside the selected bucket (11/1036/1283) and statistics about the tile and the layers it contains

Having removed some layers that reduced the average tile size on this level, the biggest culprit is still the edificacio_a_escala_0_pol layer. Although we don’t want to remove this layer, seeing there are almost 60K vertices, we can simplify it using vt-optimizer simplification mode to reduce its weight.

Simplification mode output. This run shows a layer reduction of about 60%

As always, we should now check if the simplification is good enough for our purposes. We might have to try it more than one time with different tolerance parameters to get a good result.

Layer edificacio_a_escala_0_pol not simplified (left — in purple) vs simplified (right). With a 60% reduction of vertices there’s no perceptible difference on this level.

Examining the Vector Tile again we can see how the average tile size of level 11 has changed and its size distribution in buckets.

Average sizes after simplifying
New tile size distribution

Layer 11 has gone from an average of 88.45KB to 88.08KB and the maximum tile size from 377KB to 290KB. Performing this kind of precise intervention multiple times on the heavier layers can greatly reduce the average tile size of a level.

With the knowledge acquired on the path I had just left, I now found myself prepared to take on the enormous challenge of taming Vector Tiles. I knew I would never really compare to the big magicians and their enchantments but I felt satisfaction on understanding how that kind of magic really worked.

Epilogue

Through this journey we’ve explored a procedure to optimize Vector Tiles and we got to know vt-optimizer, a small tool I created to inspect and optimize them. With this tool we are now able to really see where we need to better optimize our data. We should always ask ourselves if the data on a given level is needed and if it really is, wether we need the full geometry or we can go by with a simplification of it.

Optimizing a Vector Tile is an iterative process that should be done only as the last step on the production chain, when a style has been defined and is closed. It can really have an impact on the performance of our maps at the expense of full customization.

I was sure this was a path I would do again and again but somehow it didn’t feel cumbersome. I was eager to do it with open eyes, wanting to learn more about the process each time I went by. I was also sure I would find others who were walking the same road. I was keen on meeting them and sharing our knowledge. I was certain this kind of magic was here to stay and I was more than willing to understand everything that went with it.

And so the story goes…

You can find me on Twitter, Github, Linkedin and writting about computer graphics, virtual reality and programming in general on my almost-defunct-blog

--

--

Isaac Besora

Software engineer. Choose my aim, take one step and then the next. It had never been anything else. Inconformista, utòpic, idealista, ingenu.A baix a l'esquerra