A quick guide to paged vector tiles

Daniel van der Maas
4 min readJan 19, 2023

--

We use vector data a lot in our daily lives. From streets and borders to point locations on a map, most of us see vector data on a daily basis (whether we realize it or not).

If you, like me, ever dove in how to render this type of geometric data efficiently over the web, you probably noticed that there is no satisfactory general protocol that can render vector data over the web.

Where raster rendering over the web is very much a solved problem. (I actually wrote an article on raster data a while ago arguing that the XYZ protocol basically solves the problem of rendering raster data over the web.) However the same cannot be said of vector data. Each protocol has it’s own advantages and drawbacks and are applicable to a certain subset of data and use cases.

At Ellipsis Drive I therefore broke my usual and long standing rule to NEVER create a new protocol and went ahead to do exactly that. I’m very well aware that adding additional protocols often times cause more problems than they solve, but I felt compelled as the two popular protocols WFS and vector tiles simple did not do the trick. Vector tiles are focused on visualization alone where I needed access to raw data and WFS simply did not scale well. I needed a protocol with the scaling of vector tiles but the raw access of WFS.

To this end I created paged vector tiles. This is basically an extension on the vector tiles protocol to make it suitable for raw data, and with that for data science purposes.

The basic mechanism behind paged vector tiles is the well known tile pyramid. That covers the web mercator projected world on different zoom levels.

We use a max zoom of 21, this means that at the highest zoom level we cover the globe in 2²¹ by 2²¹ tiles. (So tiles of a size of around 20 by 20 meters).

Now think of each of these tiles as a ‘box’ in which we can place vectors. When you create a layer on Ellipsis Drive you essentially create this pyramid full of empty boxes.

Zoom level 0 consists of 1 box, zoom level 1 of 4 boxes, zoom level 2 of 8 boxes etc.

Now when you add a vector to the layer using for example the python package

import ellipsis as el

el.path.vector.timestamp.feature.add(pathId=pathId, timestampId=timestampId, features=features, token=token)

or by uploading some file. The vector in question will be added to the boxes according the the center point of the vector.

By default the vector will be added to one tile for each zoom level 0 to 21. You can override this default by specifying one or multiple zoom levels. In this case the vector is only added to tiles on the specified zoom levels.

Now when requesting data, you always retrieve vectors by specifying a certain box. If you would be interested in a certain region, you would need to calculate a good covering of tiles for this region and retrieve the data for the specified tiles.

Since all data is retrieved using these predefined tiles results can be found very efficiently and results can even be cached.

When looking at a certain area the client calculates a covering of tiles and then fetches those.

But what if at some point we keep adding vectors to a certain tile causing it to be too full to retrieve all data in just 1 request? Well that is where the ‘paged’ part of the ‘paged vector tiles’ comes in.

On entry every vector gets it’s own unique uuid. Vectors within a tile get ordered based on this randomly assigned uuid. Now if you retrieve the vectors from a certain tile you will get the first 2mb worth of vectors AND the uuid of the next vector in line (if it exists). Now you can request the same tile again only now using the uuid you got as an offset.

Since the vectors within the tile are ordered by their uuid the procedure remains log(N) efficient no matter how many pages of the tile you would be retrieving.

By requesting multiple pages of the same tile you can swiftly request over a million points.

Paged vector tiles have 2 main advantages compared to vector tiles

1 You can easily add many vectors to a layer without thinking too much about it. If you just retrieve the first page of each tile you would get an honest random sample of the data you entered, which in general will give you a good impression of your data set.

2 There is no limit to how much data you can place within a tile. It will take you more and more request to fully scrape the tile, but it will work fine.

Paged vector tiles have 1 main advantage to WFS

1 As opposed to WFS it scales. WFS retrieves data by bounding box, using a GIST index this can be made more efficient, but at the end of the day it does not reach the log(N) efficiency of tiles.

Here and here you can find some examples on how to use and manipulate paged vector tiles in some specific use cases.

--

--

Daniel van der Maas

As CTO of Ellipsis Drive it's my mission to make spatial data useable for developers and data scientists. https://ellipsis-drive.com/