GPU Accelerated Aggregation in deck.gl
Aggregation of data is a common task in many visualization applications, deck.gl offers 3 aggregation layers. HexagonLayer, aggregates data in world space to hexagon shaped bins; GridLayer, aggregates data in world space to square shaped bins; and ScreenGridLayer, aggregates data in screen space (after projections) to square shaped bins. Following images show all 3 layers with changing bin size. In this article I’ll dive into how deck.gl v6 greatly improved its aggregation performance by leveraging the GPU.
A generic aggregation process involves these steps:
- divide an area of interest into several bins. A bin could be defined by a rectangle, hexagon or any other shape.
- process each input data point and determine which bin each data point belongs to and update the bin’s counter.
- count how many points fall into each bin. This number can then be used to visualize how the input data is distributed among the bins.
Below is an example of aggregation of points (‘a’ to ‘h’) into four square shaped grid cells. The first image renders the points into corresponding bins and in the second image, bins are colored based on aggregated count. As seen below visualizing distribution of these points is much more easier when bins are colored based on aggregate count. Once aggregation is performed results can be used set the color, elevation, size etc or any combination to visualize the results.
In the following sections we explore how this can be done on GPU, we will be referring to WebGL code samples and results from our implementation of ScreenGridLayer in deck.gl, but the approach is generic enough to be implemented in any graphics library or application.
In the ScreenGridLayer, points (longitude, latitude) , grid size, projection and viewport parameters are provided as input. Based on `grid size` the total screen space is divided into a grid of cells, every point is projected to screen space, and weight of the grid cell that this point falls into is incremented. Finally each cell is rendered with a color based on its aggregated weight.
In this section we dig deep into every step involved in GPU aggregation. In abstract, we setup a render pipeline, such that, each grid-cell is represented by a single pixel in a texture. Each input point is rendered into this texture where, its position data (longitude, latitude) determines which pixel is rendered and its weight determines the color of the pixel. And blending is setup such that pixel color is incremented every time it gets rendered.
All of the input points are then rendered to perform the aggregation. GPU performs rendering operation of all points in parallel using several execution units. Once all input data is rendered, aggregated data is encoded into this texture and can be consumed by actual rendering of the visualizations.
Render Pipeline Inputs
Two WebGL buffers are created, one with position data and the other with weights data. Constant data such as, grid-size, projection matrices are set as uniforms.
Viewport size is set to the size of the Grid, and a Framebuffer object is created with the same size and activated. A float texture (RGBA32F format) with same size is created and set as color attachment to the framebuffer. This setup effectively maps each grid cell into a single pixel in the texture. Each pixel has four channels, Red, Green, Blue and Alpha. Each channel can be used to aggregate a different parameter. And moreover, using ‘‘blendEquationSeperate’, a different aggregation operation can be applied for Red, Green and Blue set and Alpha channels in a single render pass. In the following section we present more details on blend setup.
In the demo we present later, we use Red channel to aggregate the counts and Green channel to aggregate the weights. Alpha channel is used to collect max aggregated value and Blue channel is unused. More details in below sections.
Multiple points are processed in parallel on the GPU by using Vertex and Fragment shaders.
In Vertex shader point’s position is projected to transform them to screen space.In this space, based on gridcell size, their position is mapped to the cell they belong too. (Note the floor operation in linker vertex shader below). This step has to be performed only in Vertex Shader, since Fragment Shader can’t change pixels position. Now this cell position is mapped to corresponding pixel by mapping it to clip-space (X and Y coordinates range from -1 to 1). So far we have mapped a given input point to the pixel represented by the grid-cell it falls into.
Pixel perfect rendering
Before we finish the processing we move final pixel position to its center position. This is needed to avoid result being mapped to a neighbor pixel due to any floating point rounding issues. This is achieved by adding an offset which is equal to half of the pixel size. Pixel size is determined by the fact that, total grid size in each direction is represented by a [-1, 1] range.
Weight of the point is passed to the Fragment shader as a varying. Given we are rendering points, weights are not interpolated before reaching fragment processing and each weight represents corresponding grid cell weight. In Fragment shader, we render a value 1.0 to Red channel, and weight of the point to Green channel.
Actual aggregation happens during post fragment processing step, Blending. During Blending, pixel’s final color is calculated by applying blending operation on current fragment shader output and its current value in the color buffer. For more details check blendFunc, blendEquation and blendEquationSeperate.
We setup source and destination factors to GL.ONE using blendFunc and blend mode to GL.FUNC_ADD using blendEquation. With this setup and the fact that Fragment Shader outputs 1.0 for Red channel and bin’s weight in Green channel, at the end of rendering process Framebuffer’s color attachment texture’s Red channel contains total number of points and Green, Blue and Alpha channels contains total weights of points, that are aggregated into grid cell represented by the pixels position.
Total count and weight
Above setup and shaders, will give aggregated values per grid-cell. But in many application we also want to calculate total aggregation values like, total count and total weight. To achieve this we use the similar render pipeline as above, but instead of aggregating values into grid-cell, we aggregate all of them into a single pixel. We use WebGL’s ‘blendEquationSeperate’ method to perform blending separately for RGB and Alpha channels as described below.
For this draw, we use instanced rendering, with instance count equal to number of grid cells. Using GLSL’s built in variable ‘gl_InstanceID’, in vertex shader we map each instance id into corresponding texture coordinate and pass it to fragment shader. Also position of every instance is mapped to (-1, -1) , hence everything is rendered into a single pixel.
Fragment shader receives the texture coordinates and does the texture sample. Texture used for sampling is the result of above aggregation step, each pixel corresponds to aggregated values of one grid cell.
In above vertex shader, we are using ‘div_fp64’ , an 64-bit floating point version of division operation, to avoid 32-bit floating point precision issues. For more details check luma.gl fp64 shader module.
With the execution of above shaders, resulting texture contains a single pixel, with R channel contains total count, G channel contains total weight and Alpha channel contains maximum aggregated weight from all the cells.
Consumption of aggregated data
Above aggregation steps are performed by rendering to a Framebuffer object. Once completed aggregated data resides in two texture object. First texture object contains aggregated data per grid-cell, second texture object contains total count, weight and maximum weight in single pixel. This data can be read by the CPU (using readPixels) and then consumed by other rendering steps as attributes and uniforms. To further maximize this step and avoid CPU/GPU sync, this texture data can also be consumed as is by sampling data directly by the GPU or converted into Buffer objects using ‘PIXEL_PACK_BUFFER’ WebGL API.
Once aggregated data is in WebGL buffer objects, these can be set as attributes and uniforms on next rendering steps that require aggregated data.
Data is aggregated into float textures, hence the browser should either support rendering to float textures (WEBGL_color_buffer_float) or should support WebGL2 context.
If application wants to avoid CPU/GPU sync and use ‘PIXEL_PACK_BUFFER’ API to consume aggregated data, browser must support WebGL2 context.
We observed upto 12X speeds when aggregation is accelerated using GPU for ScreengridLayer. Below is an example of aggregating 4.5 million points, every time cell size slider is moved, data is aggregated, when using GPU acceleration, UI is very responsive but when performing same aggregation on CPU, due to very slow speeds UI is unresponsive.
I have presented a render pipeline setup and set of shaders to perform grid aggregation on GPU. Our implementation uses luma.gl and deck.gl API, but above details are generic enough and can directly be incorporated into any graphics application. In addition to ScreenGridLayer we are planning on extending this to other aggregation layers, like GridLayer and HexagonLayer, and to upcoming new layers like ContourLayers and HeatMapLayer in future deck.gl releases. For more details check deck.gl RFCs and roadmaps.