Timely optimization, part 2
Spoiler: 7x speed-up for two hours work
In the last post I talked about switching to a data structure designed to provide better support for the most common access. In this case, a quad-tree for indexing two-dimensional objects with a bounding box in XY coordinates.
We went from an O[n] list search to an O[log n] tree lookup. This enabled a worthwhile improvement by moving some of the query work into the data structure traversal rather than return a collection to the caller and do the work there.
As is often the case, this triggered a realisation that a similar technique might speed up the creation of “terrain” that we use to give an indication of the road in a more physical context.
Terrain generation works by recursively sub-dividing the XY space and at each stage derive the lowest point in that region so that we can render terrain which is always just beneath the road. I spent some time pondering whether we might be able to just transform the existing tree structure into a landscape, but this is not flexible enough, as the terrain generation “adapts” to where the road actually is.
Still, there’s the query that we issue for each sub-division. Rather like the previous example, it returns a list of points, only for us to calculate the lowest elevation and the aggregate bounding box; taken together these drive the terrain. Can we push this calculation into the data structure, to avoid passing back a result set?
Yes, sure we can, but it’s a fold
, not a filter
.
queryWithFold :
SpatialNode contentType units coords
-> BoundingBox2d.BoundingBox2d units coords
-> (SpatialContent contentType units coords -> accum -> accum)
-> accum
-> accum
queryWithFold current queryArea folder accumulator =
case current of
Blank ->
accumulator
SpatialNode node ->
let
fromThisNode : List (SpatialContent contentType units coords)
fromThisNode =
node.contents |> List.filter (.box >> BoundingBox2d.intersects queryArea)
in
if node.box |> BoundingBox2d.intersects queryArea then
List.foldl folder accumulator fromThisNode
|> queryWithFold node.nw queryArea folder
|> queryWithFold node.ne queryArea folder
|> queryWithFold node.se queryArea folder
|> queryWithFold node.sw queryArea folder
else
accumulator
We extend the query
call with a folder
function which the caller provides to aggregate the query results “on the fly” into the final accumulator
argument.
Does this make our calling code really obscure? I think not.
initialFoldState : TerrainFoldState
initialFoldState =
{ minAltitude = Quantity.positiveInfinity
, resultBox = BoundingBox2d.singleton centre
, count = 0
}
queryFoldFunction :
SpatialContent IndexEntry Meters LocalCoords
-> TerrainFoldState
-> TerrainFoldState
queryFoldFunction entry accum =
{ minAltitude = Quantity.min accum.minAltitude entry.content.elevation
, resultBox = BoundingBox2d.union accum.resultBox entry.box
, count = accum.count + 1
}
{ minAltitude, resultBox, count } =
SpatialIndex.queryWithFold index myBox queryFoldFunction initialFoldState
What are these three things? Firstly, a small data structure for accumulating our results. Secondly, the function that does the accumulation; called for each result. Thirdly, the actual call to queryWithFold
. I think it’s a rather succinct expression of the work, a pleasing division of labour, and easily accessible results. Scarcely any other changes were needed.
Was it worth it? I spent about one hour after breakfast writing the queryWithFold
function, and about one hour in the evening writing the calling code while watching the Bake-Off final. My test case has 23000 track points and took about seven seconds to render terrain. This went down to just about one second.
That’s the kind of optimisation I like. Targeted, timely, effective.