Drawing 2.7 billion Points in 10s

Makie
HackerNoon.com
Published in
6 min readOct 31, 2017

--

When I was reading this blog post about visualizing 2.7 billion points in a minute on a macbook, I got curious how fast this would be in Julia.

Not only for fun, but also to integrate it into my plotting library (MakiE). Since I’ve been very happy at how quickly I was able to create a very fast solution, I decided to share my experience!

To get started, one can download the data from http://planet.osm.org/gps/

Parsing the CSV

I'm assuming that one wants to do lots of operations on the data, so the most sensible thing to do is to convert the 55gb CSV to a 22gb binary blob, which can get memory mapped and doesn't need parsing!

There wasn't any tool around that could do this out of the box, but with the help of TextParser.jl I was able to whip up a fairly fast custom solution - which also demonstrates how nicely one can extend existing Julia libraries.

The first problem is, that TextParser expects a string to parse the CSV. Since I can't just load the whole 55gb dataset into RAM, our first trick is to create a memory mapped string type in Julia:

This is all one needs to satisfy the basic interface of most libraries expecting a string like type. Now we can write a function to read the 55gb dataset, parse it line by line and write it into a binary blob.

One of the problems that occurred was, that the function to write to an IO file stream was allocating quite a bit. Base.write creates a mutable container (Ref) for its argument to safely pass it to C. Even Base.unsafe_write that directly takes a pointer still has a check that allocates 128 bytes.
Be ready for unsafe_unsafe_write, short uuwrite. This saves us 40 seconds and allocates 3.5 gb less memory (which ends up as ~1.2x faster)!
This is actually a great example of how Julia currently fails, and how one can still claw back the performance.
We just call the C library directly and use the almost deprecated syntax (`&`) to take a pointer from the stack allocated argument.
in Julia 0.7 this is likely no issue and we should be able to just use `write`, since the compiler got a lot better at stack allocating and eliminating mutable containers.

By the way I was figuring this out with @time write(io, 1f0) which shows the allocations, and then using @which write(io, 1f0) (or alternatively @edit to jump directly into the editor) to figure out where the function is defined in Julia and where the allocations come from.

This rewrites the file and saves it as a binary in 190 seconds, which isn't too bad for such a simple implementation and an operation that only needs to be done one time.

Drawing the Image

Now we can memory map the file as a Vector{NTuple{2, Float32}} which holds the raw gps coordinates. We can then directly loop over the points and accumulate each position in a 2D array.

We can use the above function in the following way to get a simple gray scale image:

Et voila!

This took 10 seconds to draw 22gb of points on a normal desktop computer.

One thing to point out is the great dot call syntax you can see in the example:

This is actually a wonderful mechanism to apply a function per element on an array while fusing all operations. So the above will turn into a single for loop
over img, clamping, inverting and converting it to a color in one go!

We can also use my new interactive plotting library to create a nice animation of the progress:

There are of course various improvements we could apply, but I’m very happy with this as a first result. It is also not clear how comparable this solution is to the initially mentioned blog post, since I haven’t run that on my computer yet. If you want to try this on your machine, check out the full code!

--

--

Makie
HackerNoon.com

Makie is a data visualization ecosystem for the Julia programming language, with high performance and extensibility