Streaming with RaftLib
So I got bored and built a streaming run-time for C++. Well, not quite, I built one so I could do my thesis on mathematical modeling of streaming systems. Then again, that’s not entirely true either. I hate the state of parallel programming as it sits today. There are dozens of frameworks: OpenMP, Pthreads, Java threads, various Python constructs, C++ threads, OpenCL, CUDA, etc. I can keep going on and on, probably filling up an entire blog post with nothing but parallel processing frameworks. The bottom line though, none of the solutions satisfy my want for ease of use, so I built something slightly different (well, at least when I started this project in 2014).
RaftLib is a simple library (mostly templates) for C++ stream processing (for a quick tutorial on what stream processing is, check out this link). RaftLib enables the user to FOCUS ON THE APPLICATION, without worrying about details that can be automated like buffer allocation (type, size, location, alignment), asynchronous FIFO management, compute placement (where to run each thread, or fiber/threadlet), and scheduling. The name RaftLib is a play on words associated with “stream.” It’s a long standing tradition, which I believe started with the Brook streaming processing language.
Compute kernels are linked together via C++ stream operators like:
map += rng >> print;
which adds a stream from an output port on rng to the input port of the print kernel. When specified like this, each kernel is assumed to have a single input and output port. The add, assign operator takes this pair and places it into the streaming map to be executed.
The current alpha version only supports linking two kernels at a time, but that’ll soon be fixed (probably by the time you read this) so that n kernels may be linked in series. Each kernel is executed independently as data is available. The really cool thing, is that there is no explicit threading, locking, or even memory management to worry about from a users perspective. You get threads, largely without the headache (and boilerplate code) that comes with almost all current threading libraries. Here’s a longer (although still slightly contrived) example:
This example defines a Sum kernel between lines 14 and 40(yes, I’ve just used 26 lines to add some numbers, but it’s easy to understand this way). The kernel defines input ports “input_a” and “input_b.” It also defines an output port “sum.” Each port is typed, with the template typename T, streams connecting ports must have a matching type. The application simply sums a stream of random numbers from two gen kernels (a & b), then prints them. In 75 commented lines of code we’ve built a fully parallel application executing on 4 independent threads of execution that can run on any modern mult-core machine.
There’s a lot I still haven’t shown, perhaps enough to fill a book (in fact my thesis). The most important of which is the optimization of the data-movement between compute kernels (moving data is more expensive by far than any compute job within a modern processor). There are also bits I’ve left out of this simple example. The sum kernel itself could be more efficient, there’s a function within the raftmath header which determines the number of elements available on each input port and then vectorizes the adds when possible. I also made some notes within the code, specifically the
at line 73 probably caught your attention. It’s going away, at least the need for it. There is a large enough user-base, even in Alpha form, that we’re reticent to remove any API level interfaces. What it is being replaced with, is an execute on scope exit. The implicit execute on exit is supplemented with a barrier on specific kernels. This enables instances where sequential code, say from the main function, can wait until the kernel interacting with shared memory is complete before reading it. Here’s a brief example of how the upcoming barrier function will work:
I’m super excited about RaftLib, and its future, it has been quite fun hacking on it. Judging by the number of support requests I get, and the number of issues opened on GitHub, I think it’s actually gaining a user base. If you’re interested in using the framework, visit the project page (http://raftlib.io). If you’re wanting to contribute, feel free. The project is under the Apache 2.0 License which is fairly lenient in its restrictions.
Thanks for reading! In my next post I’ll cover the auto-parallelization features of RaftLib.