Beyond OpenCL: More Concurrent than Parallel!

(Rapid Prototyping of Large Scale Hardware Systems Using Tooling)

“Computer Science is not just about coding to get a job done,
 it’s also about broader thinking skills like computational thinking
 and abstraction and modelling and design.” — Simon Peyton Jones

This post is aimed at readers who are already familiar with hardware-level design concepts, “prototyping” and the Communicating Sequential Processes (CSP) programming model, which is also the basis for the Go language. For an introduction to concurrency, we’d recommend watching Concurrency is not Parallelism by Rob Pike and also reading Hoare’s original paper on CSP, particularly the examples [1].

As it stands, the available resources on FPGAs are increasing faster than our ability to design for them using traditional methods. To tackle this, modern scalable solutions have to emerge to enable efficient exploitation of the resources. On the other hand, complex systems call for better controllability over various design parameters (incl. data and control widths, on-/off- chip memory blocks, topology etc.). Rapid prototyping of such systems will not only address the design productivity gap but also allow us to get the desired performance out of them by exploring the design space.

FPGA Prototyping

Fast FPGA prototyping emerged in the late 90s and early 2000s to accelerate and validate the design process. High-level synthesis tools, aka C-to-gates, have also emerged to facilitate this approach by abstracting low-level signalling, protocols, drivers etc. and enabling seamless hardware device targeting (CPU/GPU/FPGA).

High-level synthesis technologies mostly take imperative languages, such as C/C++ or Python, as their input, and extend the control-flow toward parallelism using ‘pragmas’. This requires the designer to rethink the implementation and use design-for-prototyping (DFP) techniques, which usually require a great deal of effort to rework the original design.

Also, to create programs that are composed of concurrent entities, a real concurrent language is required rather than running several processes in parallel with a lot of synchronisation between them. The latter may not be ideal for modelling co-hardware/software systems as they are usually critically timed, and using a machine suitable for running parallel threads may cause an unrealistic behaviour.’s Solution

At, our solution exploits a non-deterministic, concurrent (thus hardware friendly), CSP-based programming model to allow fast prototyping of co hardware/software systems. Our synthesis framework takes a high-level description in Go, generates logic kernels correspondingly, and seamlessly couples them with the available memory blocks and communication channels available in the environment, incl. AXI, PCIe, NVMe, NVLink, etc. [2].

We use Go because its concurrent nature enables parallelism. It has features that are perfect for modelling concurrent systems, such as channels, goroutines and selects. Its fine-grained light-weight goroutines allow users to decompose a complicated job into several light-weight tasks and spawn a separate routine (goroutine) to work together and get the job done! From my perspective as a Go programmer, I find goroutines facilitate decomposing large tasks into several smaller tasks. is different to C-to-gates frameworks with high parallelisation capabilities, such as OpenCL, in several ways:

  • Based on concurrent rather than sequential thinking (hardware friendly)
  • Exploits concurrency in semantics of the language (Go channels and goroutines)
  • Enables seamless partitioning of processes (enables multi-FPGA)
  • Enables asynchronous and elastic communication (enables multiple clocking [3])

Conclusion is an easy to use, cost effective way for researchers to prototype complicated large-scale hardware systems, deploy them onto FPGAs in the AWS cloud and get performance, power and area results in just a few minutes. We also have a friendly community forum where our engineers provide feedback and optimization help with software-level designs.

Read more at:

[1]. C.A.R. Hoare, 1978. Communicating sequential processes. In The origin of concurrent programming (pp. 413–443). Springer New York.

[2]. M. Jelodari Mamaghani, R. Taylor, “The Synthesis Path for Transforming ‘Go’ Programs into Hardware Deployable on FPGA-based Cloud Infrastructures”, US Patent app №62412376, 2016

[3]. M. Jelodari Mamaghani, J. Garside, “High-level Synthesis of GALS Systems”, In Proc. of PAnDA Workshop on Designing with Uncertainty — Opportunities & Challenges, York, UK. March 2014