At Reconfigure.io we’re writing Go for FPGAs, providing hardware acceleration to developers working across a wide variety of sectors, including AI, finance, security and healthcare. We’re dealing with fine-grained parallelism, with multiple — potentially thousands — of independently executing processes happening concurrently. Up to now we’ve been using ARM’s AXI protocol as a means to give our FPGAs memory access. But, as AXI was designed for CPU-level parallelism — a few cores accessing memory at the same time — it has not been straightforward.
A Protocol for Parallelism
Usable, streamlined and scalable memory access is vital to our business, so, our engineers have developed a new protocol — SMI (Scalable Multiprotocol Infrastructure) — which provides a fit-for-purpose interface capable of providing the level of concurrency we require. SMI gives our users an easy way to have 64+ independent processes accessing memory concurrently, while simplifying code and reducing boilerplate. An initial release of the SMI protocol is available for testing from Reconfigure.io v0.17.0 onwards, and will be fully rolled out as our standard method very soon.
How is it different?
SMI ports all consist of a request and response channel pair, and can be used for reading from, or writing to memory. Additional read/write ports can easily be added to your project without the need for manual arbitration. We also require some simple settings to be specified within a .yml file, per project (simply selecting to use either SMI or AXI and setting how many ports you require).
As an example, using SMI, this is how we set up channels to have two memory access ports:
smiPortAReq chan<- smi.Flit64,
smiPortAResp <-chan smi.Flit64,
smiPortBReq chan<- smi.Flit64,
smiPortBResp <-chan smi.Flit64
SMI ports can be used to either read or write from memory. We can name the ports for our own reference, so we know which is being used for which function, like this:
readReq chan<- smi.Flit64,
readResp <-chan smi.Flit64,
writeReq chan<- smi.Flit64,
writeResp <-chan smi.Flit64
And if we wanted to add another read port — ``readB`` — in order to have two concurrent memory reads, we could just make the following change along with upping the number of ports in the project’s ``.yml`` file:
readAReq chan<- smi.Flit64,
readAResp <-chan smi.Flit64,
writeAReq chan<- smi.Flit64,
writeAResp <-chan smi.Flit64
readBReq chan<- smi.Flit64,
readBResp <-chan smi.Flit64,
Now, as a comparison, in our current way of working, to get one read and one write port using AXI, this is how we set up our channels:
memReadAddr chan<- axiprotocol.Addr,
memReadData <-chan axiprotocol.ReadData,
memWriteAddr chan<- axiprotocol.Addr,
memWriteData chan<- axiprotocol.WriteData,
memWriteResp <-chan axiprotocol.WriteResp
And if we wanted two goroutines to read from memory concurrently, we would need to manually use AXI arbitration to add another read port, as follows:
memReadAddr0 := make(chan axiprotocol.Addr)
memReadData0 := make(chan axiprotocol.ReadData)
memReadAddr1 := make(chan axiprotocol.Addr)
memReadData1 := make(chan axiprotocol.ReadData)
memReadAddr, memReadData, memReadAddr0, memReadData0,
In the SMI example above, all the arbitration for extra ports is handled automatically up to 64 ports. It is possible to expand this further, beyond 64 ports, using our SMI arbitration goroutines — more on that later.
This initial testing release of our SMI protocol, available as part of our package for accessing SDAccel with Go, provides our users with an overview of how the new infrastructure works, and an opportunity to experiment with the new infrastructure before the full release. In the coming weeks, once some final testing has been done, further bandwidth optimization will become available. FPGAs will have access to a 512 bit SMI interface, compared to the 64 bit AXI interface. AXI will still be used at the peripheries to talk to the shared memory DDR4 controllers, but SMI provides users with a streamlined interface as well as an aggregate increase in bandwidth across multiple goroutines used within a project.
Our challenge up to now, using just the AXI protocol, has been in scaling up designs to make full use of our available memory bandwidth. The introduction of our SMI protocol will make this possible, while simplifying our code design.