In case you missed the last report, click here
TL;DR: IOTA Crypto Core FPGA moved from Cortex ARM to RISC-V. RISC-V is much more open-source friendly (CPU-core MIT licensed), faster compared to Cortex M1 (e.g. hardware divider, I- and D-cache) and very flexible (extendable with plugins, custom instructions, …)
Following is all about the small red IOTA Crypto Core FPGA module and its internal soft-cpu and hardware-accelerators.
You know, officially the funded IOTA Crypto Core project ended on 3rd September but the development continues … The focus shifted from building various base modules to further developing the FPGA module for secure standalone applications. This means you could use this module as the main-microcontroller in standalone applications and use the hardware acceleration for algorithms used in IOTA — on a module with 30x26mm size.
There is one issue I wasn’t very happy about … Perhaps you remember what I wrote about the ARM-Licensing issues in report 2 and in the last report:
What I would change, though, would definitely be to get rid of the Cortex M1 soft-cpu (ARM’s licensing is sub-optimal) and replace it by a (really) free RISC V. ARM gives the IP for the Cortex M1 cost-free but no part of their IP may be included in opensource code-repositories. So it was quite an effort to work-around the license — e.g. writing instructions how to download from the ARM website and to patch source-files afterwards for the ICCFPGA-project.
In the beginning, I started with the Cortex M1 because I was familiar with ARM microcontrollers, the toolchain, the debugging and so on … Because time was short it was the logic decision to use ARM on the FPGA-module. Changing to RISC-V at the project start would have brought a lot of unknowns which could have led to problems.
What is RISC-V
Wikipedia says about RISC-V:
That means, the instruction set is open and free of patents. It doesn’t mean an actual implementation of a RISC-V microcontroller but gives specification which requirements an implementation has to fulfill to be RISC-V compatible.
There already are lots of RISC-Vs out there — most of them are soft-cpus for FPGAs (riscv.org lists about 70) such as the PicoRV32 or also the VexRiscV (which I used for the FPGA-module, but later more about).
One main feature of RISC-V is it’s extensibility. There are about 15 standard extensions, for instance
- M — Standard Extension for Integer Multiplication and Division
- A — Standard Extension for Atomic Instructions
- F — Standard Extension for Single-Precision Floating-Point
- D — Standard Extension for Double-Precision Floating-Point
- and so on (click here for full list)
It also allows to develop new extensions with custom instructions— for instance for implementing trinary SIMD operations like it would be used in a software-only implementation of Troika. Such instruction could do the following:
dst[1..0] = (src1[1..0] + src2[1..0]) % 3
dst[31..30] = (src1[31..30] + src2[31..30]) % 3
This for instance would be perfectly usable for paritity-calculations in Troika and it would only need one single instruction to add 16 Trits with modulo 3.
It’s just an example — the IOTA Crypto Core FPGA has hardware-troika acceleration but in addition to highly specialized hardware-accelerator cores algorithms in software could benefit from SIMD CPU-instructions working on packed* trinary data.
Also there are smaller FPGAs which don’t have enough resources for real hardware-acceleration but which are large enough for a RISC-V soft-cpu. Software running on the CPU could be faster and/or more efficient using trinary SIMD instructions.
*: packed means, that a machine-word (32Bit in this case) contains multiple similar data. For instance 2 Bit per Trit which gives 16 Trits in a 32bit word and trinary SIMD operations can perform calculations on each of the Trit in parallel whereas packed Trits are isolated from each other.
VexRiscV was the winner of the Soft-CPU contest 2018. It’s concept is incredible — the soft-cpu is generated with a series of Scala-scripts and can be modified by inserting or removing plugins into the scripts.
Almost everything is a plugin like hardware-multiplier, debug-interface, instruction- and data-busses (with or without cache) and even the program-counter is a plugin.
It supports different bus-systems for instance AXI4 (directly supported by Xilinx’s Vivado), Avalon (mostly seen on Altera/Intel systems), Wishbone ( mostly used by cores from opencores.org) and others.
Also, custom instructions which are scripted with Scala can be plugged into the CPU easily.
VexRiscV also supports simulations of the CPU with QEMU, so everything can be tested without using real hardware.
In my oppinion this is an incredible concept and perfectly reflects the flexibility of RISC-V!
It was also important that the migration of the project from Cortex M1 to RISC-V should not change the development environment much — and it did not! 😍
The Cortex-M1 system could be developed in Eclipse with GnuARM-plugin and OpenOCD + GDB for debugging.
The RISC-V system can be developed in Eclipse with GnuARM-plugin (yes — the developers of the GnuARM plugin already added support for RISC-V!) and OpenOCD + GDB for debugging.
Of course, you would need another GCC variant for RISC-V and another OpenOCD-version, but nothing else has changed in the toolchain and the workflow!
How does it perform?
I did a couple of performance comparisons*. The time of each task was measured with 10,000 loops.
In each task the RISC-V was faster. The performance of Bytes to Trits conversion really stands out — the reason is that the RISC-V has a fast hardware-divider whereas the Cortex M1 has to emulate divisions in software. Also, the RISC-V has an instruction- and data-cache which give it an advantage.
*: please note the logarithmic x-axis
Migration from ARM’s Cortex M1 to RISC-V — especially to VexRiscV — was imho a very good decision. The RISC-V is faster, is MIT-licensed (may be used commercially and be included in open-source repositories), is flexible (can be extended with custom instructions) and is platform independent (Cortex M1 only was cost-free for Xilinx FPGAs).
RISC-V based on VexRiscV can be scaled down to a $5 FPGA which supports custom trinary SIMD instructions — what wouldn’t be possible with the Cortex M1.
Thank you for reading so much text!