IOTA Crypto Core FPGA — 3rd Progress Report
This is my 3rd progress report. In case you missed the 2nd: link
The second milestone has been finished!
With successful completion of Phase 2 (the FPGA core; green box) I updated the image to the current planning.
- A slower STM32F429 microcontroller for Phase 4 will be replaced by a Cortex A5 which can run natively Linux. It’s faster (500MHz instead of 168MHz) and it will have more memory (128MB instead of 32MB). It also has very nice security features like ARM TrustZone and it can boot signed Linux Kernels. It also has a MMU, what means, it runs full Linux with dynamic memory management (which makes it much easier!).
- Troika was originally not planned. Short after Troika was announced a hashing core was built and integrated in the FPGA. The core is pretty fast — it needs a single clock cycle per hashing round and therefore is as fast as Curl-P81 and Keccak384. That’s very nice because (optimized) software implementations on CPUs are x10 slower than Keccak384/SHA3. Additionally, a smaller spin-off also was developed in hope small microcontrollers could make use of it.
- Also not planned from the beginning was the development of a HAT (hardware attached on top) for the Arty S7 FPGA board. It is equipped with a connector for a SWD-Debugger, a Secure-Element, SPI-Flash and a W5500 Ethernet controller. The ETH-controller was nice to play with. It proved the FPGA could run stand-alone — doing everything alone except getting Tips from the Tangle — at a speed of 1TPS (including PoW). Unfortunately, the gTTA (getTransactionsToApprove) API call to IRI still is very slow but recently there have been already major improvements on IRI and certainly gTTA also will be improved short term.
- The interface to the FPGA has not been decided in the proposal. Currently it uses a virtual COM-port interface with binding for a fork of the iota.go library for generating seeds and addresses, signing and PoW— in hope the results of the second milestone can be used by others more easily. This probably will change to something faster. One candidate is SPI with a binary protocol. Transmitting and receiving data to and from the FPGA would be more than 100 times faster.
There are good and bad news.
The good news: The concept of embedding ROM-code of the Cortex M1 in an encrypted Bitstream works very well. The code can’t be changed and keys (e.g. for decrypting transmits from the secure element) which are embedded in the ROM-code are protected by Bitstream encryption but the FPGA core still can be updated very easily by simply providing a new Bitstream. The FPGA only accepts Bitstreams which are encrypted with the correct AES key — of course only if encryption is enabled which would have to be done when using the FPGA in an unsecured environment.
The bad news: The FPGA provides an API which can be used for signing transactions. A seed- and key-index together with the BundleHash is sent to the FPGA and depending on the security-level one to three signature fragments are returned. An additional hash which is built from the parameters and an secret API key can be verified by the FPGA to make sure data hasn’t been altered on the way to the Crypto Core. Unfortunately this doesn’t make a private key completely safe because it could be brute-forced by sending different BundleHashes to the Core. This is a known problem of Winternitz OTS (one-time-signatures) which expose parts of the private key for a given key-index with every signature. An attacker could send multiple different BundleHashs to the FPGA which weakens the private key security significantly and it would be possible to calculate the key from signatures — the seed itself is pretty safe though. The consequence is that the system using the FPGA for signing transactions has to be as secure as the FPGA itself. This was one reason for switching to a more secure application controller for Phase 4 because RDP (read-out-protection) of STM32 microcontrollers are (partly) hacked.
Some tests have been done with the Cortex M1 completely self building bundles (signed value bundles with 3TX). For the next numbers PoW wasn’t included — that would have been unfair.
Although the Cortex M1 only is running at 100MHz it is with hardware acceleration a little bit faster than a Raspberry Pi.
Documentation and Repository
Everything was released and documented here: https://gitlab.com/iccfpga/iccfpga-core/wikis/home
The code repositories are grouped together here: https://gitlab.com/iccfpga
Fazit of Second milestone
It worked out very well to implement all planned features and concepts. There were no real show-stoppers. The secure element (ATECC608A) caused some headache because the full datasheet wasn’t available (it requires NDA) and it was not obvious how to configure the config zone. But the full datasheet of the predecessor was available and there was a migration guide. Combined with a library provided by Microchip it could be figured out how to configure it to securely store 8 seeds.
The Cortex M1 only runs at 100MHz which is very slow compared to e.g. a Raspberry Pi with >1GHz but hardware acceleration of type conversions, hashing — and of course Proof-of-Work — gave it an advantage. Not considering PoW an overall speed-up of about x30 could be archived.
The next milestone will all be about hardware development. Since 90% of the FPGA resources are used, the FPGA module will use the same FPGA as the Arty S7 (50kLE). Depending from the application — for instance if using a PoW-service like powsrv.io — the FPGA core could be shrinked and a cheaper FPGA could be used on the same board later.
Thank you — once again — for reading :)