This is the 2nd report of the Crypto Core FPGA project — (more than) 2 weeks late. Sorry :)
Here is the first part in case you missed it: https://medium.com/@punpck/iota-crypto-core-fpga-1st-progress-report-caebe1dac579
In the last report I included an evaluation of the Troika hashing algorithm and came to the conclusion that it’s not running very well on binary CPU architectures — but I also wrote that there certainly is room for optimizations. Reference implementations have to be clear and understandable but not necessarily very efficient.
A “silent hero” (he didn’t reveal his identity) managed to optimize Troika significantly — he managed to get a gain of x11.63 on my Cortex M1 which is very impressive. You can find his Troika optimizations here: https://github.com/c-mnd/troika
Here are some new numbers:
Interesting to note is that a lot of type conversions (Byte to Trytes and Trytes to Bytes) wouldn’t be needed anymore if Kerl would switch to Troika. There still is a factor of 5 between SHA3 and the optimized Troika but without type conversions Troika really could be used without much performance penalty!
I tried to replace SHA3 with Troika (done sloppily) but there is one smaller problem I stumbled upon on. The Troika implementation doesn’t support “streaming” yet. Here are my observations: https://github.com/Troikahash/reference/issues/2
Because I already am short of time I didn’t continue to pursue this topic. Perhaps someone wants to try it?
Crypto FPGA Core Optimizations
Some things were improved on the FPGA core — for instance a x4 speed-up could be achived in Bytes to Trytes conversion by replacing division to fixed-point multiplication. Instead of dividing by 27 it can be multiplied by 1/27. Tests showed that 43Bit (fractional) are needed to calculate the correct result. Since multiplications don’t calculate reminders of divisions a second multiplier was needed followed by a substraction. This sounds complicated but it’s worth it because the division needed about 42 clock-cycles but two multiplications only need 4 (including subtraction). The FPGA has dedicated DSP (digital signal processing) blocks which can be used for such calculations. For this reason resource usage didn’t increase much.
I have to admit, I got distracted a bit from the project but I think it was worth it because Troika could become the default hashing algorithm in IOTA.
I tried to squeeze the Troika FPGA core into a $5 small FPGA (the one, I used on the PoWChip prototype; the lower black chip).
It took more time than expected and it isn’t very fast (10k blocks (including SPI transfer times) in 11.6s; 1k blocks with 27 hash-loops (including SPI transfer times and auto-padding) in 2.6s) because resources were too little to calculate a complete hashing-round within one clock-cycle. So it needs 55 clock cycles and additionally SPI data transfer times add.
But I integrated some features which could partly compensate the lower speed:
- The FPGA core supports auto-padding for 243Trit input vectors (almost everything with addresses and signing works on 243Trit). The core automatically can add an additional block with padding like it is done in the reference implementation of Troika
- The core can do multiple hashing-rounds (also with auto-padding). For instance a private key has to be hashed several times in a loop for address generation. The same with signing a transaction. This function allows to hash input data e.g. 27 times in a loop without having to transfer new data via SPI to the core.
- The core can do nested hashing. There is only one hashing-core but it can happen that multiple hashs with different states have to be hashed nested. There is a kind of stack implemented on which the state can be pushed and popped.
I released everything (FPGA-Core, PCB-design, software for the STM32 µC) here: https://github.com/shufps/troika_ice40
A secure element was successfully attached to the FPGA. An ATECC608A was used which is very cheap but secure. It’s a quite new chip and the full datasheet is only available with NDA but there is an open-source library from MicroChip which could be used.
I also built a small PCB which can be plugged on top of the Arty S7 board which not only conatins a secure element but also a 8MBit flash storage, a W5500 ethernet controller and a SWD connector for debugging the Cortex M1. I did this in hope the results from my first milestone could be used by others more easily. And I don’t like jumper-wires — I don’t like if something looks like a mess ;-)
License Issue of Cortex M1
This is an interesting topic … I havn’t made my repositories public yet because first I wanted to figure out if the Cortex M1 IP may be redistributed in github repositories. It was no surprise that it’s prohibited. That’s not a (very) big deal because everyone can download the core for free by registering on the ARM website. But I’m not allowed to include it in the repository which is sad. Moreover the package from ARM also contains an example project for the Arty S7 board — but it’s also not allowed to included it in the repo. This also includes derivates — and my project obviously is derivated from the example project.
So the next step is to build a new project from scratch without using the original example project.
The next 2 weeks will be a bit tight …
- Software for the secure element has to be developed.
- I have to start with the documentation of the project (installing, Debugging, Synthesizing the core, …)
- I have to build the Cortex M1 processor system from scratch (see above)
- When my PCBs arrive (the HAT for the Arty S7) I’ll have to assemble it and order parts (I really should order parts before arrival^^)
So this was the 2nd progress report. A lot was done and still there is much to do :) I hope to be able to complete the first milestone according to my time-schedule :)
Thank you for reading!