Sarah Osbourne
2 min readMar 30, 2019

--

Everybody knows ALUs are relatively tiny circuits. That is why GPUs have thousands of them sharing more complex and costly resources. What you have done is focus on a tiny part of ProgPoW and pretend as if that’s all there is to it. The math function is dominated by the multiplier at 20,000 gates. The merge function is not as small as you claim because of the rotate operations. (From a power consumption point-of-view, the math function includes 4 expensive, 3 moderate, and 3 cheap operations and the merge function either implements 2 or 4 expensive operations depending on the implementation. So what?) You proceed to discuss the size and power consumption of these small circuits and falsely extrapolate everything else from this.

The first major thing you omitted in your attack on the insignificant ALU is the size of the register file relative to the ALU. A 12 kiB register file will require almost 400,000 gates just for storing the bits. This completely dwarfs the ‘large’ multiplier at only 20,000 gates.

The most important thing that you have conveniently omitted are the 12 completely random cache operations per loop. Note that there are only 20 math operations per loop, so the cache operations are extremely significant. The cost of integrating a 16 kiB SRAM would be at least 500,000 gates just for storing the bits. You would need a 12-port SRAM just to service these requests for one pipelined processing element (which would destroy your simple register file idea), and that is completely ignoring address conflicts and many more problems. If you want anything even remotely resembling the heavily banked SRAMs in GPUs with all their advanced request conflict resolution and so on, then it’s going to take a lot of engineering effort. You might as well become a GPU manufacturer at this point.

Stop spreading FUD.

--

--