VeriBlock vProgPoW GPU Mining and Economically Gated ASIC Transition Plans

By Maxwell Sanchez

Summary

  • The VeriBlock network will hard-fork to adopt vProgPoW, at block height 1,512,000 (which is expected to occur on Monday, 9/21/2020)
  • vProgPoW will require ~5GB DAG initially, with growth of 1GB per year
  • vProgPoW is a heavily customized version of ProgPoW v0.9.3
  • vProgPoW intentionally leaves in (and tweaks) the ASIC-friendly exploit found in ProgPoW 0.9.3 as an economically gated ASIC transition plan for the network

Introduction

For those unfamiliar with VeriBlock and Proof-of-Proof: The Proof-of-Proof (“PoP”) consensus protocol allows any blockchain to inherit the full Proof-of-Work security of Bitcoin in an entirely Decentralized, Trustless, Transparent, and Permissionless (“DTTP”) manner. PoP introduces a new form of mining wherein PoP miners compete for rewards by performing “security transactions” where they publish fingerprints of a PoP-secured blockchain to another blockchain to inherit its security. The VeriBlock blockchain uses PoP to secure itself to Bitcoin, and acts as a security aggregation layer, allowing other blockchains to inherit Bitcoin’s security in the most easy, cost-effective, and secure manner possible. You can learn more at our community website, on our Discord, or by reading our whitepaper. You can also follow us on Twitter and LinkedIn for project updates.

The natural evolution of Proof-of-Work is towards ASIC devices; even a Proof-of-Work algorithm optimized for current-generation commercially available hardware will eventually drift towards ASICs as consumer hardware evolves away from the design originally targeted by the “consumer-oriented” Proof-of-Work algorithm. Furthermore, the development of this ASIC will be incredibly complex, resulting in the centralization of ASIC production to a small number of well-funded teams.

Because of this, VeriBlock is adopting a customized version of ProgPoW (vProgPoW) where GPUs are the only practical mining devices at relatively low total mining power, but where very simple ASICs (with simple optimal designs any ASIC engineering firm could competitively build) eclipse GPUs as the total network mining power continues to grow.

This graceful, economically-powered transition is made possible by carefully modifying ProgPoW and an “exploit” in ProgPoW (≤0.9.3) found by GitHub user @kik which would allow ASICs to bypass the memory-hard (ASIC-resistant) part of ProgPoW in an economically viable way at high network hashrates.

This blog post explores the viability of the original exploit, and how this exploit and the underlying ProgPoW algorithm can be appropriately tuned to optimize towards a particular planned transition from GPUs to ASICs based on economic viability.

ASIC Philosophy

The development and adoption of ASICs in an inevitable part of the evolution of any blockchain Proof-of-Work algorithm.

ASIC-resistant algorithms attempt to make an ideal ASIC as close to commercially available computational hardware (CPUs or GPUs) as possible.

While an algorithm can be designed which (within reason) maximizes use of a particular consumer CPU or GPU, the eventual drift of consumer CPUs or GPUs (and general-purpose computing hardware in general) towards different balances between memory size, memory bandwidth, general compute (INT (integer) vs FP (floating point) ratios and die space, FP32:FP64 performance, etc.), specialized hardware (ex: ray-tracing, tensor cores), specialized instructions (AES-NI, etc.) and other components (graphics pipeline, etc.) will cause the algorithm to have lower overall hardware utilization over time, thus making ASICs capable of offering ever-increasing benefits in die size, power consumption, or both over commercially available CPUs or GPUs. A perfect example of this is Nvidia’s latest consumer Ampere cards, which doubled the number of FP32 units in each Streaming Multiprocessor (SM), without increasing INT32 performance [5,6].

Additionally, designing an algorithm which truly takes advantage of all of the circuitry of even a single particular GPU (architecture) is nearly impossible (particularly in a way which doesn’t couple the algorithm to an exact architecture), and makes the algorithm much more brittle against future architectures.

Finally, algorithms which do take significant advantage of a GPU’s circuitry make the upfront engineering R&D costs for an ASIC extremely high, meaning that when an ASIC of an “ASIC-resistant” algorithm becomes economically viable, its highly likely that only a single ASIC manufacturer will crack the case and produce ASICs, leading to extreme centralization of hardware manufacturing.

As a result, the ASIC-resistant algorithm will eventually succumb to the worst-possible version of ASIC mining: manufactured and controlled by a single or a small group of companies, and requiring large dies (further centralizing production efficiencies to foundries with better yield and [along with requiring expensive memory chips] setting a high baseline price which could prevent hobbyist miners from being able to buy a device).

VeriBlock’s vProgPoW will allow the blockchain to get the best of both worlds: GPU-only mining in the early days, and a gradual economics-powered cut-over to a very simple ASIC design that any ASIC engineering firm could build a competitive small-die chip for, and which could be packaged in small entry-level devices for improved mining decentralization and accessibility.

vProgPoW Specification

vProgPoW is a heavily modified version of ProgPoW v0.9.3. In summary, the following modifications were made:

1. Initial seed calculation modified from 1 to 14 keccak_f800 calls
Motivation: Tuning ProgPoW ASIC exploit

2. Initial seed reduced from 64 bits to 56 bits
Motivation: Tuning ProgPoW ASIC exploit

3. Effective seed size reduced from 64 bits to 40 bits
Motivation: Compatibility with VeriBlock nonce size and tuning the ProgPoW ASIC exploit

4. Modifying the Epoch length from 30000 to 8000
Motivation: Given VeriBlock’s 30 second blocktime, this epoch modification gives a DAG growth rate of ~1GB/year

5. Modifying the starting Epoch to 512 (for Mainnet)
Motivation: This epoch starts the initial DAG size for Mainnet at 5GB

6. Reordered merge function and math function
Motivation: Differentiating between ProgPoW and vProgPoW

7. Minor tweaks to Keccak_f800
Motivation: Resistance against any existing vanilla Keccak ASIC

vProgPoW adds thirteen extra rounds of keccak_f800 to the seed calculation in the ProgPoW search loop.

These extra rounds were implemented as calls to keccak_f800 with 32 bytes of zero padding on both sides (for simple compatibility with the existing keccak_f800 function already used for seed generation and final header calculation).

Specifically, here is the change made to the reference ProgPoW miner:

uint64_t seed = keccak_f800(header, nonce, digest);

// Additional Keccak_f800 rounds for seed
#pragma unroll 1
for (uint32_t i=0; i < 13; i++) {
seed = keccak_f800(digest, seed, digest);
}

vProgPoW reduces the effective seed size from 64 bits to 55 bits. The motivation for doing so is explained in detail in later sections on how the ProgPoW ASIC exploit functions and how we tuned it for a certain expected economic cross-over threshold (and to deal with the difference between VeriBlock’s nonce size and the expected nonce size for ProgPoW).

Specifically, the first 9 bits of the seed are set to 0. Here is the change made to the reference PoW miner:

seed = seed & 0x007FFFFFFFFFFFFF;

vProgPoW reduces the epoch length from 30000 to 8000. The epoch length is the number of blocks which elapse before DAG regeneration. Every subsequent epoch grows the DAG by 8 MB, so at a block time of 30 seconds an 8000-block epoch results in a growth rate of ~1.0265GB per year.

The epoch of a particular block height is increased by 323 (number to be modified slightly based on final announced hard-fork block height) epochs, such that the Mainnet vProgPoW mining begins with a starting DAG size of 5GB (at epoch 512).

In order to further differentiate the computations done by vProgPoW and vanilla ProgPoW v0.9.3, modifications to the merge() and math() functions were made that preserve the functions’ original compute intensity and cross-platform applicability.

Specifically, the merge function was modified to:

std::string ProgPow::math(std::string d,
std::string a,
std::string b,
uint32_t r) {
switch (r % 11)
{
case 0: return d + " = ROTL32(" + a + ", " + b + ");\n";
case 1: return d + " = " + a + " & " + b + ";\n";
case 2: return d + " = " + a + " + " + b + ";\n";
case 3: return d + " = popcount(" + a + ") +
popcount(" + b + ");\n";
case 4: return d + " = clz(" + a + ") + clz(" + b + ");\n";
case 5: return d + " = ROTR32(" + a + ", " + b + ");\n";
case 6: return d + " = mul_hi(" + a + ", " + b + ");\n";
case 7: return d + " = " + a + " | " + b + ";\n";
case 8: return d + " = " + a + " * " + b + ";\n";
case 9: return d + " = " + a + " ^ " + b + ";\n";
case 10: return d + " = min(" + a + ", " + b + ");\n";
}
return "#error\n";
}

And the math function was modified to:

std::string ProgPow::merge(std::string a,
std::string b,
uint32_t r) {
switch (r % 4)
{
case 0: return a + " = ROTR32(" + a + ", " +
std::to_string(((r >> 16) % 31) + 1) +
") ^ " + b + ";\n";
case 1: return a + " = ROTL32(" + a + ", " +
std::to_string(((r >> 16) % 31) + 1) +
") ^ " + b + ";\n"
case 2: return a + " = (" + a + " * 33) + " + b +
";\n";
case 3: return a + " = (" + a + " ^ " + b + ") * 33;\n";
}
return "#error\n";
}

In order to make the version of Keccak vProgPoW uses different enough that existing ASIC chips capable of computing Keccak_f800 would not be able to be reused for vProgPoW. Specifically, the Theta function has been modified (the indexes used were changed):

uint32_t t, bc[5];// Theta
bc[0] = st[0] ^ st[6] ^ st[9] ^ st[12] ^ st[17];
bc[1] = st[8] ^ st[11] ^ st[14] ^ st[19] ^ st[23];
bc[2] = st[2] ^ st[7] ^ st[10] ^ st[18] ^ st[22];
bc[3] = st[4] ^ st[5] ^ st[15] ^ st[20] ^ st[24];
bc[4] = st[1] ^ st[3] ^ st[13] ^ st[16] ^ st[21];

Additionally, we added two additional logic lines between the Rho Pi and Chi functions:

st[3] = st[3] ^ 0x79938B61;st[10] = st[10] ^ (st[19] & 0x000000FF | 
st[24] & 0x0000FF00 |
st[6] & 0x00FF0000 |
st[14] & 0xFF000000);

Original ProgPoW Exploit Summary

In early March of this year, a Github user by the name of `kik` highlighted an exploit that could be used against ProgPoW to break its ASIC resistance [1]. This exploit functions by bypassing the memory-hard part of ProgPoW (which makes implementation on an ASIC difficult).

Normally, ProgPoW operates in the following manner (only arguments relevant to the exploit are shown, things like zero-padding, block numbers, DAG, etc. are ignored for simplicity):

uint64 seed = keccak(header_hash, nonce)
uint256 intermediate_digest =
progpowLoop(seed)
uint256 final_hash = keccak(header_hash, seed, intermediate_digest)

The progpowLoop() function is the part of the algorithm which is ASIC-resistant. As a result, if we could bypass running progpowLoop(), then we could easily implement an ASIC which only computes Keccak hashes (Keccak, the winner for the SHA3 competition, is an extremely ASIC-friendly algorithm by design).

There are two design “flaws” of ProgPoW which, together, make bypassing of progpowLoop() possible, and turn the finding of valid ProgPoW solutions into a keccak search problem:

  1. The seed is only 64 bits

2. The final_hash calculation does not directly use the nonce

Specifically, performing the exploit is as simple as:

1. Generating a random 64-bit seed

2. Performing a single progpowLoop() calculation (could be done on a computer controlling the ASIC)

3. Grinding through keccak hashes with altered header_hashes until a final_hash that meets the difficulty target is found

4. Grinding through keccak hashes with altered nonces and the header_hash corresponding to the valid solution to recover the seed

Fixing either of the design “flaws” would patch the exploit, making it impossible to use brute-force to bypass the progpowLoop() for each hash attempt.

ProgPoW Exploit Practicality

In order to use this exploit, an attacker would need to be able to brute-force the recovery of the 64-bit seed. On average, it would take an attacker 2⁶³ attempts (searching half of all possible outputs) to recover the seed when the seed is recoverable. It’s important to note that we only have a 64-bit nonce space and a 64-bit output space, so exhaustively searching all 64-bit nonces would only recover (on average) 1-(1/e) seeds (which is roughly 63.212%). So 63.212% of the time recovering the seed would take 2⁶³ hashes, and the rest of the time 2⁶⁴ hashes would be done and the seed would not be recovered, meaning that the final_hash grind would have to be retried, and another attempt at seed recovery would occur (again with 63.212% recovery chance), and so on and so forth.

This ends up working out to approximately requiring 2^64.114 keccak hashes on average to recover a seed. Additionally since the failure to recover a seed requires regrinding to find a new valid final_hash, the average number of valid final_hashes that meet the requirement is approximately 1.58198 valid final_hashes per true valid ProgPoW solution.

So in summary, for a particular difficulty ‘d’ (number of average normal ProgPoW hashes needed to find a valid ProgPoW solution), using the ASIC exploit as it exists in ProgPoW v0.9.3 requires 2^64.114 + 1.58198d keccak hashes.

Based on existing literature on the ASIC implementations of SHA3 competition finalists, it is reasonable to assume that an optimal Keccak implementation would be approximately 2x faster than SHA256[2]. It’s important to note that Bitcoin actually uses SHA256D (double SHA256), but technologies like AsicBoost mean Bitcoin ASICs perform slightly less than two total SHA256 functions when grinding solutions (and midstate calculation on SHA256 ASICs levels the playing field with Keccak since the expander and compressor functions don’t have to be run twice for the first SHA256 calculation, despite the 80 byte input size).

As a result, we assume that an optimized Keccak ASIC would be able to achieve approximately a 3x higher hashrate than a Bitcoin ASIC at comparable power consumptions.

Image for post
Image for post
Modern Bitcoin “SHA256D” ASIC Power Consumption, Hashrate, and Lithography

Based on the above table of commercially available ASIC devices, it’s reasonable to assume that a cutting-edge 7/8nm Keccak ASIC would be capable of ~10–15W/TH, a 14/16nm Keccak ASIC would be capable of ~20–25W/TH, and a 28nm Keccak ASIC would be capable of ~45–60W/TH (28nm ASICs are likely before ASICBoost was implemented in chips, so a 20% reduction to W/TH was given for comparison’s sake).

Targeting a 14/16nm ASIC[4] (since the cost of a 14/16nm ASIC is likely the most practical for an initial run of ASICs), a 2000W Keccak ASIC would be expected to hit ~80–95 TH/s (we’ll use 87.5 TH/s for future calculations).

This means that computing 2^64.114 hashes would require the ASIC to run for ~228155 seconds (63.38 hours), meaning a valid PoW solution at the GPU:ASIC economic viability boundary (we’re ignoring the 1.58198d part since it’s a rounding error at the GPU:ASIC economic boundary as will be illustrated soon) would consume approximately 126.76 kWh on average.

The most efficient GPU commercially available today is the V100, which can compute around 52 MH/s of ProgPoW v0.9.3 at a 300W TDP. While the V100 isn’t a consumer-oriented card (nor viable for mining given its price point), it’s a decent example to extrapolate performance figures for next-gen consumer GPU hardware, since next-gen flagship consumer GPU hardware with GDDR6X is likely to achieve approximately the same performance with slightly higher power consumption (HBM2 consumes slightly less power).

Here are some performance and power usage (as reported by nvidia-smi) of popular GPUs running vProgPoW (performance extremely similar to ProgPoW v0.9.3) with all testing done at stock clocks [some variance due to different ProgPoW kernels from different periods, environmental factors, and the exact model of GPU should be expected], as well as an estimate for the next-generation Nvidia consumer GPUs):

Image for post
Image for post
Comparison of GPUs with ProgPoW v0.9.3 Hashrates and Power Consumption (Estimates for Next-Generation Hardware). RTX 3070/3080/3090 Information from [3].

Based on these assumptions, it is reasonable to assume that a next-generation consumer GPU would be able to achieve somewhere around 0.14 MH/W. Given the same budget of 126.76 kWh, the GPU would be able to perform a total of ~63.89 TH.

So, at a block difficulty of 63.89 TH and given our assumptions about ASIC and next-gen GPU performance per watt, the ASIC would break even with the GPU (in terms of electrical consumption cost) around that difficulty. Note that we are ignoring the work the ASIC would have to do in grinding through the final_hash selection because at this difficulty, it would be 1.58198*63.89=101.07 TH, which is incredibly small (2^46.522) compared to the work required to brute-force the seed recovery (2^64.114).

In order to reach a network difficulty of 63.89 TH, the network hashrate would have to be around 2.130 TH/s (at a 30-second block time), which would be accomplished by:

Image for post
Image for post
Number of GPUs Required to Reach 2.130 TH/s Network Difficulty with ProgPoW v0.9.3

Running numbers based off of the current mining ecosystem, a 2080 Ti currently has an opportunity cost of ~$2.15/day (based off of data from NiceHash’s profitability calculator).

So a blockchain which pays out 64000*2.15=$137,600 a day in mining rewards (based on 2080 Ti numbers, the numbers would be slightly higher using opportunity costs for some other cards) would be an economically viable target for an ASIC exploiting ProgPoW v0.9.3. Were they to use ProgPoW v0.9.3 with the exploit unpatched (as a thought exercise), blockchains like ZCash, Ethereum Classic, and Ravencoin would have a large enough economic incentive for the exploit to be used. Of course, fluctuations in the opportunity cost of mining as well as increases in hashrate/W for both future GPUs and potential ASIC designs would influence these calculations.

In summary, the ASIC exploit in ProgPoW v0.9.3 is economically viable at hashrates seen on popular blockchains.

vProgPoW ASIC Transition Viability Tuning

We made several changes to the “exploit” in order to tune the protocol to alter the ASIC viability threshold.

First, it must be noted that VeriBlock currently uses a 32-bit nonce (just like Bitcoin), which changes the ProgPoW exploit math significantly.

Normally, seed recovery is brute forcing a 2⁶⁴ sized search space with 2⁶⁴ unique attempts, meaning 63.212% (on average) of the resulting search space will be covered (and as explained earlier, if a particular seed cannot be recovered, a new final_hash has to be generated).

However, because of VeriBlock’s 32-bit nonce space, seed recovery requires brute forcing a 2⁶⁴-sized search space with only 2³² attempts. As a result, each final_hash calculation that meets the network difficulty only has a 1 in 2³² chance of seed recovery. So on average, 2³² final_hash outputs which meet the network difficulty need to be performed (so if the network difficulty is d, d*2³² Keccak hashes have to be computed in final_hash calculations, and 2⁶⁴ Keccak hashes have to be computed in seed recovery calculations).

Unlike the ProgPoW v0.9.3 exploit with a 64-bit nonce search space, performing the exploit with a 32-bit nonce makes the network difficulty a non-negligible part of the calculation:

Image for post
Image for post
(Average) Number of Keccak Hashes Required For ASIC to find ProgPoW Solution at Different Block Difficulties Assuming a 32-bit Nonce

As can be seen from the table above, at a difficulty of 4295 MH, the amount of work required in final_hash calculations matches that of the seed recovery (and then quickly begins to dominate the majority of work required for finding valid ProgPoW solutions on an ASIC).

If we compare GPU to ASIC power consumption to generate a valid solution, we can see that the ASIC will never surpass the GPU in power efficiency:

Image for post
Image for post
ASIC versus GPU Power Consumption for Solutions at Different Block Difficulties Assuming a 32-bit Nonce

So, we need to further tweak the “exploit” to make it applicable to VeriBlock. To do so, we need to tweak the seed recovery and ProgPoW hashing function difficulty to make it practical. All ASIC power/performance calculations are based on a 14/16nm Keccak ASIC assumed to produce 87.5 TH/s at 2000W.

There are four different parameters that can be modified to do so:

  1. The seed size can be artificially decreased from 64 bits to a lower number (making the chance of a successful recovery double with each bit removed)
  2. The number of keccak_f800 hashes used for seed generation can be increased from 1 to another number (increasing the cost of each run of the seed recovery algorithm so that final_hash calculation doesn’t dominate the ASIC’s work until a higher difficulty)
  3. The amount of work done by the ProgPoW loop on GPUs can be changed (making GPUs capable of performing fewer hashes per kWh)
  4. The number of bits in the VeriBlock nonce can be increased (making the chance of a successful recovery double with each bit added).

The block time could also be altered (faster block times mean the network has to have more GPUs at a particular difficulty target), but VeriBlock’s 30-second block time was selected for very specific security reasons, and we will not be changing it with the introduction of vProgPoW.

Choosing these parameters is a careful balancing act:

Increasing the number of bits in the VeriBlock nonce requires that the bits be taken from somewhere else in the header or the PoP miner payout information, since the total size of the header plus VeriBlock PoP miner payout information needs to stay at 80 bytes (so that it can fit in a single Bitcoin OP_RETURN rather than requiring larger transactions [which themselves require higher Bitcoin fees] to include more than 80 bytes of VeriBlock proof data).

Changing the total amount of work done in a single GPU hash increases the validation time for nodes validating VeriBlock block headers. Increasing the number of Keccak hashes used during seed generation too much shifts ProgPoW from a memory-bound algorithm to a compute-bound algorithm on GPUs.

Changing the difficulty of a single GPU hash (by modifying the PROGPOW_CNT_DAG parameter) causes the GPU:ASIC advantage ratio (when it drops below 1, the ASIC has the advantage) curve versus difficulty to be divided (halving the flagship GPU hashrate divides the curve values by 2):

Image for post
Image for post

Additionally, increasing the difficulty of a single GPU hash has the added effect of requiring more GPUs on the network for a particular difficulty threshold: doubling the difficulty of a single GPU hash doubles the number of GPUs needed on the network to sustain a particular network block difficulty.

Changing the number of Keccak rounds for seed calculation has a multiplicative effect on the GPU:ASIC advantage ratio initially, but has no effect on the final GPU:ASIC advantage at very high difficulties:

Image for post
Image for post

The following charts demonstrate the effect of increasing the Keccak hashes per seed calculation on the performance (compute-versus-memory-utilization and hashrate) on modern GPUs (run with PROGPOW_CNT_DAG=128, which doubles the memory reads required per ProgPoW hash and which results in approximately half the hashrate of standard ProgPoW v0.9.3):

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Based on the above, increasing the seed size beyond ~15–20 Keccak hashes per seed round begins to have an appreciable (10%+) effect on hashrate on Volta-architecture GPUs (Titan V and Tesla V100), a moderate effect (4%+) on Pascal-architecture cards (1080 Ti), and a negligible effect on Turing-architecture GPUs (2080 Ti, 2060 Super).

Increasing the nonce size makes the chance of seed recovery higher, so it lowers the GPU:ASIC advantage ratio. At high network difficulties, increasing the seed size by 1 cuts the GPU:ASIC advantage in half. At very low difficulties it has a negligible effect, and at medium difficulties it has a moderate effect:

Image for post
Image for post

Increasing the seed size by one bit doubles the GPU:ASIC advantage ratio at all difficulties:

Image for post
Image for post

In order for ASICs to be viable, their power consumption savings over GPUs have to be significant. Additionally, these savings must remain with future developments of higher-bandwidth memory for GPUs, and we are working with rough performance estimations of exactly how fast an efficient a 14/16nm Keccak ASIC would actually be.

As a result, we are looking for a GPU:ASIC difficulty ratio around 0.20 (meaning an ASIC has a 5x advantage over a GPU in power consumption) at a reasonable difficulty threshold (not too low that ASICs have an advantage at low hashrates, so we’re targeting a difficulty threshold where ASICs begin to be viable when there are the equivalent of somewhere between 2000 and 10000 next-generation RTX 3080 consumer cards mining (which we assume will be capable of somewhere around 41 MH/s at 320W for vanilla ProgPoW v0.9.3).

The ideal curve has some GPU:ASIC advantage ratio above or around 0.5 up to a network difficulty corresponding to around 2000 (estimated) RTX 3080 cards, and decreases down to below 0.20 around the 10000 next-generation flagship card threshold.

Here are some example curves roughly around the shape we are looking for (crossing the 1:2 threshold highlighted in blue and crossing the 1:5 threshold highlighted in purple):

Image for post
Image for post

And here they are visually (split between 41MH/s and 20.5MH/s in different charts):

Image for post
Image for post
Image for post
Image for post

We selected the option with a seed size of 55 bits, a nonce size of 40 bits, 14 Keccak rounds for the seed, and a 20.5 MH/s (based on best estimates) RTX 3080 GPU hashrate (achieved by setting PROGPOW_CNT_DAG to 128).

All of the options at 41 MH/s needed too many Keccak rounds which in benchmarking caused the GPU miner to become compute-bound instead of memory-bound (except for ones with >40 bits, which requires too many bits to be added given VeriBlock’s header space requirement).

The selected option had the best trade-off between good ASIC-resistance at low network hashrates (staying above 0.5 past 2500 RTX 3080s) and high ASIC compatibility at high network hashrates (crossing an GPU:ASIC ratio of 0.2 at approximately 8000 RTX 3080s on the network).

Profiling of several cards across the Pascal, Volta, and Turing architectures of GPUs showed that 14 Keccak hashes (with PROGPOW_CNT_DAG set to 128) for seed generation still leaves the algorithm memory-bound (as expected from previous profiling explained in the section on tweaking the seed Keccak hashes parameter):

Tesla V100 16GB SXM2 with 1 Keccak hash for seed generation (and PROGPOW_CNT_DAG=128):
Hashrate: 26.45 MH/s
Power: 299W

Image for post
Image for post

And with 14 Keccak hashes for seed generation (and PROGPOW_CNT_DAG=128):
Hashrate: 24.97 MH/s
Power: 299W

Image for post
Image for post

Titan V 16GB with 1 Keccak hash for seed generation (and PROGPOW_CNT_DAG=128):
Hashrate: 19.30 MH/s
Power: 215W

Image for post
Image for post

And with 14 Keccak hashes for seed generation (and PROGPOW_CNT_DAG=128):
Hashrate: 18.08 MH/s
Power: 215W

Image for post
Image for post

RTX 2080 Ti 11GB with 1 Keccak hash for seed generation (and PROGPOW_CNT_DAG=128)
Hashrate: 15.44 MH/s
Power:
186W

Image for post
Image for post

And with 14 Keccak hashes for seed generation (and PROGPOW_CNT_DAG=128):
Hashrate: 15.25 MH/s
Power: 192W

Image for post
Image for post

RTX 2060 Super 6GB with 1 Keccak hash for seed generation (and PROGPOW_CNT_DAG=128):
Hashrate: 10.22 MH/s
Power:
174W

Image for post
Image for post

And with 14 Keccak hashes for seed generation (and PROGPOW_CNT_DAG=128):
Hashrate: 9.95 MH/s
Power:
175W

Image for post
Image for post

1080 TI 11GB with 1 Keccak hash for seed generation (and PROGPOW_CNT_DAG=128):
Hashrate: 10.92 MH/s
Power:
196W

Image for post
Image for post

And with 14 Keccak hashes for seed generation (and PROGPOW_CNT_DAG=128):
Hashrate:
10.63 MH/s
Power:
202W

Image for post
Image for post

In summary:

Image for post
Image for post

With our final ProgPoW parameter selection (seed size = 55 bits, nonce size = 40 bits, Keccak rounds for seed calculation = 14, and estimated RTX 3080 performance = 20.5 MH/s (PROGPOW_CNT_DAG=128)), the seed recovery chance comes out to 2⁴⁰/2⁵⁵=1/32768. So at a given difficulty d, we would expect to need 2¹⁵ final_hashes that meet the network difficulty d, meaning 2¹⁵*d Keccak hashes (on average) in generating final_hashes, 2⁴⁰*14=1.5*10¹³ Keccak hashes per seed recovery attempt, and 2¹⁵*2⁴⁰*14=5.0*10¹⁷ Keccak hashes (on average) for successful seed recovery, meaning the total (average) number of Keccak hashes needed for a valid block solution is 2¹⁵*d+14*2⁵⁵ hashes.

Our estimated Keccak ASIC running at 87.5 Keccak TH/s at 2000 W is able to do 1.575 * 10¹⁷ Keccak hashes per kWh of electricity, and our estimated RTX 3080 running at 41.09 ProgPoW MH/s at 320 W is able to do 4.623 * 10¹¹ ProgPoW hashes per kWh of electricity.

Given our selected parameters, here are the following ASIC:GPU advantages given our GPU and ASIC performance assumptions:

Image for post
Image for post

At 1000 RTX 3080s:

  • Hashrate = 20.5*1000 = 20500 MH/s
  • Difficulty @ 30-Second Block Time = 20500*30 = 615000 MH
  • Keccak Hashes for ASIC Solution = 5.25 * 10¹⁷ Hashes
  • ASIC Power for Solution = (5.25 * 10¹⁷) / (1.575 * 10¹⁷) = 3.33 kWh
  • GPU Power for Solution = 2.66 kWh
  • GPU:ASIC Ratio: 1.249

At 2000 RTX 3080s:

  • Hashrate = 20.5*2000 = 41000 MH/s
  • Difficulty @ 30-Second Block Time = 41000*30 = 1230000 MH
  • Keccak Hashes for ASIC Solution = 4.58 * 10¹⁷ Hashes
  • ASIC Power for Solution = (4.58 * 10¹⁷) / (1.575 * 10¹⁷) = 3.46 kWh
  • GPU Power for Solution = 5.33 kWh
  • GPU:ASIC Ratio: 0.648

At 5000 RTX 3080s:

  • Hashrate = 20.5*5000 = 102,500 MH/s
  • Difficulty @ 30-Second Block Time = 102,500*30 = 3,075,000 MH
  • Keccak Hashes for ASIC Solution = 6.05 * 10¹⁷ Hashes
  • ASIC Power for Solution = (6.05 * 10¹⁷) / (1.575 * 10¹⁷) = 3.84 kWh
  • GPU Power for Solution = 13.33 kWh
  • GPU:ASIC Ratio: 0.648

At 10000 RTX 3080s:

  • Hashrate = 20.5*10000 = 205,000 MH/s
  • Difficulty @ 30-Second Block Time = 205,000*30 = 6,150,000 MH
  • Keccak Hashes for ASIC Solution = 7.06 * 10¹⁷ Hashes
  • ASIC Power for Solution = (7.06 * 10¹⁷) / (1.575 * 10¹⁷) = 4.48 kWh
  • GPU Power for Solution = 26.67 kWh
  • GPU:ASIC Ratio: 0.168

In summary, vProgPoW introduces an economically gated transition plan to ASICs which allows the network to be ASIC-resistant at low network difficulties, but enables simple ASICs to mine the network at high difficulties.

Additionally, the transition period is gradual; rather than cutting over from GPUs to ASICs suddenly, natural economic forces make ASICs increasingly viable (in performance-per-watt terms) against GPUs as the network matures.

References

[1] https://github.com/kik/progpow-exploit

[2] https://eprint.iacr.org/2012/368.pdf

[3] https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/

[4] https://semiwiki.com/semiconductor-manufacturers/intel/6713-14nm-16nm-10nm-and-7nm-what-we-know-now/

[5] https://videocardz.com/newz/nvidia-details-geforce-rtx-30-ampere-architecture

[6] https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf

VeriBlock & Proof-of-Proof: extending Bitcoin’s hashing power/security to any blockchain in a Decentralized, Trustless, Transparent, and Permissionless manner.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store