Analysis: ethermine.org US1 server outage

Bitfly
2 min readJun 19, 2017

--

During last Friday we experienced significant performance degradation on our ethermine.org US1 server over a time period of 6 hours. The issue caused the server to only accept a small amount of all the shares that where submitted (~10%). As work distribution and connectivity of the server was still working properly the issue was not detected by our automatic monitoring and alerting system.

As soon as the root cause of the problem was identified we quickly developed, tested and deployed a fix. After that the server was accepting shares normally again.

What added to the severity of the issue was, that most mining software do not consider a pool as down even when share submission is broken. In order to circumvent such an issue in the future we recommend that miner devs mark a pool as down if share submission takes longer than a few seconds and switch automatically to a backup pool.

From our side we took the following measures to avoid such an issue in the future:

  • Expanded our monitoring and alerting system to detect such issues
  • Added additional server capacity to our US1 server farm
  • Patched our stratum server implementation to ensure submitted shares are properly accounted for

Compensation plan

As the issue was caused by our pool stratum implementation we decided to compensate our miners for the downtime. In total the pool lost 6 hours of mining at a hashrate of 1TH/s which correspond to 124 Ether at that days difficulty. The Ether will be credited to all miners that were mining on the US1 pool proportional to their recorded hashrate immediately before the incident under the synthetic round 3896442. A full list of eligible miners, their recorded hashrate and allocated compensation amount is available at https://gist.github.com/anonymous/7f46682ba386972bea1716339c740a38. Compensations will be credited within the next 24 hours.

Edit: All compensations have been credited successfully.

We are very sorry for the inconvenience caused by the issue and will do our best to ensure it will not happen again.

Thanks for mining with us.

--

--