Bitcoin’s Buggy Sync

A long history of frustrations, deceit and financial losses

Justin Percy
The Dark Side
Published in
13 min readAug 22, 2019

--

“Those who know don’t talk. Those who talk don’t know. Close your mouth, block off your senses, blunt your sharpness, untie your knots, soften your glare, settle your dust. This is the primal identity. Be like the Tao. It can’t be approached or withdrawn from, benefited or harmed, honored or brought into disgrace. It gives itself up continually. That is why it endures.”― Tao Te Ching

[4]

If you’ve been running a full node or the QT wallet on your desktop since 2014 or prior; You probably remember the synchronization dilemmas, maybe even the tricks and tips to speed up the process.[1]

Not only was this early flaw an annoyance — It was also responsible for the denial of service attacks — taking nodes completely offline and disrupting exchange and merchant services by forcing them to manually re-index the entire blockchain; Their wallets and nodes got stuck on orphan chains and refused to download new blocks from the network. [2]

History of the Glitch

Bitcoin Core 8 issues lead to the development of a quick solution for new node/wallet installations or recovering a ‘stuck’ wallet; By clearing the local blockchain files and using a dependable remote source for downloading. The Bootstrap — implemented into Core 9; It was used as an alternative and slightly quicker method than attempting to synchronize from other live nodes on the network.

[4]

It does have its own risks — the transaction records could be altered by the ‘provider’ and accepted by the majority of the network; Essentially a double-spend attack, but without the mining investment required with a 51% attack.

“The biggest pain point of using Bitcoin-Qt (Bitcoin Core) prior to v 0.10 as your wallet is it takes forever to sync to get the complete block chain.” [3]

Low RAM, CPU or slow internet was often viewed as the culprit for such a painstaking Bitcoin experience. Bootstrapping was a quick fix, but was simply a centralized solution to a decentralized problem. This is when I began researching the resource consumption along with data transmissions during full network synchronizations and even while bootstrapping — unexpectedly the peers appeared to be fighting each other on the network while downloading orphan blocks, I also speculated this was later used as an attack vector to further enhance the Sybil (double-spend & 51%) vulnerabilities — when nodes are knocked offline or stuck on undesired block heights [2] [5].

“Mt. Gox plans to resume Bitcoin transfers after fixing ‘phantom’ weakness”[2]

“The malformed records created discrepancies in the effected exchange’s accounting systems that caused them to fall out of sync with the network.” [2]

“Thanks to our friends at Blockchain.info, Mt. Gox now has a workaround that will use a unique identifier created by Blockchain to show whether transactions have been modified or not.” [2]

Unfortunately this “unique identifier” was not the fix-all solution required — it was only a small piece of a few simple flaws that was further compounded by improper handling of orphan blocks in the ProcessBlock function.

If you have checked the network connections and added nodes if needed, but your wallet still won’t sync, then you may need to delete the blockchain along with a few other files and start over.[4]

Satoshi — We have a Problem!

Bitcoin Core developers were out of quick-fix options, and the sync issue never went away. During an opportunistic time-period — some heavily funded developers [6] [7] used this chaos & unrest to further their own political motives to become the dominant force among the core team; Bribes, bullying or other marketing tactics eventually forced the Bitcoin community to adopt whatever “solutions” this rogue sub-development group proposed… Anything that appeared to solve the issues of the past — they also recently did this marketing campaign with Bitcoin and Bitcoin Cash developers to create the BSV fork. [8]

In 2015 — Bitcoin Core 10 was released, the “improved” version with a completely revamped code base promoting new features called “headers-first synchronization and parallel block download”

“Blocks will be stored on disk out of order (in the order they are received, really), which makes it incompatible with some tools or other programs. Reindexing using earlier versions will also not work anymore as a result of this.

The block index database will now hold headers for which no block is stored on disk, which earlier versions won’t support.” [9]

This controversial upgrade replaced the original Satoshi peer consensus code with ChainActive class [10]. Blocks would be downloaded, stored and accessed in an order determined by a summary of block data called the CBlockHeader class — limited by incomplete block information to verify their validity.

In another erroneous attempt to streamline synchronization and prevent stuck block issues, peers would download this block header summary from only 1 peer on the network; then request full block information from all peers who shared the identical header summaries as the single peer initially connected to.

Attackers quickly figured out how to ensure their nodes had connection priority on the network and used this upgrade to increase the success rate of their Sybil attacks even more so. Making matters even worse, ChainActive was designed to find the most recent chain Tip in the past and present without proper information to make an informed consensus.

Early critics could see some possible issues developing with such an upgrade to the vitally important Satoshi Consensus, they expressed their concerns and most of it was overshadowed by the clever marketing ploy that Core 10 was robust, new and would sync far faster than previous Bitcoin versions without “stuck wallets” or bootstrapping! Some fell for this gimmick and adopted it blindly, later discovering their alt-coins could be totally destroyed overnight (past, present, and future); If they did not maintain majority hash-rate to ensure network security at all times.

“No more orphan blocks. At all. We only ever request a block for which we have verified the headers” [11]

But how could the wallet be sure that verified headers are in full consensus on the network when an internal class was used to find the most recent “Tip” of the chain by downloading it from a SINGLE peer upon wallet startup? How would it know the most valid network-wide chain height and hash checkpoint to continue building new blocks, or where a chain split occurred and reorganization was required, as a checkpoint? They simply couldn’t.

Greg Maxwell was one of the first to discover this new “solution” had some major issues; The wallet was disconnecting peers and banning them for “misbehaving” when what appeared to be randomly. [12]

Blocks were now also stored “out-of-order” on the disk and depended fully upon their header information, ChainActive made it possible for deep chain reorganization sometimes thousands of blocks at a time in the past. [13]

A prime example of blatant ignorance can be seen in Core 10.3 code and even newer cores; FindNextBlocksToDownload function, Line numbers: 416–420

if (state->pindexLastCommonBlock == NULL) {
// Bootstrap quickly by guessing a parent of our best tip is the forking point.
// Guessing wrong in either direction is not a problem.

state->pindexLastCommonBlock = chainActive[std::min(state->pindexBestKnownBlock->nHeight, chainActive.Height())];
}

This code means the wallet can perform automatic deep re-organization of the local wallet as soon as a peer reorganizes their own chain (for some odd reason)… Line numbers: 422–426

// If the peer reorganized, our previous pindexLastCommonBlock may not be an ancestor
// of their current tip anymore. Go back enough to fix that.
state->pindexLastCommonBlock = LastCommonAncestor(state->pindexLastCommonBlock, state->pindexBestKnownBlock);
if (state->pindexLastCommonBlock == state->pindexBestKnownBlock)
return;

All peers that did not follow the new changed “Tip” as determined by ChainActive was immediately shunted offline from the network, further enhancing the ability for Sybil attacks to completely decimate the entire network simultaneously as opposed to one node as seen with Mt. Gox in the past. [14]

The Long Treasure Hunt

After years of frustration, and later digging deeper into the source code, I was referred by some core developers to look at the net.cpp for this fatal flaw. Finding other issues there and taking a proactive approach; I was the first developer to create a primitive artificially intelligent firewall, [15] to mitigate centralized data transmission by one peer attempting to hijack the network.

Uneducated in all other aspects of the source code, I was unable to fully fix the root cause of the issue — I WAS able to successfully mitigate and prevent its damaging effects on the network by detecting and shutting it down in real-time using the network nodes to determine normal behavior; I simply did not know enough of what was good or bad network traffic, I had to make the peers learn that from other nodes; Then protect against abnormal peers and nodes using primitive artificial intelligence and machine learning.

Working on my own modified Core 8 alt-coin — Profit Hunters Coin (PHC), I was able to dive fully into the Core 8 codebase and tinker, creating some solutions never before seen in the industry and pushing new boundaries never possible with Bitcoin. [16] but I kept running into the same issue Bitcoin did, stuck wallets, slow sync, massive bandwidth, and resource usage for unexplained reasons.

After improving the old debugging code, the development team assisted me in testing and re-testing multiple versions of PHC wallet software and attempting a full blockchain synchronization more times than we can keep track of, but still learning vital data to assume normal performance benchmarks on multiple devices.

Waking up one morning, and discovering that one of my test nodes was stuck on a block and wouldn’t sync; I was excited to have finally caught the undesired event and captured a clue as to the cause in debug.log. I could see repeating errors from ProcessBlock regarding orphan blocks as it was apparent to be related to the issue.

The Root of the Problem

PHC (Main.cpp — 1.0.0.7-base)

Looking at the source code responsible for the buggy glitches of the past can reveal a simple yet complex issue. The above code was “shunting off orphan chains” into memory and aggressively asking the peer that sent the orphan block currently getting processed to keep sending the root parents and children… filling up memory and stalling other peers from broadcasting the valid chain. This was regardless of importing, reindexing or initial block download status.

What was truly missing was smart-routing — The ability to keep track of past rejected blocks per node. Peercoin attempted to optimize this sync process by asking for the root of the orphan block and all consecutive blocks after, but this uses excessive bandwidth and does not ask the rest of the network for the valid chain, only the current peer for more invalid blocks.

This glitch has been exploited during denial of service attacks or what was known as “disk-fill-attacks” even when free space was more than available on the node. This was due to memory leak issues and not related to the hard drive at all.

Orphan chains could exhaust RAM resources quickly, this; In turn, caused database errors. Nothing was even written to the disk, it was all held in memory and crashed the database libraries with unexpected errors returned to the client software often corrupting the raw database, leveldb and even sometimes the wallet.dat file. That’s why people needed to bootstrap in a pinch or re-index their local files and “don’t forget to backup wallet.dat”.

The Not So Quick Fix

The new PHC source code ProcessBlock function has grown significantly in size after optimization and what we consider to be a stable and secure temporary quick fix.

The original Bitcoin version Core 8 contained a flaw where the wallet would receive an orphan block from a node, and then proceeded to ask for the entire chain beginning at the orphan root up to the end of the chain tip. This was also in the PHC code base that included stake mining from Peercoin. But the flaw was still unfixed and we noticed would result in an infinite loop until all mapped orphans held in memory became duplicates and was ONLY proof of stake blocks. All proof of work blocks would continually be requested — causing memory buffer overflows, database corruption and unresponsive wallets, sometimes an orphan block was written to the database by error and permanently locked the wallet client on an orphan chain. This was the glitch Bitcoin 8 and Mt. Gox suffered from, this was the “phantom bug”.

TO EDIT -> new fix found recently.

PHC was able to work around a series of glitches by modifying sections of the existing code to only execute orphan chain requests during Initial Block Downloading (full node synchronization) and can be disabled when the wallet client (daemon or qt) is executed with the -orphansync=false command argument or in the configuration file. Without changing a massive amount of the consensus code to implement ChainActive and HFS; This prevents excessive resource usage and disabled any potential for “disk-fill-attacks” as experienced in the past by default. Removing this code entirely would cause the wallet to get stuck on a block, so it had to be called upon on certain conditions and in a controlled environment.

One other critical modification done to the ProcessBlock function to ensure that while the wallet client was already synced to the network, and received an orphan block it would not go into “idle” mode and refuse to request any other peers for alternative block candidates.

By creating a new Dynamic Checkpoint buffer in CNode class called dOrphanRecv and recording orphan block hash, timestamps and parent node information; A small secondary section of code could safely allocate orphan flooding protection held in memory and would be sufficient during normal chain splits. Ensuring the client could call ForceSync function to request the valid chain from all other connected peers starting at the pindexBest->pprev valid block. Primarily it would force a PushGetBlocks request to the orphan node first, to see if they contained any valid blocks after the current best block height (-1). This appears to prevent wallets from getting stuck or falling behind on the network while minimizing bandwidth and resource consumption.

mapOrphanBlocks has been modified to have only a maximum of 100 entries to prevent excessive CPU or RAM resources.

The Bonus Features

“I never did anything worth doing by accident, nor did any of my inventions come by accident; they came by work.” — Plato

HyperSync was born through the debugging process leading up to the research presented above, as the PHC development team attempted to narrow down the root causes by patching in fixes; They noticed that Core 8 could synchronize to the network in record speeds, comparable to Core 10 or above… Simply by calling to the ForceSync function during the Initial Block Download process, after processing an orphan chain from a node. This kick-starts a new cycle of block processing starting at the most recent best block height (-1) from a decentralized source (all connected peers).

This new feature DOES use a more than normal amount of bandwidth and more CPU/RAM resources compared to the default optimized fixes discussed in the previous sections of this paper, but can be run using the following command-line argument:

phcd -hypersync
phc-qt -hypersync

Lowbandwidth mode: Now that the synchronization code has been optimized an additional feature has been added to the Firewall. It will disconnect nodes trying to synchronize blocks past a certain threshold from the current best: default 1000. This will ensure nodes running on limited bandwidth quotas are able to contribute to the network without detrimental effects on performance.

phcd -lowbandwidth
phc-qt -lowbandwidth

or edit phc.conf and add the following line:

lowbandwidth=1

Staking & Mining is Possible on a Mobile Device

Balanced low usage will ensure unexpected data overcharges do not happen.

The Firewall will automatically prevent nodes from “full-syncing” to mobile nodes and disconnect/ban them for 24 hours.

The battle is far from over

Initial tweaks and test show positive results, but the buggy sync is far from fixed in PHC at the moment. More troubleshooting and is testing required. Some of the patches above may become obsolete or depreciated…

References:

[1]https://cryptomining-blog.com/230-how-to-speed-up-new-bitcoin-wallet-synchronization/

[2] http://thomist.org/images/sample/BitcoinWallet/bitcoin-wallet-sync-slow.php

[3] http://www.bitcoin-en.com/install-bitcoin-qt-faster.html

[4] https://cryptorials.io/wallet-wont-sync/

[5] https://bitcoin.stackexchange.com/questions/2979/what-can-i-do-when-the-blockchain-synchronization-is-stuck-at-a-specific-block

[6] https://blockstream.com/2014/10/23/en-why-we-are-co-founders-of-blockstream/

[7] https://www.businessinsider.com/craig-steven-wright-rumoured-bitcoin-creator-was-commercialising-blockchain-research-and-reviving-company-hotwire-2015-12

[8] https://medium.com/@jonaldfyookball/an-open-letter-to-unwriter-169af09867b1

[9] https://bitcoin.org/en/release/v0.10.0

[10] https://github.com/bitcoin/bitcoin/pull/5890

[11] https://github.com/bitcoin/bitcoin/pull/4468

[12] https://github.com/bitcoin/bitcoin/pull/4468#issuecomment-48102820

[13] https://github.com/bitcoin/bitcoin/archive/v0.10.3.zip

[14] https://github.com/bitcoin/bitcoin/issues/5851

[15] https://github.com/BiznatchEnterprises/BitcoinFirewall

[16] https://medium.com/altcoin-magazine/the-difference-between-general-and-specific-consensus-in-bitcoin-538ed8dfe696

Additional Research:

[1] https://medium.com/@jonaldfyookball/on-solving-the-51-attack-problem-in-bitcoin-part-1-9de59e34144

[2] https://hackernoon.com/bitcoin-core-bug-cve-2018-17144-an-analysis-f80d9d373362

[3] https://coinguides.org/wallet-not-syncing-fix/

[4] https://bitcoin.stackexchange.com/questions/4920/my-client-stopped-synchronizing-how-can-i-access-my-wallet

[5] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-April/007828.html

[6] http://12580max.com/github_/bitcoin/bitcoin/pull/14711

[7] https://github.com/bitcoin/bips/pull/400

[8] https://en.bitcoin.it/wiki/Bitcoin_Core_0.11_%28ch_6%29:_The_Blockchain

[9] https://bitcointalk.org/index.php?topic=1204592.0

[10] https://en.wikipedia.org/wiki/Bitcoin_scalability_problem

[11] https://bitcoin.stackexchange.com/questions/37475/call-function-for-block-n-on-blockchain

[12] https://bitcointalk.org/index.php?topic=1204592.0

[13] https://github.com/bitcoin/bitcoin/issues/5851

[14] https://sourceforge.net/projects/bitcoin/files/

--

--