Trust-based, server-based “blockchains”

The rationale for a decentralized, scalable, secure digital money system that can be used to implement both the current, scarce digital money games and democratic ones.

[This article is not very well written and is probably not even interesting. This is mostly a rambling self-note. When I write here, for some reason I think better.]

How many copies of a “blockchain” — the entire state of a massively-multiplayer online game that is a simulation of a money economy — does one need? Bitcoin-like blockchain systems have too many, arguably. Can we limit the number of replicas and make digital money MMOs scalable?

One way to attack the “scalable, secure, decentralized” quest is to change our views of what “decentralization” might mean. We seek “decentralized” systems because (1) we don’t want selected servers being pinpointed and taken out, and (2) servers colluding to manipulate the economic game (e.g. issuing infinite money).

Blockchain systems address these two decentralization concerns by simply supporting an unbounded, homogeneous set of replicas of the entire state that is open to contribution by anyone. The replicas are also updated by a competitive “block mining” contest that, again, cares not about how many competitors are currently playing it. The chaining of blocks and encoding of the difficulty of finding the next block even secures the entire network against all-but-one-replica outages, which is a ludicrously high level of resilience and security.

In this article I propose that we can have a “less secure” decentralization philosophy. In another design I wrote about, I argued that if we have a sufficient number of public servers — whose owners are publicly, socially known entities — replicate an economy and update it in consensus, then we can be pretty sure that the economy is “decentralized”. Issue (1) is solved by requiring all server sites to be permanently destroyed simultaneously, and issue (2) is solved by the low probability that a sufficient number of public, server-providing entities will all collude to violate the protocol.

But the implementation of that previous system was supposed to be a simple replicated ledger, and the protocol be some sort of Paxos consensus or some other brand of state-machine replication protocol. As I spent time trying to flesh out the details (of which are very many), I kept checking the costs versus the scalability I was projecting. If I wanted something like 10,000 TPS to be supported by $100/mo servers, the designs I had kept telling me I should probably stick to about 20 replicas (20 servers, 20 different social entities, “stakeholders”) to protect me with respect to decentralization issues (1) and (2) above.

After much meditation, I decided I do not like two-digit nature of “twenty replicas”. I want a Dunbar’s number of replicas, at the very least. My magic nerd number is a support for 256 replicas. It is logically easy for me to see twenty reputable entities providing virtually same level of decentralization and security as the Bitcoin blockchain. However, it is emotionally easy for me to see two hundred public entities supporting a protocol, and knowing that at least the majority of them are completely honest, and thus providing a sufficient level of secure decentralization.

The general approach in the previous article is still valid for attaning 200 replicas. Now I just have to ramble about the implementation details and their philosophical underpinnings, which will determine whether the server cluster can scale from 20 to 200 replicas.

For two hundred replicas, a traditional state-machine replication approach doesn’t quite work. It becomes very, very chatty, which is wasteful. It would probably still work, but the servers would be more expensive to maintain than necessary. Another algorithm, another approach is possible.

The servers can’t simply use a “blockchain” protocol among themselves. For 10,000 TPS, that’s a whole lot of blockchain growth. A small transaction being around 100 bytes (guess ballpark), that’s 1MB per second, 86.4 GB per day or 31 TB per year. But we’re going to solve that problem and that’s exactly what we will propose here.

The essential reason behind “blockchaining” about 200 servers is that it is a simple way to solve the replica synchronization problem. State-machine replication protocols tend to bundle up replica verification with replica updating. And we already know beforehand that there are probably no malicious servers, and that knowledge is wasted by a traditional Byzantine (fault-tolerant, malicious-collusion tolerant) state-machine replication protocol. We do not rely on all the servers being honest (far from it), but we optimize (we bet) in them being all honest.

In the end, there is not much difference between a “blockchain” and state-machine replication. In fact, a “blockchain” is just an extreme solution to Byzantine state-machine replication.

But by proceeding to code blindly from the traditional Byzantine state-machine replication protocol view, I end up with too much messaging between the servers, and coding the replica re-synchronization is a pain because I want the replicated state to be a 1TB file (or bigger) — and I have about 10,000 transactions, 100-byte each, wanting in that ledger, per second. If I have 200 servers, then at each step I have 200 servers trying to upload, say, an 1MB file to everyone else. If a node fails, there is a whole lot of file scanning… huge command buffers from every other server and a ludicrous 1TB file to hash and re-hash because it is updated in-place.

And that is the problem. I want to make the snapshot of the entire economy — and 1TB disk-based hash-table using open addressing — be the central data structure, and that is not flying. It looks stupid, needlessly complex.

Instead of struggling with a single snapshot file that I am always second-guessing, I could simply have my 200 servers synchronize through a blockchain protocol, and use something like the mini-chain, chain-snapshotting approach of Cryptonite to compress the blockchain. See: I already have a very high level of trust in a group of nodes which is composed of nodes that also place trust in each other. The probability that they would not all be able to reach a consensus on blockchain-snapshotting is very, very low.

Instead of worrying, at every step of the way, that the other server nodes are or aren’t keeping up with the single, huge replica of the entire state that I have to destroy to keep registering commands in it, so that I can obtain some ordering of these commands and get that ordering behind me … I can simply use the king of lazy-snapshooting replica algorithms, the “blockchain”!

A signed “chain” of event buckets, where each bucket needs the previous bucket to attach itself to the “chain”, is a very clever way, and a general philosophy, for detaching snapshotting from event ordering. As my 200-server nodes progress in spreading commands to each other in the form of some sort of event-list blocks that are “chained” somehow, I am ordering these commands, and committing them to permanent storage. Since they are ordered and persisted in a sequential file, if anybody loses any part of that chain, they can just ask any single other server for it, and the “chaining” will do the job of making sure a server does not need to ask all other servers for every piece of the state it may have missed (malicious forks are still possible, but this can be addressed later since in principle we trust whatever the majority of servers decides is the true chain).

The chaining will not use proof of work or stake; the resource consumed for block issuance/creation is consensus-based — the servers grant each other the right to issue a block in rounds; that right can then be picked up or not. Something like that; it is easier since the servers have baseline trust for each other.

So, in essence, the servers receive commands from clients, package them in blocks and submit these blocks to all whenever they get a “go” from a collective block-issuance mechanism. Note the servers also don’t need to broadcast individual transactions between them, once they receive them. The block-issue mechanism functionally is sufficient to clear the RAM buffer of incoming client requests at each node, and each transaction is broadcast just once to all servers — a 100-byte transaction will cost 20,000 bytes of inter-server bandwidth for synchronization, in a best-case scenario. A 10,000 TPS such system with 200 servers would eventually consume 200MB/s (1.6 Gbps) of collective bandwidth for synchronizing these 10,000 transactions between the mirroring servers. Which is fine. (EDIT: Well, this makes all the servers a bit expensive. But the if value of a 10,000 TPS economy doesn’t pay for them, then what is that economy worth?)

At 10KTPS, the “chain” as mentioned before would grow about 86.4GB per day. Now, at the end of the day or some time after that, if all servers agree to the chain state of an entire day (a quick exchange of hashes can be done), then, locally, each server can destroy a day-sized portion of the chain and replace it with a snapshot. We can really wait for ALL (or almost all) servers to confirm this. If we have a few terabytes of disk space available, we can buffer several days’ worth of transactions before we actually have to freeze the economy waiting for someone to catch up to the point a day’s worth of transactions can be destroyed and snapshotted.

Now, the snapshots themselves are NEVER updated in-place. This is a major headache and I now know that is not how the system should play out. Snapshots are read-only once produced, as well as the ordered blocks of transactions. This simplifies everything immensely.

If a server is turned off and then comes back online again, they just have to “download the chain”, just like in other blockchain systems, to catch up. In case their chain and/or their latest snapshot is destroyed, they also just “download the chain”. The integrity of the chain can be verified locally since it is a sequence of hashes and/or signatures. A ledger that is updated in-place is much more nightmarish to even determine if it is correct, since every little piece of it has to be hashed and compared individually with the pieces of everyone else.

In yet another previous design for a decentralized, secure, scalable digital money system, I assumed that for DDoS-resistance purposes — availability and resilience, or Issue (1) above in decentralization — we could bet in clusters of about 1,000 “personal server” type nodes to replicate an unique partition of a digital economy (or any other MMO). That cluster of 1,000 “personal servers” also solved Isssue (2) in decentralization because nodes chose individually what clusters they would participate into, and having one copy of a token expenditure online would be enough to prevent that token from being double-spent.

In that previous system, we needed many clusters of 1,000 “personal servers” because entire transaction histories were kept — it was in a sense a “blockchain” system, albeit one that could forget each “chain” after a decade or so due to demurrage charges. But in our trusted-servers system that is primarily about keeping ledgers of transactions and not transaction histories long-term, we only really need one such 1,000-server cluster (or about “200” as I have decided), storage-wise, to provide enough accounts and tokens for a whole lot of clients. And it seems, on the surface, that we can throw about 10,000 transactions per second at that cluster — substantially more than what other existing decentralized and secure digital money systems can handle. Not only these 10,000 transactions happen, in bandwidth terms, but they do not generate 30TB per year of data to be stored forever on every participating node.

So I think that solves the problem in all the ways I like. I get my “1,000-server” cluster for availability (more like “200”). I get my “Dunbar’s number” of public, reputable entities auditing each other (about “200”). I get an implementation that doesn’t stupidly attempt to update a master snapshot of a hash table in-place. I get a simplified update protocol between replicas, a simplified snapshotting protocol between replicas, and a simplified recovery protocol between replicas.

I like this one. Now can I design it and code it? And as usual I may change my mind in the process and go after another problem definition to solve :-)