Prove Logs on Blockchain with Go and ProvenDB

Published in

ProvenDB

10 min readJul 22, 2019

Photo of my drone over Lake Tyrrell, Victoria, Australia

Logs from a system are used to report what has happened on that system over time. As the importance of a system grows, so does its logs. For example, if a system handles privileged access to some very important data, its access logs could be used to audit who has accessed what and when. Other examples would be the logs of a flying system, a life-support system or a delivery robot, where their logs are critical to their domain-specific behaviors, and a way to keep them safe in a tamper-proof manner and enable future examinations is much needed. Blockchain is a fast-growing distributed technology that enables us to prove the existence of digital assets without a central authority. Public chains, such as Bitcoin and Ethereum, are popular blockchains with nodes widely distributed across the globe, where their transaction histories are highly unlikely to be manipulated by any particular individuals or organizations, so information posted on them can be viewed as an immutable ground truth. With this in mind, we can leverage Blockchain to store and verify those system logs for us¹ ².

However, the cost and throughput of posting information on Bitcoin or Ethereum simply make it unfeasible. But, this is where ProvenDB kicks in, which acts as an adaptive layer between your traditional MongoDB and the Blockchain you want to use. Instead of posting logs directly to a Blockchain, we can post them to a ProvenDB service, which will take care of the rest. In this article, I am going to show you how we can use Golang, MongoDB Go Driver and ProvenDB to build a simple yet performant logging service, called ProvenLogs, that can continuously prove your system logs on a Blockchain (Bitcoin) so that their ownership and existence can be verified. Its full source code is available on GitHub and under MIT license. Please feel free to use it in your application in any way, and leave us a comment about your amazing work that we are really keen to know about.

Architecture

ProvenLogs acts as an adaptive layer between your application logs and Bitcoin. It has three components: Parser, Batcher and Verifier (figure 1). The Parser receives logs from stdin and parses each line as a log entry. Each log entry will be stored in ProvenDB as a document, which can be queried and verified independently later on. Upon receiving logs, the Parser will immediately echo the same logs back to stdout as well as pushing them to the Batcher for further processing. In this way, your app will behave exactly the same before and after adding ProvenLogs, but with its logs continuously being proved to Bitcoin in the background. The Batcher batches log entries up to a predefined size and time interval. At the end of each batch, the Batcher will sign the batch with a private key p1 using RSA, and store the signature in the last log entry of the batch. In this way, the batch’s integrity and ownership info are embedded into the batch data itself, and once that last log entry of the batch is verified by Bitcoin, the whole batch’s integrity and ownership are also verified, because there is no way the batch data or ownership can be tampered without changing the batch’s signature. Finally, the Batcher will load the batch into ProvenDB and submit a version proof (figure 2), which proves that all log batches belonging to that version exist on the Bitcoin’s Blockchain at that version’s timestamp.

Figure 2: The Theory Behind ProvenLogs and ProvenDB

ProvenLogs creates a new version for each batch, and ProvenDB’s version starts at 1. Therefore, in figure 2, the batch m will have version m+1 and the batch m+1 will have version m+2 and so forth. The version m+1 verifies that batch m existed at version m+1’s timestamp t_m+1, and version m+2 verifies that batch m+1 existed at version m+2’s timestamp t_m+2 and so forth. This can be used to assert that all log entries in batch m is certainly created before t_m+1 by the owner of p1 and so forth.

The Verifier takes a raw log entry that is a line of the app’s logs and parses it using the same Parser logic. Then, it locates the batch that contains the log entry in ProvenDB, verifies the batch’s signature with p1’s paired public key and verifies that the last log entry of the batch has existed on Bitcoin with a document proof, which is derived from the batch’s version proof.

The reason why a version proof in ProvenDB can verify the integrity and prove the existence of all the data in that version is that a Merkle Tree of all that version’s document hashes is constructed, and the root hash of that Merkle Tree is posted to and stored in Bitcoin’s Blockchain as a transaction. Working backward, each document’s existence in that version can be asserted via the Merkle path (document proof) from the root to that document hash.

Implementation

With the architecture in mind, let’s now go through the journey of proving a stream of system logs on Bitcoin’s Blockchain and then demonstrate how we can verify one of those log entries.

Create a Free ProvenDB Service

We have released the early adopter version of ProvenDB, so everyone is welcome to create a free ProvenDB service by following this guide. After the step, you should get your own service’s MongoDB URI similar to this one:

mongodb://admin:mypwd@guiguan.provendb.io/guiguan?ssl=true

The guiguan is the service name which is also used as the MongoDB database name. The admin and mypwd are the service credentials. You can verify that your service is working using the MongoDB Shell:

$ mongo mongodb://admin:mypwd@guiguan.provendb.io/guiguan?ssl=true
MongoDB shell version v4.0.3
connecting to: mongodb://guiguan.provendb.io/guiguan?ssl=true
Implicit session: session { "id" : UUID("47068084-8b85-4793-b014-1a2f69a3880f") }
MongoDB server version: 4.0.10
> show collections
_provendb_collections
_provendb_currentVersion
_provendb_documentProofs
_provendb_forgetRequests
_provendb_info
_provendb_versionProofs
_provendb_versions

If all is good, you should be able to log in and issue the show tables MongoDB Shell command, which lists ProvenDB’s meta-collections.

Prepare a Demo App

In order to test the ProvenLogs that we are going to build, we need to create a demo app that can generate some testing log messages. You are welcomed to bring your own. Since the sample implementations in the following sections assume that the demo app is using Zap production log output format, we can just create one as follows:

$ git clone https://github.com/SouthbankSoftware/provenlogs.git && cd provenlogs
$ make build-zapper

This will build a zapper executable in the provenlogs directory. Later on, we will feed the output of zapper into the provenlogs executable using a Linux pipe like:

$ ./zapper 2>&1 | ./provenlogs ...

Please note that Zap usually outputs logs in stderr, so it is better for us to feed both stdout and stderr into the pipe.

Parse Log Stream

The following code shows how we handle the input log stream from stdin. We use a TeeReader to clone the stdin stream to a Pipe, so while we are echoing the stdin direct to the stdout, we are also forwarding the same stream to that Pipe. Our parsing logic is sitting behind that Pipe parsing the log stream line by line. Apart from that, we use a cancellable context to signal the end of the log stream, i.e. EOF (ctrl+D), to the Batcher that is running in the background, so the Batcher can gracefully terminate itself after processing all the log entries it has received.

For each raw log line, we parse the string into an internal data structure called LogEntry. The parsing logic below shows how we parse the Zap production format, which can be easily changed to support other log formats.

Batch Log Entries

Now we have converted the log stream to LogEntrys, we can start building a Batcher to group them into batches. Batch is the unit that we carry out the signing and proving operation on because those operations are non-trivial. For example, the RSA signing algorithm with a key size of 2048 bits will produce a digital signature of 2048 bits too, it will be costly in terms of storage if a signature is acquired for each log entry.

Our Batcher is as simple as a for-loop running in the background that listens for and reacts on three events: the Batcher context gets canceled, a new LogEntry is received, and the current batch reaches its batching time limit. When the context gets canceled, probably due to the end of log stream, the Batcher flushes all buffered LogEntrys and terminates itself. When a new LogEntry is received, it is appended to the current batch’s buffer, and if the batch exceeds its batching size limit, the batch buffer is flushed. When the current batch reaches its batching time limit, signaled by a Ticker tick, the batch buffer is also flushed.

Sign Batch

When flushing the batch buffer, we first use the user’s private RSA key to sign the batch data and attach the signature to the last LogEntry of the batch. The es.Sort() is just to ensure that we always sign the data in a deterministic way.

Submit Batch

After we acquired and embedded the batch’s signature, we can now submit the batch to ProvenDB. We use MongoDB Go Driver’s collection.InsertMany to insert an array of LogEntrys into our ProvenDB service’s log collection. The driver is smart enough to automatically marshall each of our LogEntry structs to a MongoDB BSON binary that can be loaded to our ProvenDB service. Then, we instruct the service to submit a version proof for the current batch. Each MongoDB CRUD operation in ProvenDB will result in a new version of the data because ProvenDB is an immutable DB that preserves all the data history. We first get the latest version number, then we submit a proof for that version.

GetVersion and SubmitProof are just wrappers for MongoDB’s RunCommand. With ProvenDB, the RunCommand supports an extra set of commands that are not native to MongoDB, such as the following submitProof.

You can build a ProvenLogs executable in the provenlogs directory as follows:

$ make

Then, run ProvenLogs with the demo app as follows:

$ ./zapper 2>&1 | ./provenlogs -u mongodb://admin:mypwd@guiguan.provendb.io/guiguan?ssl=true --batch.size 10 --batch.time 10s

At this point, our logs are submitted to Bitcoin as a ProvenDB version proof. If we wait for an hour or so, once the Bitcoin transaction containing the proof info is confirmed, we can start using ProvenLogs to query and verify each log entry.

Verify a Log Entry

We use the same parsing logic to parse the query string of a raw log line into a LogEntry, then we directly feed the struct to MongoDB Go Driver’s collection.FindOne to find the corresponding log document in our ProvenDB service.

By default, when converting a map[string]interface{} to a BSON document, the MongoDB Go Driver (go.mongodb.org/mongo-driver v1.0.2) at the time of writing this article does NOT sort the map entries by key, which will randomly result in different BSON binaries at different runs. This non-determinism, in turn, will cause a document hash to differ at times. In order to fix this, we created our own BSON registry, and use it everywhere in ProvenLogs. Line 22 in the following code is an example of usage.

With the retrieved log document, we can then fetch all the log documents that belong to the same batch. In order to verify the batch, we first check whether the last LogEntry’s hash matches the one h1 stored in its metadata, then we verify the batch signature that is stored in the last LogEntry. If all is okay, we can now be sure that the batch’s data integrity and ownership are okay given that the hash h1 is trustworthy. In order to verify h1, we first retrieve the last LogEntry's document proof, then we verify the document proof using the Go version of Chainpoint utilities that we created here at ProvenDB, which can be imported from the open-source ProvenDB provendb-verify CLI repo as github.com/SouthbankSoftware/provendb-verify/pkg/proof.

Now, let’s try the real ProvenLogs Verify CLI:

$ ./provenlogs verify -u mongodb://admin:mypwd@guiguan.provendb.io/guiguan?ssl=true -l '{"level":"info","ts":1563265469.6518111,"caller":"zapper/zapper.go:31","msg":"test 13","delay":1487}'

Please replace the raw log entry accordingly. Then if all is good, you should get the output in the following screenshot.

What’s Next

In this article, we introduced the implementation of a simple yet fully featured and performant logging service, called ProvenLogs, which can continuously store and verify system logs on Bitcoin’s Blockchain. We demonstrated how we can leverage Golang, MongoDB Go Driver and ProvenDB to build such a service in a clean and well-structured way.

However, we should never stop pushing things to their extreme states 😜. There are lots of things we can do to make ProvenLogs perfect.

Optimize log searching performance with MongoDB indexes: we haven’t created indexes for our log collection. When we carry out the log verification, ProvenDB will have to linearly scan all the stored log documents. We can at least create indexes for timestamp and level document fields.
Use the Merkle tree root hash in the RSA signing process: for each batch, we hashed twice. Once for the RSA signature, and once for the Merkle tree construction. It is reasonable to combine those two and use the Merkle tree root hash only for both RSA signature and existence proof.
Analyze performance: we can add Prometheus metrics and pprof trace to visualize and analyze ProvenLogs’ runtime performance in greater detail, so we can understand and improve its bottlenecks.
Use multiple ProvenDB services as replicas: although each ProvenDB service has already come with internal redundancies, it is still a good practice to keep replicas at the application level. The ProvenLogs service could write to multiple ProvenDB services at the same time, and carry out votings when a Byzantine fault happens.

[1]: Versteeg S. (June 13 2018). Practical Blockchain: Tamper-Proof System Logs
https://www.ca.com/us/modern-software-factory/content/practical-blockchain-tamper-proof-system-logs.html

[2]: Otander D. Detect Anomalies in System Logs Using Blockchain
https://www.uledger.co/detection-anomalies-in-system-logs-using-blockchain/