Incorruptible Auditing: Exonum-Powered Graph Database Management
In our increasingly modern and interconnected world, more and more information is stored digitally, which has made the process of exchanging, gathering and querying this information much easier. At the same time, it has introduced new challenges about how to ensure its consistency and reliability due to the sheer volume of data. A blockchain-based information system can provide an incorruptible record of history, enabling better auditing and data management practices.
In this blog, we describe how we’ve combined an Exonum blockchain and a Neo4j graph database into a system that can provide a verifiable audit trail of data integrity and its modifications. Our proof of concept combines the strengths of blockchain and graph database, delivering an incorruptible audit trail for information stored in a graph database.
This project was executed by Sjoerd Wels, Indrek Klangberg, and Silver Vapper at University of Twente as part of Industrial Software Engineering Project, conducted under supervision of the Exonum engineering team.
One of the attractive features of blockchain technology is its ability to enable multiple parties who have varying levels of trust in one another collaborate on a shared version of the truth. In this instance, the parties contribute to a single, shared database. By using blockchain, they have assurance against censorship attempts and a tried-and-true consensus mechanism that enables every stakeholder to participate and audit the blockchain. This is especially useful in a professional setting in which multiple parties need access and permissions to edit information in a database.
In our example, we have built a peer-to-peer network with each participant running their own copy of a Neo4j graph database. Peers collectively review proposed medications of information stored in the Neo4j, decide on the order of the changes, and then record actual changes in the graph database — while keeping the history of the modifications in an Exonum blockchain for subsequent auditing in case a dispute over data quality arises.
What Are Graph Databases?
A graph database is a type of data management system that represents data as a collection of nodes and shows the relationships between them. In a graph database, both the nodes and the relationships between them are considered first-class citizens.
For our proof of concept, we decided to use Neo4j, the most popular graph database management systemto date. Neo4j has been successfully used for both research purposes and in existing applications. Some of the strengths of graph databases useful for our purposes include:
● Explainability: Graph representation of nodes and their relationships is visually represented, making it easier to review heavily interconnected data.
● Functionality: The functionality of Neo4j is extendable, meaning we can define custom procedures, new user-defined functions or unmanaged extensions. This allows us to enable communication between our Neo4j database and our Exonum blockchain to record, modify and/or query information.
In our proof of concept, we require that every modification to data stored in the graph database must be performed via the blockchain. The entire blockchain network must validate the change and reach consensus on the modification for it to be permanently recorded on the blockchain. The Exonum blockchain itself is never altered before this consensus is reached. Instead, the parties modify their own forks to the blockchain and, as consensus is reached, execute them onto the actual Exonum blockchain.
However, as these forks are created, they are communicated to the Neo4j database and the changes are executed immediately in the database. These changes can only be rolled back before the forks are verified and added to the permanent record of the blockchain. This could lead to a mismatch between what was agreed via consensus on the blockchain and what was recorded in the database. We had to find a better way to synchronize the blockchain records with the Neo4j database information.
We investigated two possible solutions — an “apply and rollback” approach and a “two-step” approach.
In this approach, we designed the Neo4j database to execute all queries (i.e, transactions on an Exonum fork) that modified information, keep track of every change, and then report all changes back to the Exonum blockchain. This would enable the blockchain network to reach consensus on the entire list of modifications first before the Neo4j database could mark any of the modifications as valid.
Because the blockchain itself has a record of every modification to the information, we could roll the Neo4j database back to its previous state, before any modifications to the internal blockchain state were made. The Neo4j database would reflect only the accurate information, and the blockchain would have a record of all the suggested changes. However, if a third party needed to query the database before the rollback happened, the Neo4j database could return data that wasn’t confirmed via consensus — i.e., information that wasn’t truly verified.
One way to fix this issue is to use an ‘Event Handler’ feature in the Neo4j database. We could use this feature to rollback every transaction it received (i.e, every modification suggestion), then report all available changes back to the blockchain for consensus agreement. If the blockchain network agrees to modify the information, the Neo4j database records the final state of the information.
This solution cannot, however, adequately execute two transactions at once. If the first transaction is agreed upon, it would change the state of the Neo4j database before the second transaction is considered. However, we cannot record the first transaction on the blockchain OR apply the change in the Neo4j database, because the entire block proposal containing both transactions could fail the consensus test. Therefore, we would have to use the apply-and-rollback feature to implement the changes from the first transaction and keep the changes in memory; then, we would have to apply-and-rollback both transactions together. If the grouped transactions fail the consensus test, the database and the blockchain still have memory of the first transaction only, allowing that change to be made. This problem clearly indicates quadratic complexity. In addition, the “Event Handler” only provides the final summary of modifications. If transaction ncreates a new value and transaction n+1deletes it, the transactions cancel each other out and are not visible in the final version of history.
Instead of agreeing on the difference between modifications in the Neo4j database, an alternative approach would be to agree on the modification operations only. All block proposals would define the order of the queries as they are meant to be applied to the Neo4j database. Then, the resulting database modifications would need to be stored on the blockchain. This is what we call a two-step approach.
Once the order of the proposed queries is set, each peer instructs their local copy of the Neo4j database to execute them and store the resulting modifications. Simultaneously, each peer issues a special transaction to the network. This transaction requires the other peers to retrieve the accounted modifications. It’s expected that the modifications across all the local copies of the Neo4j database would match; thus the consensus would be reached. This assumption relies on the determinism of the system — if one peer applies the medication operations in the same order as another peer, the resulting state should be the same.
The consensus of the modifications ensures that every validator node is in the same state after applying the database modification operations. However, if one node fails to execute a transaction and is out of sync, the approach does not provide a recovery mechanism. One solution for this case would be to recreate the local database by executing every transaction that ever occured on the blockchain. For efficiency, nodes could backup the local database between block updates.
One implication of using the two-step approach is that in a smart contract on the blockchain side we cannot revert the modified data, only learn about its content. On the bright side, we can augment such functionality by either accepting or reverting the modifications directly on the Neo4j side before being used in the smart contract.
Due to the computation complexity and the considerable durability of the consensus process that could result in time-outs in the case of the apply-and-rollback approach, we selected the two-step approach for our case study.
Benefits of Our Proof of Concept
By combining an Exonum blockchain, a Neo4j database and a two-step modification approach, we were able to create a system with the following benefits:
1. We enabled the ability to audit the complete history of changes to the graph database. Both changes for databases on individual nodes and the full history of transactions (failed and successful) are stored in the blockchain.
- Read requests are made directly to Neo4j so there is no loss in performance (as compared to databases without blockchain integration).
- We still facilitate access to Neo4j graph visualization tools like Neo4j Browser.
- Every change in the graph database has an explicitly associated intention– a Cypher query submitted to the network.
- Adding business constraints to the solution can be done using one of the most popular programming languages — Java.
- There is potential for full recreation or recoverability of a database state — complete loss of information is not an eventuality you must prepare for.
To prove that our proof of concept is usable, even in a minimal production environment, we have created a demo application. It is based on the applications previously created by Neo4j to demonstrate its capabilities by representing relationships between movies and actors.
Our demo enables you to insert Cypher queries into the underlying graph database through the Exonum blockchain service. This makes it possible to inspect previously inserted transactions in order to check their state. It is also possible to see the committed blockchain blocks and inspect which transactions they include. Additionally, the immutable history of the nodes can be checked in the graph database. There is also a graph visualization interface which displays nodes and relationships that were included in the example application.
For interoperability and usability reasons, we created a Docker image that contains all required dependencies for a fully functioning system node. It also contains the demo application and OS specific startup scripts that allow for an easy automatic setup of a functioning multi-node system in a single host machine that has Docker installed.
You can checkout the demo application in GitHub.
Other Options: Interaction with External Applications
In our design, we have a peer-to-peer network, with every participant storing a full copy of the blockchain as well as keeping a copy of the Neo4j database. The peers in this network are the only ones who can make modifications to the data. However, you can also design your system to facilitate interaction with external applications (from the client-side, for example).
An external application can submit read queries directly to the Neo4j database but cannot make modifications. To apply modifications, the application would need to create a blockchain transaction that includes the suggested graph database modification operations (a list of cypher queries), sign the transaction, and then send it to the network via REST API exposed by Exonum. The validator nodes then process the transaction, communicate the queries to the database extension via an gRPC channel and, finally, modifications are applied in the corresponding local versions of the Neo4j graph database. To verify that a transaction is processed (or to retrieve an error/rejection of the proposed modification), the application can pull the transaction metadata from the blockchain.
The application could also query the Exonum service to retrieve the history of a node. For that, the application only needs the ID of the node.