Blockchain as a Data Structure

Horizen Official
Aug 21 · 10 min read
Image for post
Image for post

The first use case for blockchain technology is digital money. To have a monetary system without central control, you must have a special and sophisticated way to handle all the data produced with each transfer. Imagine if every person could access and modify the databases kept by banks. It would be a disaster.

In order to make decentralized money a reality a method of accounting had to be developed — the UTXO model, also referred to as triple-entry accounting. You can compute every account balance at any time by storing all transactions in a digital ledger.

A digital ledger used for digital money requires a set of properties that were not achievable before blockchain came along. In this article, we will look at how the blockchain handles data and why blockchains special properties partly result from it.

Common Data Structures

Let’s develop an understanding of data structures before we look at blockchain itself. Here are some of the most common data structures:

Arrays

Arrays are one of the purest forms to store data. Arrays are useful when you know how many data elements you need to store and how large each data element will be. Your computer will calculate the required storage from those inputs and set it aside, preventing other programs from accessing this partition of your memory. The drawback to partitioning memory is that reserved memory may be too small for future expansion. In this case, the entire array must be moved to a different location.

Each element of an array has an index that starts at 0. You can instantly access and modify an element if you know where you stored it. If you don’t know an element’s location, you must do a sequential lookup. This means you check the elements one by one (starting at index 0) until you find it.

Arrays are useful for their simplicity and instant access property.

Programs that use a linked list to store data don’t have to know how many data elements you want to store beforehand, but the linked list does need to know what each element consists of. The data elements of a linked list are called nodes. Each node can contain several objects of different types.

For example, If you were to store information about cars in a linked list, you could define a node as the set of information about the brand, model, year produced, and license plate.

The first element of a linked list is called the head, and the last one is called the tail.

Each node also contains a pointer to the next node. The pointer tells your computer where the following node is located in memory. This allows you to expand a linked list easily because the data doesn’t have to be in a single, continuous location in memory.

When searching for a piece of data, your computer will check the head of the linked list first. If it’s not there, it will look at the pointer, go to the location in memory where the following node is stored, and continue following pointers until it finds the desired data. This method of finding data is called sequential lookup.

Using a linked list gives you more flexibility in terms of expanding the list later on by adding new nodes, but unlike arrays, it doesn’t give you instant access.

Hash Table

The last data structure we want to look at before moving on to the blockchain is the hash table. The data elements you are storing in a hash table are called keys. To store a key, it is first hashed using a hash function. All you need to know at this point is that a hash function uses an argument of variable length as input and produces an output of fixed length. In the example below, the output is a three-digit number.

The keys are mapped to buckets by their hash value, e.g., if “Alice” hashes to 152, it is stored in this bucket. The buckets can be stored in an array because the output space of the hash function is known. Each bucket can instantly be accessed through its index.

A hash table is useful when you need to store many related data elements, like in a customer database. Initially, you could create a customer ID by hashing the customer’s name. Now there is a dedicated location to store purchases, refunds, or contact information. Whenever you need to access the customer data, your computer would hash the name you are looking for to find the bucket efficiently and add, change, or delete data.

The hash functions used for hash tables are usually not collision-resistant. This means two keys might produce the same hash and would consequently be mapped to the same bucket.

A linked list within the hash table is used to store several keys within a single bucket. In the example below, bucket 152 stores a pointer to Alice’s data in the first node, which points to the second node containing Dave’s data.

If the hash table is well-dimensioned, the cost (or the number of instructions/computations) for each lookup is independent of the total number of elements stored in the table. Hash tables give you instant access without even knowing the location of every element in memory. The location is defined by the data itself, making it convenient for systems that have to store large amounts of data and repeatedly access them.

There are many different data structures; each of them comes with some trade-offs, and depending on the use case, one might choose one over the other. Sophisticated data structures often leverage several more simple concepts in combination to achieve the set of desired properties. We chose the three examples above to show how an array and a linked list can be used to build a hash table.

The blockchain is a rather sophisticated data structure, made up of many sub-structures. It gives us a set of properties that are paramount to building a decentralized ledger for digital money.

The Blockchain

Blockchain organizes data by splitting it into subsets, referred to as blocks. Blocks are similar to the nodes of a linked list. Each block contains several elements. The elements of a block are generally separated into the block header and its transactions. While the transactions in a block account for most of the data, the block header contains essential metadata about each block, such as a timestamp and block height.

The main difference between a blockchain and a linked list is that the references in a blockchain are cryptographically secured, and therefore tamper-evident. In contrast, the pointers in a linked list can be changed at any time without affecting the integrity of the data. The secured references establish order throughout the blocks and effectively make the blockchain an append-only data structure where new data can only be added with new blocks.

The hash value of the previous block header is included in the following block as a reference because the block hash depends on the data of a block, even changing a single character in one of the transactions would invalidate the reference.

The secured links are constantly checked for validity. If you were to insert a malicious block in the middle of a blockchain or change data in an existing block (For example: between Block 1 and 3 in the graphic below), you could include a reference to its predecessor (Block 1). Still, it would be infeasible to make block 3 reference your newly inserted block.

Each new block built on top of an existing block is called a confirmation. The older a block gets, the more confirmations it will have. Each confirmation makes tampering with the data in a block more difficult because you have to recreate additional valid references.. Block 2 in the graphic below has one confirmation. You would have to recreate a single valid reference to tamper with the data of Block 2. You also have to recreate a valid reference with each new confirmation. The older the block, the more confident you can be that no changes to the block will ever occur.

It is important to note that it is not the data structure that makes data on the blockchain immutable. The information alone is tamper-evident only. Changes are easy to detect. There is no immutability if there are no strong consensus rules in place and a sufficiently large number of nodes on the network. The incentives need to be structured so the majority of participants will follow the protocol and reject invalid blocks.

We will come back to this relationship between the data structure, the protocol and the consensus mechanism in later articles. The interworking of these parts is what makes the blockchain a powerful tool for building trustless digital money.

The Properties of Blockchains

Let’s take a look at the properties that a blockchain offers before taking a closer look at the data within a block. We will assume a decentralized setting without a central authority and a robust consensus mechanism for this article.

Current Shortcomings

It is appropriate to issue certain caveats first. The development of a blockchain is stricter and slower compared to traditional databases. A bug that corrupts the integrity of data makes the entire construction useless. Hence development must be done very carefully. In a centralized setting, a bug might be easy to fix, but in a distributed environment without a central authority, this becomes very difficult.

Second, incentive design is an integral part of building a blockchain. There is always a cost associated with adding data to a blockchain. This cost must be high enough to prevent large amounts of useless data being added, but at the same time, it needs to be low enough not to become prohibitive. Once deployed, fixing is not easily done for the same reason as above.

Maintaining a blockchain is also orders of magnitude more expensive than a traditional database. Data is not recorded once but thousands of times. Data is also verified by every full node on the network, thousands of times in parallel. Additionally, the transmission of data is inefficient by design, causing the cost of maintenance to rise. This redundancy in every step of using a blockchain makes it hard to scale. We will look at several concepts to make blockchain scale later on, such as sidechains and payment channels. Most of them are based on the idea of moving data off the blockchain rather than increasing the throughput of the chain.

Benefits of Using Blockchain

All of this overhead can only be justified through utility. Having global money with a predictable inflation schedule and trustless transactions without central control and single points of failure are arguably enough utility to use a blockchain for this purpose. For many other use cases, time will tell if blockchain poses a suitable solution. Just as with the immutability attribute, it’s important to note that the current shortcomings of public blockchains result from being run in a distributed fashion, rather than the data structure. While a high level of redundancy makes the data secure, it is inefficient by definition.

In turn, you can get some unique properties with a blockchain, that if needed for the specific use case, make it invaluable. The main factor distinguishing a blockchain from a normal database is that there are specific rules about how to add data to the database. This set of rules, or protocol, can achieve the following traits:

  • Consistency: Newly added data cannot conflict with data already in the database.
  • Tamper Evidence: Append only data structure that makes it immediately apparent if data has been changed. Coupled with a strong consensus mechanism that incentivizes rejection of invalid blocks this results in immutability.
  • Ownable: Data can be attributed to a sole owner. The data is publicly verifiable, but only the owner can make changes to it. In the context of cryptocurrencies, this means everybody can see the transactions, but only with the owner can spend a UTXO.
  • Distributed: The database is consistent without a central party acting as a gatekeeper. This results from the protocol incentivizing correct behavior. Consensus and fault-tolerance are the holy grail of distributed systems that Bitcoin achieved for the first time in history.

The key takeaway from this first section should be the following: You get the immutability of data only if there is a strong consensus mechanism in place that makes the network participants decline invalid blocks. Otherwise, a blockchain is only tamper-evident.

After looking at the properties that result from the design, let’s take a look at how it is constructed. First, we look at the blocks themselves. Next, we introduce a concept that allows us to create an efficient summary of all transactions — the Merkle tree. Lastly, we look at the transactions themselves that make up the majority of data in a block.

The Building Blocks of Blocks

Blocks consist of a header that contains essential data about the block — a sort of summary. The largest part of a block in terms of storage comprises the transactions.

The block header contains the most important information about a block.

  • The Version indicates which software version the miner of the block used and which set of block validation rules were followed.
  • The previous block headers hash hashPrevBlock serves two purposes. First, it establishes an order throughout the chain of blocks, and second, it ensures no preceding block can be changed without affecting the current and all subsequent blocks.
  • The Merkle Root Hash hashMerkleRoot represents a summary of all transactions included in the block.
  • The Time is the Unix epoch time when the miner started hashing the header for the mining process.
  • The Bits or nBits are an encoded version of the current difficulty of finding a new block.
  • The Nonce — or number used once — is the variable that miners change to modify the block headers hash for its value to meet the difficulty. This process is covered in detail in our article on mining.

Merkle Trees play an important role in ensuring the integrity of data in the blockchain. They are also used in other systems such as IPFS — the InterPlanetary File System and several implementations of NoSQL databases. Let’s take a look at how they work and what they do before we continue with what a transaction looks like from a data perspective.

Read the full article and more on the Horizen Academy

Originally published at https://academy.horizen.io.

Horizen

Horizen is an inclusive ecosystem built on its massively…

Horizen Official

Written by

Horizen is an inclusive ecosystem built on its massively scalable blockchain platform where everyone is empowered and rewarded for their contributions.

Horizen

Horizen

Horizen is an inclusive ecosystem built on its massively scalable blockchain platform where everyone is empowered and rewarded for their contributions.

Horizen Official

Written by

Horizen is an inclusive ecosystem built on its massively scalable blockchain platform where everyone is empowered and rewarded for their contributions.

Horizen

Horizen

Horizen is an inclusive ecosystem built on its massively scalable blockchain platform where everyone is empowered and rewarded for their contributions.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store