Forever Isn’t Free: The Cost of Storage on a Blockchain Database
Cloud storage services work as follows: You pay a monthly fee up front for a fixed amount of storage space. During the paid time, you can use any amount of storage space up to that limit. When your paid time expires, you have two choices: pay for another month or your files get deleted. Your cloud provider only keeps your files for as long as you keep paying.
Blockchain databases can’t work on this model. A blockchain database must store data indefinitely, so the recurring payment model doesn’t work. Data storage costs must be paid up front, and must cover not just that month but all the months and years to come.
IPDB has developed a sustainable model for the long term storage of data: a one time, up-front payment that covers the cost of indefinite data storage. The payment must be enough to cover the cost of storage and the IPDB Foundation’s operating expenses.
This blog post is a deep dive into the numbers that led to a single per-GB price point — the cost of storing data indefinitely in a blockchain database.
This kind of analysis has been lacking in the hype around blockchain technology. There are many problems that could be addressed with blockchain technology, but without an understanding of what a blockchain solution will cost, it is impossible to say whether economic efficiencies can be achieved. This post is a first step toward understanding which use cases could truly benefit from the application of blockchains.
Assumptions of the model:
Before we dive into the model, let’s outline some of our underlying assumptions:
Conservative predictions: As a general rule, we have tried to keep estimates and assumptions very conservative. We would rather have happy surprises than unhappy surprises if our numbers turn out to be off.
Replication: We want to have a replication factor of six, meaning at least six copies of each transaction on six distinct nodes. For extra comfort and security, there will be one additional backup of the entire network.
Transaction Volume: We have estimated the rate of adoption for IPDB. In the first few years we assume adoption to increase exponentially like most technological adoption.
This curve is modelled by:
where t is the number of years away from 2017 i.e. t=1 for 2018.
We chose to model transaction growth using an S-curve because the adoption of similar technologies followed that pattern. In our model, the denominator gives the curve its S-shape. We assume the IPDB will start with approximately 0.37 transactions/sec in 2018, so we include the 16 to shift the curve to start here. The model exhibits a conservative ramp up, with the 1.2 providing the compression of the curve’s growth. These numbers were chosen to model the rapid adoption of successful technologies in the 21st century and the usage we are predicting.
The number of transactions the network can handle can’t grow to infinity, so we provide a limit for the number of transactions per second. The cap is described by the numerator at 1 million transactions every second which is achieved by the model in about 15 years. We also consider limits at 500,000 transactions/sec and 5 million transactions/sec.
The maximum BSON document size in MongoDB is 16 MB. Since a block is a document and can contain up to 1000 transactions we assume the soft limit for a single transaction to be 16MB/1000 = 16kB. The size of a single transaction may be anything smaller than this however, so we consider 1.5kB, 7kB and 15kB in our calculations. Given usage of the IPDB Test Network so far, we expect transaction size to trend toward smaller sizes, likely in the 1.5kB to 7kB range.
Time Value of Money: Since we are planning to store data for as long as possible, the majority of the cost of storing that data will be spread out over years. The initial payment will leave the IPDB Foundation with a significant balance that will be invested conservatively. We assume a modest 3% return on that balance, compounded annually.
Inflation: For all our costs, we account for inflation which has historically been around 2%, compounded annually.
Forever: The IPDB plans to store data indefinitely but we only run our calculations to 50 years. We embrace long-term thinking, but even this timeframe is difficult to work with given the pace of technological change.
Let’s start with the money coming into the IPDB Foundation.
1.1 One Time Payment
Users will pay per gigabyte to write data to IPDB. In practice, this will be an up-front fee that allows a certain amount of storage, but for simplicity we will use a flat fee per gigabyte of storage used. This is calculated as the amount of data stored in GB, multiplied by the cost per GB in dollars. There will be no ongoing cost for storing data. The initial fee is for indefinite storage.
1.2 Balance from the Previous Year
The Balance is key to the sustainability of the IPDB financial model. The amount not spent each year can be invested and used in following years to cover the costs of indefinite storage.
where X is the per-GB cost in dollars.
2.1 Storage Costs
The cost of storing data has decreased exponentially as technology improves:
- c(t) is the cost to store 1 GB of data in any given year t.
- A is the cost of storing 1 GB for one year at 2017 prices. Even though we are using Microsoft Azure, we look to Amazon Elastic File System for pricing here. Amazon EFS is more expensive on a per-GB basis than traditional cloud pricing, but offers an ease of scaling that would be desirable if a similar product becomes available on Azure or when we roll out nodes on the Amazon platform. With EFS, storing six copies of 1 GB of data for a whole year costs $21.60 a year; A = $21.60.
- k controls the rate at which storage costs go down over time. The larger k is, the faster prices drop. Historical data from the past 35 years set k = 0.2502, but predictions for future storage costs suggest this rate of change will decline in the future. We adopt the lower value, and set k = 0.173.
That shows us what a GB of storage will cost in any given year. Now we can calculate our total costs for storage. Each year we have to pay for the new data received in that year and continue to store all data from previous years. So for each year the storage cost is:
2.2. Intercluster Communication Costs
We need to factor in the cost of sending and receiving data. Intracluster communication costs are the costs of transferring data from one node to another within the same cloud network, whereas intercluster costs are for outbound data transfers. During our initial rollout, all nodes will be hosted in the Microsoft Azure cloud for ease of deployment and support. Within Azure, all inbound data transfers are free. Once we are running in 2019 we aim to have approximately 2/3 of all nodes hosted outside the Azure network. As a reference we consider the Azure pricing model given below:
By 2025 we aim to have 50 nodes, with approximately 34 not on Azure. We predict 366,184 GB of new data for 2025, all of which must be sent to each of those 34 external nodes.
In 2025, the first 120,000 GB of data will cost $0.138 per GB so we’ll pay $16,560. Similarly we pay $64,800 for the next 480,000 GB of data, $156,000 for the next 1.2 million GB, $504,000 for the next 4.2 million GB and the $620,613 for the remaining data.
In total this works out to $1,361,973 in intercluster costs for 2025.
To make it easier we will define:
- N(t) as the number of nodes at time t.
- Azure_outbound_data_cost(Predicted_GB(t)) as a function that takes the total outbound data transfer (in GB) as an input and calculates the intercluster costs for a given year.
We must also consider that bandwidth costs have been decreasing rapidly since 1997. The literature shows a decrease of 27% annually. It seems safe to assume this trend will continue as we see with higher utilisation rates for existing networks and new fibre coming online. If costs continue to decline 27% each year, by 2025 we should have:
This decrease has a significant effect on intercluster costs over time. In general we have:
In reality, given the large volumes of data transfers predicted, IPDB will be in a position to negotiate wholesale data transfer rates. This model provides an upper limit on how high the price for outbound data could be. Once we also factor in inflation, costs will look like:
2.3. Fixed Costs
So far we’ve only considered the cost of storing and transferring data. What about operational costs like staff, facilities, marketing and outreach, legal and accounting, and other expenses necessary to support IPDB? Unlike physical storage costs, logistical costs (staff, rent, etc.) do not decline but rather increase over time. Staffing costs assume we will grow the team to keep up with the volume of work, and offer wage increases to at least match inflation. Other costs increase to match the needs of the organisation and to account for inflation. We’ve also assumed that some people will not like IPDB or the data stored on it, so we’ve budgeted for legal fees.
At the outset, operational costs are the majority of IPDB’s expenses. These costs become a much smaller percentage of IPDB’s overall expenses as usage increases. Over time, fixed costs per GB decrease significantly. For example, even with the new hires we have budgeted for, the ratio of data stored to staff member salary will increase by factors of over 100.
The ultimate goal is for IPDB to become self-sustaining. This will happen by 2023, according to this model and our assumptions. Until then, we will work to minimize operational costs. Many costs will be covered by BigchainDB. Further operational costs will be funded by grants and donations in this period. The total costs each year are given by:
where F(t) refers to fixed costs.
So what’s the final number?
Financial sustainability is the most important piece of the puzzle. If we set the price too low, even though we can cover the cost of storing data for a long time eventually the number of new transactions each year will wane and yearly revenue will fall below yearly costs. We need to set the price such that investment income on the balance, not new fees, can be used to cover ongoing costs.
That final number is a one time fee of $100 per GB. This allows us to store data indefinitely while covering the cost of operating the IPDB Foundation.
Charging $100 per GB will see us becoming revenue-positive by 2023 with a total shortfall of $3,248,796 that must be recovered through donations or grants. This per-GB price also allows us to break even in the same time scale if our transaction rate is halved and capped at 500,000 transactions per second.
As it scales up, the marginal cost of storing each additional GB falls significantly, allowing IPDB to focus resources on becoming fully decentralized, semi-autonomous internet infrastructure that can store vast amounts of data.
The $100 price point is a maximum because of our conservative estimates. If costs drop faster than expected, we could reduce that price over time. For example, decentralized file storage provided by services like IPFS may prove cheaper than existing cloud options or even self-managed storage, or technological breakthroughs could dramatically reduce costs. But for now, $100 is a safe estimate that provides certainty to people hoping to build on IPDB.
At our $100 price point if we assume a single transaction is of average size (7 kB) how much will it cost an IPDB user to validate and store it for 50 years in IPDB?
That is $0.0007 or 7/100 of a cent for a transaction.
As a comparison, how much would it cost to store the same amount of data for an indefinite period on the Bitcoin or Ethereum blockchains?
Even though it is possible to store data on the Bitcoin blockchain, the Bitcoin protocol was not designed with data storage in mind. However, as blockchain use cases expanded beyond finance, many companies started using the Bitcoin blockchain as a database all the same.
To store data on the Bitcoin blockchain we would enter the data in the OP_RETURN field of Bitcoin transactions. The OP_RETURN field allows a user to send a transaction that doesn’t actually send money to anyone, but allows a small amount of data to be written to the Bitcoin blockchain. Each OP_RETURN output has a maximum size of 80 bytes, and each transaction can have one OP_RETURN output.
To store the same 7KB transaction we have been working with would require 88 OP_RETURN messages. As long as each one is a valid transaction, with a dust fee of 546 satoshis or more, each message will be propagated through the network and mined into a block.
At the current BTC/USD ($2518) exchange rate the dust fee is
As of July 2017, the median Bitcoin transaction fee is about $1.82. So the cost to store 7KB would be:
1GB would need 12,500,000 OP_RETURN messages so would cost approximately $22,766,250. This figure is highly dependent on transaction fees, which have increased dramatically over the past year as Bitcoin has not found a scaling solution. In any event, this is a theoretical exercise and not a proposal to use the Bitcoin blockchain for large-scale data storage.
Transactions in Ethereum work completely differently than in Bitcoin, requiring “gas” to have data processed.
To execute any regular transaction with no embedded data on the Ethereum blockchain uses 21,000 gas. This is the minimum gas limit required.
If you want to include data in your transaction you can do so in one of two ways: by creating a contract, or by sending a message call. Sending a message call allows the user to interact with other accounts or smart contracts without having to create their own contract. It requires the least gas of the two methods, so we’ll send 7KB via a message call.
The gas cost is not just based on how big your data is but also how complex. The most basic data we could send in a 7KB message call would be comprised of only zeroed bytes. If the data included text this would mean the message would have non-zeroed bytes. According to the Ethereum Yellow Paper each zeroed 32 byte costs 4 gas and every non-zero 32 byte word of data requires 68 gas to send so assuming all the bytes are zero provides a minimum gas cost.
7000/32= 219 “32 bit words” so we would need an additional:
That doesn’t seem like much if that was all you wanted to do, but storing the sent data is an additional operation. Every 32 byte word costs 20,000 gas. This is one of the largest gas requirements for any EVM OPCODE, reflecting that this is not a simple operation but one that is being replicated and stored across thousands of nodes. To store all 7KB would be:
In general, the cost to store data on Ethereum works out to approximately 17,500 ETH/GB, or around $4,672,500 at today’s prices.
As noted above, we are not suggesting that large quantities of data — images, videos, audio, other datasets — should be written to Bitcoin or Ethereum. We understand this is not the point.
However, it is important to understand which use cases are economical and which are not. Many of the applications that have been proposed for blockchains — energy markets, music streaming services, IoT, and so on — will require storage of vast quantities of transactional information. This is exactly the kind of data that should be stored within the blockchain database.
In future posts we will explore the economic implications of this. What use cases can IPDB unlock that would be uneconomical on other blockchain databases?
Visuals by Wojciech Hupert (unless noted).