Blockchain Data Storage Dilemma: On-chain vs Off-chain

Cedalio
6 min readFeb 7, 2023

--

As the blockchain ecosystem continues to mature, the question of how to effectively store and manage data on the blockchain has become increasingly important. One of the key considerations in this debate is the choice between on-chain and off-chain storage strategies. In this post, we will explore the trade-offs involved. We will also discuss the implications of these choices for scalability, security, and overall system design.

On-Chain Storage

Refers to storing data directly on the blockchain. This means that the data is recorded in the blockchain’s ledger and is accessible to all participants on the network. This approach has several advantages, including transparency and immutability. Since the data is recorded on the blockchain, it is publicly accessible and cannot be tampered with. Additionally, on-chain storage allows for decentralized access to the data, which can be beneficial for certain types of applications.

Off-chain storage

Refers to storing data outside of the blockchain. This can be done through various methods such as centralized databases, InterPlanetary File System (IPFS), or other decentralized storage solutions. Off-chain storage is typically used when the data being stored is too large or too complex to be stored directly on the blockchain. This approach has several advantages, including scalability and cost-efficiency.

What to take into account when choosing your storage strategy, a chart of them splited between on-chain or off-chain topology.

Key strategies

When choosing what kind of data should live on-chain and what data should live off-chain in a dApp, there are a few key strategies to consider:

Criticality of data: Data that is critical to the functioning of the dApp, such as financial transactions or user identity information, should be stored on-chain to ensure immutability and decentralization.

Volume of data: Large amounts of data (or blob data) can bloat the blockchain, making it slow and expensive to access. This type of data should be stored off-chain and linked to on-chain using a decentralized storage solution such as IPFS. If you want better traceability, you could store the following information on-chain:

  1. Hash of the data: By storing the hash of the data on the blockchain, you can ensure that the data stored on IPFS has not been tampered with, and that it corresponds to the version that was originally stored.
  2. Metadata: By storing metadata, such as the date the data was stored, the author, and other relevant information, you can track the history of the data, allowing for better traceability.
  3. Access control information: By storing access control information, such as a list of authorized users, on the blockchain, you can ensure that only authorized users have access to the data stored on IPFS with an access control mechanism like the one we developed with Cedalio.
  4. Pointers to off-chain data: By storing pointers to off-chain data, such as IPFS addresses, on the blockchain, you can create a link between the on-chain data and the off-chain data, allowing for better traceability. It’s important to note that if the owner’s off-chain storage is an s3 bucket, the owner could change the content without changing the pointer/link. To make sure that does not happen, you can use a hash of the content and store that on-chain or use IPFS which is already a content addressable solution.
  5. Digital signature: By storing digital signatures on the blockchain you can ensure that the data stored on an external off-chain storage solution has not been tampered and that it corresponds to the version that was originally signed.

Data privacy: on-chain storage is generally considered less private than off-chain storage. This can be a concern for sensitive information unless it is encrypted and well managed with the correct implementation of ACLs so the access can be restricted to specific parties or groups. This allows for more control over who can view and access the data. Additionally, the storage can be implemented with privacy-enhancing technologies such as zero-knowledge proofs, which allows data to be shared without revealing sensitive information. However, It’s worth noting that the level of privacy that can be achieved with on-chain storage depends on the specific implementation, and it’s still vulnerable to future innovations and attacks. Therefore, it’s important to carefully consider the specific needs of your application and the level of privacy required when choosing between on-chain and off-chain storage strategies.

Data Accessibility: Data that needs to be accessed too frequently should be stored off-chain to ensure fast and efficient retrieval or have a caching mechanism to ensure the efficiency. On the other hand high frequency writes dApps can incur in high gas fees and confirmation block latency. That is why you need some balance in your on-chain and off-chain data store strategy At Cedalio we are working on providing a much better solution to this problem that many developers face today while working with on-chain data. Keep posted for future announcements on this topic.

Data governance: Data that is subject to regulatory compliance, such as health records or financial transactions, should be stored on-chain to ensure transparency and accountability. One of the advantages of smart contracts is that they can be programmed to automatically enforce regulatory compliance. For example, a smart contract could be designed to automatically check the identity of a party before allowing a transaction to take place. This can be used to ensure compliance with anti-money laundering (AML) regulations. Similarly, a smart contract could be programmed to automatically check that a trade or transaction is in compliance with know-your-customer (KYC) regulations. Another potential use of smart contracts for regulatory compliance is in the area of supply chain management. A smart contract could be used to track and verify the authenticity of products as they move through the supply chain, ensuring compliance with regulations such as the Food Safety Modernization Act (FSMA) or the Foreign Supplier Verification Program (FSVP).

User owned data: We believe that one of the most important uses of on chain data, when implemented right, is the user owned data. User-owned data on-chain refers to the idea that users have control and ownership over their personal data, which is stored on a blockchain. This allows users to have full control over who has access to their data and how it is used.

One of the key benefits of user-owned data on-chain is that it allows for increased data privacy and security. Because data is stored on a decentralized and immutable ledger, it is much more difficult for unauthorized parties to access or tamper with the data. Additionally, because users have control over their data, they can choose to share it only with those they trust, rather than having to rely on centralized organizations to protect their data.

Another benefit of user-owned data on-chain is that it can enable new business models based on data sharing, such as data marketplaces. In these marketplaces, users can monetize their data by choosing to sell it to organizations that need it for research or analysis.

One example of user-owned data on-chain is the self sovereign Identity Wallet, it allows users to manage and control their personal data, including identity documents, personal information, and financial data, and share it selectively with service providers for KYC and onboarding purposes, in a secure and decentralized way.

Conclusion

The choice between on-chain and off-chain storage strategies is a complex one that depends on the specific needs of your application. On-chain storage is best suited for applications that require transparency, immutability, and decentralized access to the data. Off-chain storage is best suited for applications that require scalability, cost-efficiency, and high performance. Understanding the trade-offs involved in each approach will help you make informed decisions about data storage in your blockchain-based application.

It’s worth mentioning that having data on-chain also comes with some challenges, such as the scalability of the blockchain and the cost of storage, as well as the need for technical expertise to set it up and maintain it. That’s why we are building a super easy to use platform for developers to make it simple to implement it on their applications. We should also pay close attention to how the L2s and ZK Rollups evolve to make the Blockchain more scalable and affordable. We think that’s going to be a key factor for the flourishing of the on chain data ecosystem.

Some resources to dig deep into this topic:

These resources can provide you with a deeper understanding of on-chain and off-chain data storage as well as user-owned data, and the challenges and opportunities of these concepts in blockchain technology.

--

--

Cedalio

A database that is verifiableand auditable by default