Storing documents in blockchain

Saurabh Gupta
Blockchain Musings
Published in
3 min readJun 2, 2017


Bitcoin technology was the pioneer of blockchain. So lets first discuss what, if at all, can be stored in the bitcoin.

The data directory is the location where Bitcoin’s data files are stored, including the wallet data file. There is a 1 MB size limit per Block that can be stored in the Bitcoin’s blockchain.

It was clear that the blockchain concept and huge computational power attracted by Bitcoin will be used for many other purposes. The source code is publicly available, so many non-currency blockchain projects emerged, like Bitmessage or Namecoin.

In the initial days of Bitcoin, people tried to use it not only for currency transactions, but also as a messaging system and even as a file transfer service, as it allows user to add some information (payload) along with any transaction.

Sometimes, when you hear about files being stored on Bitcoin’s blockchain, what people really mean is that the hashes of the files are being stored, when making a bitcoin transaction. This is like typing into the free text field you often have when making a bank payment. A file can be any size. The hash of the file is a fingerprint of the file, usually of a fixed length, and is created by putting the file through a mathematical algorithm.

In Ethereum, there is theoretically no limit for the block size. However, blockchain is inherently not meant for data storage because storing large documents will be computationally very expensive. However, there are some instances where people hacked into Bitcoin’s Blockchain and stored some unexpected data. You will have to compress and store the doc/PDF/audio in Hexadecimal format.

That said, many blockchain-like solutions have been designed recently just to store data. Storj is one of the best examples. Storj is a protocol, cryptocurrency, and suite of decentralized applications that allows users to store data in a secure and decentralized manner. It uses a transaction ledger, public/private key encryption, and cryptographic hash functions for security. Storj nodes, sell resources to store and transfer information and earn Storjcoin X in exchange for their services. You can run the software and earn some extra cash for leasing your hard drive and bandwidth. Filecoin is another such solution which is yet to materialize. Enigma is yet another initiative.

In 2014, the IPFS (InterPlanetary File System) protocol took advantage of the Bitcoin blockchain protocol and network infrastructure in order to store unalterable data, remove duplicated files across the network, and obtain address information for accessing storage nodes to search for files in the network. IPFS is general purpose, and has little storage limitations. It can serve files that are large or small. It automatically breaks up larger files into smaller chunks, allowing IPFS nodes to download (or stream) files from not just one server like with HTTP, but hundreds of them simultaneously. The IPFS network becomes a finely-grained, distributed, easily federated Content Delivery Network (CDN). This is useful for pretty much everything involving data-> images, video streaming, distributed databases, entire operating systems and most importantly , static web sites.

Not so long ago Ethereum, which is a generalized blockchain platform, was released. Unlike Namecoin, it does not imply cryptocurrency functionality by default, but you can run as many cryptocurrencies as you wish on it. It’s capabilities are not limited to cryptocurrencies, as it allows you to create any type of self-executable contracts.

Traditional fiat currency banks and stock exchanges are very interested in adoption of blockchain for inter-bank transactions and many other purposes. The major reason is that there is no concern of someone gaining access to the information because its broken into smaller chunks of information and encrypted prior to being distributed.

The major challenge lies in the fact that blockchain was inherently not meant for data storage because storing large documents becomes computationally very expensive. On the other hand, The IPFS protocol and its implementations are still in heavy development. This means that there may be problems in the protocols, or there may be mistakes in the implementations.