Enabling an indexed, permissioned, IPFS network with Substrate

Tony Riemer
iridium
Published in
4 min readSep 14, 2021

Iris, a minor Greek goddess, submissive to the Olympians and messenger of the gods, was the sole immortal allowed to enter the home of Sytx, whose waters she uses to put to sleep all who perjure themselves.

Disclaimer: This article discusses the technical details of the initial iteration of this project and does not represent the full extent of the future functionality. There are many known limitations and issues with the approach taken here which will be addressed in an upcoming iteration (specifically, storage of sensitive data in an on-chain context)

It’s a little dramatic, but a potent metaphor. Here, we the users are the immortals (let’s leave Zeus out of this). Iris, moving between the heavens and the underworld, moves data between us, the immortals. Zeus and the oath-binding waters of Styx symbolize the governance and security that will be built into the system.

Using https://github.com/rs-ipfs/substrate as a basis, iris aims to act as a layer between the user and an IPFS node, allowing for additional logic and validations that do not exist in IPFS. In general, features that are readily available in a traditional file system or centralized storage application (google drive, dropbox, etc) are not readily available in IPFS. IPFS delivers decentralized storage, but it does not deliver trustless or incentivized decentralized storage. Specifically, there are four key categories that need to be addressed:

1) Indexability: IPFS is not a search engine and is not intended to act as one. There is not an easy way to identify or track data in IPFS without a CID.

2) Security and Privacy: When data is hosted in a decentralized network it is available to any node in the network. There are no permissions and no encryption.

3) Availability and Replication: Availability of any CID in the network is not guaranteed, and approaches to ensuring it is are not necessarily decentralized.

4) Governance: There are no native governance options and no simple way to purge content from all nodes in the network. IPFS provides a customizable blockList, but attempting to “revoke” data from other nodes is not fundamentally doable. If the IPFS network is public and truly decentralized, then this could allow for malicious content to exist in the network.

My substrate Fork: https://github.com/mystery-team/substrate/tree/offchain_ipfs_v3

Commands sent to IPFS must first go through the substrate layer and responses from IPFS must be validated and signed before being bubbled up to the caller. In short, it lets us encode calls to IPFS within the blockchain, to use it to apply validations, transactions fees, and any other logic (e.g. paying another node to pin a CID for you). Currently, my fork only approaches the first point made above by allowing a user to associate a filename with a CID. In it’s current state it is very unpolished. It only allows upload of at most 2MB at once. Additionally, a known issue is that sometimes the file upload fails, with the UI failing to notify the user (it is an issue with the substrate node, WIP). Another limitation is that you must run a validator node (proof of authority network). This is version 0.0.1.

First, you will need to run a node. When the node starts, it will start an embedded IPFS node as well, so don’t worry about starting an external IPFS daemon.

Note: Currently, this is technically a fork of the rust-ipfs/substrate fork. However, I have synced it with the master branch of substrate. I intend to update the repository to reflect this (that is, in the future it will be a fork of the paritytech/substrate repository).

Run the latest docker image:

docker pull driemworks/substratedocker run -p 9944:9944 \
-p 9933:9933 \
-p 30333:30333 \
-p 9615:9615 \
-it \
--rm \
--name node-template \
driemworks/substrate \
--dev \
--ws-external \
--rpc-external

Alternately, you could build the fork from sources (assuming you have rust installed) using:

git clone https://github.com/mystery-team/substrate.git
git checkout offchain_ipfs_v3
cargo +nightly build --release
./target/release/node-template \
--base-path /tmp/alice \
--chain local \
--alice \
--port 30333 \
--ws-port 9944 \
--rpc-port 9933 \
--rpc-cors all \
--node-key 0000000000000000000000000000000000000000000000000000000000000001 \
--validator \
--ws-external \
--rpc-external \
--rpc-methods=unsafe

Running multiple local nodes:

Run the “bob” node and use the “alice” node as a bootstrap node. If running multiple nodes on the same machine make sure to map the ports properly. A node’s address will be printed somewhere within the first 15 lines or so of the node logs.

Example: if the “alice” has address: /ip4/127.0.0.1/tcp/30333/p2p/12D3KooWEyoppNCUx8Yx66oV9fJnriXwCcXwDDUA2kj6vnc6iDEp, then run:

docker run -p 9945:9944 \
-p 9934:9933 \
-p 30334:30333 \
-p 9616:9615 \
-it \
--rm \
--name node-template \
driemworks/substrate \
--dev \
--ws-external \
--rpc-external
--bootnodes /ip4/127.0.0.1/tcp/30333/p2p/12D3KooWEyoppNCUx8Yx66oV9fJnriXwCcXwDDUA2kj6vnc6iDEp

The UI repo: https://github.com/mystery-team/ui

The UI is available at: https://gateway.pinata.cloud/ipfs/QmWynDkcPFaGq9wsFki4LxTqrpg66nMtfXojYmKQaPdvum/

The user interface lets you connect to your substrate node via the exposed web socket. You can then add data to IPFS, preserving local file names, and download data (again, preserving file names). Behind the scenes, substrate invokes an OCW that uses rust-ipfs to call ipfs. When the call is complete, the ocw submits a signed transaction, calling an on chain function which emits the results from the OCW. That is, the CID, file bytes, etc.

You can open multiple windows to connect to multiple different nodes at once. Real-time ui updates of file uploads across nodes should occur (e.g. alice adds a file, then bob’s ui should implicitly reflect that). There are no permissions and no explicit search functionality, this simply associates a filename with a CID. All data is stored within the runtime storage at some point, a major flaw that will be addressed in the next iteration.

Again, this is all very early work for my ultimate intention with this. My next step is to modify the substrate fork to enable ownership of a CID as well as implement dynamic conditions for accessing owned data (like requesting access, purchasing access, access management, etc). I’m basically developing in order of the (4) conditions outlined in the introduction, along with other substrate-only changes/features .

--

--