INTfs — Protocol for encrypted, private and anonymous distributed file sharing in the INT Network

Grayblock
17 min readJul 9, 2019

--

If you would like to read this in white paper format, click here.

Abstract

Data transmission is one of the core functions of IoT. Decisions being made within the network will rely on data to do so. Existing protocols for the storing and retrieval of data in a distributed network are general in their approach and do not provide in protocol privacy controls. IPFS is the leader in this space but is focusing on decentralized Internet. Using the file hash as the requesting key, building a picture of who-requests-what can lead to a breakdown of anonymity. INTfs is a specialized approach to decentralized storage, building a bridge between INT and IPFS, that incorporates a protocol for non-interactive asymmetric encryption, blinded data requests, and Proof-of-Storage for node compensation with the INT Chain as the transactional foundation.

Introduction

Overview and Goals

IPFS provides a solid foundation for distributed data hosting and transfer. It can be used as a personal cloud, file transfer protocol or as the backend of HTTP for a decentralized internet. It combines the BitTorrent distribution of files with Merkle DAG data structures providing a trustless and robust base for versioned file systems or blockchains. It allows you to upload any file, regardless of format, and share or access it by its hash. Because IPFS does not require the ability to read the file, the user is free to upload encrypted files but offers no framework or key indexing system to do so.

Other projects have sought out to use IPFS as its underlying protocol to produce a distributed cloud service, but again these networks rely on the users own technical ability to obtain the encryption keys and encrypt the file themselves before uploading. In order to incentivize the participation of storage nodes in these networks, they introduce their own proprietary token for fees and rewards, requiring users to obtain this token in order to use the service.

INTfs seeks to solve these issues by providing a framework for indexing encryption keys, encrypting files and messages, and storing data on top of a live asset transaction-based network, INT Chain. Utilizing the existing node network for distributed storage and the consensus mechanism for securing the file system, ensuring synchronicity and distributing fees, we can provide all the benefits of other distributed cloud networks, cheaply and without a secondary token. This creates a private and anonymous distributed file system with minimal trust within an existing network.

INT Chain as the Transactional Foundation

INT Chain is an IoT focused DPoS blockchain with a heterogeneous multichain framework enabling high throughput and ease of scaling with the addition of subchains. The transactional structure is account based like Ethereum with limited scripting ability. Because it is not structured on UTXO based transactions, it allows several transaction types for staking, voting and registering nodes. This creates an open structure for the addition of transaction types and other mechanisms for publishing information onto the blockchain. The DPoS consensus mechanism allows a 10 second block time, rotating through a pool of 13 “Thearchy” nodes acting as validators operating a single chain at 2,000 TPS in its current state. INT’s plan is to increase the number of validation nodes to support the addition of subchains, with each subchain’s logic structure defined to support the use case (e.g. additional transaction throughput, IoT device specific transactions, smart contracts, confidential transactions, etc.).

We assume the current state of INT Chain to be the transactional foundation and that it can accept the transaction formats proposed within this paper as well as the additional subchain for storing the proofs associated with this protocol. The Thearchy nodes would assume the responsibility of the storing and serving of files.

Data Encryption

Encrypting data can either be done with symmetric keys or asymmetric keys. Asymmetric encryption schemes allow you to encrypt data with a public key, removing the need to communicate with the recipient beforehand as you need to do with symmetric encryption schemes. The problems with asymmetric encryption are the computation times required to encrypt/decrypt and the size of the keys required. Symmetric keys are much smaller but require sharing the key. What is more practical is to use a hybrid system of symmetric key encryption with asymmetric encryption for sharing of the key used to encrypt.

Fig. 1 Relative size between EC and RSA key sizes for like security [Mart10]

Key size is the next concern. Smaller keys enable easier data management, lower hardware requirements, reduce the bandwidth needed for transmission and reduced energy consumption in mobile devices. Elliptic-curve cryptography lends significantly shorter keys for the same security level over other cryptosystems like RSA. For a 128-bit security level, which is strong enough for this application, the RSA key length is 3072 bits, while an ECC key is 256 bits.

Elliptic Curve Integrated Encryption Scheme (ECIES)

ECIES is a hybrid encryption system proposed by Victor Shoup in 2001 [Shoup01]. It combines a Key Encapsulation Mechanism (KEM) with a Data Encapsulation Mechanism (DEM) to provide the speed of symmetric encryption and the security of asymmetric encryption with the efficiency of elliptic curve key generation.

It uses the following functions:

  • Key Agreement (KA): Function used by two parties to create a shared secret.
  • Key Derivation Function (KDF): Function used to create asymmetric key pairs.
  • Hash: Digest function.
  • Symmetric Encryption (ENC): Encryption using a single key to both encrypt and decrypt
  • Message Authentication Code (MAC): Information used to authenticate a message.

Being that public-private key pairs will be generated ahead of time and published, this process can be done wholly by the sending party without any communication between them.

Using this protocol, to encryptand communicate a message, m:

Fig. 2 ECIES encryption process [Mart10]
  1. We first generate a new public-private key pair called the ephemeralkey, represented as private part uand public key U = uG.
  2. Using the Key Agreement protocol, we create a shared secret by taking the ephemeral private key and multiply it by the recipient’s public key, V.
  3. Take the shared secret key, uV, as input data for the KDF, with the output being a concatenation of the MAC key, k_mac, and the encryption key, k_ENC.
  4. Encrypt message m, using the symmetric encryption algorithm ENC with key, k_ENC. The result is ciphertext, c.
  5. Use the MAC function with ciphertext, c, and MAC key, k_mac, to produce atag.
  6. Concatenate the ephemeral public key, the tag, and ciphertext, (U||tag||c) and send this cryptogram to the recipient.

For the recipient to decryptmessage m:

Fig. 3 ECIES decryption process [Mart10]
  1. Separate the cryptogram into its constituent parts, U, tag, and c.
  2. Use the retrieved ephemeral public key, U, and the recipients private key, v, to retrieve the shared secret, u: vU=(v·u)G=uV
  3. Produce the encryption and MAC keys, k_ENC’and k_mac’, using the KDF in the same process as step 3 above.
  4. Use the MAC function with ciphertext from the cryptogram, c, and MAC key, k_mac’, to produce tag’. If tag’does not equal tag, produce failure as MAC verification failed. If equal, continue.
  5. Decrypt ciphertext c using the symmetric encryption algorithm ENC with key k_ENC. The result is the intended message m.

Standards

It is found that the following standards produce strong encryption with the fastest computation speeds[Mart15].

For Elliptic curve operations for the ephemeral key generation in the Key Agreement, the elliptic curve x over Binary Extension finite field 𝐺𝐹(2 𝑚) will be used [CPPbench]. This is beneficial for hardware implementations as its operations like addition and multiplication can be done by AND and XOR logic gates.

Key Agreement (KA): Diffie-Hellman (DH) provides benefits over others as it only requires only one scalar multiplication.

Key Derivation Function (KDF): KDF2 is widely used and accepted as standard.

HashAlgorithms: in order to maximize security and minimize memory and bandwidth usage, we will us SHA-256.

Symmetric Encryption Function (ENC): to maximize security and efficiency, we use AES-128 CTR which requires no padding and is, therefore, more efficient. Lack of padding also reduces the decryption load.

Message Authentication Code (MAC): in order to maximize security and minimize memory and bandwidth usage, we will use HMAC-SHA-256.

The INTfs protocol will use ECIES to digest hashes and encrypt/decrypt data and messages.

INTfs — Transaction Protocol

INTfs will operate as an abstraction layer tying together INT Chain and IPFS. The purpose of the protocol is not to change how IPFS works but to provide a framework that interacts with it that provides payment mechanisms for storage and retrieval of data, in a way that enables ease of data encryption with full privacy and anonymity for the receiving of data between parties. It allows you to encrypt your data without needing to know the intended receiver, pay for the retrieval of the data without having to reveal what it is you are requesting, download the file without revealing what it is you are downloading, in a way that leaks no information to onlookers on files downloaded or who requested the file.

The basic structure is as follows:

Registering Encryption Keys

Fig. 4 Register encryption key process flow
  • Two users create INT Chain public keys, download the keystore and transfer INT to them.
  • The users then create INTfs encryption keys, download the keystore and create a register transaction to publish the public part of the key to the blockchain, in effect, binding it to their INT address.

Uploading and Sending a File

Fig. 5 Encrypt, upload and send file process flow
  • User A wishes to send a file to User B. User A inputs User B’s INT address and asks User A to select the file. It then asks for how long they want the INT Network to store the file and any note they wish to include in the ciphertext.
  • The wallet queries the network for the encryption public key for User B, encrypts the selected file using the ECIES protocol specified above with User B’s encryption public key. It then stores the encrypted file locally, named by its IPFS checksum hash.
  • The wallet then encrypts the message musing ECIES with User B’s encryption public key, containing the file hash, file size, and note. It then takes the size of the file as well as how long User A wishes to store the file and determines the fee. It then prepares the upload transaction with User B’s address, cryptogram (containing the file hash), discard height, fee, and signature.
  • Once the network validates this transaction, the wallet initiates the uploading of the encrypted file into IPFS.

Requesting a File

Fig. 6 Requesting a file process flow
  • User B receives a transaction containing a cryptogram c, ECIES then decrypts the cryptogram with User B’s private key to reveal the file hash, size, and note.
  • User B wishes to then download the file. They create a request transaction with the file hash and how many times they wish to download it.
  • User B’s wallet then generates a random nonce nand hashes (n||file hash). Hash(n||file hash) is the request key. It then stores the relationship between n, file hash, and request key, locally in the file library.
  • User B’s wallet then takes its public encryption key V= vG, then creates a shared secret Swith the Thearchy public encryption key (which is a 1-of-n threshold keyas laid out in [Erta07]), T = dG. e.g. S = v•T = v•(dG) = dQ
  • User B’s wallet then uses the shared secret S to encrypt message M, containing nonce n,the number of times to be downloaded x, and requested file hash using the ECIES protocol to be decrypted by the validation nodes.
  • User B’s wallet then uses file size and xto determine the download fee.
  • User B’s wallet prepares the request transaction with the cryptogram (containing n, x, and requested file hash), fee, and signature.
  • Once received by the validation nodes, they decrypt c with validation node private key to retrieve file hash, nonce and how many times the nonce is valid for. This information is stored on a shared database linking nonce to file hash.
  • Once this transaction has been validated, the request key is now valid to be used for download.

Downloading a file

Fig. 7 Downloading a file process flow
  • To execute the download, the user selects file from the file library. The wallet then sends the request key to the nodes.
  • The nodes query the blockchain to see if the ReqKeyhas been “spent”. If not, the node publishes a “spend” transaction containing just the ReqKey.
  • The node then queries the hash key database to return file hash and sends it to IPFS to initiate the download.

Fee Calculation

I haven’t been able to come up with something that satisfies all requirements. It would need to take into account file size and storage duration for file upload, as well as file size and the number of downloads for retrieval. It should be cheap enough to encourage use as this is an add-on to the network, but not too cheap as to just be an additional cost with no real reward for the nodes. Being that there are a limited number of storage nodes in the validation pool, an optimal total network storage size should be determined that would also push Thearchy node minimum requirements. The fees should in the least offset that cost.

The cost of storage for VPS is about $0.10 per GB per Month. So fees should amount to at least $1.30 per GB per Month ($0.043 per GB per Day) with the current Thearchy pool being 13 nodes.

Further work would need to be done to define minimum storage times but could be divisible by 10 seconds (by main chain block). Maximum can be indefinite storage but fee logic would need to be defined.

INTfs — Consensus and Reward

Proof-of-Storage (PoSt)

Since block generation on the main chain is theoretically much faster than any changes to the storage database, it doesn’t make much sense to check for network-wide consensus on the storage state or to payout file storage fees every main chain block. What makes more sense is defining a state check epoch on the order of minutes and storing that consensus check as a block on a secondary chain that can be used for fee payout authorization and independent verification.

The proposed proof-of-storage is a two-step process with the first step being a pruning of the IPFS database and the second step being a random sampling of Merkle tree root hashes combined to create a unique hash in a Proof-of-Work style calculation that will serve as a proof of storage.

The first part of the state machine will be dedicated to storage turnover. At the beginning of each block creation process, the nodes will check the file storage duration database and if any of the files show the current block (or less than) as the discard height, they will delete the links for that file.

The second part of the state machine will be dedicated to proving each node is storing the files over time. As this is an add-on protocol to a larger transactional network, the consensus proof algorithm doesn’t need to be fully complete for the sake of security. The nodes need to be tested randomly in such a way that it would be significantly difficult to fake a valid outcome. The risk associated with not having a more complete storage check would have no impact on asset transfers on the main chain, only on the distribution of storage rewards. So the difficulty needs to be such that it would cost more to fake than the rewards are worth.

The process is as follows, in this example, we assume the optimal PoSt block time is 10 minutes or 60 blocks on the main chain:

Fig. 8 INTfs Proof-of-Storage logic flow
  • Once the trigger block is created on the main chain, all nodes check for files that need to be discarded at (discardHeight ≤ currentHeight).
  • Once that process is complete, deterministically derive a set of challenge vertices from the main chain block hash. Query the local IPFS DAG to find the corresponding roots for all challenge points. Return roots.
  • Begin PoW puzzle with the four roots and a nonce. Difficulty should be such that the average node response time with a solution ~1 minute.
  • If the node is the current block producer, send a complete solution as a proposal to all nodes to validate work.
  • If the node is not the current block producer, send a complete solution to the block producing node to validate work.
  • Once all nodes validate the block producer’s work, sign message and send to the block producer.
  • Block producer validates all proposals, checking the work of all nodes independently by hashing the given nonce with the calculated challenge points. Note valid or invalid state.
  • Once all signatures have been received, create a block with proposer’s proof, challenge roots, and all node proofs with their signature. The result is a block of proofs with the given challenge points, linking to the block hash that created them, that can be independently checked by anyone, giving a list of all nodes that should receive storage reward.

This process will follow the existing DPoS round table iteration through validators.

By not including the independently derived roots in the proposal to other nodes, you are ensured (within the bounds of this test) to identify those that have an incomplete DAG. When you validate their given hash with your points, the hashes will not match.

By requiring a PoW style hash puzzle with non-negligible difficulty and requiring a node solution response within a specified range, it makes it more difficult for nodes to collude with one another to produce false proofs.

Reward Mechanism

There may be a better solution to this but this is the best I could come up with. The fees sent in the transaction on the main chain would go into a smart contract that adds the fees and durations to define a reward calendar by block. It takes the duration, divided by the fee to determine how that fee is split into by block reward over a given period. This information is stored in a separate database.

On the creation of a PoSt block, the main chain would execute a coinbase transaction for all the addresses which provided a valid storage state proof, with the amounts given determined by the (storage reward by block) divided by (the total number of nodes to be rewarded).

On validation of the coinbase transactions, the smart contract containing the storage fee pool would burn an equal amount of INT, thereby maintaining a net zero inflation.

Implications

Creating the transactional structure as described allows us to break the information links between those doing the uploading, requesting and downloading. By encrypting the file, no one but the intended recipient can decrypt it. By sending the file hash encrypted to the recipient, no one can see what file is being sent. By encrypting the request with a nonce, no one can see what file you are requesting. By initiating the download separately by a request key, the download cannot be tied to a request transaction and is separate from a payment source.

To further break connections and strengthen privacy, the client wallet can issue requests from a different address than the ones files were sent to and initiate downloads from any software, through any gateway or VPN.

Specifying the number of times the nonce is good for, you can pay for a download en mass and distribute a link with the ReqKey. The storage fee is calculated based on the size and user decided storage duration will discourage large files and stagnant databases.

It doesn’t make sense to require you to have to be able to send transactions from the device you want to download to. By paying for the download and creating your request key separate from actually downloading it, it gives us the ability to use those keys from a separate device.

Incorporating INTfs on top of the INT network, we can tie the state machine into the consensus mechanism of the network and not have to create a standalone fault tolerant mechanism with a dedicated token. This reduces the overall cost of integration, making the service competitive in cost and simpler it implement.

As this storage and consensus protocol is separate from the main chain consensus, and throughput requirements are much less, participating nodes can include a second set of nodes, maybe a second registration for this work. this would widen the reach of the nodes geographically, increase storage and bandwidth for the whole network.

INTfs will not require the user to encrypt the files stored in the network and may even host whole directories of data.

The ECIES system can be more generally used to create asymmetric encryption for lightweight IoT data applications including MANETs.

Open Issues

Fee Calculation

Further work has to be done on determining the logic and proper pricing to achieve desired participation from both nodes and users. This is work for someone smarter than I.

Validation Node Threshold Decryption

Threshold decryption was glossed over in the Retrieval Transaction section of this paper. In order to encrypt the message in such a way that any one of the validation pool group can decrypt requires a 1-of-n threshold key.

This key requires all key-holders to share secrets in order to create an aggregate public key. The validation pool is, however, not static so there would need to be a system by which the group threshold key is recalculated and distributed upon entry of a new validator.

Attacks

In theory, it would be possible to force data into IPFS without paying the fee for storage by circumventing the upload transaction completely or by passing the software a valid transaction as proof of payment that is not for that file being uploaded. Safeguards need to be put in place to scrub the storage database for files that don’t have discard dates on them. Users will not be able to retrieve these files without payment though as hashes will not be known.

In order to fake a proof for PoSt, all one would need to do is collude with another node to receive the challenge roots in time to find a solution to the PoW within the allowed timeframe. I am sure with more thought and being clever around PoW difficulty, this can be solved or at least, significantly less probable.

IP Anonymity

In order to make this protocol more anonymous, gateways and oblivious routing could be implemented for file downloads that further separate the downloader from the memory of the network.

As transactions are passed through the network, information on who or where transactions are coming from can be leaked by where those transactions are first seen in the network nodes. This can lead to determining who-owns-what-address, simply by monitoring transactions from that address and what nodes pick them up first. Dandelion++ is a protocol that provides formally guaranteed anonymity in peer-2-peer gossiping. Implementing Dandelion++ would complete this protocol, making it a completely information-leak free, file retrieval protocol.

On Trust

The objective is to break up the transaction graph as much as possible. Being that in order for the network to provide you with a file you want, you need some way of communicating what that file is and where you want it sent. That seems to be the smallest common denominator you can break it down to. In this framework, we limit the attack vectors by encrypting every step and breaking up transaction linkage but the Thearchy nodes still hold onto some information. They know what address posted what file, what address paid to receive it and some limited information on who claimed it.

We can minimize this further by regenerating the threshold key in a defined period. This will be needed as new validators enter the pool but if it is done regularly, it would make previous epochs of encrypted request transactions no longer decryptable and therefore unable to link to a specific request key. All that would be stored is a nonce (random number) to file hash relationship.

Notes and References

ECIES general process flow was based on the work of V. Gayoso Martínez, L. Hernández Encinas, and C. Sánchez Ávila and their work presented in “A Survey of the Elliptic Curve Integrated Encryption Scheme” [Mart10]

--

--