Setting up your first distributed private storage network on IPFS: Part 3

vasa
vasa
Apr 24, 2018 · 7 min read
Bait for IPFS lovers

Welcome back to the IPFS private network series. If you are wondering why I am welcoming you back, then you should take a look at these previous posts.

Assuming that you have gone through the above posts, which i strongly recommend before diving in further, lets get started.

Composite clusters

Since ipfs-cluster provides an IPFS Proxy (an endpoint that act likes an IPFS daemon), it is also possible to use an ipfs-cluster proxy endpoint as the ipfs_node_multiaddress for a different cluster.

This means that the top cluster will think that it is performing requests to an IPFS daemon, but it is instead using an ipfs-cluster peer which belongs to a sub-cluster.

This allows to scale ipfs-cluster deployments and provides a method for building ipfs-cluster topologies that may be better adapted to certain needs.

Note that this feature has not been extensively tested, but we aim to introduce improvements and fully support it in the mid-term.

Security

ipfs-cluster peers communicate with each other using libp2p-encrypted streams (secio), with the ipfs daemon using plain http, provide an HTTP API themselves (used by ipfs-cluster-ctl) and an IPFS Proxy. This means that there are four endpoints to be wary about when thinking of security:

  • cluster.listen_multiaddress, defaults to /ip4/0.0.0.0/tcp/9096 and is the listening address to communicate with other peers (via Remote RPC calls mostly). These endpoints are protected by the cluster.secret value specified in the configuration. Only peers holding the same secret can communicate between each other. If the secret is empty, then nothing prevents anyone from sending RPC commands to the cluster RPC endpoint and thus, controlling the cluster and the ipfs daemon (at least when it comes to pin/unpin/pin ls and swarm connect operations. ipfs-cluster administrators should therefore be careful keep this endpoint unaccessible to third-parties when no cluster.secret is set.
  • restapi.listen_multiaddress, defaults to /ip4/127.0.0.1/tcp/9094 and is the listening address for the HTTP API that is used by ipfs-cluster-ctl. The considerations for restapi.listen_multiaddress are the same as for cluster.listen_multiaddress, as access to this endpoint allows to control ipfs-cluster and the ipfs daemon to a extent. By default, this endpoint listens on locahost which means it can only be used by ipfs-cluster-ctl running in the same host. The REST API component provides HTTPS support for this endpoint, along with Basic Authentication. These can be used to protect an exposed API endpoint.
  • ipfshttp.proxy_listen_multiaddress defaults to /ip4/127.0.0.1/tcp/9095. As explained before, this endpoint offers control of ipfs-cluster pin/unpin operations and access to the underlying ipfs daemon. This endpoint should be treated with at least the same precautions as the ipfs HTTP API.
  • ipfshttp.node_multiaddress defaults to /ip4/127.0.0.1/tcp/5001 and contains the address of the ipfs daemon HTTP API. The recommendation is running IPFS on the same host as ipfs-cluster. This way it is not necessary to make ipfs API listen on other than localhost.

Upgrading

ipfs-cluster persists the shared state to disk. Therefore, any upgrade must make sure that the old format in disk is compatible in order to parse correctly. If not, a message will be printed and instructions on how to ugprade will be displayed. We offer here a few more details.

The state format has not changed

In this case, upgrading cluster requires stopping all cluster peers, updating the ipfs-cluster-service binary and restarting them.

When the version numbers change, peers running different versions will not be able to communicate as the libp2p protocol that they use is tagged with the version. If you are running untagged releases (like directly from master), then you should be able to run peers built from different commits as long as they share the same x.x.x version number. Version numbers are only updated when an official release happens.

The state format has changed

In this case, we need to perform a state upgrade. ipfs-cluster-service should refuse to start if the state format is uncompatible with the new release. This procedure is a bit experimental so we recommend saving the list of your pinset (ipfs-cluster-ctl --enc=json pin ls) before attempting it.

In order to perform the upgrade, you need to stop all peers. You can also remove/rename the ipfs-cluster-data in all peers except one. You will have to perform the upgrade procedure or perform the upgrade procedure in all of them.

To update the state format, run ipfs-cluster-service state upgrade. This:

  • Reads the last Raft snapshot
  • Migrates to the new format
  • Backups the ipfs-cluster-data folder and creates a new snapshot in the new format.

On the next run, ipfs-cluster-service should start normally. Any peers with a blank state should pick it up from the migrated ones as the Raft Leader sends the new snapshot to them.

Debugging

By default, ipfs-cluster-service prints only INFO, WARNING and ERROR messages. Sometimes, it is useful to increase verbosity with the --loglevel debug flag. This will make ipfs-cluster and its components much more verbose. The --debug flag will make ipfs-cluster, its components and its most prominent dependencies (raft, libp2p-raft, libp2p-gorpc) verbose.

ipfs-cluster-ctl offers a --debug flag which will print information about the API endpoints used by the tool. --enc jsonallows to print raw json responses from the API.

Interpreting debug information can be tricky. For example:

18:21:50.343 ERROR   ipfshttp: error getting:Get http://127.0.0.1:5001/api/v0/repo/stat: dial tcp 127.0.0.1:5001: getsockopt: connection refused ipfshttp.go:695

The above line shows a message of ERROR severity, coming from the ipfshttp facility. This facility corresponds to the ipfshttp module which implements the IPFS Connector component. This information helps narrowing the context from which the error comes from. The error message indicates that the component failed to perform a GET request to the ipfs HTTP API. The log entry contains the file and line-number in which the error was logged.

When discovering a problem, it will probably be useful if you can provide some logs when asking for help.

Peer is not starting

When your peer is not starting:

  • Check the logs and look for errors
  • Are all the listen addresses free or are they used by a different process?
  • Are other peers of the cluster reachable?
  • Is the cluster.secret the same for all peers?
  • Double-check that the addresses in cluster.peers and cluster.bootstrap are correct.
  • Double-check that the rest of the cluster is in a healthy state.
  • In some cases, it may help to delete everything in the consensus data folder (specially if the reason for not starting is a mismatch between the raft state and the cluster peers). Assuming that the cluster is healthy, this will allow the non-starting peer to pull a clean state from the cluster Leader when bootstrapping.

Peer stopped unexpectedly

When a peer stops unexpectedly:

  • Make sure you simply haven’t removed the peer from the cluster or triggered a shutdown
  • Check the logs for any clues that the process died because of an internal fault
  • Check your system logs to find if anything external killed the process
  • Report any application panics, as they should not happen, along with the logs

ipfs-cluster-ctl status <cid> does not report CID information for all peers

This is usually the result of a desync between the shared state and the local state, or between the local state and the ipfs state. If the problem does not autocorrect itself after a couple of minutes (thanks to auto-syncing), try running ipfs-cluster-ctl sync [cid] for the problematic item. You can also restart your node.

libp2p errors

Since cluster is built on top of libp2p, many errors that new users face come from libp2p and have confusing messages which are not obvious at first sight. This list compiles some of them:

  • dial attempt failed: misdial to <peer.ID XXXXXX> through ....: this means that the multiaddress you are contacting has a different peer in it than expected.
  • dial attempt failed: connection refused: the peer is not running or not listening on the expected address/protocol/port.
  • dial attempt failed: context deadline exceeded: this means that the address is not reachable or that the wrong secret is being used.
  • dial backoff: same as above.
  • dial attempt failed: incoming message was too large: this probably means that your cluster peers are not sharing the same secret.
  • version not supported: this means that your nodes are running different versions of raft/cluster.

Congratulations!! You just completed IPFS cluster Part1, Part2 and Part3. Stay tuned, because I am soon coming up with

Setting up a 2 node private storage network on IPFS: Part 4

Setting up a multi-node private storage network on IPFS: Part 5

Till then, enjoy the gif below…

Learned something? Click the 👏 to say “thanks!” and help others find this article.

Hold down the clap button if you liked the content! It helps me gain exposure .

Want to learn more? Check these out…

Clap 50 times and follow me on Twitter: @vasa_develop

towardsblockchain

Revolutionzing the business and establishing trust using distributed ledger technology.

vasa

Written by

vasa

Entrepreneur | Co-founder @TowardsBlockChain, an MIT CIC incubated startup | Speaker | https://vaibhavsaini.com

towardsblockchain

Revolutionzing the business and establishing trust using distributed ledger technology.