Tutorial: Setting up an IPFS peer, part III
Making sense of the many IPFS configuration options
Welcome to the next installment of our ongoing Tutorial Series on “Setting up an IPFS Peer”! This time around, we’re going to piece together some of the IPFS repository (repo) configuration options to help us customize our IPFS peer node. While the IPFS devs have come up with a smart set of default options, certain configurations will be more useful in certain contexts, so it’s important to figure out what will work best for your particular application. Do if you haven’t done so already, take a quick spin through the first post in our series to ensure you have a working understanding of IPFS peer nodes. For today’s tutorial, you won’t need a cloud-based peer, but you are welcome to use that one to experiment with.
Putting the pieces together…
The IPFS config file is a JSON-formatted text file, located in your IPFS repo (
/data/ipfs/config if you followed our tutorial). It has a number of options for controlling your IPFS repo and daemon, including how your peer is addressed by other peers, what peers it connects to be default (bootstrap peers), how it stores and represents data (files), how it discovers other peers, and a whole lot more. The main components include
Swarm. In the remainder of this section, we’ll briefly explain the purpose of each of these entries.
The config file stores a few different address types (
Gateway), each of which use the multiaddr addressing format. These addresses are common to modify/tweak, so make sure you are comfortable with the following concepts.
Swarm addresses are addresses that the local daemon will listen on for connections from other IPFS peers. You should try to ensure that these addresses can be dialed from a separate computer and that there are no firewalls blocking the ports you specify.
API address is the address that the daemon will serve the HTTP API from. This API is used to control the daemon through the command line (or via curl if you’re feeling adventurous). Unlike the
Swarm addresses, you should ensure that the
API address is not dialable from outside of your machine, or potentially malicious parties may be able to send commands to your IPFS daemon.
Gateway address is the address that the daemon will serve the gateway interface from. The gateway may be used to view files through IPFS, and serve static web content. This port may or may not be dialable from outside you machine, that’s entirely up to you. The
Gateway address is optional, if you leave it blank, the gateway server will not start.
Addresses config option also specifies
NoAnnounce array options. Both can be empty. The first specifies the
Swarm addresses to announce to the network. If left empty, the daemon will announce inferred swarm addresses (based on your public IP address, open ports, etc). Conversely, the
NoAnnounce option specifies the array of swarm addresses not to announce to the network. You might use these options if you want greater control over how your peer-to-peer connections work.
The API config entry is a little bit simpler. It contains information (settings) to be used by the API gateway. Essentially, your daemon is running a lightweight HTTP server that will respond to client (e.g., IPFS commands, curl) requests. The
HTTPHeaders sub-entry (currently the only entry under the
API config option) is a map of HTTP headers to set on responses from your API HTTP server. You might want to edit these settings if you need to allow additional access control methods, or require authorization headers, etc.
Bootstrap config array specifies the list of IPFS peers that your daemon will connect to on startup. The default values for this are the ipfs.io bootstrap nodes, which are a set of VPS servers distributed around the world. If you want to run your own private IPFS network, you might want to change this to your own set of IPFS peers, or simply add other peers under your control.
The Datastore config option contains a bunch of information related to the construction and operation of the on-disk storage system — how your repo stores data that you’ve
pinned, and accessed. Other than the following storage size options, you’re probably going to want to leave this section alone.
StorageMax option is a soft upper limit on the size of your IPFS repository’s datastore. In other words, how much disc space your repo is allowed to consume. Related to this, the
StorageGCWatermark option is the percentage of the
StorageMax value at which a garbage collection will be triggered automatically if the daemon was run with automatic gc enabled (that option defaults to false currently). The default is currently is 90%. A third related option is the
GCPeriod, which is the time duration (default is 1 hour) specifying how frequently to run a garbage collection. Again, this is only used if automatic gc is enabled.
Other options in this section include the
BloomFilterSize, which is a number representing the size (in bytes) of the blockstore’s bloom filter. Leaving this at zero disables this feature. So why might you want to turn on your repo’s bloom filter? First, bloom filters are space-efficient probablistic data structures used to test whether an element is a member of a set. In the case of IPFS, a bloom filter can be used to speed-up blockstore lookups (checking for specific hashes). For now, there’s limited data on what value to specify here, though there are tools to help you calculate the optimal size. Unless you know what you’re doing, you can leave this at zero.
The last option in this section is the
Spec, which defines the structure of the IPFS datastore. It is a composable structure, where each datastore is represented by a JSON object. Unless you really know what you are doing, you should probably leave this one alone! For more information on possible values for this configuration option, see docs/datastores.md.
Discovery config option is pretty important. After-all, you want other peers to be able to discover your peer, right? So it is important to properly configure your node discovery mechanisms. By default, multicast DNS peer discovery (
MDNS) is turned on. This is useful for enabling peer discovery on your local network. If you are running IPFS on machines with public IPv4 addresses, then you should probably just disable this. Related, the
Interval option controls the number of seconds to wait between discovery checks for MDNS.
This brings us to the content
Routing mode. How does your peer node actually find and access content. Today, IPFS uses a Kademlia-based distributed hash table (
dht option), and continues to learn from DHT research. Essentially a DHT is a decentralized distributed system that provides a (key, value) lookup service, and any participating node can efficiently retrieve the value associated with a given key. In the IPFS world, this means a peer uses the DHT to find peers who have a copy of the file (value) they are looking for via its CID hash (key). The peer can then connect to those peers, and download the file from them directly. See our discussion of the distributed web and content addressing for a more complete discussion of these ideas.
By default, your node will act as a DHT node. This means it will store and serve small bits of data to the network. This is how IPFS distributes content: IPNS records, content provider records (who has what content), peer address records (to map peer IDs to IP addresses), etc. This usually doesn’t take up that much memory. However, constantly answering DHT queries can significantly increase CPU usage. One can set the
Routing mode to
dhtclient, which doesn’t serve requests to the IPFS network, saving bandwidth. Here, you are essentially not participating in the DHT (i.e., your peer is not a DHT node).
Directly related to
Discovery is the
Reprovider entry. Here, we can control the time (
Interval) between rounds of reproviding local content to the routing system. If unset, it defaults to 12 hours. If we set to the value “0” it will disable content reproviding altogether. Disabling content reproviding will result in other nodes on the network not being able to discover that you have the objects that you have. If you want to have this disabled and keep the network aware of what you have, you will have to manually announce your content periodically. If you leave it enabled (a good idea), then you can choose a
Strategy for deciding what should be announced. The
Stragety can be one of:
"all" (default) which announces all stored data,
"pinned", which only announces pinned data, or
"roots", which will only announce directly pinned keys and root keys of recursive pins.
Similarly to the
Gateway options control the HTTP gateway. Again, we can control the
HTTPHeaders to set on gateway responses. By default, an HTTP gateway for IPFS only supports the HTTP GET method. This allows you to fetch a resource by its hash and, if the hash is a directory, by the path from that directory to a named file. If you enable the
Writeable flag for a gateway, it gains the ability to understand the HTTP POST, PUT, and DELETE methods. This allows clients to add data to IPFS, but doesn’t trust them with the full daemon API. You can enable this mode by setting
true in the daemon configuration, or by passing the
--writeable flag on the daemon's command line. Additionally, the
Gateway config entry allows you to specify a url (
RootRedirect) to which requests for
/ will be redirected.
This one’s easy. When you run something like
ipfs id, you’ll get output that contains the peer’s
Identity information. The two main entries in the config file are
PeerID, which is the unique PKI identity label for this config’s peer. This is set on
init and never read. It is merely stored in the config for convenience. IPFS will always generate the
PeerID from its keypair at runtime. Similarly, the
PrivKey, is a base64-encoded protobuf describing (and containing) the node’s private key. This is not something you can change or control.
Finally, we come to the
Swarm entry. Options for configuring the swarm include
AddrFilters, which is an array of address filters (multiaddr netmasks) to which you wan to filter dials. What does this mean? Basically using this config setting (it is empty by default) you can restrict peer connections to certain IP address ranges. For example, one might want to exclude all IPv4 peers, and all IPv6 link-local peers to avoid some connection issues.
NAT traversal techniques are required for many network applications, such as peer-to-peer file sharing. However, in some locations (e.g. data-centers) you don’t need NAT discovery. You can disable NAT discovery by setting
true. You can also
DisableBandwidthMetrics, so that IPFS does not keep track of bandwidth usage. Doing this may lead to a slight performance improvement, as well as a reduction in memory usage. So if you don’t need it, this is a good one to tweak.
Swarm config options that I like to have enabled on all my peer nodes is p2p-circuit relay transport support (set
false), and hop relaying (set
EnableRelayHop to true). If
EnableRelayHop is enabled, the node will act as an intermediate (Hop Relay) node in relay circuits for connected peers. What does all this mean? Circuit relaying provides peers with the means to indirectly connect other peers who cannot directly connect to each other, either because of NAT or because of protocol incompatibilities, such as browser-based (
js-ipfs) peers connecting to desktop (
go-ipfs) peers. This is a nice feature to have if you want to be able to relay information for browser-based Dapps.
Other than these Swarm settings, you might also want to tweak your connection manager configuration (
Swarm.ConnMgr). For instance, you can adjust your
LowWater count, which is the minimum number of peer connections to try to maintain. Similarly, you could adjust your
HighWater line, which is the number of connections that, when exceeded, will trigger a connection garbage cleanup operation (i.e., it will drop some connections). Finally, your
GracePeriod is a time duration (default is
"20s") that new connections are immune from being closed by the connection manager. In low power situations, you might want to kick the
GracePeriod up to 1 minute, but drastically reduce the
There are a number of other entries in the IPFS config file, including
Experimental features, settings to control publishing
Ipns records, and even FUSE
Mount point configuration options. We won’t go over those options here, as they are more complex than your average IPFS user needs, or are changing frequently (partcularly the
Experimental features). For those interested in learning more, we highly recommend you check out the Experimental pubsub features.
And there we have it! We’ve covered pretty much all the critical bases. There’s a lot to unpack here, so feel free to jump back up to a particular section, refer back here later, and generally use this post as a guide for tweaking your peer node. In our next tutorial, we’ll peeking under the hood of IPFS daemon profiles, so you can get a better understanding of how profiles control configuration options behind the scenes.
In the mean time, why not check out some of our other stories, or sign up for our Textile Photos waitlist to see what we’re building with IPFS, or even drop us a line and tell us what cool distributed web projects you’re working on — we’d love to hear about it!