Tutorial: Setting up an IPFS peer, part III
Making sense of the many IPFS configuration options
Welcome to the next installment of our ongoing Tutorial Series on “Setting up an IPFS Peer”! This time around, we’re going to piece together some of the IPFS repository (repo) configuration options to help us customize our IPFS peer node. While the IPFS devs have come up with a smart set of default options, certain configurations will be more useful in certain contexts, so it’s important to figure out what will work best for your particular application. Do if you haven’t done so already, take a quick spin through the first post in our series to ensure you have a working understanding of IPFS peer nodes. For today’s tutorial, you won’t need a cloud-based peer, but you are welcome to use that one to experiment with.
Putting the pieces together…
The IPFS config file is a JSON-formatted text file, located in your IPFS repo (/data/ipfs/config
if you followed our tutorial). It has a number of options for controlling your IPFS repo and daemon, including how your peer is addressed by other peers, what peers it connects to be default (bootstrap peers), how it stores and represents data (files), how it discovers other peers, and a whole lot more. The main components include Addresses
, API
, Boostrap
, Datastore
, Discovery
, Reprovider
, Gateway
, Identity
, and Swarm
. In the remainder of this section, we’ll briefly explain the purpose of each of these entries.
Addresses
The config file stores a few different address types (Swarm
, API
, Gateway
), each of which use the multiaddr addressing format. These addresses are common to modify/tweak, so make sure you are comfortable with the following concepts.
Swarm
addresses are addresses that the local daemon will listen on for connections from other IPFS peers. You should try to ensure that these addresses can be dialed from a separate computer and that there are no firewalls blocking the ports you specify.
The API
address is the address that the daemon will serve the HTTP API from. This API is used to control the daemon through the command line (or via curl if you’re feeling adventurous). Unlike the Swarm
addresses, you should ensure that the API
address is not dialable from outside of your machine, or potentially malicious parties may be able to send commands to your IPFS daemon.
The Gateway
address is the address that the daemon will serve the gateway interface from. The gateway may be used to view files through IPFS, and serve static web content. This port may or may not be dialable from outside you machine, that’s entirely up to you. The Gateway
address is optional, if you leave it blank, the gateway server will not start.
The Addresses
config option also specifies Announce
and NoAnnounce
array options. Both can be empty. The first specifies the Swarm
addresses to announce to the network. If left empty, the daemon will announce inferred swarm addresses (based on your public IP address, open ports, etc). Conversely, the NoAnnounce
option specifies the array of swarm addresses not to announce to the network. You might use these options if you want greater control over how your peer-to-peer connections work.
API
The API config entry is a little bit simpler. It contains information (settings) to be used by the API gateway. Essentially, your daemon is running a lightweight HTTP server that will respond to client (e.g., IPFS commands, curl) requests. The HTTPHeaders
sub-entry (currently the only entry under the API
config option) is a map of HTTP headers to set on responses from your API HTTP server. You might want to edit these settings if you need to allow additional access control methods, or require authorization headers, etc.
Bootstrap
The Bootstrap
config array specifies the list of IPFS peers that your daemon will connect to on startup. The default values for this are the ipfs.io bootstrap nodes, which are a set of VPS servers distributed around the world. If you want to run your own private IPFS network, you might want to change this to your own set of IPFS peers, or simply add other peers under your control.
Datastore
The Datastore config option contains a bunch of information related to the construction and operation of the on-disk storage system — how your repo stores data that you’ve add
ed, pin
ned, and accessed. Other than the following storage size options, you’re probably going to want to leave this section alone.
Firstly, the StorageMax
option is a soft upper limit on the size of your IPFS repository’s datastore. In other words, how much disc space your repo is allowed to consume. Related to this, the StorageGCWatermark
option is the percentage of the StorageMax
value at which a garbage collection will be triggered automatically if the daemon was run with automatic gc enabled (that option defaults to false currently). The default is currently is 90%. A third related option is the GCPeriod
, which is the time duration (default is 1 hour) specifying how frequently to run a garbage collection. Again, this is only used if automatic gc is enabled.
Other options in this section include the BloomFilterSize
, which is a number representing the size (in bytes) of the blockstore’s bloom filter. Leaving this at zero disables this feature. So why might you want to turn on your repo’s bloom filter? First, bloom filters are space-efficient probablistic data structures used to test whether an element is a member of a set. In the case of IPFS, a bloom filter can be used to speed-up blockstore lookups (checking for specific hashes). For now, there’s limited data on what value to specify here, though there are tools to help you calculate the optimal size. Unless you know what you’re doing, you can leave this at zero.
The last option in this section is the Spec
, which defines the structure of the IPFS datastore. It is a composable structure, where each datastore is represented by a JSON object. Unless you really know what you are doing, you should probably leave this one alone! For more information on possible values for this configuration option, see docs/datastores.md.
Discovery
The Discovery
config option is pretty important. After-all, you want other peers to be able to discover your peer, right? So it is important to properly configure your node discovery mechanisms. By default, multicast DNS peer discovery (MDNS
) is turned on. This is useful for enabling peer discovery on your local network. If you are running IPFS on machines with public IPv4 addresses, then you should probably just disable this. Related, the Interval
option controls the number of seconds to wait between discovery checks for MDNS.
This brings us to the content Routing
mode. How does your peer node actually find and access content. Today, IPFS uses a Kademlia-based distributed hash table (dht
option), and continues to learn from DHT research. Essentially a DHT is a decentralized distributed system that provides a (key, value) lookup service, and any participating node can efficiently retrieve the value associated with a given key. In the IPFS world, this means a peer uses the DHT to find peers who have a copy of the file (value) they are looking for via its CID hash (key). The peer can then connect to those peers, and download the file from them directly. See our discussion of the distributed web and content addressing for a more complete discussion of these ideas.
By default, your node will act as a DHT node. This means it will store and serve small bits of data to the network. This is how IPFS distributes content: IPNS records, content provider records (who has what content), peer address records (to map peer IDs to IP addresses), etc. This usually doesn’t take up that much memory. However, constantly answering DHT queries can significantly increase CPU usage. One can set the Routing
mode to dhtclient
, which doesn’t serve requests to the IPFS network, saving bandwidth. Here, you are essentially not participating in the DHT (i.e., your peer is not a DHT node).
Reprovider
Directly related to Discovery
is the Reprovider
entry. Here, we can control the time (Interval
) between rounds of reproviding local content to the routing system. If unset, it defaults to 12 hours. If we set to the value “0” it will disable content reproviding altogether. Disabling content reproviding will result in other nodes on the network not being able to discover that you have the objects that you have. If you want to have this disabled and keep the network aware of what you have, you will have to manually announce your content periodically. If you leave it enabled (a good idea), then you can choose a Strategy
for deciding what should be announced. The Stragety
can be one of: "all"
(default) which announces all stored data, "pinned"
, which only announces pinned data, or "roots"
, which will only announce directly pinned keys and root keys of recursive pins.
Gateway
Similarly to the API
options, Gateway
options control the HTTP gateway. Again, we can control the HTTPHeaders
to set on gateway responses. By default, an HTTP gateway for IPFS only supports the HTTP GET method. This allows you to fetch a resource by its hash and, if the hash is a directory, by the path from that directory to a named file. If you enable the Writeable
flag for a gateway, it gains the ability to understand the HTTP POST, PUT, and DELETE methods. This allows clients to add data to IPFS, but doesn’t trust them with the full daemon API. You can enable this mode by setting Gateway.Writeable
to true
in the daemon configuration, or by passing the --writeable
flag on the daemon's command line. Additionally, the Gateway
config entry allows you to specify a url (RootRedirect
) to which requests for /
will be redirected.
Identity
This one’s easy. When you run something like ipfs id
, you’ll get output that contains the peer’s Identity
information. The two main entries in the config file are PeerID
, which is the unique PKI identity label for this config’s peer. This is set on init
and never read. It is merely stored in the config for convenience. IPFS will always generate the PeerID
from its keypair at runtime. Similarly, the PrivKey
, is a base64-encoded protobuf describing (and containing) the node’s private key. This is not something you can change or control.
Swarm
Finally, we come to the Swarm
entry. Options for configuring the swarm include AddrFilters
, which is an array of address filters (multiaddr netmasks) to which you wan to filter dials. What does this mean? Basically using this config setting (it is empty by default) you can restrict peer connections to certain IP address ranges. For example, one might want to exclude all IPv4 peers, and all IPv6 link-local peers to avoid some connection issues.
NAT traversal techniques are required for many network applications, such as peer-to-peer file sharing. However, in some locations (e.g. data-centers) you don’t need NAT discovery. You can disable NAT discovery by setting DisableNatPortMap
to true
. You can also DisableBandwidthMetrics
, so that IPFS does not keep track of bandwidth usage. Doing this may lead to a slight performance improvement, as well as a reduction in memory usage. So if you don’t need it, this is a good one to tweak.
Another two Swarm
config options that I like to have enabled on all my peer nodes is p2p-circuit relay transport support (set DisableRelay
to false
), and hop relaying (set EnableRelayHop
to true). If EnableRelayHop
is enabled, the node will act as an intermediate (Hop Relay) node in relay circuits for connected peers. What does all this mean? Circuit relaying provides peers with the means to indirectly connect other peers who cannot directly connect to each other, either because of NAT or because of protocol incompatibilities, such as browser-based (js-ipfs
) peers connecting to desktop (go-ipfs
) peers. This is a nice feature to have if you want to be able to relay information for browser-based Dapps.
Other than these Swarm settings, you might also want to tweak your connection manager configuration (Swarm.ConnMgr
). For instance, you can adjust your LowWater
count, which is the minimum number of peer connections to try to maintain. Similarly, you could adjust your HighWater
line, which is the number of connections that, when exceeded, will trigger a connection garbage cleanup operation (i.e., it will drop some connections). Finally, your GracePeriod
is a time duration (default is "20s"
) that new connections are immune from being closed by the connection manager. In low power situations, you might want to kick the GracePeriod
up to 1 minute, but drastically reduce the LowWater
and HighWater
values.
Others
There are a number of other entries in the IPFS config file, including Experimental
features, settings to control publishing Ipns
records, and even FUSE Mount
point configuration options. We won’t go over those options here, as they are more complex than your average IPFS user needs, or are changing frequently (partcularly the Experimental
features). For those interested in learning more, we highly recommend you check out the Experimental pubsub features.
What’s next?
And there we have it! We’ve covered pretty much all the critical bases. There’s a lot to unpack here, so feel free to jump back up to a particular section, refer back here later, and generally use this post as a guide for tweaking your peer node. In our next tutorial, we’ll peeking under the hood of IPFS daemon profiles, so you can get a better understanding of how profiles control configuration options behind the scenes.
In the mean time, why not check out some of our other stories, or sign up for our Textile Photos waitlist to see what we’re building with IPFS, or even drop us a line and tell us what cool distributed web projects you’re working on — we’d love to hear about it!