Decentralized code distribution for the future of open source

The code we write is one of society’s most valuable outputs. The software we develop is almost never a monolithic, independent structure that can be constructed, admired, and replaced. Instead (largely through open source) we have built an interwoven network of dependencies that links together first releases, new versions, forks, and both public and private libraries. Like the Internet itself, this network can at times, be fragile, and in its current form may be susceptible to censorship, manipulation, and destruction.

Perhaps one of the most valuable tools used to construct this complex network of software are the package managers developers use. Package managers are how developers automate the process of incorporating external code libraries into their own projects. Package managers simplify the network of code dependencies, making a fairly simple, replicable, and repeatable way to specify how your code depends an external library.

Every package manager needs a way to get the external code, and many of them solve this need by linking to an external code registry. The code registry is a service where a developer can publish their code (or metadata about their code) so that the package manager knows how and where to get it. Some registries are used broadly, others tend to host code specific to a language or family of languages. For Example, there is NPM for Javascript, RubyGems for Ruby, PIP for Python, to name a few.

A centralizing force in the network of code

Package managers and their code registries have, to date, largely been created and maintained as centralized nodes in the network. Often they are run by centralized organizations and hosted on centralized servers. Your code must be on one of these servers in order to be discovered or managed with the package manager (mirrors are possible, but without users, they exist outside of the network).

While these centralized services have been invaluable in getting the open source community to where it is today, they also represent bottlenecks in the network. Through these bottlenecks, we have to trust that our code, and the code that our code depends on, will always move freely. Placing that trust may turn out to be a mistake.

What happens when the maintainers of these code registries change management or ownership? Change roadmap or missions? Lose funding or runway? Will we be Google Reader’d? What choice did you have in GitHub’s acquisition? Does the code you want to use please every government who’s borders it crosses? In the future of our code, who should stand between you and sharing or accessing code?

Decentralizing our source code

NPM Dependency Graph via Graph Commons blog post.

Now feels like the right time to be thinking about how decentralized package managers and code registries should work for the future of our code. What does it look like? A decentralized package manager would allow any developer to publish versions of their software, self-hosted or through a provider of their choice. The package manger would then allow any other developer to point to that published code and incorporate it into their projects. There are a number of additional features that could be built to improve the basic system, including package discovery, version management, etc. Decentralized package management systems could solve a number of potential weaknesses that exist in our current setup. The most significant result of decentralized package management is that it adds resistance to censorship, manipulation, and control.

Perhaps the ideal future is a mix of centralized and decentralized, where code registries become less monolithic and are decoupled from the distribution of code. [tweet it]

On a grand scale, resistance to control is an important consideration for software developers when publishing and sharing their code. One of the clearest examples of the impact of centralized control came a few years back with the case of the missing leftpad. What many stories at the time focused on was that, when a single developer removed his code from NPM, he ended up breaking many projects that depended on his code being available through the centralized service. What these stories largely missed however, is that the motivation for the developer to remove his code in the first place was due to NPM transferring ownership of his project to their registry (also see some of the events leading up to that).

The overnight replacement for 38,000,000 websites on GeoCities. via Flickr.

Decentralizing package management could also be beneficial inside of organizations managing many of their own projects. If you work on a team with many interwoven dependencies across technology, a decentralized package manager would allow your team to enable a robust code hosting solution unpinned from needing external services to be perfectly reliable. As long as your developers are on the local network, no need for a registry status page (e.g. NPM or RubyGems). There are a few other tangible benefits, such as instantaneous version availability, your software isn’t going to fail if one developer stops publishing their code, you add redundancy to the external availability of your code, etc.

Perhaps the ideal future is a mix of centralized and decentralized components, where code registries become less monolithic and are decoupled from the distribution of code.

Challenges that decentralized code doesn’t solve, yet

While moving toward a more decentralized setup may solve some big challenges, it also introduces, or amplifies, others. Mistakes, motivations, security, and ownership are all issues we need to think about in the development of the aforementioned network of code.

Mistakes are common in writing code. Sometimes, those mistakes can have pretty serious consequences. For example, committing your private keys to git and then pushing that code to GitHub. Central registry can take some steps to help protect developers against these mistakes, and can support the mechanisms necessary to correct one’s mistakes. Still, that doesn’t stop it from having big consequences when it does happen. In their most basic form, decentralized systems can make it difficult to ever retract information once it is published and shared.

Mistakes aren’t the only reason why a developer may want to retract the code that they’ve published. A developer’s motivations may change. Or perhaps they’ve identified security issues created by published code. Or maybe the code never rightfully belonged to them in the first place. In each of these cases, we currently rely on centralized systems, where removing information is much more straight-forward. These aren’t unsolvable challenges in decentralized systems, they are just ones we should think through carefully as we move towards these types of systems.

Rethinking our belief in mutable digital information

The immutability of information common in decentralized systems seems to amplify some weaknesses. While these risks are real, I also wonder if we should rethink our beliefs about the mutability of digital information more broadly. Our relationship with personal data for example, has arisen almost entirely in a world of centralized platforms (e.g. Google and Facebook) which has influenced the social understanding of digital information more broadly. These platforms have created a user-experience that includes deletion of information after it is published. This has led to the widely-help perception that information is mutable.

A more useful way to think about digital information is that it is always immutable. [tweet it]

A more useful way to think about digital information is that it is always immutable. Thinking about data this way could help us think of new ways to plan for mistakes and changes in motivations. Right now it seems that the default belief is all forms of digital information are mutable to the point of deletion, which is rarely ever true. Whenever the true nature of digital information suddenly reveals itself to be immutable, it does so through some form of loss, shock or disaster (e.g. Deloitte hack, Cambridge Analytica, Prism).

GX — a decentralized package manager you can use today

With all of the above issues in mind, I wanted to share a decentralized package manager that we are excited about, GX. GX is solving the package management and decentralized distribution of code. We started using GX for Textile Photos because we rely on a bunch of other libraries (mostly published by the IPFS and libp2p teams) that use it. But after some deeper thoughts about the benefits of decentralized software management, we were inspired to share our thoughts here.

Before going any deeper into GX, I should share that the developers of GX have stated pretty clearly that the project isn’t ready for prime-time yet. GX is ready for early adopters. It’s great for managing your internal projects and in some cases, distributing production ready code.

GX allows developers to publish and use code shared over IPFS. This decentralized code hosting backend gives GX the built-in resistance to centralized system failures, censorship, or tampering. The IPFS protocol uses the verifiable hash of the source code itself, meaning when you request a code version, you always know that the code you get is the code you asked for (see our blog post on content addressing). Finally, by using GX and IPFS, anyone can become a host, a mirror, or a contributor. No accounts are needed.

At Textile, when we are developing our projects locally, we are also running our local IPFS daemon with dependencies pinned locally (adding redundancy to the network). We also run an IPFS peer shared by our whole team that pins all of our dependencies for us, ensuring that as long as that node, or one of our local nodes exist, the code that our project relies on will also be available. Not only will those projects be available to us, but they will be available to anyone who forks our project or builds another project that needs them. We win, you win, we all win together.

The next steps for decentralizing code

It’s simple, we should explore these new models for code distribution. There is still work to be done to make them the perfect fit for the needs of the community, but this is how all great projects are born. The benefits of using decentralized code distribution are real and can already be gained by using GX (e.g. gx-go or gx-js) to link code. The weaknesses of decentralized code still need to (and will) be addressed.

The community of developers already using GX is still small. But there are already a number of great projects on it (including the work of IFPS, libp2p, and of course Textile). There is no single ledger of projects on GX, this is probably a good thing: the technology remains agnostic to issues of code discovery and package naming. There will be user-friendly solutions built soon, I’m sure. For now though, you can get an idea of who’s using it by looking at this GitHub search for projects publishing code to GX.

If you are comfortable with the potential pitfalls of decentralized code distribution and package management, we’d love to help you get started using GX. For that, you can take a look at, The pioneer’s guide to GX — decentralized dependency management on IPFS.

We are excited to see where GX goes and how decentralized access to source code can change and empower new development in the future. Let us know what you think here, or on Twitter!