Forked Communities: Whose Property Is It Anyway?
Open source community forks seem to be in the news again. Companies continue to re-license their source code away from the original open source licenses and pundits complain vigorously. But this is also an interesting time to dig more deeply into the discussion. Dr. Dawn Foster gave an excellent presentation recently at an OpenUK event where she is beginning to dig into the data. Stephen O’Grady provided excellent observations (as always). I want to dig in from a different perspective.
Many in the broad community complain that these companies are abandoning the “community that made them successful” but I think this is a bit naive. The company stepped into the trap I have explained in the past as a business model design problem. Let’s quickly re-cap how the trap works in a couple of paragraphs.
In 2018, I wrote a post called, “There is still NO Open Source Business Model.” In it I cover some ideas based on Clayton Christensen’s work and Geoffrey Moore’s work. I talk about how a company building software essentially builds software in one of three buckets,
- Software that enables the core value proposition to paying customers.
- Software that complements that core value proposition to customers.
- Context software and tools, essentially the tools you build while building other software.
I argued that publishing context tools under an OSI-approved license is an easy decision, and the separate act of building a community has simple costs for a good ROI (e.g., Chaos Monkey from Netflix). Publishing software projects in the complement space around your core value proposition is strategic, requires careful execution, and may require a nonprofit to signal neutrality to partners to enable and encourage their participation (e.g., think Google changing the industry conversation around cloud application deployment with Kubernetes, and then the later creation of the CNCF).
But publishing the software that enables your core value proposition to customers needs to be done extraordinarily carefully. Your engineers, business team, partner channels, and customers may not see the difference clearly between the freely licensed software project and your product’s core value proposition. It is summarized in Theordore Levitt’s classic quote about the difference between the drill your engineers build and holes that the customer wants drilled.
One always needs to maintain the very bright line between project (with a community) and product (with paying customers). Different conversations with different metrics. Very few companies do this well. There is no conversion ratio from community into customer. Early adopting community members are not Moore’s early adopting customers, and you are heading naked into Moore’s chasm if you make this mistake.
We’ve seen several companies re-licensing their OSI-licensed software projects over the past 6+ years. The first time I remember seeing this in a large OSI-licensed project (MongoDB in 2018) it seemed a bit surprising. Mongo Inc. was trading publicly by then and I assumed they had “figured it out.” They were well past the chasm. But we have seen this re-licensing multiple times since then with companies public and private (CockroachDB in 2019, Couchbase in 2021, elasticsearch in 2021, the Hashi Corp collection in 2023, and Redis in 2024). Each situation is a company in control of a codebase making a re-licensing decision to defend their business from competitors.
This shouldn’t be terribly shocking. These companies have customers to serve and employees to pay. And with OSI-licensed codebases enabling their core value proposition to those customers, they have a business model challenge and a messaging problem (and often a branding problem).
In 2020, a U.S. House of Representatives report “Investigation of Competition in Digital Markets” mentions AWS use of MongoDB and elasticsearch specifically while using words like “knock-off products” in the sentences [Final report published 2022, p. 274–275]. Much of the re-licensing debate revolves around these sorts of challenges that the OSI-license allows. Ultimately, however, the companies that are re-licensing their software wrote almost all the software and maintained the licensing rights to the contribution flow and re-licensing becomes the solution. (More on the numbers in a minute.)
Fork!
The response to re-licensing events on several occasions by a community of users has been to fork the project codebase from the last open source licensed version and to attempt to build a new community. This is where things begin to get interesting for me. First, let’s clarify a couple of ideas. We have long celebrated in the broad open source ecosystem that the right to fork a codebase is the ultimate throttle on bad behavior from a community’s leadership. That idea still stands. But if we really look at history, there just aren’t that many forked project communities over the past 30 years. Forking a community is a considerably different challenge than forking a code base on your own and living with the economic consequences of being on a brittle fork.
A group of us sat around the dinner table a few years ago trying to name all the major community forks we knew, and we could only name around eight communities:
- EGCS from gcc happened in the late 1990s. The community knit back together after ~16 months. The GNU C compiler suite was owned by the Free Software Foundation.
- The Adempiere project was forked from the Compiere code base around 2006. Compiere was owned by a company with a very small development team, and a large knowledgeable contributing community. When the company put that community into a state of crisis, they forked. Compiere was an early-stage VC-backed company that failed soon after, but the Adempiere community was still running in 2019.
- LibreOffice forked from OpenOffice, MariaDB from MySQL, Jenkins from Hudson, all caused by Oracle acquiring Sun (2009) and not caring about the original projects.
- Io.js was forked from node.js in December 2014 when community members weren’t happy with Joyent’s governance, but the codebases were merged back together by September 2015.
- NextCloud forked in 2016 from ownCloud which was created in 2010. This was particularly interesting as it was the ownCloud’s creator (and the company CEO) that left the company and forked his own project.
I would welcome other examples. I have a working theory that community forks don’t work unless,
- The entire community is put into an existential crisis by the IP owner, and
- A respected member of the community puts their hand up to anchor the new community.
Software isn’t easy and the engineering experience to build, manage, and sustain large complex software is difficult, especially when you don’t understand the nuance in a code base of half a million lines or more in a project. Building a software team is a different set of activities, and if it’s a community rather than an employed team, you can add additional nuanced activities to the team building exercises to ensure a healthy roadmap and strong contribution flow. Rage as fuel burns out quickly.
Elastic and elasticsearch
The re-licensing events causing new forked project communities become interesting as a test of the ideas of what makes a successful community fork. The first time I had to pay attention to a re-license event was the elasticsearch re-licensing. I work at Microsoft. Microsoft was an active user of elasticsearch in multiple places. Amazon announced its intentions to fork.
I reached for some rough numbers to get a sense of the size of the project and its contribution community:
- Clone the repo and run:
git summary --line > raw.txt
(This takes a while on large code bases. As in a couple of hours on an M1 Mac. N.B. ‘git summary’ is a part of the git-extras package.) - Trim the first few lines out of
raw.txt
and you are left with an ordered list of the number of lines of software change +/- from a contributor for a contributor id and the percentage of codebase that number represents. - I also get the overall size of the current code base by running:
cloc .
(The elasticsearch code base is ~2MLoC. Also, I’ve come to preferscc
tocloc
because it also does cyclometric complexity and COCOMO calculations.) - Now you have a rough dataset with which to play.
- I made an enormous assumption that you would need to contribute at least a thousand lines to have a reasonable understanding of the code base. There is no science to that number. But it was interesting because it reduced the list of 1000+ contributors by an order of magnitude down to 120+.
- Then I spent the next half day of my life cleaning the data and digging through the contributor ids to try to identify their employer for the shorter list. This means digging through GitHub, LinkedIn, and your search engine of choice, and making the occasional educated guess. You find developers with more than a single ID while working for the same company. You find the occasional “Jane Doe” that can’t be better identified. Engineering tooling bots show up. I didn’t see any names with two IDs from two different companies and so I made the simplifying assumption that if a developer left Elastic Inc that they were no longer working on elasticsearch.
- The net result was to discover that 98.5% of the code contributed from those 120+ IDs was linked to 109 names that were employees of Elastic Inc. I am happy for more knowledgeable folks to tell me how to do a better analysis.
This rough analysis now allowed me to think about the Microsoft partner engagement and the forked community more clearly.
I suspected the AWS fork would fail because the two rules of the community fork theory didn’t hold. First, the entire community wasn’t in a state of existential crisis. There was an enormous user community, and their free use of elasticsearch didn’t change in the re-licensing. Second, it was unclear that Amazon represented that “respected member of the [contribution] community” to anchor a new community around the forked project. It was important to AWS business needs to stand up the fork, but a successful forked community wasn’t necessary.
The thousand elasticsearch contributors were rightly unhappy that their contributions were appropriated, but their contributions for the most part were each less than 1000 lines of software on a base of software that they were still using for free unless they were trying to stand up a competing service. While it is no longer an open source licensed code base, the economic deal between contributor and company is still orders of magnitude in favour of the contributor. Each contributor gave a thousand lines (or less) and received two million lines of functionality in return. They would likely never have contributed to the project if it wasn’t open source licensed, neither would they have likely used it for free in the first place if it wasn’t open source licensed. They have every reason to feel angry, but the economics still is mostly in their favour for existing solutions.
And no customer cares, or maybe the better way to say it is that customers paying for solutions may be unhappy that they can’t go to other providers for identical functionality, but it is unlikely that they start ripping and replacing working solutions unless Elastic Inc begins treating them badly. But customers aren’t community members. Keep the line between project and product bright.
Hashi Corp and Redis Labs
The more recent re-licensing events from Hashi Corp around their terraform tools and Redis Labs with the Redis database followed a similar pattern for me. They are much newer and smaller code bases comparatively speaking, with far fewer contributors. But in each case on a half million lines of code and a few hundred developers, one got to a small list of contributors working for the primary IP holder and in each case, it was 94–95% of the code base written by employees. In each case one must ask:
- Is the entire community in an existential state of crisis?
- Is there a respected member of the community outside of the employee base that could anchor the new forked project community?
I don’t believe the terraform community was in crisis. The Hashi Corp partner community (writing plugins) was mostly a business partners channel, and their world didn’t change. Customers aren’t likely to care, and again, aren’t likely to rip and replace as long as Hashi continues to treat them well. For all the heat in the OpenTofu manifesto signed after the license change, there was no signing member of the manifesto that I could see in the list of terraform contributors in the top 10%. So, I suspect we have a situation where the user community is soldiering on, using things for free, and the rightly angry contributors need to figure out how angry they need to be, and if that anger will translate into a viable solution long term. At the OpenUK event, a member of the OpenTofu community spoke to how hard it is to fork a community.
A similar discussion can happen around the Valkey community fork of Redis. It is unclear whether the community is in a state of crisis. It is interesting that a name that appears in the top dozen contributors of Redis put up their hand to lead the Valkey community. Only time will tell if they have the depth and community support to pull off the heavy lift of building a proper open source contributor community and engineering practices around a complex code base.
I think the involvement of an open source nonprofit (the Linux Foundation in the case of OpenTofu and Valkey) is a red herring in the discussion. Nonprofits solve a couple of problems as an open source licensed project grows. A nonprofit provides a bank account and can hold assets. This solves the liability challenges a project “in the wild” can have, but the project itself still needs to solve the software engineering work and community development activities for users, developers, and contributors that enable a project to reach the next level of growth.
Likewise, a nonprofit gives business partners an anti-trust protected way to cooperate on work to support projects, and a neutral place for a partner to bring a project and signal their desire to collaborate in neutral space. This means we need to look more closely at what the fork partners get as businesses for their customer facing products. First, the nonprofit can support the project marketing, and each partner then draws on project assets and nonprofit messages to build customer facing product. Second, the project engineering control may still sit with the project’s primary developer that may be predominantly from one company again, which means a project’s consumers may be trading one company in control of a project for another company in control in the case of a fork.
Ultimately, the success of the forked project needs to be based on a good understanding of the code base, good engineering practices in the project, and a good flow of contributions from a broad group of user-developers back into the project from a healthy community perspective. These are project-level activities. If a large company or two with competing services create a forked community to support their own business service needs, then the situation becomes analogous to the direct AWS OpenSearch fork. The nonprofit engagement is somewhat irrelevant.
From a pure business perspective, the collection of partners in the new nonprofit effort needs to solve the original product’s customers’ needs better than the original company. If the new fork partners aren’t pulling customers away — the original company re-licensing the project has lost developer community engagement but not customers, and not necessarily their free users. Creating this new open source licensed fork “solution” takes time. It also creates risks for paying customers and non-paying users who aren’t going to rip-and-replace working solutions for unproven solutions unless there is real need.
Looking Forward
This is ultimately why I think this is a rich area for investigation. The new community forks feel different. Dawn Foster rightly pointed out several truths in her OpenUK presentation.
- If you use an open source licensed project owned and controlled by a single company, then you need to be aware that the license can change. Consumer beware.
- If you contribute to an open source licensed project owned and controlled by a single company, then that license can change. You likely signed a contribution license agreement (CLA). Many open source nonprofits have contribution license agreements and assignments, and this is a perfectly reasonable practice, but if you sign a company’s CLA, then you likely gave them the right to re-license your contribution.
Dr. Foster has begun looking at the project contribution data in far more insightful detail as a data scientist than my simple experiments with their attendant assumptions. I can’t wait to see where that work goes as she compares projects and their forks and scales the views over time. Is there a contribution percentage under which a fork succeeds or fails? Is there a number of contributors that becomes significant? It would be interesting to see if the data still exists to dig into the older forked communities to see if the “rules” can be improved.
A friend and colleague also suggested an interesting investigation would be to determine how switching costs for downstream participants enter the discussion when an upstream company has re-licensed its software. Clarity in the new license may also be a factor — was the company explicit and clear in the new license. Some of the re-licensing debates have been vague with respect to downstream companies offering any behavior that might look like competition as opposed to a new license that is crisp. Both these ideas get into nuances about how people articulate risk and how we might measure those risks. We live in interesting times.