Successful Volunteer-based Software Projects
Introducing the “TOFFEE Metric” and community-oriented software architecture
A well-known myth of the open-source world is that in an open-source project, anyone can contribute. While theoretically possible, in reality, most contributions to open-source projects are made by someone who’s being paid to do so — either by a commercial entity or an academic institution.
However, there are some open-projects who solely rely on contributions from volunteers. In the past 7 years I’ve been working on many of such projects in varying capacities. Most of that work was under the roof of the Public Knowledge Workshop — a non-profit (of which I’m a proud co-founder) which builds tools and web-sites to make government data (and other public data) more accessible and understandable to the greater public.
During this time I saw many of these projects starting — but not all of them reaching the finish line. Some were abandoned by their own developers; some were just neglected until they stopped functioning. Some were functional for a few years until their end, while others never gained the initial momentum to take off.
It should be obvious that the key ingredient for any volunteer-based project is the volunteers — and in the context of an open-source project, we’re usually talking about having a community of developers maintaining and improving the source code.
One would think that keeping such a community alive and well is a job for community managers or developer advocates, while developers have little to none to do with it — but the truth of the matter is that it’s quite the opposite. In my experience, the technical aspects of a project tend to affect the most on the viability of the developer community surrounding it.
This post will describe these technical aspects, and what can be done by developers and software architects to help their project’s community to grow and prosper.
Before diving into the nitty-gritty details, it’s important to understand one thing about software architecture — good architecture comes from having some constraints. Constraints are what guides us whenever there’s a trade-off between two implementation routes, and we need to choose the best one.
Just as an example, imagine that you are required to implement a software module. In one scenario, you have a team of one developer and a deadline of one year from today. In another scenario, you get a team of 12 developers but only one month to complete the project. The different sets of constraints would dictate different architectures to compensate for the lack of time or to effectively use the larger team. The resulting implementations in these two scenarios would be very different, in spite the identical functional requirements.
With volunteer-based projects, developers often get the feeling that there are no constraints — you can do whatever you want, and there are no deadlines and no budget. In some cases, constraints are ‘made-up’ in an attempt to replicate something familiar from a work environment (e.g. by self-imposing ‘fake’ milestones to invigorate the team). However, having no constraints (or fake constraints) is a bad idea, as without them development tends to wander off into the wrong places — or you end up optimizing the wrong parameters.
So, what constraints volunteer-based projects do have? It’s definitely not a resource constraint — as most of them start with no expectation of budget, no deadlines, nor a fixed, dedicated team. The answer is the community — the actual constraint that is inherent for these kind of projects. Just as a commercial project can’t exist without a budget, a volunteer-based software project without a community will simply die.
Therefore, a software architect’s job in a community-based open-source project is to ensure a live and thriving community.
One way to bridge this gap is to use this simpler and more tangible metric:
”How long does it take for a new contributor to fix a simple bug in the code and for the fix to be deployed?”
We call this metric the TOFFEE Metric — Time of Onboarding, Finding & Fixing an issue End to End
Although it might not encompass all of its aspects, in my experience, there is a very good correlation between a happy community and the ease of contribution. If you think about it, it makes lots of sense — there’s nothing that makes a contributor happier than seeing her code get used.
So let’s try to break this metric down into pieces:
- As a new contributor, I open the project’s home page for the first time.
How long does it take me to understand what the project does?
- I decide I want to contribute.
Can I quickly find an open task that’s suited for my skills and expertise?
- I found a proper task.
How fast can I understand what exactly needs to be done here?
- I know what needs to be done.
How easy it is to get the code, find the location that needs to be changed and fix it?
- I made a modification to the code.
How much time do I spend in creating a development environment to try it out?
How simple is it to add verification tests?
- I verified my change and want to submit it.
How long does it take me to figure out how to do it?
How long do I wait for that change to be approved?
- My change was approved.
When will I see it used / released / deployed?
We want to try and reduce the time it takes to complete each of these individual steps. If you’re anything like me, you can already begin to see how this seemingly vague constraint we defined a few paragraphs above is already starting to take shape into very concrete action items.
Before we dive into the details, it’s important to notice one recurring theme, common to nearly all of these steps: removing bottlenecks from the process. The inconvenient truth is that the biggest bottleneck of any open-source project is almost always the core developer team. By removing core developer involvement from the steps above (as much as reasonably possible) we’re not only improving the basic contribution experience, but we’re also freeing time for these core developers to do actual coding.
You will see that in order to unclog the ‘core-developer’ bottleneck, we will employ different kinds of solutions. Some are technical solutions, in which we find a tool that automates work for us or reduces complexity. Others are workflow solutions, in which we change the processes we use to adapt better to community needs. Finally, the hardest kind of solution is for expanding the core developer pool — and the processes and means we might employ to make that happen more easily and transparently.
So, let’s dive in —
Step (1) is all about documentation.
- It’s important to remember that good documentation does not necessarily mean long documentation — we all know how it is to get into a project’s home page and find ourselves exploring through a vast corpus of documentation just to find out what this project is actually about.
Make sure you keep it short and to the point. Clearly separate the parts of documentation which are meant for new contributors from the ones that are meant for users or for general reference.
- One important piece of information to state in the documentations is “where you should ask questions”. Although our aim is to create a smooth as possible experience, without the need for human interaction (as humans are not always available to help), sometimes asking someone a question is the simplest solution out of a problem. Having a link to a forum, gitter/slack/.. channel or a simple email address for support could be invaluable at this point.
- Documentation should always be up-to-date. Make sure it’s updated by keeping it lean, making sure that any code change also contains the accompanying documentation change in the same change request. Use tools to auto-generate documentation for code and APIs as much as possible.
- Newcomers are the best critics of your documentation. Encourage them to open issues on documentation problems and make sure they are fixed.
Step (2) is about issue management.
- Make sure the project’s issues are publicly visible and viewable in a single location. It should be clearly documented where that is.
If you’re using more than one repository on GitHub, you should use any of the numerous services that allow managing issues from many sources in a single place.
- Make sure the priority of issues is clear and understandable. If there are specific milestones to be aware of, make sure they are mentioned in the issue itself. If there’s a CCB / Kanban / Scrum / Other project management system that you’re using to decide what needs to be worked on and when, make sure it’s transparent to everyone.
- Some issues are meant for the core developers and are not suitable for a newcomer. That’s perfectly fine, as long as you mark these issues clearly as such.
- If something is not important or not ready to be implemented, move it to the backlog. You don’t want someone putting time and effort on an issue that will never be used.
- The backlog can get very long very quickly. Make sure you clearly mark what’s in the real backlog (“This is something that we’re planning to do sometime in the future”) and what’s not (“This is an interesting hypothetical idea that has no chance of being implemented any time soon”).
Which issues are good for newcomers? Here are a few tips:
- You should choose simple to explain and self-contained tasks. Nothing that requires many integrations with different parts of the code.
- Don’t choose anything that’s on your critical path or a dependency of other future developments etc.
- In case there’s a recurring task that you find that has a good fit for newcomers, consider creating a ‘contribution endpoint’ for that task.
What are Contribution Endpoints?
In many projects you often have sets of code changes which follow a very similar process. The classic example of such tasks would be to localise your project to a new language —the process of adding a new language is pretty much the same, regardless of the actual language. This is the reason that you would often find specific processes and tools for adding translations to a project, which are different than the regular contribution flow. Other examples for such tasks would be adding a new codec to a video player, adding a new effect to a graphic editor or adding support for a new file type in a file manager — you get the gist.
You should try and identify if your project has such tasks . If so, make sure that the contribution process for that specific sort of task is frictionless, and requires as little boilerplate as possible. If there’s any tooling that might make these specific tasks even easier to accomplish, you should consider that as well.
Step (3) is also about issues — and what’s in them.
- Make sure you state clearly what’s the required outcome of this issue.
- Provide a few guidance tips on where to start, which module needs modification etc.
- Use a standard template for these sort of issues. Make sure that template includes a footer with links to relevant documentation, style guide, tutorials or sample code.
- Avoid having strict checklists (e.g. ‘[ ] Write tests’). Instead have a link in the footer to the project’s workflow and conventions documentation.
Step (4) is mostly focused on managing complexity.
The time it takes for a newcomer to understand your code and find the correct place to make a fix, depends greatly on the complexity of the code itself and its environment.
One of the most important tasks of a software architect in these sort of projects is to make sure complexity doesn’t rise over a certain limit — and take corrective actions in case that it does.
Complexity can arise out of many factors:
- There is too much code: sometimes it’s best to split the code into multiple loosely-linked components to manage complexity.
- There are too many tightly-coupled components, with too little functionality. Same as above, only the other way round.
- The code is too tangled, abstract or self-referencing: It could be the neatest trick in the programming book or just messy code — for a newcomer it will look the same. Keep it simple and organised, and refactor the troublesome piece of code.
- The choice of tooling or framework is esoteric, too cutting edge or obsolete. Don’t reinvent the wheel — use industry standards for your frameworks, build scripts etc. so that newcomers know how to work with your codebase without having to learn irrelevant stuff too.
As an aside, separating into stand-alone components has other benefits as well. By giving more experienced contributors permission to review and merge code changes in these components you’re starting to build trust with these contributors (which might end up in expanding the core-developers team) and reduce workload off the core developers.
In case something goes wrong, the risk of damage is confined within one component only. The contributors receiving these new permissions will feel empowered, have more autonomy — and will overall be more committed to the project and its goals.
Step (5) is, I think, the most neglected subject — the development environment.
Being able to run the code locally on a developer’s machine is a requirement for getting any proper contribution — but when facing a custom install of dependencies and tools is often the point where most contributors give up and never come back.
- Plan for multi-platform support. Having a development environment which works well on a wide variety of user machines is not something you get by chance.
The exact solution varies greatly depending on the nature of your product, but there are quite a few technical solutions today for managing this — vagrant, Docker for ‘virtualisation’ on the user machine, online IDEs (such as c9.io and others), just to name a few examples. Avoiding native libraries (or providing alternatives for development purposes) can also help in some cases.
- Managing constrained environments on a user machine is also a must — npm does it, Python has virtual environments etc.
- Aim for a state where contributors can develop only one component of your product without having to install a fully working system. For example, if you have website consisting of a front-end and a back-end components, don’t require running the back-end locally for developing the front-end.
Providing good stubs (or working with online services instead) could help in removing these sort of requirements.
- If there are any data fixtures that are needed to bootstrap the local development environment, make sure they are always up-to-date and usable.
Step (6) is about automatic verification of code.
You have reached the point where new code is about to be merged to the common code base. This step is the first one where a core developer involvement might actually be required. Therefore, it’s important to keep this involvement to the bare minimum.
- Avoid using written style guides — most languages today can rely on linters to make sure that the code conforms to a specific style guide.
- Rely on unit-tests to make sure that no functionality was broken. Make sure that unit-tests were added to test new functionality by having coverage reports and seeing that the proposed change maintains or increases code coverage.
Any other conventions that need to be checked should also be added here (e.g. commit messages).
- Make sure that you integrate the execution of unit-tests, coverage and linting within the acceptance criteria of any code change. This is very easy to do nowadays (GitHub, travis-ci, and coveralls are some of the services that I’ve been using).
- Code reviews should focus on the modified logic itself, and not on anything else. Teach the core developers to trust the automation, and make sure it works for you — it is saving everybody’s time, after all.
Finally, step (7) is about infrastructure.
- Any code that passes the verification steps so far should be ‘production ready’. This means that at the very least you should be able to deploy it to your staging environment / nightly builds etc.
If it’s possible, it’s even better to push it directly to production — push it to the live website, deploy a version on npm/pypi etc. — depending on your use case.
- This deployment should be automatic — no need to wait for someone to push a button.
Infrastructure should be as visible as possible to everyone:
- Make production databases world-readable (apart from tables containing private information)
- Make production server logs world-visible.
- Allow anyone to see the status of your servers, the load or individual service status.
Providing this transparency will allow contributors to debug problems occurring on the live servers independently, and might provide you with insights you weren’t aware of.
To sum up — the software architecture of a project which relies on voluntary contributions of code is very different from a ‘regular’ commercial project. The set of constraints and considerations is different and you should optimize your architecture on a different metric — namely, the liveliness of your community.
I’ve introduced the TOFFEE Metric — a metric which we use to measure our ability to add new contributors. This simple metric correlates very well with the overall liveliness of your project’s contributor community.
Finally, I’ve included some practical implementation tips on how to minimize your TOFFEE Metric and make your code and processes more agreeable to newcomers.
Here’s to many pull requests to come!