Lessons from Creating & Maintaining Open Source Projects at a Startup
On Oct 5, 2014 I made the first commit to concurrent-map (+1K stars on GitHub as of a few weeks ago). Shortly after, the lion part was coded by Roi Lipman, our then software architect. This was in the first days of StreamRail, my first startup, where I served as CTO and Co-Founder. Since then, open sourcing projects at work has become something I would usually highly recommend.
Since it’s debut, orcaman/concurrent-map has received over a thousand stars on GitHub from coders who work at Zillow, Amazon, OnePlus, Alibaba, Microsoft and more, merged code from 22 contributors, dozens of pull requests (including ones from companies like Amazon), 188 forks, and had sprung interesting discussions within some of the Google Go team concerning additions and changes to the native version that was going to be introduced.
StreamRail had grown pretty quickly: Just a few months after launching our product, our backend had to deal with millions of concurrent connections. And since we were in the video ad-tech space, we had the strictest performance requirements — we had to have amazingly fast frontend web servers replying very quickly.
Back in 2014, Golang was not as mature as it is today, but it already demonstrated an amazing combination of great performance and productivity features. We decided to use Go as the language for all of our backends.
The Need
One issue that you have when you decide to become an early adopter of a programming language is that sometimes the language’s standard libraries still do not support some functionality that one may consider being basic in high-performance web environments.
Prior to Go 1.9, there was no concurrent map implementation in Go’s stdlib. (in Go 1.9, sync.Map
was introduced, yet even today thesync.Map
has a few key differences from our custom map implementation).
After looking around for solutions, we found that most projects had naive implementations, “reinvent the wheel” style. Implementing a concurrent-safe map is pretty simple with Go’s locking mechanisms, however, building a high-performance version is a different story.
So, we decided to code our own implementation. And since StreamRail had a terrific engineering culture, we pretty much decided that we are going to have this approach where all projects which were not in our core technology would become open-sourced as a default
The Process
If we were going to get contributions from the community, we had to obey pretty strict standards — a build process with extensive tests, linting, build badge, etc. are a must. Pull requests must be validated automatically for sanity.
The Subtle Art of Saying No
When people contribute to your open source project, they always do that on a voluntary basis, and most of the time, their work is not being paid for by anyone (even if you are using the source code at your company and you need to change it, no one forces you to open a pull request to merge your contribution back to the source). For this reason, I have found it very important to always be polite and patient with contributions, even if they seem to make no sense or if they do not follow some guidelines stated on the repo’s homepage.
In the case of concurrent-map, very few pull requests out of the 46 ones that were opened ended up being merged to the master branch.
Guidelines for Contributing
In order for a contribution to be merged, we asked that the contributors follow the following guidelines:
- Open an issue and describe what you are after (fixing a bug, adding an enhancement, etc.).
- According to the core team’s feedback on the above-mentioned issue, submit a pull request, describing the changes and linking to the issue.
- New code must have test coverage.
- If the code is about performance issues, you must include benchmarks in the process (either in the issue or in the PR).
- In general, we would like to keep
concurrent-map
as simple as possible and as similar to the nativemap
. Please keep this in mind when opening issues.
The last part about keeping concurrent-map
as simple as possible is the biggest hurdle — for instance, setting the number of shards for the map’s internal sharding mechanism is a feature that we felt was too complex for the implementation we had in mind. Anyone looking to make such changes is free to apply them on their own fork.
Is it a good idea to open source your company’s stuff?
One of the best things about an open source project is that it’s, well, open :-)
So putting aside the benefits of getting free contributions to your codebase, when you open source something, you would probably put in some effort into not embarrassing yourself — the project will have tests, proper documentation, build badges, etc. — in in general, having your code open for everyone to see is a great tool to improve your code’s quality on your own.