Introducing Open-Source Indexes: Databases, Headless CMSs and Static Site Generators

Bogdan Semenov
Runa Capital
Published in
7 min readMay 3, 2022

At Runa, we love open-source software. We have invested in plenty of OSS companies (Nginx, MariaDB, N8N, etc.), analysed open-source domains (OS Databases), built internal products on top of open-source components (Runa Data Platform) and even contributed back by publishing the repo with an awesome list of open-source alternatives to SaaS on Github. Given that this repo got 10,000+ stars, we are probably the world-leading VC by this vanity metric 🙂

One of our most interesting developments in this space is ROSS Index, which was created by Konstantin Vinogradov in Q2 2020 and updated every quarter with the top-20 fastest-growing open-source startups by Github stars. ROSS index appeared to be a decent predictor for future rounds as around 40% of early-stage startups raised $750M+ totally after being mentioned in the index. There are 5 clear fundraising leaders:

  1. Airbyte raised $150M Series B in 6 months after featuring in ROSS Q2 2021.
  2. Hugging Face raised $40M Series B in 11 months after the first featuring in ROSS Q2 2020.
  3. Pulumi raised $37.5M in 4 months after featuring in ROSS Q2 2020.
  4. Appwrite raised $37M totally in 12 months after the first featuring in ROSS Q1 2021.
  5. Streamlit raised $35M Series B in 10 months after featuring in ROSS Q2 2020.

ROSS index highlights which OSS products were on top of developers’ minds last quarter but it has some inevitable flaws. Github stars are barely a community strength metric and manual quarterly updates are hardly scalable. So we decided to take a step forward and create a few more OSS indexes, which will be updated automatically and based on the most important metrics for open-source projects — contributors. For every repo in these indexes, we calculate daily:

  • Active contributors (AC) — users made commits in the last 12 months
  • New contributors (NC) — users made the first commit in the last 12 months
  • Total contributors (TC) — users made commits since its inception

Contribution to various open-source domains has various entry barriers— on average, it is easier to add something to a JS-based internal tool than to a C++ database. Therefore, we divide open-source projects into categories and create distinct niche-specific indexes.

Today we present the first three open-source indexes — databases, headless CMSs and static site generators. All of them gain momentum in the modern infrastructure space and, for instance, open-source databases have already overcome their closed-source rivals by popularity. Let’s go!

Open-Source Database Index

We automatically source the full list of open-source databases from dbdb.io and calculate daily updated metrics based on Github data. Then we include only noticeable databases having 500+ Github stars and 3+ active contributors. These criteria are applied for indexes of open-source headless CMSs and static site generators as well.

We have published Open-Source Database Index with daily updated metrics, and there are 3 clear leaders with the highest number of active contributors:

  1. ClickHouse (349 active contributors). Originated in 2009 at Yandex and now developed by ClickHouse, Inc. (valued at $2B in 2021).
  2. Spark SQL (302 active contributors). Founded in 2009 at UC Berkeley and commercialized by Databricks (valued at $38B in 2021).
  3. ElasticSearch (257 active contributors). Founded in 2010, developed and commercialized by Elastic (NYSE: ESTC, $7.2 market cap).

According to dbdb.io, around 80% of the total number of open-source database management systems originated from just 4 countries: the USA (Spark SQL, etc.), China (TiDB, etc.), the UK (QuestDB, etc.), and Germany (Prometheus, etc.).

The most widely used languages for database development are Java (30%), C++ (17%), and Go (15%). However, a chart of the total number of commits shows that Java commits have fallen since 2018, while C++ and Go commits have increased.

The number of database contributors is steadily increasing, but in 2014 the growth reached its maturity. Since 2016 the average increase is around 12%.

Open-Source Headless CMS Index

Almost all known Headless Content Management Systems (91) are listed on jamstack.org, which could be automatically scrapped. After filtering open-source projects and enriching data via Github API one can get an excellent dataset of 40 repos.

The index of open-source headless CMSs with all important metrics is updated daily on our website, and it has 4 clear leaders by the number of active contributors:

  1. Directus (161 active contributors). Founded in 2004 in Brooklyn and raised $1M.
  2. Pimcore (134 active contributors). Founded in 2010 in Salzburg and raised $3.5M.
  3. Netlify CMS (101 active contributors). Founded in 2016 in San Francisco and developed by Netlify (valued at $2B in 2021).
  4. Strapi (100 active contributors). Founded in 2016 in Paris and raised $14M.

The most widely used languages for open-source headless CMS are JavaScript (29%), PHP (24%), and TypeScript (12%). However, over the last four years, TypeScript has become the most popular language in terms of commits (38%).

The number of headless CMS contributors is increasing annually, but the most interesting case is Strapi. Despite being the most popular open-source headless CMS in terms of stars, its main repo lost 62% of its contributors in 2021.

Open-Source Static Site Generator Index

Again we automatically scrape from jamstack.org the list of 333 open-source static site generators and calculate metrics based on Github data. Open-Source Static Site Generator Index with all crucial metrics is updated on a daily basis on our website, and there are 3 clear leaders with the highest number of active contributors:

  1. Next.js (495 active contributors). Founded in 2016 and developed by Vercel (valued at $2.5B in 2021).
  2. Gatsby (322 active contributors). Founded in 2015 in Berkeley and raised $47M.
  3. Docusaurus (279 active contributors). Founded in 2017 and developed by Facebook (NASDAQ: FB, $542.5 market cap).

The most popular languages for site generator repos are JavaScript (31%), Python (12%), and PHP (6%). Year by year, JavaScript and TypeScript gain popularity. In 2021, around 50% of commits in SSG repos were JS commits, 15% — TypeScript commits. Python’s share has almost halved in the last 10 years.

In 2019, the growth of SSG contributors reached maturity. Gatsby contributors accounted for about half of all static site generator contributors in 2019. But since Series A the number of Gatsby contributors in a target repo has been steadily decreasing.

Conclusion

Now when we have metrics for open-source databases, headless CMSs and static site generators, we can compare them and find interesting patterns.

The main chunk of open-source database code (48% commits) is created by contributors with 3+ commits in a target repo. And around 35% of open-source headless CMS and static site generator code is written by this type of contributor.

Database repos keep contributors engaged more than SSGs and headless CMSs. For instance, the open-source databases retain 21% contributors after 1 year since their first commit, while headless CMS repos — 17%, and static site generators — 9%.

All these indexes with daily updated metrics are published on our website. Now we’ve covered only three open-source areas: databases, headless CMS and static site generators.

What are other OS areas that need to be covered? Don’t hesitate to share your ideas with the curator, Bogdan Semenov.

--

--