Launching Germany’s Monthly Startup Top 100

Jan Hoekman
GlassDollar
Published in
7 min readJul 7, 2020

How we use startups’ digital footprint to quantify momentum.

Last month, Fabian published GlassDollar’s first Top 100 list.

The response was overwhelming, so this month I decided to write this short note on what exactly the GlassDollar ranking is, its shortcomings, and what we plan on doing with it next.

In sharing, I hope to enable you to better use the list as well as gather some feedback on how we can improve the next iteration together.

The Idea

Our business is helping corporations find and partner with the right startups to solve their problems. To do this well, and at scale, we spent the last year building a formidable database of startups: our “data engine” (as it ingests data non-stop, every day).

To deliver high-quality scouts, we need to find startups fitting a particular business context and then find the best ones within the given set. To optimize this, we started experimenting with tracking the differences in a startups’ online footprint, getting “signals”, to gauge a startup’s momentum and thereby rank them in our lists accordingly. For example, the +- differences in a company’s headcount.

This is by no means novel; however, dealing with so much data is neither easy nor cheap. At this point, all top VCs have a data strategy, with the leading ones spending millions yearly to develop and run their systems, like European EQT with their “Motherbrain”. Other funds pay third-party data vendors tens of thousands for “signal” data, at least those with a fund large enough to be able to afford it.

We think making our Top 100 ranking public can help make German fundraising a little more equitable.
On the flip side, we’d also like to democratize access to this data to Angels in their search for hidden gems and, more generally, for those that normally don’t have access to data like this because they don’t have a billion-dollar fund.

How GlassDollar’s Top 100 ranking is built

Photo by Luke Chesser on Unsplash

The base:

We aggregate startups from three sources:
- Startup databases
- German Company registrar (Handelsregister)
- First-party data (startups we contact)

Right now we have a slightly more than 11k startups in Germany (pretty close to the total estimated 12k). Through a human/machine QA process we ensure that the data is up-to-date and correct.

The signals:

Right now we collect social media profiles, funding, and web-traffic, and refresh the data every month. The month-on-month differences in values are the signals. For instance, a growth in followers from 500 to 2000 indicates that something great happened.

Our four data sources yield 23 signals, 12 of which we have deemed good enough to make up our rank.

Discovering the story:

Equipped with this data the really interesting bit is in figuring out the story behind a particular startup showing tremendous growth. Like:

  • “They doubled their headcount — ah, ok, they just raised their Series A two-months ago.”
  • “A sharp increase in traffic and followers — ah, I see, they introduced a new product that people loved.“

The rank

More difficult is putting the signals together to create a sensible assessment of a company’s growth, expressed as their rank.

For example, this is an extract of GlassDollar’s own “signal fingerprint” for some selected signals in the last two months, all normalized to be in a range of 0 to 1. One “story” here is how the introduction of our first Top 100 in May resulted in a massive spike in social media following.

The last two months of some Signals tracking GlassDollar

As we don’t have an easy-to-use variable to optimize to (our Y), such as future price in the public markets, we cannot simply throw this into a predictive model and let it tell us how to weight the different variables together for the most accurate score.

Instead, we employed a highly iterative (and yes, inevitably also biased) approach of tweaking the variables ourselves to create the best ranking we could come up with given our three years of experience as startup analysts.

  • Weights: As an indication, we give employee growth the largest weight, and number of likes & comments to social media posts the smallest.
  • Considering time: For our ranking, we take 6 months of historic data into account. As we’re looking for exponential growth we are basically weighing each month according to an inverse-power distribution. Thus, for the total 6 months, we weigh the last two months the most and the farthest away two-months the least.

some more details on what we do:

  • Resampling: Resampling means that we transform the data so as to compare the same time frame of one month, even if for two “observations” the time in-between may be slightly shorter or longer. We do this because we collect data throughout the month because we can’t collect all sources for every company on the 1st of every month and want to maintain accurate comparisons.
  • Square Root Transformation: We think many positive (but smaller signals) are a better indicator of a high-momentum company than one extraordinary signal. Example: Company A has 500% follower growth, while Company B has 150% follower growth, 50% employee growth, 40% traffic growth, and 60% more engagement on their posts. We assume that Company B is more likely to be successful, and that’s why we apply a square root transformation to each variable so as to give less weight to extremes.

The ranking’s bias

The signals we collect are only proxies for company success, so we’ll never have a model that’s 100% accurate (we sure try, though).

That being said, our rank is more biased against some types of companies than others.

  • Leading vs. lagging signals: We weigh employee growth as most important, but in most cases, that’s a lagging indicator of company success, and usually even happens after a venture round has been raised. (Note: You can see when a company has last raised funds in the list)
  • Consumer bias: Both social media engagement and web-traffic are much more pronounced for consumer companies than for B2B. This month’s Top 100 represents B2B quite well, like German rocket company Isar Aerospace on rank 4, but media darlings aside we are definitely biased against B2B companies. As our clients are also particularly interested in this, we have some more experimental alpha-stage signals in development that mitigate this, which we cant wait to release.
  • Hard to detect the very-early: We only get web-traffic for companies with approx. more than 1000 views/month, and have decided to exclude all companies with less than 5 employees because data is very shaky (as “breakout” percentage growths are very normal due to small sizes). Altogether, if you want to detect the best companies amongst the very-early, different types of signals are needed (e.g. founder leaving a big-name brand background that just started a new company). We don’t specialize in this and will remain more focused on companies that already have a product.

Next steps

Three ways in which we can make the top-100 better:

  • Geo expansion: 1. Expand to DACH, 2. Probably Benelux, 3. EU ranking
  • Usability: Add filters, such as customer group (consumer, business, enterprise, gov), business models, industry-specific lists, etc.
  • Additional signals: We’re currently testing multiple signals which we will include in the coming months.

Your input

What improvement would help your Use Case the most? I’d love to hear it in a comment or via email (jan [ at ] glassdollar.com) so that we can build the most value-adding features first.

The Top 100 List: Latest list & Archive.

UPDATE 15th September 2020:

Based on feedback from the community, we have further worked on improving our scoring algorithm for the August ranking. Concretely, we have adapted three things:

  1. In our activation function, we have exchanged the square root transformation against a logistic curve (which is a general case of the possibly better known term ‘sigmoid’). This function yields the benefits of squishing the output value between -1 and 1 (circumventing overestimates of extreme growth in any signal), while also being less steep around the midpoint (and thus not too sensitive to discrete jumps in signals).
  2. So far, we have used 0 as our midpoint in the activation function. However, this would imply that we assume 0% growth as the norm. The subset of fastest-growing startups shows (even for small time periods such as two months) growth rates of well over 20% for some metrics, which is why we’re now shifting the activation function further to the right — readjusting the midpoint.
  3. In order to normalize the growth delta over our observation intervals of 2, 4 and 6 months, we are dividing our six-month horizon into three intervals of two months each and compute the geometric mean of the respective percentage delta ‘plus one’ in each time interval (e.g. a delta of first +25%, then -10% and then +40% has a geometric mean of ((1+25%) * (1–10%) * (1+40%)) ^ (1/3) — 1 = 16.3%. This way, we are able to compare all improvements and use the same parameters in our logistic curve.

Updates by the talented Tobias Reinerth

I’m Jan Hoekman,

Co-Founder & CTO of GlassDollar, the startup scouting company.
We are building the most comprehensive & clever startup data engine yet. With it, we help corporations such as Daimler, Porsche, Miele, Rabobank and others to integrate promising startups into their value-chain.

Invaluable feedback was provided by Fabian Dudek, Diane Salimkhan & Clement Segal.

--

--

Jan Hoekman
GlassDollar

Data Scientist working on quantifying the private market. Co-Founder & CTO — GlassDollar