How Fivetran CEO George Fraser’s NRR Bro Math Perfectly Encapsulates Modern Data Stack Ponzi Financing

Let’s debunk some more BS

Lauren Balik
14 min readOct 13, 2022
At the Data+AI Summit hosted by Databricks this summer, while presenting on dbt and Fivetran in front of hundreds of people, Fivetran CEO George Fraser points out where the slide designer should have included homeless people in the ‘Startup Town’ of San Francisco. https://www.youtube.com/watch?v=bIM_z12XxhI&t=2198s

Welcome to Part IV.

In Part I we went over incentives around ELT incrementalism and rent-seeking behavior. In Part II we went a bit deeper. In Part III Fivetran’s CEO George Fraser takes to Twitter to start making stuff up about uptime and calling me a ‘peanut-gallery’ observer, even though my customers were using his product at the time of his outage and were quite ticked off at the absurdity of the whole situation.

With no ingest to the cloud data warehouse the whole Modern Data Stack party falls apart.

No Fivetran? Then no dbt! No metrics layers! No spinning cloud credit usage! No Headless BI! No Reverse ETL! No six different point solutions to play with inside the warehouse! It all falls apart with no Fivetran and sporadic Fivetran.

Thank you to the readers who have pointed out that Fivetran CEO George Fraser is also now running around the internet leaving Simpsons fun facts where links to my posts have been shared.

Thank you to the several people who shared this gem.

It’s all monorails.

Let’s dive into the bro math and Kool-Aid surrounding George’s latest debacle, NRR Doesn’t Matter, which no fewer than eight people shared with me after it was posted.

The conclusion of his piece is,“[Positive NRR is] fundamentally similar to higher average customer value, not to a higher growth rate.”

While this can potentially be true for some pricing models and business models, this is simply just not the case with Fivetran, given:

a) Fivetran’s Monthly Active Row pricing model

b) the fact that Fivetran is middleware

c) the negative unit economics and growth incentives of their startup bread & butter customer base

Positive NRR for a product like Fivetran is arguably only a function of higher customer value if and only if new connectors from new sources are added to replicate data from new systems.

However, back in reality, much of the ‘Year Two’ and ‘Year Three’ Fivetran revenue collected from customers comes from the volume and complexity of data an organization produces, completely independent of anything to do with Fivetran, which sits as a middleware layer on the cloud data warehouse.

Let’s say, for example, you are a startup men’s shaving subscription business.

You’ve been pretty much a 1 of 1 in the market for a year and you are using Fivetran to ingest email data from your ESP to Snowflake. You use this ESP for sending marketing campaign emails to your list of 1,000,000 emails (and growing) once a month, plus for automation emails to customers after transactions occur.

You pull in your Facebook and Google data, since you spend on ads there. You also pull in your orders/transactions data through Fivetran as well.

You’re toasting 20M Active Rows via Fivetran per month with some growth, you have some BI set up, you have your SQL joins in place so you know your email and ads conversions to revenue contribution and you have your product catalog in place.

But uh oh, here comes a competitor!

It’s Silicon Valley and any commodity B2C business like men’s shaving gear can be knocked off since there is nothing that is actually proprietary about the business model or product. There is no moat other than your supply chain and your customer loyalty and affinity.

And, oh wait, here’s another competitor!

Both have significant venture capital investments and have started carpet bombing your company on search. Plus, they’ve been able to back into your customer list with some degree of accuracy and they are marketing 3 months free to anyone who switches from your brand to theirs, no strings attached.

Now there are three well-funded players in the category, plus some legacy brands are copying the business model. Someone with an MBA at P&G figures out that they can move a percent of their inventory out of their channel partners and try to sell direct, plus they already own the shelf space in retail where men buy their groceries and deodorant and other goods. Why not diversify and test out?

So what do you do?

Your marketing and growth teams start running more ads.

You cut prices and take a hit on margin. The category is growing, or at least it seems like it is growing.

You get more customers making more orders, but your profitability is taking a hit.

Your unit economics take a hit as CAC rises because you and your two new competitors compete for mindshare and clicks. That CPC has tripled for all the terms you bid on. Your audience is essentially the same audience that your competitors are bidding on. You are all going bananas trying to get urban and suburban men ages 18–34. Your new customers have lower ASPs because you are incentivized to just start getting them into the funnel with deals. Your margins get smoked.

You start sending out more emails through more campaigns per month, so your ESP-via-Fivetran-to-Snowflake connector is seeing double to triple the Monthly Active Rows moving through it.

You cut prices and market heavily to lower affinity customers with much higher CACs. Males 35–49. Males who live in rural areas, where you will lose order contribution margin on a unit basis but screw it, you have to market to more customers.

You make more ads and get more clicks at lower returns per ad and click at scale because you are bidding heavily on limited mindshare for a limited TAM.

Now your Monthly Active Rows that get set into a fixed normalization structure are rising higher than the unit price decay of Monthly Active Rows.

Plus, in aggregate, across all kinds of companies like yours competing in heavily funded categories, this is meaningful cloud credit attribution for Fivetran.

So you have more rows of more customers, more orders, more transactions, more ad data, more ESP served emails and more clicks and more opens, and on a unit level, each nth activity converges to worse unit economics for the business. Meanwhile, Fivetran makes more money off the incremental rows of data it passes through its fixed normalization structure. It captures more NRR not as a function of customer value, but as a function of your business doing increasingly complicated activities.

Who pays for all of this?

Disproportionately, the answer is more venture capital dollars and less so the actual revenues and profit of most of these Fivetran customers.

This NRR is so much more pegged to VC dollars running-up categories than it is to ‘customer value’ that it isn’t even funny.

It’s classic land-and-expand, but the expansion is more of a function of how much venture capital is getting thrown at customers to finance activities that result in more rows of data that end up highly normalized through the middleware of Fivetran.

Further, the math and chart gymnastics in George’s piece and the conclusion that customer growth converges to zero is farcical for many reasons.

https://www.fivetran.com/blog/nrr-doesnt-matter

First, the doubling of new customers each year after the first few years is not reflective of any realistic TAM of the cloud data warehouse. This biases his ‘Growth rate’ column and makes his whole thesis moot.

There are not even this many customers at Years 8 or 9 or 10 in the model who could possibly even use Fivetran in the rosiest possible scenarios. There aren’t even this many customers using cloud warehouses in the market. This is pure Ponzi math.

Second, there is no churn in this model.

There are many ways Fivetran sees churn, the primary driver of which is Fivetran’s rent-seeking NRR collection in the first place. Turning off useless tables, switching to competitors that price on server time or fixed costs, and simply turning off connectors altogether are common defenses against Fivetran NRR.

Further, many sophisticated customers simply negotiate better unit costs and threaten to leave Fivetran.

Additionally, as Fivetran serves a lot of the startup market and is available on 2 year cloud starter plans, some customers have just gone out of business altogether in the past few years, which is churn.

Third, the ACVs here make no sense. The model uses a straight $10,000 ‘New customer’ ACV. This is very presumptuous and in the case of Fivetran is an outlier.

They’re not making good money off $10k customers, and arguably for any SaaS that utilizes partnerships and account executive teams, $10k is usually not even enough of an ACV to have these teams pay for themselves as TAM becomes exhausted.

This doesn’t even scale with Fivetran’s pricing model. Everything about this NRR article is completely preposterous. But there are charts.

“Oh, there are charts. Why didn’t you tell me there are charts?”

How will Fivetran and dbt move upmarket at scale?

Well, outside of a few customers here and there, they won’t.

When George Fraser got up on stage at the Databricks Conference and began making jokes about homeless people in San Francisco as he laid out some high level plans for Fivetran and dbt to move upmarket, what he neglected to mention is that many upmarket customers simply are not going to put up with the Fivetran + dbt circus.

For one, dbt at any scale quickly devolves into a headcount play. You’re not really hiring a solution, you are hiring a framework and then people to use it.

The only customers who can continuously afford this are the largest enterprises and the small-midsize VC-backed companies that have had cash injection after cash injection to propel them.

Second, and I have been through this several times, many customers in the upmarket are completely turned off by the fact that dbt Labs in their cloud offering has written into their terms the right to essentially free access to your data. The terms for the Cloud product state that dbt Labs has the right to make ‘derived metadata’ or alternative data off of their customers. It’s probably the most ridiculous terms I’ve ever seen on a cloud product in my career of working with dozens of vendors in many RFP and procurement cycles.

This is a game ender for many in regulated industries or at publicly traded companies. Just the fact that this alt data stuff is mentioned publicly on their terms in Section 7 is enough to turn off many corporate and enterprise customers, even ones that use other cloud products for various data management practices.

Why would a publicly traded company want to put their data about orders, transactions, customers, and other similar entities into a product that is going to collect data on this and potentially expose them to be traded against?

Why would any company — especially the many larger private tech and ecommerce companies facing cash flow issues — want someone else to have a view inside their data that can be used as an edge to more accurately price them in a down round or M&A situation?

Most adults can see right through this.

On the left, dbt Labs CEO Tristan Handy assures a VC growth influencer that he is safe and loved and that there will be further transparency. On the right, dbt Labs’ Cloud product has almost unlimited rights to give your data to whomever they want — hedge funds, their own investors which are basically hedge funds at this point, etc.

Of course, the kayfabe of all the Modern Data Stack nonsense reaches a point of complete disreality now that this same growth influencer is pitching metrics layers that you can make in dbt.

Soon you can also manifest your financial and business metrics in a cloud product that has essentially free license to do whatever they wish with your data.

It’s very good for hedge funds and traders.

Or, you can use good ol’ dbt Core and run it with Prefect, for example.

That’s also run by a hedge fund guy.

He wants you to simply change your job title on LinkedIn to ‘analytics engineer’ with no training or resources and come play with his selection of the toys dbt and Prefect.

Apparently, you’ll get instant pay.

Thanks, sport.

Ah, The Metrics Layer.

Rocket Ships and Boosters

Let’s get into the final piece here — the raison d’être for the Modern Data Stack.

I’ve yet to see anyone just explicitly call this out by name, so I’m just going to do it.

There are two types of data companies in the world of unlimited, unsecured cloud data warehouse consumption:

  1. Rocket ships (the engines/platforms/stores and compute centers)
  2. Booster rockets (add ons that drive consumption for the rocket ships)

The two obvious rocket ships here are the resellers — Snowflake and Databricks.

You can argue that AWS Redshift and Athena and such and Microsoft Synapse and GCP BigQuery are rocket ships as well, yes, but those are owned by large corporations with very wide product offerings. The cloud warehouse and the services they put around it are pennies and nickels for Amazon and Microsoft and Alphabet.

Snowflake and Databricks are important because a large amount of shares are owned by venture capital firms.

For example, over 10% of Snowflake is currently held by ICONIQ and Altimeter as of Oct 12, 2022.

Sequoia has also held large stakes in Snowflake.

On the Databricks side, Andreessen Horowitz has led and invested in many rounds. They have to get this to an IPO at some point and bring it to a conclusion. While it’s hard to say that a firm with such heavy AUM has to do something, they need to land this Databricks plane at some point soon.

Well, who invests in all the Modern Data Stack compute spinners and consumption eaters?

Sequoia. Altimeter. Andreessen Horowitz. ICONIQ.

For some reason Amplify Partners plays along. I have no idea what they are doing, truly, nothing adds up, they are just hiring growth influencers playing a bottoms-up social media game and generating paper IRR to keep their own charade going. Whatever.

Here’s how dumb this all is.

These firms own enough shares to equal billions in market cap for these reseller rocket ships, Snowflake and Databricks.

These resellers trade at very high multiples, Snowflake on the public markets and Databricks marked at what is likely about 35x of annualized revenue on the private markets.

On the Snowflake side, Altimeter and ICONIQ and Sequoia for example could lose every single penny they have invested into companies like Census and dbt Labs and Hightouch and Lightdash and many others as long as it props up juuuuuuust enough incremental revenue for Snowflake to beat their revenue targets each quarter and the firms still actually come out ahead so long as they hold the float of Snowflake.

Even incremental consumption from these spinners in the tens of millions per quarter may be juuuust enough to keep the party going for their stakes in Snowflake.

Too many data engineers and analytics engineers and the like think about this in terms of vendors and vendor revenue.

Revenue of the compute spinners has very little to do with anything — it’s mostly a game of consumption.

Further, these Modern Data Stack booster rockets are all extremely unprofitable and funnel all of the VC dollars into growth gimmicks and hiring more employees.

But wait, it gets better.

If you actually add up all of the employees currently working for all of these standalone booster rocket companies, there is probably over $1B in employee cost per year right now.

As a quick exercise, below are all of the companies that I know to have marketed themselves as part of the Modern Data Stack or are generally included around this.

I took the # of employees on LinkedIn as of October 12, multiplied by an estimate of $200k per fully loaded employee per year (salary + additional cash compensation + benefits + insurance and other costs of an employee) and at this $200k a year rate it works out to over $1B spent on Modern Data Stack employees per year at the current run rate.

Yes, a lot of founders take low salaries. Yes, some of these employees at some companies are in low COL areas of the world. Yes, you should net out the random VCs who say they work at their portfolio companies on LinkedIn (lol).

However it’s also likely that I am missing some companies others would include in the Modern Data Stack booster rocket companies, plus in some cases the average fully loaded employee cost may be over $200k.

Regardless, it’s more grounded in actual numbers than George Fraser’s nonsensical NRR blog assumptions.

Data pulled from LinkedIn, 10/12/2022

Some of these companies are spending 8-to-1, 9-to-1, 10-to-1 to ARR just on employees alone.

Some I know are actually under 3-to-1 or 2-to-1.

What’s important here is the aggregate estimate. Even if you wish to debate specifics, or add more companies, or net out non-US employees, the number is going to come out to around a billion on the low end.

These salaries and employee costs are not paid for in aggregate by revenue.

They are paid for by VC dollars for equities.

Too many VC dollars creates employees and too many VC dollars creates marketing. What is the marketing? Well, in many cases, it is bottoms-up or converging on it.

It gets even better.

As a subscriber to SEMRush, the two frothiest data buzzwords right now in Search are ‘Reverse ETL’ and ‘Data Observability’, at over $20 Cost-per-Click.

Good heavens. 10/12/2022.

Why?

Well, Census and Hightouch have run up ‘Reverse ETL’ for the past year with VC dollars.

For ‘Data Observability’ Monte Carlo and others have run up that term and similar ones, also with VC dollars.

Is this healthy marketing? Is there any return to good ARR here from all of this hiring of employees and running up search terms and hiring on-staff influencers to close five-figure ACVs, and in most cases low five-figure or even four-figure deals?

In aggregate, no.

None of the Math Works Out

Really, there’s no hope for these compute spinning companies that can’t maintain $25mm ARR with minimal churn for at least one year.

There is hope for some of these companies that don’t have crazy valuations and run at burn multiples under 3. But these are very few.

Everyone else is burning their runway faster than the TAM is growing, spending 5, 6, 20 dollars for every dollar they make in ARR and frankly spending 5, 6, 20 dollars for every dollar in incremental cloud credit burn for their customers they create.

This puts them in the Valley of Down Rounds.

We live in an era of chasing down fake numbers and hustling toward invented metrics and KPIs. Some people ‘engineer analytics’ as a profession now.

George’s NRR blog post is ridiculous. Four VC friends who at least live in reality (I don’t hate VCs, I just hate ones who live in disreality) and four data company founders sent me this soon after it was posted.

Tristan’s claim that dbt drives billions back to the cloud is also ridiculous. dbt drives millions in incremental compute resources, yes, but not billions yet, though the worse SQL people write as they simply change their job titles, the more cloud credits he gets to attribute to himself, which is why dbt promotes bad practices on their blog and why they have ‘just change your job title from analyst to analytics engineer’ as the call to arms right now.

It’s all too ridiculous. I sit back and laugh, honestly.

This may actually end up being worse than Hadoop over-distribution.

To end with some practical advice:

  1. If you’re an employee of one of these compute spinner data companies, consider that you are likely operating on borrowed time. Eventually the VC money that pays for you and many of your coworkers will run out. For some companies, the new calendar year is when layoffs will start happening.
  2. If you’re a person who is an ‘analytics engineer’ only, as in you spend 75% of your time or more moving SQL files around, you are in the middle and you need to get to the business value side (reports and analysis) and/or you need to get to the data engineering and broader systems architecture side.
  3. If you are a customer of dbt Cloud and you agreed to a contract where dbt Labs is creating ‘Derived Metadata’ off of your data, consider switching out. If you are considering putting your financial metrics and key business metrics and lineage in dbt products, and you manage a team of ‘analytics engineers’ who simply switched their job titles for cash, you’ve probably already lost, but maybe it’s time to reevaluate.
  4. If you’re yet another hedge fund reading my posts, my rates for hedge funds are $2500 per hour paid upfront in 4 hour minimums.

--

--

Lauren Balik

Owner, Upright Analytics. Data wrangler, advisor, investor. lauren [at] uprightanalytics [dot] com