6 Reality-Based Predictions for Data in 2023

Lauren Balik
7 min readOct 31, 2022

--

This past week, Tomasz Tunguz of Redpoint shared his 9 Predictions for Data in 2023.

While there are some aspects of Tomasz’ piece I genuinely like and agree with, as the owner of America’s premier, top-tier data consultancy here are some reality-based predictions for data in 2023.

  1. The Reverse ETL category will double down on kitchenware-forward GTM tactics.
  2. DuckDB will be heralded as an initial win for WASM (and perhaps rightfully so) and it will get some additional eyeballs and meaningful use cases on it.
  3. dbt Labs peaks in revenue collection as more and more dbt Cloud customers realize all they are effectively doing is creating hedge fund alternative data for dbt Labs.
  4. CDW-first data observability peaks in revenue collection, as on-prem/hybrid/VPC enterprise-focused data observability companies lap them in revenue.
  5. Workflows shift from centralized data teams to operational teams through flexible, business user-friendly tooling such as Retool and Airtable.
  6. Benn Stancil will re-launch his weekly blog as a media empire, with data TikToks, a Twitch stream, and he will promote an Ethereum-based data team bartering system in lieu of FinOps.

1) Reverse ETL doubles down on kitchenware

Earlier this year, the worst thing to ever happen in my life happened: Reverse ETL vendor Census sent me a panini press in order to bribe me for pledged fealty to Census, and not Hightouch, as my preferred Reverse ETL choice when recommending Reverse ETL to customers.

As a distinguished and upstanding member of the data community, and as someone who acts in the best interests of my customers, I was shocked and appalled to be bribed with a panini press that retails at an MSRP of $24.95.

As everyone knows, it takes at least $50 in retail value to bribe me.

Exhibit A. The Reverse ETL bribe of a Cuisinart panini press.

If Reverse ETL vendor Census wishes to use kitchenware-focused bribes to get consultancies and services businesses to promote Census, it should look toward the high-end of kitchenware to make these bribes.

Cuisinart or even Hamilton Beach simply will not cut it for most consultancy owners and directors — we have finer tastes.

I believe Census has taken to heart this feedback from myself and others, and in 2023 will move toward a more Williams-Sonoma and Sur La Table style strategy of bribing fewer but higher value consultancies with more expensive kitchenware instead of taking a numbers game approach and attempting to mass bribe with cheaper alternatives.

If Census is to continue with sub-$50 kitchenware bribes, I suggest going with more niche or nuanced kitchenware solutions, such as the Bagel Guillotine from Sur La Table.

2) DuckDB will be heralded as an initial win for WASM

Venture capitalists and data professionals are right to be flocking to DuckDB.

However, databases for the sake of databases are not viable, and tangible business outcomes will need to drive adoption. Rill Data’s ‘thick’ Developer Edition, which combines DuckDB with a Svelte front-end, is the most obvious path to gaining initial adoption for lightning-fast, sleek BI applications. Further, adoption among data scientists, analysts, and analytics engineers will be driven through managed DuckDB via MotherDuck, which should see a lot of adoption in early 2023 as they develop go-to-market.

As far as WASM goes, DuckDB’s core users will likely be a mix of data modelers/analyst types who will not necessarily even interact much with many of WASM capabilities. Although this will be a good, solid use case for the success of a WASM product, it is not enough to prove out the value of WASM longer-term.

Rust will continue to be lingua franca among top WASM-adjacent developers and will continue to be used by top applications as a recruiting tool.

WASM will still be early-days in 2023, at least in the data world.

3) dbt Labs’ Cloud product peaks in revenue

As I and others have noted before, dbt Labs’ core revenue driver dbt Cloud is likely to peak in 2023 due to its extraction of alternative data and policy of copying customer data and retaining it for an undefined period of time. As CISOs and CIOs become more involved in purchasing decisions with increased levels of scrutiny, dbt Cloud customers will become aware that they are quite literally paying dbt Labs license fees to steal their data and make hedge fund products out of it.

dbt Labs’ policy of creating ‘Derived Metadata’ off their customers will become worth a second look, as customers become aware that dbt Labs is creating hedge fund data off of literally all of the data these customers run through dbt Cloud. Additionally, this will remain a primary reason why dbt Labs continues to struggle selling upmarket to potential high value accounts that do not want dbt Labs to literally retain all of their data and make data products to sell to hedge funds.

Many savvier buyers will realize their entire BI program is net negative and any insights or value they believe they are creating are negated.

Further, dbt Labs’ newly promoted Semantic Layer (which is not really a semantic layer, but a metrics layer) will fizzle out as customers will be wary of putting key financial and business critical metrics in the hedge fund product of dbt Cloud.

dbt Cloud terms may be reviewed here.

Hedge funds have all of your data when you use dbt Cloud. Do not trust any nonsense from dbt Labs, Brooklyn Data Company, Montreal Analytics, or any other dbt-adjacent services firm that tells you to use dbt Cloud, which is just a way to sacrifice your company’s data to hedge funds. You are exposing your firm to be traded against when using dbt Cloud.

4) CDW-first data observability peaks in revenue collection

Much has been said about the data observability market, with Monte Carlo picking up a unicorn valuation at what is clearly under $20mm in annualized revenue and others in the category picking up nine-figure valuations.

Cloud data warehouse-first observability vendors (eg. Monte Carlo, Bigeye, Metaplane, among others) are selling into a limited TAM. I don’t believe this is more than a $30mm-$40mm per annum total market.

I also believe there is probably also as much consumption revenue back to the CDWs from SQL-first solutions as there is contract revenue going back to these vendors.

Awareness is high — in my experience, most potential buyers of CDW-first data observability can name solutions, whether or not they buy. The TAM seems fairly kicked already.

Companies going after enterprise first (eg. Acceldata, Lightup) that sell more into hybrid/on prem/VPC-first types of customers have a much larger TAM and much higher ACVs than the CDW-first group. This group should be able to continue to sell into the enterprise and high-end of corporate and continue to grow revenue as the CDW-first companies see potential contraction.

5) Workflows shift from centralized data teams to operational teams

The ‘Modern Data Stack’ continues its slow death, as customers instead move workflows to operational tools like Retool and Airtable, which service the majority of business intelligence needs.

Some of these workflows will use the CDW as the ‘source’ of data, yes, but many of these workflows will rebaseline and simply use operational data tools to create and deliver on workflows.

Up-and-comers in this internal app-building category include Patterns, as well as spreadsheet-forward tools like Canvas and Equals.

2023 will see the BI space move more toward RPA instead of existing as a function of data engineering. This RPA shift will in the long-run be the actual application layer that the CDWs have promised, and because most of this market will be internal tools or B2C, the markups on Snowflake and Databricks compute will cause these two vendors to mostly fail to service this market.

6) Benn Stancil relaunches his weekly blog as a media empire

TikToks. Twitch. A channel on Cheddar if Cheddar is still a thing. A global media empire. Enough said.

This past week, Benn proposed the idea that:

“Every quarter, everyone at the company is given a set number of credits that they can use to ‘buy’ work from the data team. Each week, or on whatever sprint cadence makes sense, the data team holds an open auction. People submit bids for work. Each bid includes a project proposal and an offer price. During the auction, the data team asks questions, and bidders answer them to fill in missing details about their proposals. The two sides then negotiate…”

This is essentially the concept of 1) Quarterly Budget Planning and 2) Chargebacks, which are both well-established concepts at many companies that operate on EBITDA and net operating profit instead of Silicon Valley grow, grow, grow initiatives.

It’s awesome.

The entire ‘Modern Data Stack’ world is slowly just re-learning concepts that are well-defined among larger, established companies with years to decades of data management cultures.

I can’t even handle Modern Data Stack “thought leadership” these days.

Conclusion

Additionally, 2023 will likely see a large amount of layoffs and contraction in terms of Modern Data Stack companies. Many of these point solutions that solve one problem are too over-capitalized and burning 5x or more per every dollar they bring in.

Consolidated platforms like Keboola, Nexla, Rivery, Ascend.io and similar will continue to rise and win share away from MDS tools like Fivetran and dbt and Reverse ETL and orchestration vendors.

We are in for a shaky year ahead, and I am excited to see where things go.

--

--

Lauren Balik

Owner, Upright Analytics. Data wrangler, advisor, investor. lauren [at] uprightanalytics [dot] com