DaaS: The Value Calculus

James Sharp
7 min readSep 11, 2023

--

Understanding what your data is worth.

Part 3 of a series exploring how can “non-data” companies (SaaS, marketplaces, etc) build out commercializable data businesses alongside their core offerings.

A special shoutout to Data Republic for a fantastic piece (pdf) that was my North Star reference for this section

Photo by Antoine Dautry on Unsplash

There’s a fundamental question to data assets that arises from the moment you first begin conceptualizing the product: what is this worth? At this stage, this isn’t a pricing question. It’s much deeper than that.

In the simplest, most abstract, form possible, a data product’s worth is just like that of any product: “a product’s value is the benefit a customer receives from it minus the associated cost” (s/o Productboard). Tautologies aside, there’s a lot to unpack in this, and a few key special considerations in the data realm that I want to unpack here.

Example dataset for this framework: a data set that shows 100% of a consumer’s spend, down to the item-level, for about 1M households (~1% of the US population).

The framework: a data product’s value is Bn (the unique benefit that data provides to a use case) less Cn (the cost of using the data for that use case), where N is an individual use case

  • Marginal use case benefit: to what extent does your data asset create an outsized advantage in fulfilling a use case? In this example, seeing 100% of a sample of consumers is vital to RoAS calculations (missed spend = missed return).
  • Cost of actionability: how much work is needed to take the data set from delivery to business action? More later — but this can be roughly divided into “processing cost” (cleaning, standardizing) and “action cost” (getting the clean product into decisioning systems). In the example, the data may be clean, but sample logic is tricky for Measurement and, in Targeting/Audience Creation, a larger spine set is needed to turn that sample into a marketable audience, creating additional “cost” for the end-user.
  • Use case applicability: for what use cases can the data be utilized? This is the factor on which each of the cost/benefits has to occur as neither will remain steady in different use cases. The value advantages, and associated costs, will vary with every use case as the competitive set also changes. In the example, that asset’s “value statement” will vary wildly even within Marketing use cases, let alone when expanding into Investing / Hedge Fund use cases.

Double-clicking into each element

Each of these three components needs to be considered in relation to each other as, ultimately, they’re interconnected, e.g. it’s hard to elocute the benefits without understanding the use cases. But unpacking each first:

Use Case Identification

In Data Mesh, Zhamak Dhegani describes a central challenge of data products as: “unlimited use cases”. Because of its abstract form, it’s easy to think of your data product being used anywhere, for anything.

I like to use an inside-out model, looking at near-in adjacencies to core customer use cases, moving outward:

  • Direct data use at your client: if working with large enterprise clients, it’s likely they have some analytics/BI function internally that could utilize your data. For instance, consider a Supply Chain planning solution for enterprise CPG, where the Finance team may be consuming data for forecasting purposes.
  • Adjacent software solutions at your client: think about the other software solutions your buying center, or adjacent buying centers, may be using. Could those solutions benefit from the data you’re generating? Think product-partnering but, instead of co-developing, licensing your data to the player. In the example dataset, the consumer data is licensed to Marketing Measurement software for use in attribution models.
  • New industries: aka the most common answer: hedge funds and the investing community, but there are a lot more. This can be hard to scope, particularly if your organization is aligned to a specific industry and doesn’t have a broad “market research” muscle. May be helpful to enlist consultants for a rapid market map at this point.

Data Benefits

For all intents and purposes, the primary vector on which your data will be evaluated is: incisiveness & uniqueness. Outside of a pure cost bake-off, it’s likely that the consumer is looking for some unique element that generates an outsized insights in either analysis or action.

Incisiveness is simple: does your data provide signal or just noise? You could have a fascinating, one-of-a-kind dataset on the color composition of US license plates, but I’m not sure what signal that provides. In a prediction model, the question will be if your data increases the predictive “accuracy”. In marketing measurement, it may be how closely your data aligns to previous studies, adjusting for the uniqueness elements below. Understanding incisiveness, and how clients test for it, is critical to your sales process.

Uniqueness often comes down to 3 things- depth, breadth, and speed:

  • Depth: do you unlock a granularity not found in competitive data sets? For instance, credit card data, while valuable and pervasive, only captures the store-level / basket-level transaction, e.g. Walmart.com $49.59. Being able to drill down to item-level detail unlocks whole new analyses not available at the store-level.
  • Breadth: do you cover a wider segment of the market or capture unique segments that were previously missed? This can mean having more of 1 type (e.g. 100M credit card records vs 50M) or having a whole new category type (e.g. B2B ACH payments data vs personal credit card). Breadth is important because of “stitching costs” — the more data you can provide, ideally from a similar framework, the less the client needs to worry about marrying data sets.
  • Speed: not always relevant, but when it is, it’s vital. How latent/recent is your data and how quickly can you provide it? Think equities data — this is fundamentally a race for speed on both refresh (upstream) and delivery (downstream). Note that speed in particular carries a lot of operating model burdens so be cognizant of how much it really matters.

If you have a data set that is deeper, broader, and faster than your competition- congratulations, you have a gold mine. But more likely, you’ll have to explain trade-offs and why an advantage in one outweighs a disadvantage in the others.

The other critical component I’ve heard discussed a lot recently is consistency. This ties a bit into cost below, but I think it warrants recognition as a benefit. Users want to make sure that there are no crazy spikes in the data (unless a real spike occurs). Take a financial prediction use case: ideally the predictive quality of the data stays relatively constant and doesn’t suddenly spike some weeks or drop off others. The smoother data flows, the better.

Data Costs

That gold mine above? Bad news- people want that mine really nicely maintained (they really would rather you just hand them the gold). Consuming and acting on 3rd party data is already a complex act and you’re going to get stony reactions if you come to the table with an asset that’s difficult to use.

The principle data costs you’ll encounter are cleaning costs and distribution costs:

  • Cleaning costs — It’s common practice to provide data dictionaries & sample up front in a sales process. For one, this helps show what you have. But more than that, your ability to cleanly & effectively describe your data is a signal of how clean the data is going to show up. Clear schema, clean fields, consistent formats- the work you do on your end will clear a lot of hurdles.
  • Distribution costs — these are both internal and external. There’s the cost to ingest data- getting systems to communicate (use standard systems like Snowflake, AWS/S3, or API gateways), setting up discussion meetings. But there’s also the cost of getting the data into their workflows. The more you can include common keys, identifiers (e.g. ticker symbols), to have your data “communicate”, the less time and energy the client wastes.

Recap: Your data is only as valuable as how people will use it to change their operating flows. The value story is three parts: honing in on the core benefits of your data, clearly understanding (and being deliberate about) the cost to use your data, and clarifying the use cases that you’re targeting and avoiding. This is meant to be simplistic- in the confusing, abstract world of data, being simple and clear about your value proposition is critical.

Data Cost Calculus — reproduced from Data Republic “How Much is Your Data Really Worth”

A note: “Productizing the Value Curve”

Everything above — the benefit and the associated cost of actionability- is contingent on the form of the product being delivered. If the ultimate use case is “Purchase-Based Audience Targeting” — one answer is to supply seed data to agencies building audiences. Another answer could be to build a whole audience targeting system, internalizing the costs to supply closer-at-hand-benefit to the client.

How far you go down this level of productization will always be a debate, particularly if you’re an established SaaS company with strong experience taking new software to market.

If your objective is to monetize a data asset, note that that you’ll oftentime have to restrain that productization and take rawer data to market. This will likely feel weird — you’re building a product by actively avoiding building a product- but if you’re focused on data products, it’s critical to keep this restraint in mind.

--

--