Beyond Systems of Record

Many years ago, only technical people could enter data into a database. The software era of the 90s allowed anyone to work with databases through Microsoft Access. After that came cloud-based applications that had everyone adding data from everywhere as part of their daily workflow — without even thinking they were manipulating databases. Today, those cloud applications are effectively Systems of Record for many business functions. Salesforce is the best example of this: every sales person knows that if it’s not in Salesforce, it didn’t happen.

“Salesforce is the system of record for client information. Zendesk is the system of record for customer support; ServiceNow for IT Ops; Marketo for marketing; Github for code.”— Tom Tunguz

However, these early, cloud applications — Systems of Record — were built in an era where storing data was expensive so tended to only store necessary data. Three things have changed:

  1. Workers execute their entire workflow online, generating a lot of data about what they’re doing and how they’re making decisions (‘data exhaust’);
  2. Data is cheaper to store; and
  3. The methods for learning over that data are more developed

This article suggests a way to find startup opportunities in this post-cloud era of intelligent computing: rebuild an existing System of Record, capture the data exhaust and see if you can find predictive features in the data. Let’s unpack that.
System of Record: the ‘single source of truth’ for a business function. For example, sales is a business function tracked by revenue. However, where a customer is positioned in the sales pipeline is a leading indicator of revenue. One finds the single source of truth with respect to a customer’s position in the sales pipeline in a Customer Relationship Management system. This is essentially a table populated with the sales team’s opinion of where a customer is in the sales pipeline at the end of set periods.
data exhaust: quantities output through the use of an application. That is, anything one can record as a user performs operations in an application, for example clicks and changes in values.
predictive features: operations on data found to predict something of business value.

How Applications Became Databases

Companies such as Salesforce and SAP started making applications for companies to track their work many years ago and achieved such high penetration that they serve as a System of Record for all work done in those customers’ businesses. This allows for high lock-in — customers have so much data in these systems that they don’t want to switch away from them — earning revenue for the system developers on a steady basis. When you have a good business, you double-down. These companies thus focused their efforts on building simple products that allow non-technical users to create, read, update and delete records through various interfaces [1].

While these companies focused on building CRUD apps, newer companies built applications on top of them to provide more specific insights; to learn something from the data rather than just collect it. This subordinated the CRUD apps to the database layer [2]. These newer companies can charge higher prices for this increased functionality and thus capture more value, in aggregate, from the same customer base. An example of a new application built on top of Salesforce’s database is Insidesales looks like phone dialer that automatically cues up leads from your Salesforce (or another CRM) to call. Sales reps open Insidesales in the morning then call whomever it suggests. Insidesales makes these suggestions based on learning algorithms it has trained over more than 100B data points. These algorithms predict, for example, at what time someone is likely to answer the phone/an email, whether they answer phone calls more often than emails, if they’re more/less likely to respond when it’s raining outside and which sales rep is more likely to close the lead. This extra layer of intelligence allows Insidesales to charge ~3x what Salesforce charges on a per seat basis.

Better Data Collection, Better Products

Salesforce is the System of Record layer and Insidesales is the Intelligence layer. Let’s think a little more about the differences between these layers with respect to how data is input and what that allows.

System of Record vs Intelligent Application

  1. Data Entry: explicit vs implicit
  2. Input User: humans vs machines
  3. Input Timing: post facto vs live
  4. Data Type: structured vs unstructured
  5. Data Categorization: set fields vs exhaust

This allows the following, respectively.

  1. Increase in contextual data gathered from public and ancillary sources, made available for later analysis.
  2. Reduced time required to input data so that you can do real work. No more updating Salesforce every month.
  3. Ability to act on data in real time. Data is uploaded from the System of Record, predictive models applied and actionable insights made available to the user.
  4. Deep learning over unstructured data to extract predictive features. There’s just not enough data about most jobs available for the training of (highly parameterized) deep learning models.
  5. Higher accuracy by removing reporting bias. A machine summary of the last email from a lead is perhaps more indicative of likelihood to close than a salesperson’s subjective opinion expressed in a single, categorical input.

Building an application as intelligent from the ground up allows for a data input advantage that can yield significant benefits for customers.


Many domains are devoid of truly intelligent, workflow applications. That is, applications that run machine learning algorithms over a combination of rich input data, data exhaust and external data to make predictions or — ideally — decisions for the users of the application.

  • Vertical CRM: predictive lead management for sales, marketing and fundraising in narrow domains.
  • Inventory management software that tells you when inventory is likely to be out and orders ahead for you.
  • Supply chain tracking that predicts breaks in the supply chain and suggests workarounds.
  • Product management tools that track product builds and bugs by automatically collecting needs from customer service tickets and seeing what engineering work is done relevant to that.

We would, of course, be interested in hearing of any startups of this type :-)

We’re at a unique moment in time where a lot of the previous generation of cloud products have open APIs. Now is the time to consider how to build an intelligence layer on top of that, bootstrapping your dataset with what’s already in today’s Systems of Record.

Automating More Jobs

“Most of the thousands of mundane tasks that people loathe and waste their time on day after day are relatively specific. Specific tasks require large amounts of task specific data to automate. Most specific, mundane tasks in a person’s day to day life aren’t repeated nearly enough to build a dataset big enough for deep learning to be effective. Data-hungriness prevents deep learning technologies from solving specific, but repetitive problems.” — Will Jack

There is a lot of information in how someone uses a workflow tool that’s not tracked today because the last generation of cloud companies built simple, CRUD apps. This information may yield valuable predictions. We would even suggest that perhaps the only to automate more jobs is to rebuild more workflow tools to collect enough data necessary to build deep learning models.

Thanks to Jared Haleck and Mick Hollison for reading a draft of this article.


[1] Both Microsoft and Salesforce saw the success of their ecosystems and are changing their strategies to be systems of intelligence/engagement.

[2] This is not a criticism. Salesforce allowing companies to build on top of Salesforce data through its APIs entrenched it as the System of Record for sales.