Data Science — What is Alt Data or Alternative Data?

DataScrum
6 min readMay 8, 2019

--

Alternative data refers to data used by investors to evaluate a company or investment that is not within their traditional data sources (financial statements, SEC filings, management presentations, press releases, etc.). Alternative data helps investors get more accurate, faster, or more granular insights and metrics into company performance than traditional data sources.

Over the last 10 years, increases in computing power and personal device usage created massive growth in data generation. As a direct outcome, a large number of companies emerged to collect, clean, analyse, and interpret data and provide it as a product that could inform investment decisions.

Alternative Data Stack

The figure below presents all the major players in the alternative data space.

After thousands of conversations with investors, vendors, and experts, AlternativeData.org have compiled the stack of top alternative data providers in the institutional investment space. The stack focuses on the top 100 data providers used by fundamental investors. It excludes market data, economic/macro data, and market news/industry publications.

Each provider’s position is intended to the firm’s product positioning relative to institutional investors. Data providers in the clusters towards the top are focused on data analysis and extracting insights from alternative data. Clusters that are positioned toward the bottom are more focused on data collection and quality assurance. and tend to not be directly consumed by fundamental analysts and PMs, but rather go through data brokers, the sell-side, or internal data teams for analysis.

Figure — Alternative Data Stack

The growth in the number of alternative data providers to buy side has been exponential in recent years.

Figure — Growth in Alternative Data Providers

Major Types of Alternative Data

How is alternative data generated?

What are the different categories of alternative data?

  • App Usage — Data on app engagement and reviews. The level of data accuracy and usefulness depends on the app panel size, functions and features collected, and the level of user engagement. Popular use cases: gaming, food delivery, streaming services.
  • Credit/Debit Card — Transaction data generated from credit and debit cards. This data is considered highly accurate when the transaction panel is large and covers a consistent user sample. Usually panels over 3 million consumers are considered large enough to be useful. These panels are some of the more expensive data licenses on the market. Popular use cases: Retail revenue tracking.
  • Email/Consumer Receipts — Transaction data generated from email receipts. This data is accurate, but panels are typically smaller than credit/debit card panels and can be biased depending on the nature of the email receipt collection (often via an opt-in email or rewards app). Popular use cases: Retail revenue tracking.
  • Geo-location — Foot traffic data available from WiFi signals (limited granularity and accuracy) or bluetooth beacons (higher accuracy, more expensive, less coverage). Popular use cases: Geography-specific retail foot traffic tracking.
  • Public Data — Data from public resources. In its original form, this data is often difficult to access, not clean, not in a usable format (e.g. PDF). The value add of public data providers is the work of collecting, aggregating, and making the data actionable. Examples include SEC filings, patent data, government contracts, import/export data, etc. Popular use cases: patent data for tech company; supply chain imports for manufacturing; government contracts for construction company.
  • Satellite — Data collected from satellites or (increasingly common) low-level drones. This data is expensive and of variable quality. Image processing is as important as data collection (raw data is not valuable to most investment teams). Satellite data on parking lots is only useful if a more direct measurement of store activity (geo-location data) or spend (credit card, email receipt) data is not available or beyond price range. Popular use cases: supply chain disruption tracking; agriculture yields tracking; construction tracking; oil & gas production/storage.
  • Sell-side — Alternative data teams within large sell-side institutions. Combine new data and processing techniques with traditional sell-side research.
  • Social/Sentiment — Data obtained from text processing of social media, news, management communications, and other sources. Sentiment data is relevant for some companies (think younger, more trading volume, more volatile) more than large, established corporations. The data is often more relevant to shorter-term traders as it does not always reflect fundamental business aspects. On the lower end of cost spectrum. Popular use cases: Event-driven sentiment tracking; Brand Virality/Advertising success.
  • Survey — Data collected from surveys. This requires opt-in and panel diversity is variable depending on how good the provider is. This is a direct line in to consumer sentiment, rather than collecting it from text processing as in social/sentiment data. Popular use cases: brand preference; consumer behavior.
  • Weather — Data on weather patterns collected from sensors. Popular use cases: agriculture and commodities.
  • Web Data — Data scraped from public websites. This data comes in a wide range, from highly accurate and expensive to extremely raw and relatively inexpensive. This data is applicable where KPIs can be tracked by aggregating and analyzing large amounts of public-facing information, such as companies that publicize quantity sold and prices on each item page. This data can be extremely granular. Popular use cases: e-commerce; auto sales; airlines bookings; travel bookings; job postings.
  • Web Traffic — Data on quantity, demographics, and history (clickstream) of users visiting a certain website. This is popular for tracking e-commerce efforts. Popular use cases: travel bookings; e-commerce.
  • Other — There are many other popular datasets, including point-of-sale data, ad spend data, pricing data, and much more. These are not yet broad enough to capture a full section.

Which are the most popular datasets for investors?

Major Players in Alternative Data

By Sector:

Autos:

China:

Consumer

Energy

Internet

Transportation

Travel

Alt Data Providers by Data Type

App Usage:

Credit/Debit Card:

Email/Consumer Receipts:

Geo-location:

Public Data:

Satellite:

Social/Sentiment:

Survey:

Weather:

Web Data:

Web Traffic:

--

--