Ad Tech DeFi

Dr. Yana Volkovich
Sep 21 · 16 min read

Co-authored by Yana Volkovich and Moussa Taifi.

There is no question that Ad Tech is facing a new era. With the retirement of third party cookies and internet users becoming ever more concerned about privacy, Ad Tech businesses have started to look for new solutions to maintain this vibrant ecosystem. Many of these solutions are meant to mimic the old ones with some adjustments to accomodate novel paradigms. In other words, even though the industry is changing, the underlying concepts are staying, in many ways, the same.

In this post, we explore how a new Ad Tech ecosystem might look in the near future, operating in Web 3.0 with technologies such as edge computing, decentralized data networks, and artificial intelligence. We will discuss a vision of Decentralized Finance (DeFi) in which Ad Tech products become available on a decentralized public blockchain network. In contrast to the current Web 2.0, where companies are dictating data monetization models and do not typically share revenue with users, we envision that the future Internet will look different from how it does today, with users being equal participants and having an opportunity to profit alongside the traditional players in the Ad Tech system.

Walled Gardens by Yana Volkovich

Web versions of the Internet

Web 1.0 & 2.0

The Web is 30 years old. In some sense, a fully “grown up” Web. It started as Web 1.0 (a mostly anonymous place with few content creators, where the vast majority of users simply acted as consumers of content). This original Web evolved, and is currently operating as Web 2.0, sometimes referred to as the Social Web. However, the next paradigm, that one that will shift us to Web 3.0, seems to already be emerging.

Web 2.0 currently aims to provide a rich user experience and encourage user participation. In return, Internet users are expected to de-anonymize themselves, show good behavior, and continuously generate content. Web 2.0’s business models rely on user participation to create fresh content. These same users generate behavioral data that is often sold to third parties for marketing purposes. By design, Web 2.0 requires lots of personal data to be collected. In the Ad Tech domain, this data collection enables marketers to target users in new ways, such as cross-device graphs and behavioral segments, resulting in highly personalized ads. To achieve their advertising and revenue goals, publishers, advertisers, and too-many-to-count middle men (e.g., data providers) rushed in to collect as much personal and behavioral data as possible. Data became the new gold.

Future paradigm (Web 3.0)

Wikipedia refers to Web 3.0 as the Semantic Web, which is based on the Resource Description Framework (RDF). RDF data models store information as “subject–predicate–object” triples. For example, “NYC Central Park” in RDF is represented as the triple: a subject “Central Park”, a predicate “is located in”, and an object “New York City”. The collection of these RDF triples can be represented as a labeled directed multi-graph (i.e., knowledge graph). Knowledge graphs are already widely used in many AI applications (e.g., Google Search or question-answering services like Amazon Alexa). Combining all possible graphs could be used to form a Global Brain (potentially the ultimate AI technology) in which all data is connected and understood contextually and conceptually.

A recent Forbes article claimed that the Semantic Web, in such a form, would not materialize because it’s nearly impossible to implement with the technologies currently available. However, in the same article, an alternative to Web 3.0 is proposed. This alternative utilizes new technologies such as distributed ledgers (a consensus of replicated, shared, and synchronized digital data geographically spread across multiple sites, countries, or institutions) and blockchain storage technologies. In this conceptualization of Web 3.0, the main themes are data decentralization and transparency in a secure environment.

A Web 3.0 design which leverages blockchain technology would by definition be open. In this new paradigm, users will operate in a space of open identity and reputation. This open space enables users to be in charge of their personal data and port their identities across various providers. Tim Berners-Lee (inventor of the Web) is now working on a concept of secure personal web servers for data pods (see Solid project). These servers and pods would allow people to store their data securely in decentralized data stores and promote interoperability (i.e., different applications can work with the same data). Specifically, pods support storing Linked Data, and that brings us back to Semantic Web.

Web 3.0 will likely rely on three new layers of technological innovations: edge computing, decentralized data networks, and artificial intelligence. Once adopted, Web 3.0 will enable a whole new wave of previously unimaginable businesses and business models and eventually serve the personalization and information organizing needs of users. This is markedly different from the current Web 2.0 model, where publishers and service providers dictate the information and personalization available to users.

One of the signs of a shift in the Ad Tech industry towards Web 3.0 includes solutions such as the Google Privacy Sandbox which has a stronger focus on AI. In addition, the postponed (but inevitable) death of the third party cookie catalyzed the search for new technologies to collect, exchange, and act on personal data. However, most of the proposed models still rely on the Web 2.0 paradigm: that is, walled gardens rather than a decentralized global marketplace.

In the next section of this post, we discuss the current state of Ad Tech in terms of both the marketplace and solutions. Then, in the following section, we explore a vision for the future.

Data Marketplace

Today’s Data Marketplace

The image to the left is a simplified depiction of today’s advertising data marketplace. In this model internet users interact with multiple companies (online and offline) on a daily basis. Some examples of this interaction are:

  • Users share their personal information, create content, consume products, etc.
  • Companies collect user data to buy and sell it through an “exchange”.
  • Buyers (advertisers) and sellers (publishers) exchange the user data directly with each other.

In addition, due to the complexity of the advertising marketplace, user data also goes through a chain of agents such as data/service providers, DSPs, and SSPs.

The users themselves are clearly not part of the buying and selling activities. They have little to almost no control over the sharing of their information between different companies and do not receive any direct cash revenues from these transactions. However, user’s data is leveraged to provide tailored experiences and potentially free access to content. There have been several central governmental attempts to help users to gain back some control over their data. Initiatives such as GDPR in the EU and CCPA in the US are examples of these efforts. Nonetheless, these solutions are no silver bullet due to low transparency within the current Data Marketplace. For example, CCPA’s “do not sell” setting on one website does not require all partners with whom the user’s data has been shared to delete it as well. Often, users are not even aware of their data being shared with other partners and data providers. Companies are also not making it easy for users to revoke their consent. This friction can come in the form of these companies making “sharing all” or “accept all” their default setting. Or even making the “opt-out” hurdle as difficult as sending a physical letter. Of course, the more data companies dig, the more gold they collect, so they are not incentivized to make it easy to do.

Transitioning to the future Data Marketplace

While companies are trying to maximize the data they collect about their own users so they can leverage it, they also want to limit what they exposure to their partners. This is to retain control of the data and prevent these partners from reselling it. Recently, many new solutions (including Data Cleanrooms and new Ad IDs) have emerged to help companies achieve these goals.

Future Data Marketplace

In this new model, the marketplace flow described in the previous section stays almost the same (see image). However, some “magic” (AI, hashing, encryption, etc.) is added to the data before it hits an exchange. The goal of this “magic” is to (a) hide the true data from other marketplace partners (b) while keeping the ad mechanics (re/targeting, matching, selling, and buying) almost as seamless as before.

Today, the Ad Tech industry continues to explore different solutions. So far, most of the proposed solutions can be split into two groups:

  • aggregated sharing data solutions
  • ID sharing data solutions.

Aggregated data solutions

These types of data solutions transform a user’s personal data (e.g., browsing history) and combine it into meaningful groups while also preserving the user’s anonymity. One potential candidate for this type of solution is Google FLOC.

Using this scenario, a set of selected users provide their browsing histories to train a model (aka clustering algorithm) from which a set of large groups of potentially similar interests is created. This model then assigns all users to its groups based on their browsing history. Finally, these group IDs are used in online auctions.

This solution may seem smarter (and more Web 3.0 compliant) because it uses artificial intelligence techniques. Nevertheless, it is still not ideal as it remains centralized and is owned and operated by only a handful of companies (in the case of FLOC, one very big corporation).

ID sharing solutions

The idea here is to use encryption and decryption. As part of transacting, buyers and sellers collect IDs that are also often PII (personally identifiable information), such as name, address, email, or phone number). Since the buyers and sellers don’t want to share the data they collect with other entities, they often engage data brokers to allow them to transact on this data without compromising its proprietary nature. These brokers naturally end up collecting data from many parties.

Since 2020, the problem of data sharing has become particularly relevant due to the anticipated deprecation of third-party cookies. As a result, there are currently as many as 80 solutions trying to determine how to share data while protecting PII. Below are two of the most widely adopted solutions, for illustrative purposes:

  • LiveRamp’s RampId: LiveRamp serves as the data broker, applies encryption techniques, and provides each client with unique keys and proprietary ID mappings. As a result, none of the participants can use IDs from others, and LiveRamp has access to IDs from all participants.
  • Unified ID 2.0: All IDs are encrypted twice (separate keys are updated monthly and daily) by an independent centralized system serving ‘good players’ only. The updating of daily keys ensures that yesterday’s IDs are not valid (e.g., if a bad actor wants to store and reuse them).

However, since both of these data sharing solutions require centralized systems, they do not allow internet users to participate in the ad exchanges or control their own data.

Blockchain Ad Tech

Search for an ID

While collecting all this data, companies still need to identify their users and match their data with that of other vendors. To identify users of their services, companies leverage user names and passwords (walled gardens). To match this with data outside their services, companies need to map it to standardized IDs. However, there are currently no universal IDs for internet users. As a result, data matching is a big issue and several new businesses have emerged offering various matching solutions.

The user’s identity is even harder to establish when they are not signed-in. Nevertheless, these users can still be tracked by their IP address, device ID or location. Even so, there are still some limitations to these tracking solutions. Firstly, IPs were not designed to serve as a personal user name system (e.g., they are not static). Next, even if IPv4 currently allows us to identify users across different devices (cross-device graph), it will eventually be replaced by IPv6 which is more resistant to IP-based tracking. Finally, Apple’s plan to route internet traffic through relays to mask who’s browsing makes IPs even less reliable as an identifier. Device IDs, such as IDFAs and AAIDs, are also under privacy scrutiny and both Apple and Google are making it more difficult to track across apps and websites.

Remember that part of the reason for owning data is to match it with other market participants. The best and easiest way, of course, is to link everything to some universal ID. There have been several attempts to create a universal ID for internet users. Twenty years ago, the W3C proposed the WebID to uniquely identify a person or a company, but this initiative was not successful. Other potential solutions have been Social Sign-On, such as those offered by Google, Facebook, and other third party websites. While it offers a somewhat universal ID solution, Social Sign-On unfortunately also leads to the centralization of all this data in the hands of a few companies. With centralization comes greater security risks since the security of such IDs can be compromised in an instant (e.g., if your Google account is hacked).

“Human NFTs”

It may at first seem strange to propose NFTs as a potential solution for establishing universal IDs. However, there is a reasoning to this. During the first half of 2021, the sale of non-fungible tokens (NFTs) surged to $2.5 billion. NFTs are associated with specific digital or physical assets. The concept of an NFT also includes a specific use of those assets for a given purpose. At its core, an NFT is a unit of data stored on a digital ledger (i.e., blockchain) that certifies a digital asset to be unique and, therefore, not interchangeable. NFTs can be bought and sold on digital markets. Each trade adds a shared value to all participants of the chain. For example, a young artist selling NFTs at the beginning of their career would continue receiving a share of revenue from those NFTs as they are resold later, when the artist and their work may have become more highly valued and recognized.

Just as NFTs can be associated with physical assets, humans can be associated with a Self-Sovereign Identity (SSI). In an SSI system, users generate and control their own unique digital identities via keys. Most SSI systems are decentralized with the credentials verified using public-key cryptography anchored to a distributed ledger. This fulfills one of the requirements of Web 3.0.

Traditionally, governments are the entities that issue and verify identities. For example, the European Union is already building on the vision of eIDAS to support electronic identification for digital transactions in the EU Market. However, Web 3.0 does not even require a central authority since verification is facilitated by users. An example of a decentralized system (outside company and governments control) currently available is Ethereum (ETH).

Ad Tech DeFi Marketplace

The Ethereum system offers users the ability to create Ethereum wallets. These wallets can then allow internet users to participate in a decentralized version of the Ad Tech ecosystem. Advertisers and publishers may also have Ethereum wallets. These interactions with Ethereum accounts are an example of Decentralized Finance (DeFi).

In this new Decentralized Ad Tech marketplace, the role of Ad Tech companies like Xandr would be to provide wallet and interface capabilities (sometimes referred to as the third layer of blockchain computers) for all types of participants (advertisers, publishers, and internet users). Ad Tech companies could charge small fees for their services similar to how it is currently done in the Ad Tech and crypto wallet businesses.

Life of an Ad Call with wallets

To illustrate what Decentralized Ad Tech would look like on a daily basis, we can now discuss how Ad calls would function using wallets. If you need a refresher on the life of an Ad call, a simplified version of the current mechanisms and workflow can be found at this reference.

In contrast to traditional Ad calls, the Decentralized Ad Tech ecosystem we envision would use 3 Ethereum-based wallets. For the purposes of this example, we will call these WA (for the advertiser), WP (for the publisher), and WU (for the user). The potential workflow process is illustrated below, then described sequentially.

Online process

Life of an Ad call with wallets
  1. Internet user (wallet ID WU) visits website.com from Publisher (wallet ID WP).
  2. The Publisher’s website.com launches the ad call (via an ad tag) and signs the user wallet in.
  3. Optional: Publisher’s website.com sends the row of the record of the user’s visit to the chain (WU, ad tag, WP). The ad tag is a piece of code on the publisher’s website that launches the ad call.
  4. Publisher sends the bid request to advertiser bidders. Advertisers bid on the bid request.
  5. Winning bid is established and the winning Advertiser (wallet ID WA) sends the creative.
  6. Publisher receive the creative and serves it on website.com.
  7. Publisher records the transaction in the blockchain (WU, WP, WA, ad tag, winning bid). Publisher also transfers a percentage of their revenue to the user and records it (WP, WU, X% winning bid).

Publisher can also record clicks or conversions back to the chain.

Offline Processes (Reconciliation, Analytics)

Instead of publishers/advertisers conducting instant payments, it might make more sense for them to perform revenue transfers on a scheduled basis (daily/monthly) as part of an offline process. This can be particularly relevant to the case of cost-per-click (CPC) and cost-per-action (CPA) pricing models, due to the potential delays of click and conversion records. For example, some of the conversions could take up to a month to complete. Data science and analytics calculations are another way in which offline processes are used within blockchain transactions.

Since Ethereum is a database with tables and rows of transactions (each with unique addresses), it can be used to perform many traditional table operations on blockchain records. Another Ad Tech-based example of this is provided below to illustrate how it might work.

Decentralized Ad Tech Offline Processes
  1. Collect served ads on publisher’s website.com from the chain (WU, ad tag, WP, WA, winning bid). If clicks occurred, collect all clicked ads from the chain (WU, WP, WA, ad tag, click).
  2. Calculate revenue for users, publishers and advertisers. Potentially, calculate aggregated analytics about the ads served for publishers and advertisers.
  3. Send revenues to WU (X% of transactions), WP, WA wallets according to the pre-defined schedule.

Some general observations:

Below we summarize some observations we made while researching this topic:

  • This new vision represents a big shift in the privacy paradigm of internet users. All browsing information becomes completely public: any wallet and chain records are accessible at any time by anyone. In the decentralized model, information should never be traceable unless a user inputs their PII (and it is later connected with their wallet). For example, users might share their names, emails and addresses as part of a click/conversion event on a publisher or advertiser website. This could potentially be one of the main shortcomings of the proposed Decentralized Ad Tech model.
  • The longer users are browsing the Web, the more valuable they become. Similar to the “cookie-based world”, users “bring” their information with them each time they visit a website. However, in the decentralized model, a user’s browsing value is translated into real revenue in their wallet. This will be the incentive for users to participate in the system.
  • In comparison to current Ad Tech solutions, the proposed model is browser- and device-agnostic. For example, MetaMask, one of the most popular Ethereum-based wallets, offers extensions for different browsers. Another important distinction is that blockchain solutions do not depend on a particular currency and can be converted to any currency as needed.
  • Although Ethereum private keys are super secure user-controlled passwords that don’t require any central service for storage, a good user interface for using these private keys is still needed. This is something that the crypto community is currently working to improve.
  • The Basic Attention Token (BAT) from Brave Browser and the Permission (ASK) browser extension are the only Ethereum-based Ad Tech solutions in the market (to our knowledge). They offer crypto-wallets and charge percentages of the transaction value. In both of these setups, user browsing histories are still stored privately.
  • Although there are several companies offering analytics solutions for the blockchain (including AnyBlock, BitQuery, Dune Analytics, and The Graph to name a few), none of them are tailored to work with Ad Tech.

Lastly, fraudulent activities and bot farms are known issues in Ad Tech. This also applies to Decentralized Ad Tech. The next section discusses some of these problems in greater depth.

Attack and Fraud Vectors, and Remediation methods

Any distributed system will eventually have to deal with malicious actors. The blockchain identity solutions we proposed are not immune to that. Blockchain technology comes with a set of inherent vulnerabilities. However, these vulnerabilities are not usually exploited, since government-level involvement is needed to provide the global scale such attacks require.

The list of identity solution vulnerabilities we are most concerned with in the Ad Tech space are:

  • Bots: Fraudsters drive traffic by infecting a user’s browser with programs that visit pages, click on ads, play videos, and perform other actions that mimim user activity.
  • Domain spoofing: Fraudsters pretend that their website is more valuable or legitimate than it is.
  • Pixel stuffing: Fraudsters show one or more ads in a tiny part of their website to drive up their ad revenue. The ads are not actually viewable by humans.
  • Ad Stacking: A website shows multiple ads in the same ad slot stacked on top of one other. This increases the apparent number of ad views. The user is unable to see any of the actual ads (except maybe the top layer one).
  • Location Fraud: Compromised browser sends incorrect location data, driving up prices by pretending to be in a valuable segment of the target population.
  • User Agent Spoofing: Compromised browser modifies the information (user agent) it sends to the advertiser to confuse bidders and drive up the value of impressions.

When using blockchain technology as an identity solution, these sorts of attacks can still occur since protecting every single browser/device, globally, is impossible. On the other hand, the same ad protection methodologies that advertisers and exchanges already use to safeguard from these attacks can be applied to minimize these risks. Similar to viewability measurement and click-bot detectors, we expect that a set of tech organizations will find it beneficial to adapt their inventory quality techniques to this new environment.

For example, one way to detect bot users is by noticing those users whose activity never has a lull (since real humans need to sleep). By finding these “never sleeping users”, advertisers have been able to blacklist many bots, and take remediation actions to protect their spend. Similarly, by using a wallet ID as a replacement for the cookie we can determine the users whose wallets never sleep. We expect that the industry will come up with a new set of robust defensive techniques in the future.

To sum up

Ad Tech buyers and sellers want and need to know who their users are, but they do not want to share their revenue with those users. Eventually, when users own their data, this withholding of revenue from the user will no longer be possible. Therefore, users must become part of the new Ad Tech ecosystem. In this future, we see internet users not as products but an equal participants.

We thank Amanda Tan and Joe Roepcke for their thoughtful comments.

*Disclaimer: The views and opinions expressed by the authors should not be considered as financial advice. We do not give advice on financial products

Xandr-Tech

Our latest thoughts, challenges, triumphs, try-again’s…

Xandr-Tech

Our latest thoughts, challenges, triumphs, try-again’s, most snarky and profound commit messages. Our proudest achievements, deepest darkest technical debt regrets (just kidding, maybe). All the humbling yet informative things you learn when you try to do things with computers.

Dr. Yana Volkovich

Written by

Data Science @Xandr

Xandr-Tech

Our latest thoughts, challenges, triumphs, try-again’s, most snarky and profound commit messages. Our proudest achievements, deepest darkest technical debt regrets (just kidding, maybe). All the humbling yet informative things you learn when you try to do things with computers.