Data Ownership and Privacy Protection Spark New Debates
As cookies crumble and the digital data environment shifts dramatically, what new safeguards and models will win out?
By Paula Klein
Digital data is at the center of a storm.
Online market data, once used primarily by e-commerce sellers and platforms to measure clicks and manage e-commerce traffic, is now a key analytics tool to attract and personalize content to individual consumers and to boost sales and revenue. It’s also a competitive weapon.
A panel discussion at the recent, two-day MIT Conference on Digital Experimentation (CODE@MIT) examined this changing landscape — including Google’s controversial plan to phase out third-party cookies by next year, known as cookie deprecation. Experts also talked about personalization strategies, how to value data, platform regulation, and the impact of sophisticated tools like LLMs and AI on competition, public policy and privacy. Critical questions were raised including: Who owns your data? Can you put the genie back in the bottle? And are there economic benefits to personalization?
Taken together, the changes are so dramatic, that it may seem like the “golden age of marketing experimentation is waning,” claimed Dean Eckles.
Eckles is the William F. Pounds Associate Professor at MIT Sloan, a research group leader at the MIT IDE, and a CODE@MIT organizer who led the discussion. In its place, a new age of (perhaps?) even better data collection and ad targeting may arise, panelists said.
Optimistically, new methodologies such as AI and LLMs have the potential to boost e-commerce on a large scale while also protecting consumers and the economy,
said Eric Benjamin Seufert, General Partner at Heracles Capital. “Lots of smart people are working on this, and cryptographic methods or federated learning might be even better for both sellers and consumers,” he said.
A Post-Cookie World?
Because cookies have long been the standard for online market data collection, cookie deprecation has spurred passionate arguments — for and against its implementation. Originally, announced in 2020, Google’s end date for third-party cookies was pushed back several times, now to 2025. The current proposal is for this deprecation to be “opt-in.” Google announced plans for a Privacy Sandbox and said it would “introduce a new experience in Chrome that lets people make an informed choice that applies across their web browsing…”
Google said its decisions were made after much consideration of feedback “from a wide variety of stakeholders, including regulators like the UK’s Competition and Markets Authority (CMA).”
At CODE@MIT, Seufert said that complying with privacy regulations like the GDPR in the EU has had “disastrous” effects on small businesses. He fears that finding alternatives to cookies could also have that impact. “The CMA is trying to stop [cookie deprecation] because it will be good for large companies, but regulators don’t understand the tradeoffs” for small businesses. He believes that Google benefits by keeping data on its own search engines.
However, Malika Korganbekova, Assistant Professor at the University of Chicago, presented the economic benefits of personalization algorithms.
“Regulators start from the position that personalization is harmful, but that’s not exactly true.”
She recently conducted research of personalized online ads with retailer, Wayfair. The experiment, with nine million users, over three years, shows that personalized recommendations benefit consumer search with faster, better matches.
Additionally, privacy restrictions imposed by browsers such as Safari and Chrome limit the quality of individual-level data used in personalization algorithms, according to her research. Large-scale randomized experiments indicate that personalization increases seller and platform revenue and leads to better consumer-product matches. (Read the working paper, Balancing User Privacy and Personalization.)
Seth Neel, Assistant Professor at Harvard, noted that “we are still lacking good ways to measure marginal value” of data; it’s very context-dependent and changing rapidly with new training data sets used for LLMs and AI. “How will accuracy change? The downstream use cases aren’t always clear,” he said.
Challenges of Machine Unlearning
Moreover, Neel spoke about the desire by some — and required in the European GDPR — to reverse long standing data collection policies and to delete personal data.
So-called machine unlearning is an “incredibly difficult task” much like “putting the toothpaste back in the tube,” he said.
When users want to opt out or take ownership of their data, as proposed in the EU, it poses serious challenges.
With smaller, simple models, “we understand how the data is used and can modify it without starting from scratch,” he said. Larger models are much more difficult.
Neel said that a weaker alternative to eliminating data on the training model entirely could still offer some privacy protections. Some efforts are under way to limit access and use of private data, known as a “right to be forgotten” on the web.
Seufert said that developers have to move forward more cautiously to protect personal data and to avoid a surveillance state — and new tools may help advertisers find appropriate audiences without third-party cookies. At the same time, the data “genie is out of the bottle,” and it may be easier to restrict downstream actions than to rework the machine learning model itself.
Do more:
Watch the full panel discussion here.
Check out the 2024 CODE@MIT agenda and speaker bios
Watch all of the 2024 CODE@MIT sessions on YouTube
Read more on Medium here.