The Long Tail of Analytics
Using Moore’s Law to Find “The Next Big Thing”
“What’s the next big thing?”
Every panel discussion on political technology that I’ve ever been a part of has featured some variant of this question. In the media’s conception of things, each subsequent election is singularly defined by one technological innovation, and one alone. In 2004, it was blogs. In 2008, social media. In 2012, “Big Data.” The media (and panel moderators!) want to know what this will be in 2016.
This question may be a crowd favorite. It is certainly a favorite of those looking to craft new sales pitches around that Next Big Thing. But it fails to adequately capture the ways technology actually matters in elections.
Real technological progress happens in the natural extensions of what people are already used to. In the political arena, email has been fundamental to political fundraising since at least 2002, yet organizations who are not Presidential campaigns are only now just figuring out how to use it to drive millions or ten of millions in revenue. The story here has been one of slow, patient, iterative progress. Back in 2007, we struggled with the questions of getting slightly more personal emails from politicians through the approval process. Today, we have match kittens.
In assessing how things might progress on the technical side, we should still bet on Moore’s Law, even if the actual multiplier isn’t exactly a doubling. And in terms of real, practical impact, Moore’s Law is more about doing the things we already know how to do faster and cheaper than it is about finding new things to do.
In hunting for “The Next Big Thing,” we should be looking for these sorts of innovations: hard, expensive things that already drive value for high end today, but through increases in computing power and the commoditization of various technologies, will be commonplace tomorrow.
Predictive analytics is following this trajectory.
It is not that analytics was new in 2012. It was just so rare before that it felt new. Five or ten years ago, a modeling project might cost hundreds of thousands of dollars for single set of predictions that would become obsolete the minute the study was complete.
In 2012, the Obama campaign regularized the discipline of dynamic modeling — refreshing its models on a nightly basis. Democrats were pushing innovation down the stack as early as 2010, when Dan Wagner’s team at the DNC modeled more than 50 Congressional and statewide races. In 2012, the Democratic Congressional Campaign Committee used dynamic modeling in more than 50 House races, understanding how individual voter preferences were shifting throughout the campaign. As Democrats approached the off-year elections in Virginia in 2013, a key challenge was to see if they could “shrink” the Obama model for use in a statewide race. In collaboration with their pollster, Geoff Garin, they pegged their polling samples to actual support and turnout probabilities generated by their analytics team. The result was a much more accurate projection that showed the race much closer than either public or Republican polling. Relying on imprecise polling alone, Republicans all but abandoned a possibly winnable race.
One story here is that the tools used by Presidential campaigns four years ago will — or should—be applied at the Congressional level and below in 2016. This is a version of Moore’s Law in action — the high-end use cases of today serving the long tail tomorrow.
The broadening of analytics is happening hand-in-hand with the collision of analytics and the cloud. Here, it’s worth reviewing Drew Conway’s Data Science Venn Diagram:
“Hacking skills” are already a necessary component of data science. Yet much of this “hacking” is of a different sort than the code that runs your favorite apps and the Internet economy itself. Much of the work of data science happens after the fact, in batch, in snapshots of live data, and often in desktop software like R where datasets (typically) reside in memory locally. This means it takes a while before the lessons of machine learning make their way into production.
Innovations like Apache Spark are now pushing the boundaries of real-time data processing. Analytics can happen in closer to real time on datasets living in the cloud, rather than on local drives (as is now the norm). We are moving closer to a world where data can be re-modeled on the fly with each new signup, each new donation, each new commitment to vote.
The state of data in the political world painfully bifurcated. Modeling techniques have become de rigeur, but mostly only within the offline parts of the campaign. We collect data about how people are likely to vote — often over the phone—and use this to train models about what the rest of the electorate will do. And we use tools which are largely offline, like R, to do it.
Despite this process seeming antiquated in some ways, it represents genuine progress over the world in which we used to live, where the goal was using brute force to obtain “hard IDs” on every voter in a universe. With declining phone response rates, and the difficulty of scaling a door-to-door field program, modeling has helped campaigns fill in the blanks and make better-than-random guesses about how the next voter they talk to might vote.
We are moving closer to a world where data can be re-modeled on the fly with each new signup, each new donation, each new commitment to vote.
In the online world, digital practitioners are collecting mountains of data about issue and vote preferences. It is believed that Obama for America built an in-house email list nearly 20 million strong. This was not only used to drive a combined $1.2 billion in revenue between 2008 and 2012, but it also represents nearly one third of the total votes the president would eventually receive.
Digital is proving itself more adept at cost-effectively identifying potential supporters, yet the digital industry itself is largely operating on the old model of hard IDs, using the data solely as an end unto itself (for fundraising, outreach or other core purposes) rather than modeling initially small lists of supporters to gain insight into the dynamics of the larger electorate. The character of the first 100, or 1,000 donors or supporters, if matched to voter files and consumer files, can tell you a lot about the kinds of people who might support you amongst the tens of millions who don’t already.
Yet, much of the talk of data in the political world is focused on the smallest, most tactical aspects — targeting and segmentation—rather than the strategic work of optimizing, that is, using the data to find the overarching reasons why a voter might support you, and making sure that is reflected in everything from candidate appearances to online advertising.
Right now, a sample use case of data in the digital realm might involve collecting the names of the hundreds or thousands of people who liked your Facebook post about Benjamin Netanyahu, matching them to your email list, and tagging them as a supporter of Israel or a national security voter, and tailoring content to those hundreds of people accordingly (and probably dozens, since not all your Facebook supporters are on your email list, or will match).
These use cases are all the rage in the digital world, and while potentially helpful as a supplemental tactic, I find these examples less than impressive. Micro-segmentation of this sort represents an exponential increase in effort for a marginal benefit that further risks fragmenting your message and voters losing the bigger picture about what your candidacy is truly about. In 2008, the Hillary Clinton campaign was guided by Mark Penn, who was obsessed with “Microtrends.” Barack Obama ran on a unified message of “hope” and “change.” Who won?
So, does this mean all messaging decisions get made based on gut? No. Data should still guide us. But data should be used to optimize, not just to target. What does the data tell us is the best possible message for the entire electorate—and for areas where it doesn’t work as well, supplement that with more targeted messages.
Segmentation has a place, particularly on hot-button issues or in highly divided urban electorates, but it is rarely the strategic driver it gets made out to be. And to the extent segmentation is used, it needs to be used where messaging resources are truly scarce, using machine learning on digital data to prioritize the millions of people outside your universe rather than the tens of thousands within it.
There is an aching need to apply predictive analytics to one of the richest data sets that a campaign will collect — digital data—not just to optimize the digital side of the campaign, but to optimize everything. In the targeted sharing applications we built in the 2014 cycle, we’ve found that an algorithmic distillation of a person’s Facebook likes is 85% likely to predict who they’ll support, which rivals any voter-file based model and exceeds the predictive power of traditional consumer data on its own. Using these Facebook likes to prioritize voter contact and surveys, users of these tools tagged hundreds of thousands people as supporters from friends’ lists in targeted states.
An algorithmic distillation of a person’s Facebook likes is 85% likely to predict who they’ll support, which rivals any voter-file based model and exceeds the predictive power of traditional consumer data on its own.
Perhaps recognizing the enormous political (and commercial) power of this sort of data, Facebook has pulled back on letting app developers use friends’ data in these applications. Yet this initial experience confirms the extent to which us as users of the Internet are willing to self-identify in ways that lead political candidates and others understand exactly who we’ll support, and to infer much more beyond that. Beyond Facebook, we are likely to see many more examples of digital breadcrumbs being used to inform the predictive models generated by campaigns in 2016.
Innovation isn’t always about The Next Big Thing.
Often, it’s about doing the current thing better, driven by key building blocks that just weren’t there before, like smartphone adoption or the commoditization of cloud computing.
This is just as true in politics as it is everywhere else.
So, let’s stop asking about The Next Big Thing, and start asking what we can build by combining the things we already know how to do in interesting ways.