The Three Vs of Enhanced Interrogation
Did the CIA’s failed torture program have its roots in Big Data thinking?
Between 2001 and 2006, dozens of prisoners held by the United States were subjected to brutal practices of torture and abuse. Waterboarding, rectal feeding, rape threats, psychological trauma: the list of violations is long and terrifying.
The Senate Intelligence Committee report on CIA torture specifically identifies thirty-nine detainees, but the total number of people subjected to torture may have been much higher. The US operates numerous ‘black sites’ around the world including Bagram and Abu Ghraib, only a small percentage of which have been officially recognized. It is estimated that there are more than 100 locations around the world which have been used for detention and interrogation of terrorist suspects. More importantly, the report addresses only violations that occurred when prisoners were in US custody, and doesn’t include rendition — when the CIA handed prisoners over to third countries for interrogation or imprisonment. It is possible ‘enhanced interrogation techniques’ were used on hundreds of people.
According to the Senate report , released on December 9th, these long years of violence were ‘in violation of U.S. law, treaty obligations, and our values’. The report goes on to say that explicitly that torture was ‘not an effective means of acquiring intelligence’ and furthermore that ‘justification for the use of its enhanced interrogation techniques rested on inaccurate claims of their effectiveness’.
So, why was torture conducted at such a large scale?
One answer can be found in the report, which describes a long chain of misleading information and outright deception, in the service of maintaining careers and supporting generous payments to contractors. For another answer, we might look towards the military’s changing relationship to data in the early part of the 21st century.
After the end of the Gulf War in 1991, the US Army initiated a long-term program to develop better information systems to be used on and off the battlefield. The result of these efforts, known as the Distributed Common Ground System (DCGS-A), is a hardware and software ‘weapons system’ designed to process large amounts of intelligence. The $2.7B DCGS-A was built on top of a belief that if enough data was collected, pattern and insight could be discovered that wouldn’t be findable through traditional, small-scale intelligence techniques. By the time the CIA’s ‘enhanced interrogation’ program began, this nascent idea of ‘Big Data’ was driving thinking (and investment) from Langley to Silicon Valley. Former CIA director Michael Hayden’s description of data use at Guantanamo could easily have been lifted from a data startup’s pitch deck, circa 2002:
“Nothing that we get from the program, however, is used in isolation. It’s a data point that then has to be rubbed up against all the other data points we have available to us.”
— Michael Hayden
There are few tech aphorisms that have had as much staying power as ‘The Three Vs of Big Data’. The Vs were coined by META Group analyst Doug Laney in a 2001 report, in an attempt to distill the core properties of Big Data down to three adjectives. According to Laney, and the thousands of other pundits who have cribbed from him, Big Data is grand in scale (Volume), fast (Velocity), and intermixed (Variety). These promissory nouns do well to explain the justification for programs like DCGS-A, funded to process more data faster, and to piece together intelligence from disparate sources.
The 3Vs paint an adulatory picture of the big data Big Data, but they omit many potential dangers and shortcomings. Where can we look to understand how Big Data goes wrong?
A good start would be S, for Sparsity. One of the great implicit promises of Big Data is that you don’t really need good or accurate data — imperfections and gaps will disappear at scale. Thus, if you are not getting the intelligence you need with traditional information gathering techniques (ie. torture), you need not focus your energies on getting better data; you just need more. Big Data’s mythic ability to remedy sparsity is a huge part of its appeal; focusing on the collection of more data is fundamentally easier than thinking about the collection of better data. It’s no surprise that intelligence agencies have been particularly eager to fasten themselves to the promise of Big Data, both in its worship of volume, and its embrace of the incomplete.
Behind sparsity, though, hides the gorilla in the room of Big Data: as the size of a dataset grows, so does its capacity for error. If a data point is statistically weak, either because of the way it is measured, or the way it is stored, or the way in which it is processed, this weakness doesn’t automatically go away at scale. As machine learning researcher Michael Jordan said in an interview with IEEE in October:
“When you have large amounts of data, your appetite for hypotheses tends to get even larger. And if it’s growing faster than the statistical strength of the data, then many of your inferences are likely to be false. They are likely to be white noise.”
— Michael Jordan
Academic researcher Kate Crawford has written extensively about what she calls the ‘epistemic big-data ambition’, a desire for problems to be solvable solely through the collection of a ‘complete’ data set. She points to the failure to apprehend the Boston bombing suspects as an example: authorities collected vast swaths of data, and indeed had evidence to point the Tsarnaevs, yet the ‘real data’ was obscured by multitudes of false alarms, blind alleys and road blocks:
“The bigger the data gets, the more small things can be overlooked. The risk of being seduced by ghost patterns in data increases with the size of the data sets.”
— Kate Crawford
Put simply: more is not necessarily better; indeed it can be worse. In its extensive use of torture to acquire information, the CIA may have seen this first hand. Extensive torture led not to stronger data, but to larger amounts of weak data, along with more faulty hypotheses and time-consuming dead ends.
Consider another addition to the Vs: D, for Distance. An aspect of Big Data practice that must hold a great deal of appeal for intelligence operations is the remove which it allows between the collection of data and its analysis. This can be seen very clearly in the activities of the NSA, where surveillance is conducted not by agents in remote locales but by computational sorting of e-mails and twitter messages. A primary benefit here is cost: in a paper published this year by the Yale Law Journal, authors Ashkan Soltani and Kevin Bankston show that the cost of new methods of surveillance can be an order of magnitude cheaper than traditional methods. By lowering cost, distance then becomes a facilitator of scale, and driver toward the goal of more.
At Guantanamo, another operational benefit of distance is apparent. Intelligence software systems like DCGS-A are metadata tools, focused not on analyzing data as it was directly collected (ie. transcripts from an ‘enhanced’ interrogation session) but on extracting additional layers of information from the data (names of people and places, images of faces, etc.). These tools are operated by specifically trained individuals, often outside contractors, who may have little or no contact with the ‘on-the-ground’ operations that are supplying data.
In these operations, horrific practices were taking place:
“After Abu Zubaydah had been in complete isolation for 47 days, the most aggressive interrogation phase began at approximately 11:50 AM on August 4, 2002. Security personnel entered the cell, shackled and hooded Abu Zubaydah, and removed his towel (Abu Zubaydah was then naked). Without asking any questions, the interrogators placed a rolled towel around his neck as a collar, and backed him up into the cell wall (an interrogator later acknowledged the collar was used to slam Abu Zubaydah against a concrete wall). The interrogators then removed the hood, performed an attention grab, and had Abu Zubaydah watch while a large confinement box was brought into the cell and laid on the floor. A cable states Abu Zubaydah “was unhooded and the large confinement box was carried into the interrogation room and paced [sic] on the floor so as to appear as a coffin.” The interrogators then demanded detailed and verifiable information on terrorist operations planned against the United States, including the names, phone numbers, email addresses, weapon caches, and safe houses of anyone involved. CIA records describe Abu Zubaydah as appearing apprehensive. Each time Abu Zubaydah denied having additional information, the interrogators would perform a facial slap or face grab. At approximately 6:20 PM, Abu Zubaydah was waterboarded for the first time. Over a two-and-a-half-hour period, Abu Zubaydah coughed, vomited, and had “involuntary spasms of the torso and extremities” during waterboarding.” — Excerpt from the Senate Intelligence Committee report on CIA torture
The accounts are sickening; the acts which are recorded even more so. Yet the data analyst, at a convenient physical remove from these actions, is able to do his job without concern about where the data came from or how it may have been gathered. Staff members present for the interrogation described above were “profoundly affected . . . some to the point of tears and choking up”, yet the analyst sees data only in abstraction.
In some ways military data analysts are like drone operators, physically removed from the fields of war. However, where drone operators can and do suffer stark psychological consequence from their activities, the ‘data operators’, working largely in abstraction, are spared from the messy consequences of the real-world. Data collection through torture offers an extreme example of the ‘moral distance’ inherent to Big Data thinking, the ability to operate at a remove from the real world systems from which data is collected.
Given the technological climate that existed 15 years ago, it seems probable that Big Data’s nascent promise of ‘more from more’ would have played a role in the conception of the CIA’s enhanced interrogation program. In addition, the In its failure, the program might represent one of the earliest and most significant reminders of Big Data’s shortcomings.
However, while CIA’s torture program ended in November 2007, its enthusiasm for Big Data continues unabated. In a curiously circular arrangement, the CIA both funds and contracts to Palantir (founded in 2003), a Silicon Valley company which makes software to analyze large sets of unstructured data. Recently ‘leaked’ documents confirm that Palantir is deeply linked not only to its surrogate parent — the CIA, but to a laundry list of intelligence agencies, military branches, police forces and regulatory organizations.
It has general been a weakness in our approach to data & technology that we have been eager to focus on the benefits — the 3 Vs — while conveniently ignoring more problematic properties like sparsity and distance. The moral affordances of this ignorance are visible in some of Big Data’s most questionable endeavours: Facebook’s experiments on 689,003 of its users, or New England Complex Systems Institute’s erroneous labeling of Hunter High School as ‘the saddest spot in Manhattan’.
It is easier for us to connect Big Data to solutions than it is to problems; more convenient to attach its philosophies to smart cities than to smart bombs. Palantir’s document boasts that its technologies are used by States to find IEDs, to catch financial frauds, to support ‘the cops on the streets’. Are they are also being used to target drone strikes? To to surveil activist groups?
Big Data was born in Silicon Valley, but in many ways it’s a military brat, raised by the NSA and the CIA and the DoD as much as Google, eBay and Amazon. In fourteen years its technologies and philosophies have carved a wide swath of impact, not only our social media feeds and monthly bank statements, but our battlefields and in our prisons. As it grows to adulthood, we’d be wise to balance our optimism about its future with a measured understanding of its shortfalls and a caution about its potential for harm.