The Fallacy of Absolute Awareness
An Introduction to Total Information Awareness
There has been a proposal floating around for many years to build a holistic communication analysis system called Total Information Awareness (TIA). The idea is develop the capability to track the transactions of every person within a given geographical area during a given time period to enable the identification, trailing and interception of terrorists. It is an extension of existing intelligence methodology and is the logical conclusion of an evolution in intelligence that has embraced technology to facilitate the interception and analysis of communication traffic. Shades of TIA existed for decades in projects like Echelon and entered the public consciousness with the Snowden revelations.
The modern formulation of TIA originated in the USA and became visible in policy circles during the tumultuous period after September 11th 2001 though its existence in planning circles is likely to have predated this by a considerable period. The original TIA program was designed to record and data mine all transactions occurring in US territory with the rationale that such an activity could help prevent further coordinated terrorist activity inside the USA. Given recent events in Europe it is entirely possible there will be increased interest from policymakers in this type of approach.
A vision to cover everything
TIA differed from past communication interception projects in scope. It aimed to provide intelligence agencies with a complete picture of all the communication networks operating in a defined territory. The provision of this holistic picture was intended to allow pattern analysis, key word filtering and other methods of data mining to detect dangerous individuals and groups.
Because of its scope the TIA program was described by critics as a dangerous threat to civil liberties. Its advocates countered by insisted that it was a necessary measure for the protection of national security. The very nature of its pervasiveness meant that TIA polarized opinion and created a level of discord that ultimately lead to US congress formally closing the TIA program in September 2003. However, the formal closing of the program did not mean its extinction, as the resources involved moved to the Defense Department and its associated activities were relabelled the 'Information Awareness Office.'
The ethics of the proposed system are not the only problem
The debate around the ethics of TIA is not the only reason to challenge its existence, and it is questionable whether the program could ever reach its goal of creating a complete map of communication transactions. Actually, there is strong evidence to suggest that the program goal is a mathematical impossibility. This fact is often lost in the noise of the moral debate and yet it is vital to address in the context of considering a trade-off between civil liberties and security.
Why TIA Cannot Work
The problem that hobbles TIA is called the base rate fallacy. This is when information pertaining to a prior probability is not taken into account if it does not appear to have a casual relationship with the problem at hand. When an agency searches for terrorists by data mining a nation’s daily events it is actually entering a swamp of transactions, interactions and situations that will inevitably cause the underlying analysis system to generate flaws. The foundation of statistical information that affects the analysis will be subject to misinterpretation.
Bruce Schneier, a cryptographer and security expert, explained that the inherent problem of uncovering terrorist plots through data mining is twofold. The first is that no well-defined profile exists for terrorists. The second is that terrorist attacks are very rare. He stated that “taken together, these facts mean that data mining systems won't uncover any terrorist plots until they are very accurate, and that even very accurate systems will be so flooded with false alarms that they will be useless.” More formally this means that TIA is likely to fail because of false negatives and false positives. A false positive is when individuals or groups are mistakenly identified as terrorists, and a false negative is when a genuine terrorist is not identified.
Even with very few transactions and very high system capability there would be a massive number of alerts
Exploring the underlying numbers quickly illustrates the scope of the challenge. A TIA system that is 99% accurate regarding false positives and 99.9% accurate regarding false negatives would generate 27 million alerts a day in the USA if we assume that each citizen only created 10 transactions per day. Of these 27 million alerts perhaps only one per month will constitute a connection to a real terrorist plot. Increasing the sensitivity of the system to 99.9999% accuracy will generate 2,750 alerts per day at the cost of overlooking a higher percentage of real plots (false negatives).
TIA in the European Sphere
The application of TIA to the European sphere presents even greater challenges than the application of TIA to the USA. Let us assume a defined remit of common digital transactions inside EU nations ten years ago. In 2006 the European Union has a population of over 455 million people spread across 25 countries. These countries all possessed functional ICT infrastructure and had the capacity to support digital communication and monetary transactions.
The false positives and false negatives are mathematically insurmountable
Assuming ten transactions a day per citizen a TIA program would obtain almost 46 million alerts per day with a system that is 99% accurate. If the accuracy was increased to 99.9999% it would obtain 4,550 alerts per day. Each of these alerts would be a potential terrorist plot requiring an investigation to determine its validity. This equates to between 4,550 and 46 million police investigations being required each day across the EU. There would also have to be secondary analysis of the transactions deemed ‘safe’ to ensure that false negatives were not overlooked, an audit process would need to be sized in proportion to the relative sensitivity of the TIA system itself, with more significant effort needed if the TIA system was highly sensitive regarding false positives.
The key problem with Total Information Awareness is the difference between information and knowledge. TIA conceives a system to intercept and filter massive amounts of information while is subsequently sifted for certain key indicators of activity. Because of the massive amounts of data captured and analyzed a correspondingly massive number of potential alerts are generated and these alerts must be reviewed to determine their validity. Finally the information gathered must be turned into actionable knowledge by human operators.
There is a limit to how much information can be intercepted and filtered because there is a limit to the number of trained personnel available at a given time to review the output of the automated systems. A dangerous assumption in the formulation of TIA was that machines would constitute the primary intelligence tool. The review and audit process was discounted or marginalised during its creation and introduced a fatal weakness.
TIA is an aspirational goal but not necessarily a mathematical reality. It is conceivable to build equipment to monitor trillions of communication transactions but less so to investigate the alerts generated and to audit the system for alerts missed. Arguably we are approaching a point in Artificial Intelligence where such investigations may be conducted by machines in the near future but there is no evidence of such capability existing today.
It is better to have some good knowledge than a lot of useless information
It is dangerous to deploy and rely on TIA as a concept for enhancing security. To assume it can generate holistic information awareness is to operate under a false assumption which compromises the security process and introduces increased vulnerabilities. It provides better security to design partial information awareness systems that supplement existing investigative methodologies. It is therefore such partial systems which should be considered from a policy perspective and which should be debated in the context of balancing civil liberties and security.
This is a revised version of a commentary originally written for the Opendawn website.