We built a model that could have predicted the Paris attacks. Here’s how.

Predata’s threat signal around terrorism in France showed a clear spike in the days leading up to November 13.

Richard Laurent

The intelligence failures that led French and European authorities to miss warnings of the November 13 attacks should open the way for greater use of open-source predictive analytics in threat monitoring.

The week following the Paris attacks has seen much hand-wringing over the so-called “intelligence failures,” in France, across Europe and beyond, that allowed the terrorists to plan and execute their strategy undetected. The fact the attackers were well known to various national intelligence agencies — as was the case with the Charlie Hebdo attacks of January 7 2015, and several other terrorists incidents throughout Europe in recent years — has only heightened the shock of the events of November 13. With all the resources and data monitoring tools at France’s disposal, how could this have happened?

While calls for more robust “human” intelligence gathering in the wake of the attacks are welcome, the painful experience of Paris also invites a greater role for open-source predictive analytics in threat monitoring. After an intelligence failure of this magnitude, the temptation might be to reject the recent shift towards more cyber-focused, data-intensive intelligence gathering processes. This would be a mistake. At Predata, we scraped metadata from various web pages relevant to terrorism in France and built a model which successfully warned of the November 13 attacks. The model is retrospective, but that’s not to say it cannot be useful in the months and years ahead. Tools like this can be an important aid for intelligence agencies as they seek to comb through an ever-expanding universe of data relevant to threat detection.

The global intelligence community has, of course, enjoyed significant successes over the last 15 years in preventing terrorist attacks; French officials said Abdelhamid Abaaoud, the purported mastermind of the Paris attacks, was at the centre of four of the six major plots it had foiled since this summer. But despite these successes, the global intelligence apparatus appears to have become less preventive and more investigative, after-the-fact. National security agencies — in France and elsewhere — have shown themselves to be adept at identifying potential attackers, but bad at predicting specific attacks. This is not for a want of information. Iraqi officials, for instance, have indicated they warned France a significant attack was imminent in the days leading up to November 13; French officials have countered that such warnings are routine. Rather, the problem may be a surfeit of information, with too few analysts and too few tools to sort through and prioritize the climbing tsunami of data intelligence agencies face.

Compounding the problem is the fact that terrorist groups are increasingly using secure communications, with end-to-end encryption, to organize mass attacks, making their actions less visible to authorities. We now know the Paris attackers used Telegram, a secure, open-source messaging service, to plan the carnage of November 13. The growing use by terrorists of apps such as Telegram only increases the need to seek better intelligence through analysis of open, unencrypted sources.

Unsurprisingly, there has been no shortage of commentators willing to step forward and offer solutions to the failures that allowed November 13 to happen: Intelligence agencies need more resources and more personnel for threat monitoring; there needs to be a return to old-fashioned person-to-person surveillance; Europe needs an EU-wide central intelligence agency to coordinate data and information sharing between national bodies; telecommunications and technology companies need to introduce decrypted “back doors” to allow governments to monitor secure communications; the “Five Eyes” agreement must be expanded to include France and other U.S. allies; and so on. These solutions revolve around three central themes: more money, more cooperation, more surveillance power. But money is in short supply, especially in recession-hit Europe; international cooperation takes time; and overly intrusive surveillance puts civil liberties at risk, an especially raw concern given the framing of the Paris attacks as an assault on “western values.”

These obstacles make it important to cast the net wider in the search for tools to assist with threat detection. Terrorists organizations such as the Islamic State are famously adept at manipulating social media; their online footprint is deliberately, brazenly large. The Predata approach is built on the idea that this footprint can be exploited to generate meaningful signals to warn of imminent terror attacks. Much of this information is freely available on the open internet; gathering it requires no intrusion into civil liberties. Secure messaging platforms like Telegram may be where specific attack plans are coordinated, but the open internet is littered with material that can give us important clues about where and when the attackers will strike next.

For this model, we scraped the metadata users leave in strategically important places like Wikipedia, YouTube and mainstream news media sites and used that information to generate a signal around the theme of terrorism in France. We limited the signal to two main areas or topics — the French-language discussion on Wikipedia around ISIS/ISIL, and Opération Chammal, the French military operation in Syria — and measured both the level of activity (pageviews, number of participants, etc.) and the level of contestation (essentially, how argumentative activity is) on the web pages selected.

The signal, an index of the volatility of online chatter, showed a clear spike in the days leading up to the November 13 attacks, driven in large part by activity on the French-language Wikipedia page on the Islamic State. That in itself is revealing, but the more important step is what we did next. We back-tested the signal and regressed it over the last 12 major terror incidents in France, a period covering 431 days. We set the prediction window at 30 days and the alarm threshold at 80%; in other words, an alarm would be raised only if the likelihood of an attack within the next 30 days exceeded 80%.

Structured this way, the signal was successful across the 431-day test period in predicting an imminent attack, within a 30-day window, 61% of the time. Retrospectively, the prediction signal warns successfully of the November 13 attacks; on each of the 30 days leading up to November 13, that is, it would have put the likelihood of a terror attack in France at 80% or greater. It also warns successfully of the failed Thalys train attack in late August. It does not retrospectively warn of the Charlie Hebdo attack, though it’s important to remember Al Qaeda in the Arabian Peninsula claimed authorship of that assault, not ISIS. Some of the results can be seen here:

The model we’ve built at Predata is not, of course, the entire solution. It can only ever be one threat detection tool to complement others. But as intelligence agencies intensify their search for better methods to foil attacks, smarter use of predictive analytics can be an important addition to the toolbox — and one that sidesteps, importantly, the complex politics associated with other solutions put forward après Paris.

richard@predata.com | twitter: @predataofficial | www.predata.com