Implications of WannaCry Ransomware on Data Ecosystem

Blue Coat Photos (CC BY-SA 2.0) Source: https://www.flickr.com/photos/111692634@N04/18495846450/in/photostream/

WannaCry is a remarkable attack. It is the first large scale demonstration, large by number of machines, of the disruptive effect of an attack on data. It was predictable and straightforward. Data as a viable attack target has been established, and bitcoin as a way to monetize attacks through its global currency and marketplace. We can only expect more attacks on data going forward. The attacks might go beyond the ransom and be more adversarial.

The cost of data access will increase due to increased tooling, process, and other overheads that will be introduced as a result of this attack. Data scientists and tool providers should expect greater scrutiny on access, integrity and communication of data. The tool providers will face tougher questions on new vulnerabilities introduced by their tools. CDOs will institute organization-wide data discipline to reduce exposure.

Why The Attack Was Successful

The main threat, as articulated by the WannaCry creators themselves, was the loss of information. Many of the impacted organizations had deployed industry best practices, in addition to configuration and backup tools along with appropriate firewalls. Despite this, the reason the attack was still successful is because simple restoration as an approach did not provide the solution. This could be due to several reasons:

  1. Data could not be restored — There was too much data on each desktop that was valuable and not systematically tracked, such as emails, docs and powerpoint files.
  2. The restoration process was not well established — Backup tools are complex, and geographical distribution or lack of automation made the cost of restoration prohibitive.
  3. Data might have been copied — The bits may have left the building and therefore restoration did not eliminate the potential for data leaks

There are two sources of vulnerabilities in modern data systems:

  1. High system complexity and data management tool complexity — Organizations don’t invest resources to make the problem manageable. It is seen as a waste of time, effort, or money.
  2. Gap in approaches — The volume, use, and value of data is outpacing our ability to manage the downside. It is not even clear who will do it in organizations. The discussion about how to cope with vast amounts of data lying all over the enterprise has just begun.

These persistent problems will require a longer discussion at the community level to find suitable approaches.

Implications

The short term implications will be, as always, driven by fear. The security spending will increase. It will make everybody’s life miserable by adding more complex software that few understand, and the story will repeat itself some time down the line.

We see the following responses to the attack from an analytics perspective:

There will be Increased scrutiny of all data players including data scientists who access critical business data. Though the loss of internal organizational email, doc files etc result in loss of productivity, the bigger threat is that any loss of critical business data will result in legal and reputational challenges whose cost is much higher.

Analytical process discipline will grow. While new techniques and technologies, including elements of AI and ML will continue to grow in the analytic space, the need to provide analytic discipline will grow. Data scientists will have to address questions regarding organization (where is the data residing?), integrity (is this data consistent?) and lineage (where did it come from?) beyond the usual questions on methodological soundness and value.

Metadata will see growth. In order to even know the scope or impact of an attack, we first need an account of what is where (location) and what is it (content and context). Tools and processes will be introduced to track critical data throughout its lifecycle. The security cost of metadata is much lower than data itself, and it is usually generated by system and therefore amenable to automation. Both these will drive greater adoption of metadata in organizations.

CDO role will receive greater attention. Balancing the security concerns and business value of data, and instituting good organization-wide data discipline is a complex task and slightly different from the roles of CSO, CIO, and CEO.

Conclusion

The full scope and depth of the WannaCry attack will emerge over time. It has warned us about what is possible. This will slow down the democratization of data and analytical work unless we build novel tools and approaches to address the underlying fundamental drivers of risk including complexity and data discipline.

This post is authored by Venkata Pingali, CEO of Scribble Data, a data automation company for analytics.