In light of the revelations of a massive data collection and snooping effort by the NSA, one response has been to suggest that privacy advocates are overreacting, and that, as a friend put it, “the scale of abuses reported is minimal/nonexistent” so this is not that big of a deal.

That the abuses of this massive data trove that we know of are very few is true—but that should not be a comfort as this hides a huge, uncomfortable problem. We don’t know what we don’t know, and not just in some abstract, philosophical sense that there will always be “unknown unknowns,” but very specifically in that what we know of the NSA’s data management practices strongly suggests that the NSA itself doesn’t really know how the data it is collecting is being used.

In a nutshell, here’s what we’ve learned, or has been highlighted, as a result of Edward Snowden’s leaks: Almost all major software companies as well as telecommunications giants have created mechanisms by which the NSA has access to traffic and user information that goes through that company. We have also learned that NSA has been deliberately weakening internet security so that it can eavesdrop easier on it all. We learned that NSA also taps into internet’s physical backbone and listens in to the traffic directly.

In short, the NSA is collecting a massive amount of data from multiple, varied sources. Each of these data surveillance methods produces massive amounts of complex, incongruous data in nonstop fashion. Just managing data storage at this scale is a humongous challenge, let alone categorizing and sorting it all, and then retrieving it on demand.

To manage this data beast, the NSA seems to have relied on highly-competent “sysadmins”—in effect super users. The powerful wizards. What is increasingly clear that it did not do, however, is find a way to provide an effective oversight of these sysadmins, the custodians of it all.

In a dramatic failure of quis custodiet ipsos custodes, the NSA has all but admitted it doesn’t really know what Edward Snowden took, or how exactly. Snowden, as sysadmin, seems to have had access to both user identities of more powerful users as well as his own logs, which meant he could snoop freely and delete his tracks all the while sitting pretty in Hawaii. He could pose as a very authorized person, gather the data and hide his tracks. Forensics on systems that are manipulated by a competent and powerful sysadmin are very hard. In the physical world, it’s hard to be somewhere without shedding any DNA, hair or some sort of trace. In the digital world, with the right set of permissions and access, it is possible to be traceless for all practical purposes.

The NSA, it seems, is now trying to reconstruct Snowden’s digital footsteps within the agency’s computers through indirect methods such as looking for incongruous logins (a user on vacation logs in; password resets that a user cannot recall) but these methods will only take them so far if logs were erased and written over.

Other examples of the “scarcity” of abuses are no more comforting. We know, for example, that there have been abuses of this data in few instances by NSA employers stalking their exes (jokingly called LOVEINT, the way signal intelligence is called SIGINT and human sourced intelligence is called HUMINT). As far as the NSA knows, these instances were few; but the reality is the NSA doesn’t know because these violations were not uncovered by NSA tracking but by employees who confessed. In other words, there is no strong oversight that catches every instance of a scorned NSA employee snooping on their object of obsession or affection. Given the security clearance requirements, it seems quite unlikely that a whole bunch of NSA employees were dating people whose profiles were very easy to confuse with potential terrorists and that’s why these gross violations weren’t detected. Ordinary people’s data was snooped on and the NSA was no wiser for it until the employees ‘fessed up.

Other parts of the NSA data management system also seem creaky. When ProPublica filed a FOIA request with the NSA to search for emails between its employees and a TV channel that had aired a puff piece documentary on the NSA, the response it got from the NSA was that they did not have the technical capability to bulk scan their email as their system was “a little antiquated and archaic” and thus they had “no central method to search an email at this time with the way our records are set up.” (Such bulk searches are fairly common in the corporate world). In case you are wondering how a top level agency that deals with digital data can have a creaky system, this is often the result of being an very early adopter—they likely had a burgeoning intranet before most anyone else, and this is probably still run on some old creaky, antiquated code the author of which has long retired, and which has been patched repeatedly through the years to the point that nobody can really touch it anymore short of scrapping it all and starting over.

Given this reality, can anyone truly deny the possibility that a malevolent Snowden or a foreign government that might have placed a sysadmin mole into NSA has NOT scooped up personal information on influential and important politicians and is now (or will in the future) blackmailing them? Can we be sure that there is not already massive “unauthorized” snooping at lower levels? There is already a whistle blower who claims Barack Obama was wiretapped by the NSA along with a whole number of high-level US politicians. The possibilities for mischief—ranging from the small potatoes cases of scorned lovers to significant political and personal blackmail and deep privacy violations—is vast. And the scary truth is that nobody really knows for sure what has already happened, nor can anyone claim or guarantee that it won’t. Not the pundits, not the NSA itself, and not any individual sysadmin because, as I’ve already argued, digital unknowns can stay buried forever if tracks are covered with expertise and root access.

The NSA has responded to criticisms of its data management practices by claiming that it was going to fire 90% of its 1,000 sysadmins and automate their duties. If their duties were so easily automatable, there wouldn’t be a need for 1,000 of them. It also said that it will implement “two-person” administrative requirements for the most secret documents, which merely raise the bar but do not prevent silent, significant abuse of very sensitive data. (In any case, it never said that the two person rule would apply to accessing personal data, just sensitive documents that can be leaked). The NSA can also take other steps to make the data a little more secure, but given the realities of the size, complexity and nature of the data, they will always need “ad hoc” management of their surveillance system, and all such “ad hoc” systems are “hackable” from the inside.

So, next time someone tells you that there have been very few abuses of the NSA’s massive data trove, ask them how they know because they are claiming to know something the NSA itself doesn’t. And these days, that’s not much.