Stop Calling Them “Logs”

4 min readDec 20, 2015

I’ve been reading a lot of nonsense from self-proclaimed “database experts” about what the “logs” from the NGP-VAN story indicate. What all these “database experts” don’t seem to realize is that these aren’t logs.

The above screenshot is, at first glance, an audit trail of actions taken by a particular user within the NGP-VAN system. To a layman, this appears very much as one might think a system produced audit trail would. It has all the features we associate with technical mumbo-jumbo, including highly specific timestamps (down to the second), consistency among description (searches preceded with “Searched:”), etc.

This sample alone, however, gives us pretty clear indications that this is not an actual log. What you’re seeing in Exhibit A is, at best, a human interpretation of a more technical log, at worst, a complete fabrication. As with most things, the reality probably lies somewhere in the middle. That is to say, I’m not claiming that these are complete fabrications, but rather, likely a reduction of information contained in multiple logs which are not necessarily as clear about what is going on as are the descriptions we see here.

As professional server administrator for a number of years, I can assure you that no logs, as produced by software, say things like “Attempts to run a search. At this point cannot access page sections.”

It’s also interesting to note the timestamps on the “logs.” Clearly, we see that seconds are being displayed, why then are the majority of entries rounded to the minute? More, why are what should be very distinct, clear, and atomic actions rounded to the minute? For example, at 10:49:00 we see that the user “Searched: HFA Primary Priority 9 - 10.” If this were a single atomic action (like logging into the system), we would not expect then to see a similar entry “Searched: HFA Primary Priority 0 - 1” exactly two minutes earlier. Why is there, according to the timestamps, 0 seconds between a search and saving the list? Yet two minutes precisely between searches?

What this indicates is that an entry like “Searched: HFA Primary Priority 0 - 1” is actually probably represented by a number of entries in an actual log, each with much more specific timestamps. This collection of actions is then summarized by a human interpreter in the information we see in Exhibit A.

In short, this is not a log.

Exhibit B provides us with yet more evidence to this. In addition to the obvious change in timestamps (seconds are no longer shown at all), we also see the system (supposedly) using the word “innocuous” to describe several actions towards the bottom. Are we supposed to believe the system describes various events in terms of what is harmful vs. what is not?

Exhibit C is the first example we see where timestamps are completely mixed. A single entry contains no timestamp at all, while another one avoids reporting the seconds, despite every other instance having done so. Again, we see what should be very simple an atomic actions occurring at precise minutes and precisely 1 - 2 minutes apart.

The system also apparently attempts to make guesses as to what’s going on citing an “apparent” session timeout. It also aggregates future information at precise times. For example, at 12:14:55 it claims that the user logged into IA, “but did not touch those folders or lists.” Perhaps the user touched them at 12:14:56?

Lastly, Exhibit D represents an inconsistency which I’m not even sure can be explained by mere interpretation. According to this “log” at 10:59:00 this user is created by Juretsky. Presumbably, Juretsky’s “log” is that of Exhibit A which is the only “log” showing the creation of users. However, there’s a bit of a problem. According to that “log”, the only two users created were created at 11:01 and 11:07, not at 10:59.

In conclusion, while we may be able to glean some information from these screenshots, the primary piece of information that a “database expert” should walk away with is that these aren’t logs. To what extent the information must be interpreted or has been doctored before being released, we cannot know with any level of certainty because we don’t have the logs from which they may have been derived.

Stop Calling Them “Logs”

Written by Matthew Sahagian