FTP servers are a grab bag but can even expose classified information

Google finds and makes copies of these servers and you can ask Google where they are.

This is excerpted some from my old Wordpress blog with updated information, both new search tactics and changes to searching Google implemented since the 2013 post. I’ve edited and added throughout so if the voice seems to change psychotically we can all at least pretend it’s just creative growth.

People treat personal servers like home office filing cabinets, with a drawer at the bottom for stuff, to deal with later but make sure is kept. I’ve found it is very common that professionals in sensitive industries upload entire if not multiple laptop hardrives, external HDs, USB drives, etc.

I’m not interested in stealing anyone’s password and have only examined system folders in a few cases for forensic purposes (I use the term loosely), but these uploads typically include from the C:\ directory down and I imagine a knowledgeable actor with dubious intentions would be able to extract compromising information.

In folders below Users/ (in the case of computer HDs), I’d find more or less what’s on your computer right now. Sometimes military or law enforcement professionals are the owners or are users of these servers.

The thinking behind this may be that without a recognizable domain or submission to Google, these private servers won’t show up in searches, but Googlebot is opt-out.

Sensitive confidential information including classified documents can be found via this search method, most likely in combination with painstaking browsing through folders and subfolders. Google’s indexing of files on FTP servers seems to be more hit-and-miss than for regular web pages.

*I have never found marked classified documents but have found deliberative documents and operational data with presumably classified content.

My rationale for first attempting is the theory that while many allow anonymous log-in and even more are indexed by Google, FTP servers are used more for uploading and downloading, and storing, files — as opposed to HTTP/S which are meant for viewing pages and content, and typically house more office-type documents (and more frequently software). An FTP will often serve downloads for an HTTP site.

Limiting your searches to FTP servers also significantly restricts the overall number of results to be returned, which is a double edged sword. Choice keywords combined with a query that tells Google to bring back files that have inurl:ftp:// but NOT -inurl:http:// -inurl:https:// makes a great starting point.

I learned recently that site:ftp.* also works.

CAPTCHAs

A bog one encounters before long using this method is that Google will present you with a “captcha.” Google suspect you of being machine in nature.

Many, many websites use captchas and pretty much everyone who uses the Internet has encountered one. The basic idea behind a captcha is to prevent people from using programs to send automated requests to a webserver, they are a main tool in fighting spam by thwarting bots that mine the internet for email addresses and other data, and which register for online accounts and other services en masse. The Google captcha presents the user with a natural language problem, a picture of letters to complicate computer reading.

Unless you are in fact a machine (sometimes you’re a machine, in which case there are solutions), easily solved; however, instead of returning me to my search after answering the captcha, Google was sending me back to the first search page of my query (forcing me to somewhat start the browsing process again and to encounter another captcha).

I labeled this limitation the Google Governor, as it seemed to throttle searchers’ access to the API for high-powered queries.

UPDATE

There’s a new Captcha and I’ve found, because my searching is a tad excessive, that I get more and more of them until Google shuts me down, and after solving I get an error page at ipv6.google. The solution is to clear my browser history (as a side-note, Google seems to actually see through web-based click-through proxies, can’t prove).

Google’s new all image captcha

On with searching, instead of adding terms, because your results are already restricted, try operators that omit unwanted noise or by subject matter.

Eliminate a recurring irrelevant website with inurl:”websitenameistupid.com” or -site:domain.random or eliminate results by “false positives” with -”privacy” -”foia” and so on.

You might further restrict your results by omitting sites in foreign domains by TLD (especially useful with short acronym-based keyword searches): -site:cz -site:nk.

Or limit it to (site:com OR site:net OR site:info OR site:biz OR site:org OR site:gov OR site:us).

Actually not “or,” rather, try all. Do so systematically with each term and that’s the whole general idea. Combine your research with other techniques I’ve described on my blog, and with your own mnemonic and inference devices, and some fuzz, because Google doesn’t index perfectly.

The search inurl:wp-content/uploads combined with key terms finds almost exclusively office-type documents hosted on Wordpress sites. This is a great way to find whole directories with information of interest automatically dated by url according to when uploaded.

intitle:index.of and typically “parent directory” finds index listings and files with “index of” in the name as well as spam and worse with deceptive titles and text. The more description on the results page the more suspicious, is a fair rule.

Create records of FTP servers or otherwise log searches, this can be an important reference in lots of ways. Save documents in folders with as much of a system as you can — I don’t, usually, I alter my searches too frequently, downloading many documents at once after several pages of “ctlr-clicking” or directly downloading files.

As a not bad shortcut I index documents using dtSearch, which is expensive and I’m not sorry but won’t pretend I didn’t torrent. It also creates docx files of search results.

Visualize your information somehow. I use mind maps and some software helpers, self-starters like Maltego, or programs like BaseX that visualizes XML in a way that is both cool-looking and can reveal information. There’s an image below for an example of BaseX.

I also use network tools like Nmap, sometimes even penetration testing OSs on USB (for years I kept an old Xfce Backbox on my laptop, using Windows 7 until it croaks for video editing software, have yet to find a really stable equivalent to Premiere or Final Cut — almost entirely a proprietary kink).

You’re not trying to break in to any houses, just talk to Alexa and get her to bring you some stuff. That’s a rough metaphor. The US DoJ has prosecuted people for less than I do on an effectively routine basis. I perhaps imprudently insist on continuing on the basis of its clear legality.

In tweeted form a few more, for variety and for people who understand this better. I’ve actually tweeted about just the filetype: operator a lot.

Some specific examples of what I have found:

An FTP server for an ALPR system which contained an extract of the terror watch list, classified information.

Much of what I learned from the ALPR server was about the structure and mechanism itself, and the data within the software, which could be extracted from, for example, .inf files (for the mobile gateway application) with a Linux application called MonoDevelop. I double-clicked to expand components over and over, the equivalent of getting lucky smacking a tree with a laser printer.

BaseX, as mentioned above, and an xml file exported of my Filezilla queue.

BaseX looks cool and lets me do stuff like pick apart configuration files, and browse the entire contents of an FTP server without having to download it, by adding it to the queue in Filezilla and exporting as xml, and loading it in BaseX as above.

The only limit to your use of software is utility; I use many programs that I may not even be capable of employing for their basic intended purpose. When you’re using software to visualize, the power is in arranging and rearranging and generally sifting to see connections, common traits and groups, it’s necessary to navigate and manipulate graphic data to make useful inferences.

I found a server used by an IBM computer vision researcher, which held over 5 years worth of documents concerning installation and deployment of smart video surveillance in Boston, Chicago, New York City’s Domain Awareness Center, and elsewhere, and other information.

There is so much of both office document and software nature in this cache that I have only carefully examined a small fraction of it since I came across it in late Spring 2013.

A Denver security and defense company’s FTP server was wide open (the company, or a subsidiary, perhaps parent, I don’t recall, was implicated some time ago in a TSA exam cheating scandal at SFO).

Apparently a shared storage server, with folders named for some employees who did not house any files in them, only one executive totally abandoned caution when storing files there. Under Associates/Guidry/ a Byron of that surname had apparently uploaded everything he could find. One folder was titled “Guidry PW and Signature,” it contained dozens of files with passwords, and held .crt, .pem, and .key files and VPN clients; another titled “admin10” contained the file “passwords.xls.”

That document contains the log-in credentials for bank accounts, utilities, and government portals. This particular document is of more interest to the penetration tester; for our purposes it serves as a meter for the sensitivity of the gigabytes of files that accompanied it on the server. The recklessness of the uploader exposed internal details of dozens of corporations and their business with government agencies.

Passwords.xls blurred and distorted

Some files from that server that are uniquely informative: a FOIA request’s full lifecyle:

I’m willing and sometimes happy to assist people with undertaking research, I also like food and accoutrements when I can get them, if anyone wants to hire me for a job or their newsroom (kenneth @ networkedinference.com). I hope what’s here is useful to the reader and at least a little entertaining.

UPDATE