Open-Source Intelligence (OSINT) Reconnaissance

*The following is a single chapter contribution (Chapter 3) to the Peerlyst community-sourced eBook titled “The RED Team Guide”*

Conversations in Social Media; image courtesy of Ethority, 2014

Whoa, slow your roll cowboy! Before we can get to the shell-poppin’make sexy-time’ (joke, laugh) hacking adventures that Red Teams have come to be known for, there is some homework to be done. A professional pentester never goes into a job without first studying or doing ‘homework’ on their target. A critical first step, gathering information about a particular target within the scope of the operation allows an attacker to find potential vulnerabilities and weaknesses in an organization’s defense system that may be exploitable, be that physical, social engineering, logical or a combination of all three. Information is the new exchange commodity and such, there is literally a plethora of information about almost any subject freely available on the Internet. So what exactly does OSINT mean?

Open-source intelligence (OSINT) is using publicly available sources to collect information (i.e., intelligence) about persons or entities from a wide array of sources including the Internet.

OSINT is usually performed during the Reconnaissance phase of hacking and pertinent information collected from this phase is carried over into the network Enumeration phase. Due to the vast amounts of information available to sift through on the Web, attackers must have a clear and defined search framework as well as a wide array of OSINT collection tools to facilitate this task and assist with processing the data; otherwise they risk getting lost in the overwhelming sea of information that has become the Internet. OSINT reconnaissance can be further broken down into the following 5 sub-phases:

Phases of the OSINT Process; image courtesy of Chiheb Chebbi
  • Source Identification: as the starting point, in this initial phase the attacker identifies potential sources from which information may be gathered from. Sources are internally documented throughout the process in detailed notes to come back to later if necessary.
  • Data Harvesting: in this phase, the attacker collects and harvests information from the selected sources and other sources that are discovered throughout this phase.
  • Data Processing and Integration: during this phase, the attacker processes the harvested information for actionable intelligence by searching for information that may assist in enumeration.
  • Data Analysis: in this phase, the attacker performs data analysis of the processed information using OSINT analysis tools.
  • Results Delivery: in the final phase, OSINT analysis is complete and the findings are presented/reported to other members of the Red Team.
**DISCLAIMER** Before we go any further, I would be remiss not to mention that while performing OSINT is legal, using the OSINT tools and techniques outlined here are intended to be used in conjunction with sanctioned Red Team activities as part of a contracted service with the permission of the target. When conducting pentesting, always protect yourself with a proper contract that is signed by the target company/entity/individual allowing you permission to “hack” their organization for the purposes of pentesting/vulnerability assessment that contains the scope of the operation. Exercise due caution accordingly. You have been warned!

OSINT Tools

There are a plethora of OSINT tools available, some of which are free and others can cost a pretty penny. While it is outside the scope of this chapter to cover every single OSINT tool, we will cover a few of the more popular tools that you may find useful for Red Team ops. Performing OSINT is about taking the little bits and pieces of information that you are able to extrapolate about a particular person or entity and pulling the thread on it by running that information through OSINT tools to see what more can be discovered.

Google Searching & Dorking

For instance, let’s say you’ve been hired to pentest a company that calls itself Exploration Media Group (*for example only) and you perform a Google search that results in top result of the website domain name: www.explorationsmediagroup.com. You navigate to that site by clicking on the link and discover at the bottom of the site, there are a few website links they’ve titled as “Other Notable Web Properties.” You click on the first option: www.theworldsworstwebsiteever.com and you want to find out some more information about this site, it is a truly heinous webpage by the way (1980’s flashbacks). Should you decide to follow this lead further down the Internet rabbit hole, how can you find out more information about this site?

One method is to use what is known as “Google Dorking,” also known as Google Hacking, which are advanced search strings used within a Web browser. Essentially, we are using the Google web crawler search engine to hack with. This is an example of how hackers will take technology and turn it upside-down to make it work in ways it wasn’t necessarily designed to. Play around with these Google Dorks a bit to learn what type of results you can get.

List of simple Google Dorks; courtesy of Techworm

We can then enter Google Dork commands directly into the browser such as:

site:www.theworldsworstwebsiteever.com ext:(doc | pdf | xls | txt | ps | rtf | odt | sxw | psw | ppt | pps | xml) (intext:confidential salary | intext:”budget approved”) inurl:confidential

While this specific query will not return any results, if we make it more generic by adding a Boolean search operator such as “OR” then we can see all of these types of results:

site:www.theworldsworstwebsiteever.com OR ext:(doc | pdf | xls | txt | ps | rtf | odt | sxw | psw | ppt | pps | xml) (intext:confidential salary | intext:”budget approved”) inurl:confidential

Whois

Given the above example, you could use one of several WHOIS tools to resolve the DNS domain name of www.theworldsworstwebsiteever.com and you’ll find that you get some information such as Registrar info (godaddy.com); when it was created (2008–05–14); and the ICANN query yielded 2 server names (NS1.EXPMG.NET & NS2.EXPMG.NET). However, you notice that the IP address is missing. Hmmm? Why is that you wonder? This is because the WHOIS sites consider this ‘dangerous’ information that they protect. In other words, they want to make you work for it. But you’ve got this so you keep plugging along, there’s plenty of other ways to get the website’s IP address.

Using the WHOIS.net tool for website domain name OSINT
Using the WHOIS.icann.org tool for website domain name OSINT

Command Prompt

Being a hacker, you likely prefer using the command prompt to GUI tools anyway. Using either an xterm (Unix/Linux), a command prompt (MS-DOS Windows), or a PowerShell command prompt (MS-DOS Windows), you can perform a similar query of the <www.theworldsworstwebsiteever.com> website using the command: tracert www.theworldsworstwebsiteever.com PowerShell, by the way, is a lot more powerful a tool for system administration than a simple MS-DOS command prompt. If you aren’t proficient in PowerShell you may want to work on that.

Using the PowerShell tracert command to determine website IP address

We now have an IP address that we can run Nmap scans against. You could also take that IP address and run it through another OSINT tool that specifically enumerates IP addresses such as Onyphe:

Onyphe IP address scan results

As you can see, the Onyphe search resulted in a lot of useful information that we can use later in the Enumeration phase.

Spokeo

People search engines such as Spokeo and others will crawl through social media sites, whitepages, email addresses, public records, criminal records, school records, and many other types of publicly available information sources. If you have a specific name of a person within the target organization (e.g., Explorations Media Group) such as a fictional CEO named “John Jacob Jinkleheimer Smith,” Spokeo’s search engine will return several leads that you can further narrow down with search parameters (see image below).

Spokeo people search engine

Sites similar to Spokeo are Family Tree Now, Pipl, Thats Them, IntelTechniques, ZoomInfo Directory, Zaba Search, USSearch, Snoop Station, Radaris, to name but a few. There are many, many more to try out. Now you might begin to see why the collection of Personally Identifiable Information (PII) and selling it to interested third-parties is such a lucrative business as well as just how difficult it can be to keep your own private information off the Web. As a Red Team member, you should be performing these same types of queries on yourself to ensure your private info or at least any potentially damaging information is not posted for everyone to see.

Check the OSINT Framework for a more complete listing of OSINT people searching tools as well as other types of OSINT tools. You can also perform basic searches of a person’s name in Internet search engines such as Google, Bing, and Yahoo.

Shodan

Shodan is a popular OSINT tool that is specifically designed for Internet-connected devices (i.e., including ICS, IoT, video game systems, and more). You can use the Shodan GUI off the website which presents some added functionality in that you can view live camera feeds and visually depict geographically where vulnerabilities are located throughout the world. You can also perform the same types of scans that Shodan uses to enumerate IP addresses from the command line using the Nmap scanner tool when you get into the Enumeration phase:

nmap –sn -Pn -n — script=shodan-api –script-args ‘shodan-api.apikey=XXXXXX’worldsworstwebsiteever.com

whereas -sn disables the port scan; -Pn skips host discovery and doesn’t ping the host; and -n skips DNS resolution.

Exploring the Shodan search engine

Datasploit

Datasploit is another OSINT tool found within the Kali or BlackArch Linux OS distro that can be used to collect data on a particular domain, email, username, or phone number that you are targeting and then organizes the results coherently in HTML and JSON reports or text files. Datasploit will attempt to find credentials, api-keys, tokens, subdomains, domain history, legacy portals and more.

Datasploit OSINT tool; image courtesy of KitPloit

Maltego

Maltego Community Edition (CE) is a free OSINT tool from Paterva with quite a bit of functionality for analysis of real-world publicly available relational information. Maltego can footprint Internet infrastructure used on social networking sites and collect information about the people who use it. Maltego will query DNS records, whois records, search engines, social networks, various online Application Programming Interfaces (APIs) and extract metadata that is used to find correlational relationships between names, email addresses, aliases, groups, companies, organizations, Websites, domains, DNS names, Netblocks, IP addresses, affiliations, documents, and files.

The Maltego OSINT tool; image courtesy of Paterva.com

Social Media

Social networking sites like LinkedIn, Facebook, Peerlyst, Twitter, Google+, Instagram, Snapchat can be a gold mine for information seekers. If you think about the types of personal information that these sites ask users to input and the type of sometimes very personal content that users often post to social media, it should be one of the first stops in the OSINT phase of Red Teaming. To collect information on LinkedIn for example, you may want to check out ScrapedIn. For Facebook there is StalkScan; for Twitter there is GeoChirp, Tweepsmap for location data and Tinfoleak Web for analytics. Dating sites like Match.com, eHarmony, Plenty of Fish, Tinder, OkCupid, and Ashley Madison are some other potential gold mines of information that can also be checked for particular target names and gather more information. With people searches, it is really only limited by how far you want to take it. You can pay for information on many of these sites to drill down further and attempt to get more information, but that often is not necessary if your target is a particular company or organization.

Automater

Automater is a URL/Domain, IP Address, and Md5 Hash OSINT tool aimed at making the analysis process easier for intrusion analysts. Given a target (URL, IP, or HASH) or a file full of targets, Automater will return relevant results from sources like the following: IPvoid.com, Robtex.com, Fortiguard.com, unshorten.me, Urlvoid.com, Labs.alienvault.com, ThreatExpert, VxVault, and VirusTotal.

Automater OSINT tool; image courtesy of SecuirtyOnline.com

For OSINT reconnaissance of the Deep Web, there are a multitude of search engines that can be used such as PubPeer, Google Scholar, Cornell University’s arXiv.org, and Harvard’s Think Tank Search. With Deep Web searches, you’re mainly looking for articles, whitepapers, and studies published in academic journals and professional publications.

Cornell University’s arXiv.org for Deep Web OSINT

For OSINT reconnaissance of the Dark Web, search engines such as DeepDotWeb, Reddit Deep Web, Reddit DarkNetMarkets, Hidden Wiki, Core.onion (from Tor browser), OnionScan and Tor Scan may provide some useful information. With the Dark Web, however, there will be some sites and services that are by invitation only, which can make finding them very difficult because they won’t appear on a normal Dark Web search. Network traffic pattern analysis from within the Dark Web is the only real way to find these types of sites. Remember also that Tor is not the only entrance to the Dark Web, there is also Freenet and I2P.

Using the OnionScan OSINT tool to scan the Dark Web; image courtesy of Mascherari.press

OSINT collection is only limited by your imagination. You can take any number of these tools or search examples and tweak them to your needs and get even better results. We have only covered a select few OSINT tools designed to give you a taste of what is out there. There are so many more tools to discover and experiment with, many of which come included in Kali or BlackArch Linux distros. At the end of your OSINT collection, you should have plenty of information to enumerate in the next phase. Happy hunting!