Content Discovery: Automated and Manual

Photo by Agence Olloweb on Unsplash

Recently I’ve completed TryHackMe’s Content Discovery room as well as the third challenge of their Advent of Cyber 3(2021) event and here’s what I’ve learned!

What is Content Discovery?

Content, in the context of web applications, refers to files, folders, pictures, pathways, and website features that were not intended for public access. For example: backup files, administration panels, and login portals intended for employee use only.

There are multiple different ways to discover content on a web application!

  • Automated
  • Manually
  • OSINT(Open-Source Intelligence)

In this article I’ll be discussing two. That is, how to discover content manually and through the use of automation tools.

Manual Content Discovery

A great place to start discovering content would be the robots.txt file. Robots.txt is a document that communicates with search engine crawlers and tells them which pages they are and are not allowed to show in search engine results.

The robots.txt file of a website can be viewed like so:

https://example.com/robots.txt

Reviewing the contents of a robots.txt file will give you a great list of locations the website owners didn’t necessarily want discovered!

Sometimes HTTP headers can reveal useful information like the web server software and even the programming language in use.

Using the following command against your target website will output the headers! (If you don’t have curl installed on your linux distro you can learn more about it and how to install here.)

user@machine$ curl https://example.com -v

Automated Content Discovery

Automated discovery means using tools to discover content as opposed to doing it manually yourself. Using automation tools for this process allows you to make thousands, or even millions of requests to a web server at a vastly quicker pace than you would be capable of doing manually. Dirbuster is an excellent tool that can be used to automate the process of file and directory discovery!

Here’s how it works:

Dirbuster takes a word-list containing the names of the files/directories you’d like to search for, and then makes requests to the web server checking to see whether it exists on the website.

You can create your own wordlist that contains all the things you’d like to search for. For example, let’s say we create a .txt document titled wordlist with the following contents:

admin/

docs/

config/

If you provide Dirbuster with the URL of the website and the full path of your wordlist, Dirbuster will scan your target website for the folders listed within your wordlist! For example:

usr@machine$ dirb https://example.com /home/documents/wordlist.txt

You don’t always have to create your own wordlists! Here is a collection of open-source wordlists! There you can find a wordlist for default credentials as well as frequently used usernames and passwords!

This is what I’ve learned so far about Content Discovery! Thankyou for reading and I hope you were able to learn something too! You can get some hands-on practice by following TryHackMe’s Content Discovery Room!

--

--

--

Cybersecurity Engineering student and Information Security Content Creator from Alabama.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Automate iOS testing via bundle identifier

Building Aseprite from Source for MacOS Catalina 10.15

When emails attack — how we accidentally spammed our own users

The World of Open APIs

Dynamic Profile Pic NFTs Based on your Genome — Where, Why, How?

How to Protect Your Organization From Auth Vendor Lock-in

A Greedy Algorithm for Job Sequencing with Deadlines and Profits

A Beginners Guide to Understanding Microservices

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Kaorrosi

Kaorrosi

Cybersecurity Engineering student and Information Security Content Creator from Alabama.

More from Medium

Understand SSTI in 3 minutes

Research on XML eXternal Entity Injection (XXE)-Cyber Sapiens Internship Task-10

XSS Discovery and Exploitation With BurpSuite

Breaking Parser Logic Gain Access To NGINX Plus API — Read/Write Upstreams.