“black and gray laptop computer turned on” by Markus Spiske on Unsplash

I Feel Like A Total Hacker

Parker Addison
Aug 31, 2018 · 4 min read

Last week I got a taste of how cool it is to be a Hollywood-hacker, and how challenging yet rewarding it is to do what I assume is closer to actual web-hacking.

Hacking on the silver screen

First, I learned how to use Selenium with Python and instantly felt like I was re-watching every single movie scene there is that involves “hacking”. In short, I controlled a browser using nothing but the command line! This meant that as I typed code and pressed return on one half of my screen, a window popped up and interacted with the web on the other half of my screen, and I didn’t so much as move my cursor.

Selenium is a powerful web browser automation and testing tool; you can write a set of instructions and then those actions will be carried out in a web browser. However, as is always the case with programming, those instructions need to be very specific. The easiest thing to do after launching the webdriver (literally the driver that runs the browser) is to navigate to a webpage using one command and a full URL address (including the “https://www"). But after that, you need to know the source code of the page in order to do much else. Unfortunately, unless you can magically sense the css-id of an element or rely upon common naming conventions, this means each time you want to click on a button or enter text into a field you need to first look at the HTML behind the webpage and examine that element. Oh well.

I learned Selenium to help with scraping data off a page that was the result of some form inputs and dynamically loaded content. I couldn’t access the content through pure HTML requests, so I had to submit the form and let the resulting page load in order to get to the data I needed. Sure, I had to carefully examine the webpage every step of the way in order to tell the script what to do — down to the characters in the element id’s — but I still got a little kick out of every time my browser magically opened and filled out an entire form.

Needless to say, I thought it would be great to feel like a hacker every time I searched something on Google, so I got really fast at typing the full google URL, I memorized that the id of the search bar is #lst-ib, and I figured out that Python's lovely special characters such as \n and \b do in fact work to submit my search or to backspace if I spelled something wrong. After that, all I needed to do was change my terminal theme to that green-and-black Matrix aesthetic, and I was set to be cast in the next Russian espionage movie.

Revealing what can’t be seen

The second feat is much more inline with what I assume a real web-hacker might do — I opened up Firefox’s web debugger and sifted through the source code of a page in order to access a database that was otherwise hidden. Funnily enough, I actually did this in order to avoid the need for Selenium! Loading a page, then scraping the content, then reformatting the scraped data into a usable structure seemed a bit inefficient. However, I figured that if the page was generating content, then it must have a supply of data, and perhaps I could access that data directly. After an initial look at the network requests and then a quick look at the source code of the page, I found just what I was looking for. Following a very helpful comment, <!-- start of dynamic data load case -->, I found an extended URL that displayed all the data I needed in a json format!

I ended up getting lucky here. This was the first site I had to scrape, and coincidentally is the one site behind which I absolutely love the code — there are comments, there’s whitespace, and there are a bunch of meaningful variable names. It’s beautiful. This was not the case for the other sites I visited, some of which I’m still trying to find a similar backdoor somewhere in the code. There was even one site whose core HTML file consisted of a filled out <head>, an open <body>, and a rather large handful of <script> calls which in turn assembled the entire page.

Even though this version of “hacking” is nowhere near as glamorous as green ASCII running down the screen with windows popping up and disappearing, finding that backdoor to the database still felt like a total “I’m in” moment. Ultimately, this rather simple achievement was much more rewarding than the first — and likewise much more practical, as web security is an increasingly important topic. As I pursue research projects and other forays into data science, I’ve realized that the simple act of collecting and organizing the data is extremely important (and often challenging) in its own right. I hypothesize that my browser’s “developer tools” will become some of my favorite tools for gathering information off the web.

— Parker 2581e73

* Just as a little disclaimer: My data scraping is entirely for academic research purposes and not for anything commercial. I found it a bit ironic that I ended on a note about web security when my ultimate goal is to circumvent it!


Originally published at pgaddison.com.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade