Python For Pentesters: Beyond the basics. Part1

Amine Amhoume
6 min readJun 14, 2020
https://www.artstation.com/artwork/qqALn

You feel lost, right?

Because you’ve never thought about what’s going to be next after you learn the basics.

I’ve been there. I felt so the day I woke up to realize that I have finished the python basics course.

Back in 2016, after the first week in college, I realized that I have so much free time in my pocket so I decided to learn Python.

I searched for a course on youtube and I spent about ten days learning the basics. I was happy. I’ve done something that will get me close to my goal ( becoming a pentester).

Then I was like… Now what?

Oh! I should look for small projects to work on.

They were so boring.

Creating a calculator, a password generator, and others. I am sorry. That was so boring. That’s not what I wanted.

I wanted to create pentesting scripts. I was lost where to start, thus I wasted so much time looking for things I know nothing about.

That’s why I decided to write this article. I know there are many others like I used to be. I want to help. I don’t want people to feel stuck like I did.

I will introduce you to some python libraries professional pentesters actually use to create their own tools and solve CTF challenges plus some resources to learn them.

By the end of this article, you will have the knowledge to create your own tools such as a directory brute forcer and others deal with basic web challenges.

before we proceed, I want to state that this is the first part of many other articles. There’s so much to learn. If you don’t want to miss the upcoming articles, I invite you to subscribe to my newsletter here:

https://mailchi.mp/f37eb4abdeac/pentesting-thoughts

Don’t worry I won’t overflow your inbox with boring stuff. I hate boring newsletters too.

Let’s go with the first component.

Regular expression.

Re module is a lifesaver when it comes to searching for specific information. And to briefly explain what are regular expressions I will borrow this one-liner explanation from w3school:

A RegEx, or Regular Expression, is a sequence of characters (/*{}()[]-^) that forms a search pattern.

See these tables:

w3school
w3school

RegEx can be used in almost any programming or scripting language.

The problem they present is that they seem to be very complicated to understand at first, especially for beginners.

Python has a built-in package called re and it’s very useful when dealing with regular expressions. And here are some basic functions to work with:

w3school

The code below takes any file and fetch for valuable content including IP addresses, JWT, subdomains, and email IDs.

First, we import the ‘re’ module.

In lines 6 and 7, we open a local file and give its content to the variable ‘Text’.

From lines 13 to 16, we have a variable for each one of the assets we are looking for. We use the findall(pattern, data) function to find all the matches based on the given pattern.

In the first one (IP), we are looking for IP addresses so we use the pattern:

(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})’

It’s easy to understand.

Look at the table above. Since IP addresses are constructed of four parts separated by dots \.

We are just repeating the sequence [\d]{1,3} four times:

The \d means return digits only while {1,3} specify the number of occurrences.

[] is for characters and () is to gather the results in a group.

For emails, we used the pattern:

([\w\.-]+@+[\w\.-]+\.+[\w]{1,5})

Emails are joined with a @ so we used the \w which returns a match where the string contains any word characters, \.- a dot or an upper score symbol. And we do the same for the domain.

For the purpose of not making the article longer than it should be, I will let you think about the other two ones. Use the tables above.

and if you find any difficulties, here’s a cheat sheet to help you.

Usually, the findall() function returns results as a list.

So from 25 to 31, we use the for loop to print the result as a simple string.

I gave the script a simple file containing some IPs, emails, JWT, and subdomains; and the results were like this:

Tip: Before you use the ‘re’ module, you should first know what exactly you are looking for, then useon the tables above to construct your own pattern.

The BeautifulSoup of Requests.

“Requests” is one of the most loved libraries ever.

As the official documentation says, it’s an elegant and simple HTTP library for Python, built for human beings.

The BeautifulSoup is a Python library for pulling data out of HTML and XML. It lets you cook a delicious HTML and XML soup.

When combined, the two libraries can create a strong weapon to deal with almost everything on the web.

Requests library has a friendly documentation which you can access from here.

Also, you can access documentation for BeautifulSoup here.

Now check the codes below:

The bunch of functions above solves some CTF challenges. I’ve included comments on almost every line for you to understand what’s going on.

You should separate the functions each in a file.

Notice that you need to be logged-in on the root-me platform to solve the challenges, otherwise the script won’t work.

If you go through the documentation you can see that we can do almost everything with requests.

But what I am trying to prove here is how we managed to combine between the three modules to create our own tools.

While requests take care of making HTTP connections to the servers and pulling out data from them, BeautifulSoup lets us fetch through HTML, and RegEx lets us search for very specific things.

The mindset.

When you learn the basics of programming language, you feel stuck and don’t know what to do with them because you need to move on and do something. Python is like this tool storage.

So in order to craft things, first you need to know what problem you want to solve. And in order to have a problem to solve, you need to engage yourself in something, for example, a CTF.

That’s how it works.

See you in the next article.

--

--

Amine Amhoume

Penetration tester | security researcher | sometimes I write stuff.