The (Reg) Ex Girlfriend

Hey There!

CSAW CTF 16 Prelims were held around the October of 2016 and I took part in it as part of our team d4rkc0de. We qualified for the regional finals.

In this post I would be discussing how I went about solving this question that was there in the CTF and explaining the various components attached to it in a very simple, layman-ish approach. Even if you do not know anything about CTFs or Regular Expressions (often abbreviated to Regex), don’t worry, we’ll cover all of that.

What are CTFs?

Capture The Flag Competitions, commonly abbreviated to CTFs, are a special type of information security competitions which (usually) pose challenging security problems in the various sub-domains of:

  • Web — Various attacks possible on websites and online web services
  • Crypto — Different ways of breaking a cipher or a current state-of-the-art crypto system
  • Binary — Reverse Engineering and finding entry points for exploiting binary files like .exe and Linux based Binary files
  • Forensics — Hunting and scavenging through various forms of digital data to get where the critical information is

… and other such related fields.

CTFs provided a nice platform to understand the common fallacies in the infosec scene and practice ways to exploit them so that one doesn’t make the same mistakes again.

If you want to read more about what they are, how they are conducted, etc. check out this bad boy on CTFTime.


There was this Problem titled Regexpire having the following description:

I thought I found a perfect match but she ended up being my regEx girlfriend.
Note: You can't use newlines inside your match.

We were also given this code snippet:

nc 8001

Cool. Okay. So this challenge wants us to connect to the service running at port number 8001 of host

Easy-peasy. We just run the same snippet that was given earlier to connect through my terminal.

After we connect in, this is what I get as an output on my terminal (it may be different on your system, you’ll understand why pretty soon):

Can you match these regexes?

Alright, I think we have figured out the problem for now. The system just wants us to match a set of regexes it is going to send our way. There’s also some funny looking string below the statement.

Well, that’s a regular expression string. To get to know more about what they are and how they work, check out my earlier post titled Regex for Dummies.

Setting Up

Okay. If we need to solve this question programmatically, we need to set up a script in python which does exactly that. I’ll always be using Python 2.7 unless until explicitly stated.

We’ll be using the telnetlib library to interact with the service.

Cool, this works! For now. Use python 8001 to connect to the service and work on the problem programmatically and write back to the connection.

The read_until method keeps on reading from the output stream of the connection till the time it encounters the string that we passed in as a parameter. This helps us in breaking down the input that is sent to us via the connection.

The write method just writes back to the connection with whatever you provide.

We shall be solving our challenge in the solve method.

The Real Challenge

We haven’t really discussed what exactly the challenge is. The aim of this task is to find a string which matches the regex given to us. This might look tough at the first sight but is actually very easy.

Alright, let’s break the problem into various sub-parts and dive in.

The Character Classes Problem

Okay, so when we saw the regex, we found that there were a lot of \w, \d and \D character sequences. Do they represent something? Yes, they do!

On exploring a little bit on RegExr, we find that they represent specific character classes:

  • \w would match any word character (alphanumeric and underscore) like ‘a
  • \W would match any non-word character (non-alphanumeric and not underscore) like ‘&
  • \d would match any digit character (0 to 9) like ‘7
  • \D would match any non-digit character (not 0 to 9) like ‘m

Hence our function to reduce or solve the character class problem should look like this:

The * Quantifier Problem

All we need to do to tackle any element(s) having the * quantifier in front of it, is to just remove that quantifier. This is because * represents zero or more times in regex land. This is achieved as follows:

The + Quantifier Problem

To get past the + quantifier is very easy. Since the quantifier means that the element should occur at least once, and since the element is already occurring once, we can just remove all the + from the string and get past them:

The [] Group Problem

The [] group simply means pick one out of all the characters in the set. Thus, wherever we find [] we just replace it with the first element of the group. The approach is quite similar to what we took for the * quantifier:

The {} Quantifier Problem

When we run the problem a couple of times, we find that usually, the {} quantifier expects us to replicate exactly x number of times, where x is the argument given to the quantifier for eg. x is 3 in a{3}.

To solve this we take the same path as we took with the * quantifier, i.e. search and reduce:

The () Group Problem

Now, we are left with the () group which requires us to pick one out of the complex elements it has its children. The approach is similar to that of the [] group:

Rinse and Repeat People

Plug all of the problem solvers into our solve() function and see the magic unfold. Great! We were able to solve one regex, but now we are challenged with another regex. Just write an infinite loop to read the regex and solve it, let’s see how far does this go.

… and we are done!

If any errors are thrown, just take a deep breath, re-evaluate your code and work it out on pen and paper. Then figure out what changes you need to make and you are good to go! The key to solving problems is to getting dirty with your code. :D

The entire script which solves the problem is:

The flag that you should receive after the problem is solved is:


Leaving Note

Well, this was not an easy problem to solve. It took me a lot of hits and misses to get the code working but it made me brush up on my RegEx which was good. I hope this write-up helps you out in exploring and understanding the core of Regular Expressions. :)

But the question is do I wanna know more about RegEx? ;)

Well, we’ll never find out!

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.