A quick primer on encoding & decoding for security folks

Photo by Anas Alshanti on Unsplash

This blog post tries to be a quick and practical primer on encoding and decoding schemes for security testers. This blog will cover multiple ways to encode and decode data quickly, this could come in handy while doing security assessments and CTFs.

What is encoding & decoding?

Encoding is a technique to transform data into other formats so that it can be consumed by different systems. For example, if you want to send binary data over email you should encode the binary using some encoding scheme which makes it suitable for sending it over email.

Using encoding does not ensure confidentiality, but rather simply ensures that data can be transmitted and consumed properly.

Unlike hashing, the data that is encoded can actually be reversed to it’s original form by decoding it.

Unlike encryption, there is no key involved during encoding data and also the goal of encoding is not maintaining confidentiality of data

Why do we care about encoding/decoding?

As security professionals, during security assessments we deal with various systems that exchange and interpret data in different ways. Encoding is a common technique used by these systems to exchange data properly. For example, when a browser wants to send parameters through URL it has to make sure that the unprintable characters or characters with special meaning are translated to a representation that is unambiguous and universally accepted by web browsers and servers.

During Capture The Flag(CTF) challenges, it is a very common theme to use different encoding schemes to make the data look obscure. Knowing common encoding techniques and knowing how to quickly encode and decode data comes in very handy during CTFs.

Common encoding schemes & quick recipes

There are numerous encoding schemes out there but there are few schemes that we as application security testers run into often. I’m going to quickly cover few of these techniques. The idea is not to focus on the encoding scheme details but rather on the ways to quickly identify the scheme, encode and decode data.

Base64 encoding

Base64 schemes represent binary data in an ASCII string format by translating it into a base-64 representation. This basically means that all kind of characters (ASCII, UTF8, UTF16…) with control characters can be mapped to an English alphabet a-z, A-Z, 0–9 and you would be able to read them all on the screen, or even print them out.

Identifying Base64 data

  1. Check that the length of string is a multiple of 4
  2. Check that every character is in the set A-Z, a-z, 0–9, +, / except for padding at the end which is 0, 1 or 2 = characters

Encoding a string into Base64

Using base64 Linux command line utility

$ echo -n "convert this to base64" | base64
$ echo -n Y29udmVydCB0aGlzIHRvIGJhc2U2NA== | base64 -d
The -n option passed to echo suppresses the new line character at the end of the string. You can use printf instead to achieve the same result.

Using Python

$ echo -n "convert this to base64" | python -m base64
$ echo -n Y29udmVydCB0aGlzIHRvIGJhc2U2NA== | python -m base64 -d
If you plan on encoding/decoding often you should consider using bash aliases to make the process more simpler and easier
$ alias base64-encode='python -c "import base64,sys; print base64.b64encode(sys.argv[1])"'
$ base64-encode "some test string"
c29tZSB0ZXN0IHN0cmluZw==
$ alias base64-decode='python -c "import base64,sys; print base64.b64decode(sys.argv[1])"'
$ base64-decode "dGVzdA=="
test

Using Chrome/Firefox DevTools

Using Burp suite Decoder

URL encoding

URL encoding is a mechanism for translating unprintable or special characters to a universally accepted format by web servers and browsers.

Using Python

$ alias urlencode='python -c "import sys, urllib as ul; print ul.quote_plus(sys.argv[1])"'
$ alias urldecode='python -c "import sys, urllib as ul;print ul.unquote_plus(sys.argv[1])"'
$ urlencode "<script>alert(0)</script>"
%3Cscript%3Ealert%280%29%3C%2Fscript%3E
$ urldecode %3Cscript%3Ealert%280%29%3C%2Fscript%3E
<script>alert(0)</script>

Python urllib has a handy function urlencode() that can URL encode parameters that you want to pass as part of a URL. You need to pass your parameters into urlencode() as a Python dictionary.

import urllib
params = { 'lname':'tester','fname':'<script>alert(0)</script>'}
print urllib.urlencode(params)
‘lname=tester&fname=%3Cscript%3Ealert%280%29%3C%2Fscript%3E’

Using Node.js

$ alias urlencode='node -e "console.log(encodeURIComponent(process.argv[1]))"'
$ urlencode "<script>alert(0)</script>" 
%3Cscript%3Ealert(0)%3C%2Fscript%3E
$ alias urldecode='node -e "console.log(decodeURIComponent(process.argv[1]))"' 
$ urldecode "%3Cscript%3Ealert(0)%3C%2Fscript%3E"                             
<script>alert(0)</script>

Using Chrome/Firefox DevTools

Using Burp suite Decoder

HTML encoding/escaping

In HTML, there are few special characters such as <, >, &, and ". When using these characters, it's important to let the browser know if you want to render them as is or process them. HTML escaping is when these special characters are encoded so that the browser renders them instead of processing them. This is an important mitigation against script injection attacks.

Using recode Linux command line utility

$ echo -n "<script>alert(0)</script>" | recode ascii..html
&lt;script&gt;alert(0)&lt;/script&gt;

Using Python

$ alias htmlescape='python -c "import cgi,sys; print cgi.escape(sys.argv[1])"'
$ htmlescape "<>"
&lt;&gt;

ROT13

ROT13 is a simple letter substitution cipher that replaces a letter with the 13th letter after it, in the alphabet. This is not something you might encounter in a security assessment but I have seen it being used in various CTFs

Using Python

$ python -m encodings.rot_13 <<< “test string”
grfg fgevat  

Special mention — pwntools

pwntools is a CTF framework that helps automate and super charge your CTF tasks. Of all the things pwntools can do encoding and decoding into multiple formats is one.

from pwnlib.util.fiddling import *
urlencode("<script>alert(0)</script>") #URL Encode
#'%3c%73%63%72%69%70%74%3e%61%6c%65%72%74%28%30%29%3c%2f%73%63%72%69%70%74%3e'
urldecode('%3c%73%63%72%69%70%74%3e%61%6c%65%72%74%28%30%29%3c%2f%73%63%72%69%70%74%3e') # URL Decode
# '<script>alert(0)</script>'
b64e('test is a string') # base64 encode
'dGVzdCBpcyBhIHN0cmluZw=='
b64d('dGVzdCBpcyBhIHN0cmluZw==') # base64 decode
'test is a string'

You can use Python dir() to know more about encoding schemes that pwntools supports

>>> from pwnlib.util.fiddling import *
>>> dir()
[‘StringIO’, ‘__builtins__’, ‘__doc__’, ‘__name__’, ‘__package__’, ‘absolute_import’, ‘b64d’, ‘b64e’, ‘base64’, ‘bits’, ‘bits_str’, ‘bitswap’, ‘bitswap_int’, ‘bnot’, ‘context’, ‘cyclic’, ‘cyclic_find’, ‘cyclic_pregen’, ‘de_bruijn’, ‘de_bruijn_gen’, ‘default_style’, ‘enhex’, ‘getLogger’, ‘hexdump’, ‘hexdump_iter’, ‘hexii’, ‘isprint’, ‘lists’, ‘log’, ‘naf’, ‘negate’, ‘os’, ‘packing’, ‘pwnlib’, ‘random’, ‘randoms’, ‘re’, ‘rol’, ‘ror’, ‘sequential_lines’, ‘string’, ‘text’, ‘unbits’, ‘unhex’, ‘update_cyclic_pregenerated’, ‘urldecode’, ‘urlencode’, ‘xor’, ‘xor_key’, ‘xor_pair’]

References