A quick primer on encoding & decoding for security folks
This blog post tries to be a quick and practical primer on encoding and decoding schemes for security testers. This blog will cover multiple ways to encode and decode data quickly, this could come in handy while doing security assessments and CTFs.
What is encoding & decoding?
Encoding is a technique to transform data into other formats so that it can be consumed by different systems. For example, if you want to send binary data over email you should encode the binary using some encoding scheme which makes it suitable for sending it over email.
Using encoding does not ensure confidentiality, but rather simply ensures that data can be transmitted and consumed properly.
Unlike hashing, the data that is encoded can actually be reversed to it’s original form by decoding it.
Unlike encryption, there is no key involved during encoding data and also the goal of encoding is not maintaining confidentiality of data
Why do we care about encoding/decoding?
As security professionals, during security assessments we deal with various systems that exchange and interpret data in different ways. Encoding is a common technique used by these systems to exchange data properly. For example, when a browser wants to send parameters through URL it has to make sure that the unprintable characters or characters with special meaning are translated to a representation that is unambiguous and universally accepted by web browsers and servers.
During Capture The Flag(CTF) challenges, it is a very common theme to use different encoding schemes to make the data look obscure. Knowing common encoding techniques and knowing how to quickly encode and decode data comes in very handy during CTFs.
Common encoding schemes & quick recipes
There are numerous encoding schemes out there but there are few schemes that we as application security testers run into often. I’m going to quickly cover few of these techniques. The idea is not to focus on the encoding scheme details but rather on the ways to quickly identify the scheme, encode and decode data.
Base64 encoding
Base64 schemes represent binary data in an ASCII string format by translating it into a base-64 representation. This basically means that all kind of characters (ASCII, UTF8, UTF16…) with control characters can be mapped to an English alphabet a-z, A-Z, 0–9
and you would be able to read them all on the screen, or even print them out.
Identifying Base64 data
- Check that the length of string is a multiple of 4
- Check that every character is in the set
A-Z, a-z, 0–9, +, /
except for padding at the end which is 0, 1 or 2=
characters
Encoding a string into Base64
Using base64
Linux command line utility
$ echo -n "convert this to base64" | base64
$ echo -n Y29udmVydCB0aGlzIHRvIGJhc2U2NA== | base64 -d
The
-n
option passed toecho
suppresses the new line character at the end of the string. You can useprintf
instead to achieve the same result.
Using Python
$ echo -n "convert this to base64" | python -m base64
$ echo -n Y29udmVydCB0aGlzIHRvIGJhc2U2NA== | python -m base64 -d
If you plan on encoding/decoding often you should consider using
bash
aliases to make the process more simpler and easier
$ alias base64-encode='python -c "import base64,sys; print base64.b64encode(sys.argv[1])"'$ base64-encode "some test string"
c29tZSB0ZXN0IHN0cmluZw==$ alias base64-decode='python -c "import base64,sys; print base64.b64decode(sys.argv[1])"'$ base64-decode "dGVzdA=="
test
Using Chrome/Firefox DevTools
Using Burp suite Decoder
URL encoding
URL encoding is a mechanism for translating unprintable or special characters to a universally accepted format by web servers and browsers.
Using Python
$ alias urlencode='python -c "import sys, urllib as ul; print ul.quote_plus(sys.argv[1])"'$ alias urldecode='python -c "import sys, urllib as ul;print ul.unquote_plus(sys.argv[1])"'$ urlencode "<script>alert(0)</script>"
%3Cscript%3Ealert%280%29%3C%2Fscript%3E$ urldecode %3Cscript%3Ealert%280%29%3C%2Fscript%3E
<script>alert(0)</script>
Python urllib
has a handy function urlencode()
that can URL encode parameters that you want to pass as part of a URL. You need to pass your parameters into urlencode()
as a Python dictionary.
import urllib
params = { 'lname':'tester','fname':'<script>alert(0)</script>'}
print urllib.urlencode(params)
‘lname=tester&fname=%3Cscript%3Ealert%280%29%3C%2Fscript%3E’
Using Node.js
$ alias urlencode='node -e "console.log(encodeURIComponent(process.argv[1]))"'$ urlencode "<script>alert(0)</script>"
%3Cscript%3Ealert(0)%3C%2Fscript%3E$ alias urldecode='node -e "console.log(decodeURIComponent(process.argv[1]))"' $ urldecode "%3Cscript%3Ealert(0)%3C%2Fscript%3E"
<script>alert(0)</script>
Using Chrome/Firefox DevTools
Using Burp suite Decoder
HTML encoding/escaping
In HTML, there are few special characters such as <
, >
, &
, and "
. When using these characters, it's important to let the browser know if you want to render them as is or process them. HTML escaping is when these special characters are encoded so that the browser renders them instead of processing them. This is an important mitigation against script injection attacks.
Using recode
Linux command line utility
$ echo -n "<script>alert(0)</script>" | recode ascii..html<script>alert(0)</script>
Using Python
$ alias htmlescape='python -c "import cgi,sys; print cgi.escape(sys.argv[1])"'
$ htmlescape "<>"
<>
ROT13
ROT13 is a simple letter substitution cipher that replaces a letter with the 13th letter after it, in the alphabet. This is not something you might encounter in a security assessment but I have seen it being used in various CTFs
Using Python
$ python -m encodings.rot_13 <<< “test string”grfg fgevat
Special mention — pwntools
pwntools
is a CTF framework that helps automate and super charge your CTF tasks. Of all the things pwntools
can do encoding and decoding into multiple formats is one.
from pwnlib.util.fiddling import *urlencode("<script>alert(0)</script>") #URL Encode
#'%3c%73%63%72%69%70%74%3e%61%6c%65%72%74%28%30%29%3c%2f%73%63%72%69%70%74%3e'urldecode('%3c%73%63%72%69%70%74%3e%61%6c%65%72%74%28%30%29%3c%2f%73%63%72%69%70%74%3e') # URL Decode
# '<script>alert(0)</script>'b64e('test is a string') # base64 encode
'dGVzdCBpcyBhIHN0cmluZw=='b64d('dGVzdCBpcyBhIHN0cmluZw==') # base64 decode
'test is a string'
You can use Python dir()
to know more about encoding schemes that pwntools
supports
>>> from pwnlib.util.fiddling import *
>>> dir()
[‘StringIO’, ‘__builtins__’, ‘__doc__’, ‘__name__’, ‘__package__’, ‘absolute_import’, ‘b64d’, ‘b64e’, ‘base64’, ‘bits’, ‘bits_str’, ‘bitswap’, ‘bitswap_int’, ‘bnot’, ‘context’, ‘cyclic’, ‘cyclic_find’, ‘cyclic_pregen’, ‘de_bruijn’, ‘de_bruijn_gen’, ‘default_style’, ‘enhex’, ‘getLogger’, ‘hexdump’, ‘hexdump_iter’, ‘hexii’, ‘isprint’, ‘lists’, ‘log’, ‘naf’, ‘negate’, ‘os’, ‘packing’, ‘pwnlib’, ‘random’, ‘randoms’, ‘re’, ‘rol’, ‘ror’, ‘sequential_lines’, ‘string’, ‘text’, ‘unbits’, ‘unhex’, ‘update_cyclic_pregenerated’, ‘urldecode’, ‘urlencode’, ‘xor’, ‘xor_key’, ‘xor_pair’]