Oops, you found a subdomain takeover. Should you be worried?
Dangling DNS records that allow for a subdomain takeover are nothing new. Auditing tools for pen testers like can-i-take-over-xyz have been around for years, while resources like MDN do a solid job trying to educate developers and operations teams about the issue.
Unfortunately the 100’s of bug bounty reports indicate this vulnerability class is far from mitigated.
What happened?
We recently discovered a subdomain takeover of a dangling DNS record on johndoe.example.com:
- A GitHub Pages site (e.g. johndoe.github.io) was setup to use a custom domain name.
- A DNS record johndoe.example.com was created as an alias or CNAME to johndoe.github.io.
- The custom domain name configuration was later removed from the GitHub Pages site when its use was retired. The DNS record for johndoe.example.com was however not removed. This left a dangling DNS CNAME to johndoe.github.io.
- A malicious actor set up their own GitHub Pages site and configured it to use the custom domain name johndoe.example.com. All links and traffic to johndoe.example.com were now being served content controlled by this malicious actor.
As soon as we became aware of the issue, we advised to remove the DNS CNAME record. That stopped the bleeding but didn’t tell us what the malicious actor was up to.
Why did they bother doing a subdomain takeover? Did they potentially read cookies set from the top-level domain, perform cross-site scripting and circumvent content security policies (CSP), use the subdomain to bypass redirect whitelists, or…?
Did they target example.com specifically?
Given this was subdomain takeover of a GitHub Pages website, the full contents of the site was easy to find. Using the GitHub search function, we found the repository “tiodiatavo” owned by the GitHub user @stepard.
The repository had ~10,900 HTML files with Arabic content on a range of topics from recipes, computer equipment, visas, affiliate marketing, and Lorem Ipsum.
Each HTML file had a script tag to an unusual JavaScript file hosted on a .ru domain. Most JS files are minified with UglifyJS, Babel Minify, etc so it is not uncommon for JS to be hard to read, but this JS file was purposefully obfuscated. Across the ~10,900 HTML files in the repository we identified 10 unique domains used to serve this file:
- https://bo.datingsvr.ru/trd
- https://ct.dominikpers.ru/trd
- https://de.datingvr.ru/trd
- https://dr.dietaforlove.ru/trd
- https://ew.dionwars.ru/trd
- https://js.ekb-tv.ru/trd
- https://nnm.eburi.ru/trd
- https://rt.coronafly.ru/trd
- https://td.dzeroki.ru/trd
- https://to.darkandlight.ru/trd
The GitHub user @stepard furthermore had a total of 157 repositories. We pulled them all and found that 87 repositories had similar HTML content. For those 87 repositories, we identified the domain they belong to and found that 77 were still active subdomain takeovers.
We reported the account to GitHub and they swiftly terminated it.
What does the obfuscated JavaScript do though?
The number of subdomain takeovers discovered makes it unlikely that the actor was specifically targeting example.com but we would like to know for sure. We need to figure out what this obfuscated JavaScript does.
There are plenty of tools available that “beautify” scripts. The format feature inside Chrome DevTools is a great start to make scripts easier to read introducing whitespace and newlines but it doesn’t try to simplify the code. Other tools like javascript-unobfuscator and JS NICE do try and go a step further, performing de-obfuscation, statistical renaming and type inference.
Unfortunately none of these tools did a particularly great job on this JS file well. Even worse, when we tried to substitute the obfuscated JS script with a locally hosted “beautified” version to make it easier to work with we ended up with a lovely Aw, Snap! notification from Chrome. Beautifying the script stops it from working and repeatedly crashes our browser.
In order to understand why, we had to revert to manual de-obfuscation. We opened the Chrome DevTools and paused JavaScript execution after loading the page yet before the crash happens.
Before we continue, you can grab the code on my GitHub if you want to follow along. The original obfuscated JS code, intermediate beautified JS output of each de-obfuscation step, and my de-obfuscation Python script are all available.
Stepping through the code we noticed that it appears to get stuck inside an infinite loop as part the “ySQKnz” function:
- The script will loop while _0x5ac416 is lower than _0x30c102.
- On every iteration, the value _0x5ac416 is incremented by one.
- On every iteration, a random value is also appended to an array and the value of _0x30c102 is reset to the length of that array.
- Effectively both _0x5ac416 and _0x30c102 are incremented by one on every iteration. As a result, the condition to terminate the loop is never achieved.
- This infinite loop continues to grow the size of the _0x30c102 array and thus increases Chrome memory consumption until the array size becomes excessive and Chrome decides to kill the browser tab... Aw, Snap!
If we go back up the call stack, we can find branching logic inside a “jipTwl” function that conditionally calls the “ySQKnz” function (i.e. our infinite loop). We naively tried to short-cut this and avoid calling “ySQKnz” altogether but the code just ends up looping indefinitely elsewhere… at least it no longer crashes the browser?!
We continued along this path for a while and tried to figure out a way to disable the anti-tampering mechanism but alas didn’t get anywhere. There is simply too much array-shifting wizardry going on to make sense of given the overly complex and obfuscated JS code.
De-obfuscating JavaScript the hard way
Let’s rethink our approach... Can we simplify the code sufficiently to make it easier to understand while simultaneously not triggering the anti-tampering mechanism?
De-obfuscation #1: Simplify
To kick things off, we can identify some Truthy values being evaluated in boolean contexts. Replacing those with their resulting boolean values should makes the script a bit easier to understand:
- !![] becomes true
- ![] becomes false
We can also see some string concatenation going on that is unnecessary and can easily be removed:
- “a” + “b” becomes “ab”
- ‘a’ + ‘b’ becomes ‘ab’
The JavaScript language allows for property accessors using either bracket notation or dot notation. The code sample uses bracket notation but we find dot notation easier on the eye:
- obj[‘property’] becomes obj.property
Finally, any Integer literals have been written in hexadecimal (base 16) notation which for most of us doesn’t come as naturally. Let’s convert them to decimal (base 10) instead.
- 0x10 becomes 16
De-obfuscation #2: Expressions To Numbers
The code has a large amount of algebraic expressions with no variables. Given there are no variables in these expressions, their value can never change so we can replace them with their constant value:
- -5354+-3878*-2+2402*-1 should become 0
All function and variable identifiers have been replaced with hexadecimal identifiers like _0x2782e9. This makes it harder to perform the above transformation using Regular Expressions without accidentally absorbing part of these identifiers.
- _0x2782e9- -0x362 should not become _0x2782e+875
Rather than complicating our Regular Expressions, we run a preprocessing step in order to append an underscore postfix to all function and variable identifiers. We also remove the 0x prefix from them to avoid misinterpreting them as hexadecimal numbers:
- _0x2782e9 becomes _2782e9_
As a result of this preprocessing we can now more easily evaluate algebraic expressions and convert hexadecimal Integer literals to decimals:
- _2782e9_- -0x362 now correctly becomes_2782e9_+866
De-obfuscation #3: String Array Encoding
Malicious JS files often have 3 parts:
- They typically start with an encrypted and/or shifted array of strings
- Followed by a function that initialises the script by decrypting or unshifting this array
- Followed by the actual payload scripts
This or a similar pattern seems to be present here too. We can see an array _0x2084_ with string values:
There are only 3 references to array_0x2084_. One reference is as a parameter to the function below which appears to re-order the elements in the array (see the push and shift statements). This function does not return and assign a new value so we know that it mutates the array in place.
The other two references grab values from the array for some additional form of processing, presumably a type of decoding or decryption. Both these references are located inside a function with the _1c40_ identifier.
There are 15 calls to the _1c40_ function but they all appear to be relatively simple wrapper or helper functions.
There are a few interesting things to note about these wrappers:
- They often come in identical pairs, which simply bloat the code and make it harder to comprehend
- These wrappers perform a simple addition or subtraction to one of the arguments.
- The wrapper function takes 4 arguments but only 2 are actually in use.
Note that which argument is used is inconsistent; it’s not always the first argument as per the screenshot above.
We can remove these wrapper functions completely if we replace any calls to the wrapper with an inline invocation of the _1c40_ function. As part of this process:
- We “lift” the addition or subtraction operation out of the wrapper function and adjust the relevant argument in the replaced _1c40_ function call.
- We drop the two unused arguments.
- We rename _2084_ to _strings_ for clarity
- We also rename _1c40_ to _decode_
De-obfuscation #4: String Array Encoding on steroids
Removing the 15 wrapper functions revealed several new functions with the same signature. Double-wrapped wrapper functions! We inline those as well. And then we find more triple-wrapped and even quadruple-wrapped wrapper functions.
De-obfuscation #5: Simplifying Decode
As a result of our de-obfuscation efforts thus far, we now have direct calls to the _decode_ function all through the code.
We now need to figure out what the two numerical values passed to the _decode_ function represent and how they are used to determine which element of the _strings_ array should be decoded and returned.
The first thing we discover is that the first argument of the _decode_ function immediately has 327 subtracted from it and is then used as the index to the _strings_ array. We will do a similar “lift” operation as we did before. We remove the subtraction inside the _decode_ function and instead adjust the value of the first argument of every call to _decode_.
We also find that the second argument of the _decode_ function is only referred to once. It is being assigned a value inside a nested function that is not and cannot ever be invoked. This is dead code, so we can simply drop the second argument from all calls to _decode_.
De-obfuscation #6: Removing Decode
All calls to _decode_ are easily to understand now, with a single small numerical value being passed and being used to lookup an element from the _strings_ array. Wouldn’t it be better to get rid of these calls altogether though and replace them with the actual string value that the call represents?
In de-obfuscation #3, we talked about the shifting function. We previously identified that this was mutating the _strings_ array in place using the push and shift operations. After all our work thus far this shifting function now looks as follows:
We could try to figure out what this does exactly but we don’t have to. We will instead get the Chrome’s JavaScript engine to do the heavily lifting and unshift the _strings_ array for us.
We do this by introducing a debugger statement in the code right after the shifting function. When we then load the page with Chrome DevTools open, JS execution will stop at this point. Inside the Chrome DevTools console, we then evaluate the following expression to create a new temporary array with all _strings_ values already decoded.
strings_rotated = _strings_.map(function(_str, _idx) { return _decode_(_idx); })
The _strings_ array and shifting function can now be removed altogether. We can replace every invocation of _decode_ with the correct string literal instead.
- _4c86b3_ += _decode_(88) becomes _4c86b3_ += ’; secure’
De-obfuscation #7: Simplify, again
After inline’ing all string literals, the same patterns from de-obfuscation #1 are back again. We can see properties being accessed using bracket notation, unnecessary string concatenation, etc. We already know how to tidy this up…
De-obfuscation #7: Bringing back the console
Now that we can more easily read the code, it becomes easy to identify this section of code that refers to the various methods available on the JS console class and that are used for logging or debugging.
JavaScript being a dynamic language allows you to change all properties and methods of all objects, including browser built-in ones. Malicious JS scripts often use this to replace the methods on the console class in order to prevent you from instrumenting the malicious code with console.log statements.
We reviewed the code related to the log, warn, info, etc keywords above and found that the call to _1b8bb4_ appears to do exactly that but with a fair amount of obfuscation thrown in.
The code for _1b8bb4_ is obfuscated using a technique called control-flow flattening. In short, every action that need to be executed has been moved inside a different case clause of a switch statement in a randomised order. A variable _GGAKL is then maintained that defines a list of clauses and the order they need to be executed in. The code then iterates over this and executes each clause as and when needed.
None of this really matters though as we can just remove this entire chuck of code to bring back the browser’s default console behaviour.
De-obfuscation #8: Disabling anti-tampering (finally!)
Looking at the code again, we spot an almost identical control-flow flattening pattern discussed earlier:
This function is never called though. This is just dead code and we will remove it. But while navigating this code we spotted a few curious strings and some fairly cryptic code referring to those strings.
With some help of the Chrome DevTools, we discovered that this is the anti-tampering check! The check calls toString on a function which returns the actual JS code definition for that function. It then runs a Regular Expression to test that this code definition is what is expected.
If the JS file has been “beautified” additional whitespace and newlines are introduced inside the function definition. Calling toString therefore returns different JS code which no longer matches that Regular Expression.
The anti-tampering check is started by the call to _41a7ac_ so we simply remove it and the related code.
De-obfuscation #9: Rename
We are now left with only a single class definition in our file. All the other code has been simplified and removed. We can see this is the entry point to the actual behaviour. A class Task2 is defined. It has a static method called com2 (which is slightly hard to read again due to the bracket notation).
The actual logic itself is yet again obfuscated using control-flow flattening. We can see the IZOXR property with the order of clauses to be executed. We can identify other properties hiding various string literals and functions wrapping basic arithmetic and operations.
We inline the string literals and give some functions proper names but beyond that Regular Expressions can’t help us simplify this script much further (for the cost/effort required.)
De-obfuscation #10: DIY
The final step is just manual code refactoring using any half-decent IDE.
- We remove any unreachable code blocks.
- We replace any controlFlow.invokeFnWithXArgs calls with the actual function invocations
- We do the same for the controlFlow.or, controlFlow.plus_a, etc functions and replace them with the equivalent operand or expression
- We unwrap the switch statement from the control-flow flattening structure
- We perform some basic tidying up and formatting
Well that was un-fun. What does the unobfuscated JavaScript do?
After all the effort we now have the clean JavaScript source and can easily review its functionality… and it’s a disappointment to say the least.
The script grabs the first part of the document.title based on two delimiters (or the entire title if the delimiters are not found within it)
The script searches the document.referrer to find out whether the visitor landed on the current page from a click-through on one of these search engines:
- google.*
- search.yahoo.*
- bing.com
- search.aol.*
- ask.com
- altavista.*
- search.lycos.*
- alltheweb.*
- yandex.*
- nova.rambler.* and search.rambler.*
- gogo.*
- go.mail.*
- nigma.*
If the visitor came from one of these search engines, then a cookie “opos” is set with the value “1”. The user is then immediately redirected to https://js.ekb-tv.ru/trds?q={document_title_prefix}.
If the visitor has an existing cookie “opos” with the value “1”, i.e. if they have landed here from a search engine previously, then the cookie is refreshed and the user is similarly redirected immediately.
If a visitor lands on this page organically, the script does nothing.
One curious part is that the code seems to have Regular Expressions in place to be able to parse what a user searched for on the mentioned search engines. That sounds like a potentially interesting data point to capture but we cannot see this being done anywhere. Is this a bug? Is this a partially implemented or deleted feature? We will never know.
What are they redirecting to?
In order to find out more, we will need to checkout the contents and behaviour of https://js.ekb-tv.ru/trds but that is another long story for another day and my next blog post: Would you like a free iPhone with that?
Meanwhile, beware, if you don’t have sufficient auditing tools in place to detect dangling DNS records or actual subdomain takeovers (yes, they are different and you should have capabilities to detect both!) you will likely eventually get stung.
Postscriptum
During this investigation we discovered that the JS file was obfuscated with the https://obfuscator.io/ tool created by Timofey Kachalov which supports several more obfuscation techniques and variations not encountered in this analysis.
Update #1
Since performing this analysis we have discovered a tool, shift-reactor, aimed reverse engineer obfuscated JavaScript using ASTs that may be more suitable than the Regular Expressions approach we adopted here.