Oops, you found a subdomain takeover. Should you be worried?

Andy Vermeulen
14 min readFeb 22, 2022

--

Dangling DNS records that allow for a subdomain takeover are nothing new. Auditing tools for pen testers like can-i-take-over-xyz have been around for years, while resources like MDN do a solid job trying to educate developers and operations teams about the issue.

Unfortunately the 100’s of bug bounty reports indicate this vulnerability class is far from mitigated.

What happened?

We recently discovered a subdomain takeover of a dangling DNS record on johndoe.example.com:

  1. A GitHub Pages site (e.g. johndoe.github.io) was setup to use a custom domain name.
  2. A DNS record johndoe.example.com was created as an alias or CNAME to johndoe.github.io.
  3. The custom domain name configuration was later removed from the GitHub Pages site when its use was retired. The DNS record for johndoe.example.com was however not removed. This left a dangling DNS CNAME to johndoe.github.io.
  4. A malicious actor set up their own GitHub Pages site and configured it to use the custom domain name johndoe.example.com. All links and traffic to johndoe.example.com were now being served content controlled by this malicious actor.
Content controlled by a malicious actor being served from our subdomain.

As soon as we became aware of the issue, we advised to remove the DNS CNAME record. That stopped the bleeding but didn’t tell us what the malicious actor was up to.

Why did they bother doing a subdomain takeover? Did they potentially read cookies set from the top-level domain, perform cross-site scripting and circumvent content security policies (CSP), use the subdomain to bypass redirect whitelists, or…?

Did they target example.com specifically?

Given this was subdomain takeover of a GitHub Pages website, the full contents of the site was easy to find. Using the GitHub search function, we found the repository “tiodiatavo” owned by the GitHub user @stepard.

The repository had ~10,900 HTML files with Arabic content on a range of topics from recipes, computer equipment, visas, affiliate marketing, and Lorem Ipsum.

Multiple Arabic HTML pages with an unusual script tag.

Each HTML file had a script tag to an unusual JavaScript file hosted on a .ru domain. Most JS files are minified with UglifyJS, Babel Minify, etc so it is not uncommon for JS to be hard to read, but this JS file was purposefully obfuscated. Across the ~10,900 HTML files in the repository we identified 10 unique domains used to serve this file:

  • https://bo.datingsvr.ru/trd
  • https://ct.dominikpers.ru/trd
  • https://de.datingvr.ru/trd
  • https://dr.dietaforlove.ru/trd
  • https://ew.dionwars.ru/trd
  • https://js.ekb-tv.ru/trd
  • https://nnm.eburi.ru/trd
  • https://rt.coronafly.ru/trd
  • https://td.dzeroki.ru/trd
  • https://to.darkandlight.ru/trd
Obfuscated JavaScript file served by the malicious actor

The GitHub user @stepard furthermore had a total of 157 repositories. We pulled them all and found that 87 repositories had similar HTML content. For those 87 repositories, we identified the domain they belong to and found that 77 were still active subdomain takeovers.

We reported the account to GitHub and they swiftly terminated it.

What does the obfuscated JavaScript do though?

The number of subdomain takeovers discovered makes it unlikely that the actor was specifically targeting example.com but we would like to know for sure. We need to figure out what this obfuscated JavaScript does.

There are plenty of tools available that “beautify” scripts. The format feature inside Chrome DevTools is a great start to make scripts easier to read introducing whitespace and newlines but it doesn’t try to simplify the code. Other tools like javascript-unobfuscator and JS NICE do try and go a step further, performing de-obfuscation, statistical renaming and type inference.

Unfortunately none of these tools did a particularly great job on this JS file well. Even worse, when we tried to substitute the obfuscated JS script with a locally hosted “beautified” version to make it easier to work with we ended up with a lovely Aw, Snap! notification from Chrome. Beautifying the script stops it from working and repeatedly crashes our browser.

Beautifying the obfuscated JavaScript crashes our Chrome browser.

In order to understand why, we had to revert to manual de-obfuscation. We opened the Chrome DevTools and paused JavaScript execution after loading the page yet before the crash happens.

Before we continue, you can grab the code on my GitHub if you want to follow along. The original obfuscated JS code, intermediate beautified JS output of each de-obfuscation step, and my de-obfuscation Python script are all available.

Stepping through the code we noticed that it appears to get stuck inside an infinite loop as part the “ySQKnz” function:

  • The script will loop while _0x5ac416 is lower than _0x30c102.
  • On every iteration, the value _0x5ac416 is incremented by one.
  • On every iteration, a random value is also appended to an array and the value of _0x30c102 is reset to the length of that array.
  • Effectively both _0x5ac416 and _0x30c102 are incremented by one on every iteration. As a result, the condition to terminate the loop is never achieved.
  • This infinite loop continues to grow the size of the _0x30c102 array and thus increases Chrome memory consumption until the array size becomes excessive and Chrome decides to kill the browser tab... Aw, Snap!
The infinite loop inside the “ySQKnz” function.

If we go back up the call stack, we can find branching logic inside a “jipTwl” function that conditionally calls the “ySQKnz” function (i.e. our infinite loop). We naively tried to short-cut this and avoid calling “ySQKnz” altogether but the code just ends up looping indefinitely elsewhere… at least it no longer crashes the browser?!

Attempting to bypass the call to the “ySQKnz” function that causes the infinite loop

We continued along this path for a while and tried to figure out a way to disable the anti-tampering mechanism but alas didn’t get anywhere. There is simply too much array-shifting wizardry going on to make sense of given the overly complex and obfuscated JS code.

De-obfuscating JavaScript the hard way

Let’s rethink our approach... Can we simplify the code sufficiently to make it easier to understand while simultaneously not triggering the anti-tampering mechanism?

De-obfuscation #1: Simplify

To kick things off, we can identify some Truthy values being evaluated in boolean contexts. Replacing those with their resulting boolean values should makes the script a bit easier to understand:

  • !![] becomes true
  • ![] becomes false

We can also see some string concatenation going on that is unnecessary and can easily be removed:

  • “a” + “b” becomes “ab”
  • ‘a’ + ‘b’ becomes ‘ab’

The JavaScript language allows for property accessors using either bracket notation or dot notation. The code sample uses bracket notation but we find dot notation easier on the eye:

  • obj[‘property’] becomes obj.property

Finally, any Integer literals have been written in hexadecimal (base 16) notation which for most of us doesn’t come as naturally. Let’s convert them to decimal (base 10) instead.

  • 0x10 becomes 16

De-obfuscation #2: Expressions To Numbers

The code has a large amount of algebraic expressions with no variables. Given there are no variables in these expressions, their value can never change so we can replace them with their constant value:

  • -5354+-3878*-2+2402*-1 should become 0

All function and variable identifiers have been replaced with hexadecimal identifiers like _0x2782e9. This makes it harder to perform the above transformation using Regular Expressions without accidentally absorbing part of these identifiers.

  • _0x2782e9- -0x362 should not become _0x2782e+875

Rather than complicating our Regular Expressions, we run a preprocessing step in order to append an underscore postfix to all function and variable identifiers. We also remove the 0x prefix from them to avoid misinterpreting them as hexadecimal numbers:

  • _0x2782e9 becomes _2782e9_

As a result of this preprocessing we can now more easily evaluate algebraic expressions and convert hexadecimal Integer literals to decimals:

  • _2782e9_- -0x362 now correctly becomes_2782e9_+866

De-obfuscation #3: String Array Encoding

Malicious JS files often have 3 parts:

  • They typically start with an encrypted and/or shifted array of strings
  • Followed by a function that initialises the script by decrypting or unshifting this array
  • Followed by the actual payload scripts

This or a similar pattern seems to be present here too. We can see an array _0x2084_ with string values:

An array of obfuscated string values and property names.

There are only 3 references to array_0x2084_. One reference is as a parameter to the function below which appears to re-order the elements in the array (see the push and shift statements). This function does not return and assign a new value so we know that it mutates the array in place.

A shifting function that prepares the array of obfuscated string values for use.

The other two references grab values from the array for some additional form of processing, presumably a type of decoding or decryption. Both these references are located inside a function with the _1c40_ identifier.

The signature of the _1c40_ function.

There are 15 calls to the _1c40_ function but they all appear to be relatively simple wrapper or helper functions.

Wrappers for the _1c40_ function call.

There are a few interesting things to note about these wrappers:

  • They often come in identical pairs, which simply bloat the code and make it harder to comprehend
  • These wrappers perform a simple addition or subtraction to one of the arguments.
  • The wrapper function takes 4 arguments but only 2 are actually in use.
    Note that which argument is used is inconsistent; it’s not always the first argument as per the screenshot above.

We can remove these wrapper functions completely if we replace any calls to the wrapper with an inline invocation of the _1c40_ function. As part of this process:

  • We “lift” the addition or subtraction operation out of the wrapper function and adjust the relevant argument in the replaced _1c40_ function call.
  • We drop the two unused arguments.
  • We rename _2084_ to _strings_ for clarity
  • We also rename _1c40_ to _decode_

De-obfuscation #4: String Array Encoding on steroids

Removing the 15 wrapper functions revealed several new functions with the same signature. Double-wrapped wrapper functions! We inline those as well. And then we find more triple-wrapped and even quadruple-wrapped wrapper functions.

De-obfuscation #5: Simplifying Decode

As a result of our de-obfuscation efforts thus far, we now have direct calls to the _decode_ function all through the code.

Calls to the _decode_ function.

We now need to figure out what the two numerical values passed to the _decode_ function represent and how they are used to determine which element of the _strings_ array should be decoded and returned.

The first thing we discover is that the first argument of the _decode_ function immediately has 327 subtracted from it and is then used as the index to the _strings_ array. We will do a similar “lift” operation as we did before. We remove the subtraction inside the _decode_ function and instead adjust the value of the first argument of every call to _decode_.

The _decode_ function subtracting 327.

We also find that the second argument of the _decode_ function is only referred to once. It is being assigned a value inside a nested function that is not and cannot ever be invoked. This is dead code, so we can simply drop the second argument from all calls to _decode_.

The only reference to the second argument _1c6082_ of the _decode_ function.

De-obfuscation #6: Removing Decode

All calls to _decode_ are easily to understand now, with a single small numerical value being passed and being used to lookup an element from the _strings_ array. Wouldn’t it be better to get rid of these calls altogether though and replace them with the actual string value that the call represents?

In de-obfuscation #3, we talked about the shifting function. We previously identified that this was mutating the _strings_ array in place using the push and shift operations. After all our work thus far this shifting function now looks as follows:

We could try to figure out what this does exactly but we don’t have to. We will instead get the Chrome’s JavaScript engine to do the heavily lifting and unshift the _strings_ array for us.

We do this by introducing a debugger statement in the code right after the shifting function. When we then load the page with Chrome DevTools open, JS execution will stop at this point. Inside the Chrome DevTools console, we then evaluate the following expression to create a new temporary array with all _strings_ values already decoded.

strings_rotated = _strings_.map(function(_str, _idx) { return _decode_(_idx); })

The _strings_ array and shifting function can now be removed altogether. We can replace every invocation of _decode_ with the correct string literal instead.

  • _4c86b3_ += _decode_(88) becomes _4c86b3_ += ’; secure’

De-obfuscation #7: Simplify, again

After inline’ing all string literals, the same patterns from de-obfuscation #1 are back again. We can see properties being accessed using bracket notation, unnecessary string concatenation, etc. We already know how to tidy this up…

Property accessors using array notation.
Unnecessary string concatenation.

De-obfuscation #7: Bringing back the console

Now that we can more easily read the code, it becomes easy to identify this section of code that refers to the various methods available on the JS console class and that are used for logging or debugging.

References to keywords used for console logging.

JavaScript being a dynamic language allows you to change all properties and methods of all objects, including browser built-in ones. Malicious JS scripts often use this to replace the methods on the console class in order to prevent you from instrumenting the malicious code with console.log statements.

We reviewed the code related to the log, warn, info, etc keywords above and found that the call to _1b8bb4_ appears to do exactly that but with a fair amount of obfuscation thrown in.

The function call that initiates the hijack of the JS console.

The code for _1b8bb4_ is obfuscated using a technique called control-flow flattening. In short, every action that need to be executed has been moved inside a different case clause of a switch statement in a randomised order. A variable _GGAKL is then maintained that defines a list of clauses and the order they need to be executed in. The code then iterates over this and executes each clause as and when needed.

The order in which the clauses of the switch statement need to be executed.
The switch statement containing all actions subjected to control-flow flattening.

None of this really matters though as we can just remove this entire chuck of code to bring back the browser’s default console behaviour.

De-obfuscation #8: Disabling anti-tampering (finally!)

Looking at the code again, we spot an almost identical control-flow flattening pattern discussed earlier:

Dead code using control-flow flattening.

This function is never called though. This is just dead code and we will remove it. But while navigating this code we spotted a few curious strings and some fairly cryptic code referring to those strings.

Interesting strings discovered.
A regular expression test using the strings discovered earlier.

With some help of the Chrome DevTools, we discovered that this is the anti-tampering check! The check calls toString on a function which returns the actual JS code definition for that function. It then runs a Regular Expression to test that this code definition is what is expected.

If the JS file has been “beautified” additional whitespace and newlines are introduced inside the function definition. Calling toString therefore returns different JS code which no longer matches that Regular Expression.

Example of calling toString() on a function and how whitespace is treated.

The anti-tampering check is started by the call to _41a7ac_ so we simply remove it and the related code.

The function call that runs the anti-tampering check.

De-obfuscation #9: Rename

We are now left with only a single class definition in our file. All the other code has been simplified and removed. We can see this is the entry point to the actual behaviour. A class Task2 is defined. It has a static method called com2 (which is slightly hard to read again due to the bracket notation).

The actual logic itself is yet again obfuscated using control-flow flattening. We can see the IZOXR property with the order of clauses to be executed. We can identify other properties hiding various string literals and functions wrapping basic arithmetic and operations.

We inline the string literals and give some functions proper names but beyond that Regular Expressions can’t help us simplify this script much further (for the cost/effort required.)

De-obfuscation #10: DIY

The final step is just manual code refactoring using any half-decent IDE.

  • We remove any unreachable code blocks.
  • We replace any controlFlow.invokeFnWithXArgs calls with the actual function invocations
  • We do the same for the controlFlow.or, controlFlow.plus_a, etc functions and replace them with the equivalent operand or expression
  • We unwrap the switch statement from the control-flow flattening structure
  • We perform some basic tidying up and formatting

Well that was un-fun. What does the unobfuscated JavaScript do?

After all the effort we now have the clean JavaScript source and can easily review its functionality… and it’s a disappointment to say the least.

The script grabs the first part of the document.title based on two delimiters (or the entire title if the delimiters are not found within it)

Parsing the document.title

The script searches the document.referrer to find out whether the visitor landed on the current page from a click-through on one of these search engines:

  • google.*
  • search.yahoo.*
  • bing.com
  • search.aol.*
  • ask.com
  • altavista.*
  • search.lycos.*
  • alltheweb.*
  • yandex.*
  • nova.rambler.* and search.rambler.*
  • gogo.*
  • go.mail.*
  • nigma.*

If the visitor came from one of these search engines, then a cookie “opos” is set with the value “1”. The user is then immediately redirected to https://js.ekb-tv.ru/trds?q={document_title_prefix}.

Checking the referrer or cookie before doing an automatic redirect

If the visitor has an existing cookie “opos” with the value “1”, i.e. if they have landed here from a search engine previously, then the cookie is refreshed and the user is similarly redirected immediately.

If a visitor lands on this page organically, the script does nothing.

One curious part is that the code seems to have Regular Expressions in place to be able to parse what a user searched for on the mentioned search engines. That sounds like a potentially interesting data point to capture but we cannot see this being done anywhere. Is this a bug? Is this a partially implemented or deleted feature? We will never know.

Evidence of code to capture search queries that does not appear to be in use

What are they redirecting to?

In order to find out more, we will need to checkout the contents and behaviour of https://js.ekb-tv.ru/trds but that is another long story for another day and my next blog post: Would you like a free iPhone with that?

Meanwhile, beware, if you don’t have sufficient auditing tools in place to detect dangling DNS records or actual subdomain takeovers (yes, they are different and you should have capabilities to detect both!) you will likely eventually get stung.

Postscriptum

During this investigation we discovered that the JS file was obfuscated with the https://obfuscator.io/ tool created by Timofey Kachalov which supports several more obfuscation techniques and variations not encountered in this analysis.

Update #1

Since performing this analysis we have discovered a tool, shift-reactor, aimed reverse engineer obfuscated JavaScript using ASTs that may be more suitable than the Regular Expressions approach we adopted here.

--

--