A Look into SUNBURST’s DGA

asuna amawaka
insomniacs
Published in
6 min readDec 20, 2020

Many fellow researchers have written very good analysis on SUNBURST malware, so I shall not do a walkthrough on reverse engineering it. But I’ve been intrigued by the domain generation algorithm. The RedDrip Team wrote a nice decoder [1]. Folks at NETRESEC has an improved version [2]. However, there are still some generated DGA strings that cannot be decoded. I want to know what these are! *Edit: Just moments before I click on Publish for this post, I saw that Kaspersky’s analysts wrote their code in C [3] with nice writeup [4] that also handled decoding of all two types of DGA string. But hey, I got Python :)

Before I go on to explain what I did to decode these DGA strings, I would like to summarize the related functions and variable names so you won’t be lost.

Victims of SUNBURST are uniquely identified with a GUID that is created within OrionImprovementBusinessLayer.GetOrCreateUserID(). This GUID is a 8-byte value made up of the victim machine’s MAC address, MachineGUID value read from HKLM\Software\Microsoft\Cryptography\MachineGuid and the victim machine’s domain name. These three information are concatenated and MD5-hashed. The MD5 value is then “cut” into 2 and XORed (where 1st byte is XORed with the 9th byte; 8th byte is XORed with 16th byte), hence we end up with a 8-byte irreversible unique identifier.

OrionImprovementBusinessLayer.Update() is the function that calls the respective functions to generate the DGA strings and responsible for handling the stuff that happens after a DNS response is received.

OrionImprovementBusinessLayer.CryptoHelper.GetStatus() returns the concatenation of “.appsync-api.<one of four below>.avsvmcloud.com”.

  • eu-west-1
  • us-west-2
  • us-east-1
  • us-east-2

Four functions within OrionImprovementBusinessLayer.CryptoHelper are the ones that are called upon to generate the DGA strings:

GetNextStringEx(), GetNextString(), GetPreviousString() and GetCurrentString().

Within them, the functions CreateSecureString(), CreateString(), DecryptShort(), Base64Decode() and Base64Encode() are responsible for the encoding process.

(At this point, I would like to comment on the names of the functions and variables — geez most of them are not in any way descriptive of the actual meaning of the task they perform or the value they hold. If anything, they are there to mislead the analyst.)

CreateSecureString() does nothing to create secure strings. What it really does is XOR-encode some data with a random byte as XOR key, prepend this byte to the data and then return a Base32-like encoded string of the data. The Base32-like encoding is performed by Base64Encode(). Look at that nasty confusing name!

The other related function, Base64Decode() is sort of a substitution cipher where by default the characters’ are shifted by 4 to the right according to a custom alphabet. If a special character “0”, “.”, “-“ or “_” is encountered, then a random shift value is selected. Either Base64Decode() and Base64Encode() is called within DecryptShort().

CreateString() creates a 1 byte value that encodes the index of the DGA string. In the event that multiple DGA strings are required to fully represent the victims’ domain, this index numbering (I call it the “chunk index”, and it ranges from 0 to 35) will help the receiving end to piece back the domain. This is because the DGA string is capped at 32 bytes (excluding the fixed “.appsync-api…avsvmcloud.com”), so encoded victim’s domain that is longer than 16 bytes would have to be expressed across multiple DGA strings (the first 16 bytes in each of such DGA string is taken up by victim’s GUID and the chunk index). Having a chunk index of 35 means that this DGA string is the last piece.

To explain this concept of “index numbering”, let’s look at the output of my decoder. After decoding the victim GUID and the index, I was able to link up two DGA strings (involving abit of manual searching) that makes up one victim’s domain. Chunk Index of “0” means it is the first piece, and Chunk Index of “35” means it is the last. If there had been another piece in the middle, it would be index “1”.

What made decoding the victims’ domain possible is because the malware generated the DGA string via GetPreviousString() and GetCurrentString(). The victims’ domain is encoded and included in the DGA string through the variables dnStr and dnStrLower. The decoders out there would try to reverse the DGA string to decode dnStrLower to retrieve the victim’s domain.

This is how the DGA string would look like. I call this “Type 1” DGA string.

Notice that in GetNextString() and GetNextStringEx(), dnStr and dnStrLower are not used. Instead, another function UpdateBuffer() is called.

As such, a different kind of DGA string is generated. I call this “Type 2” DGA string.

Within UpdateBuffer(), a 3-bytes time value is calculated through GetStringHash(). The last two bytes of this time value is going to be used as the XOR key to encode the 8-bytes victim GUID. UpdateBuffer() returns a 11-bytes value made up of 8-bytes encoded GUID and 3-bytes time/XOR key. If data is provided to the function (data that describes the number of security tools’ processes present somehow), then UpdateBuffer() returns a 13-bytes value, with the additional 2 bytes appended behind the time/XOR key.

The value is then encoded through CreateSecureString(), which applies the Base32-like encoding. The ending DGA value is a 20 or 23 bytes string. Interesting. I can use this as a condition to identify this form of DGA string.

Alright, stop talking. Let’s decode!

I made 2 assumptions to try to differentiate between the two types DGA strings.

- If the decoded chunk index is 0, the length of the DGA string cannot be less than 32 bytes (which should not be happening, because if data is “overflowing” into another DGA string, then the first should be filled up to the max length). Else, it could be a Type 2 DGA string.

- If chunk index is successfully decoded to 35, then it is a Type 1 DGA string.(what are the chances of getting this exact value using the 16th byte and the 1st byte? I think low enough for this assumption to work.)

Along with the expected length for Type 2 DGA strings, I’m able to come up with the following if-else checks:

With the decoded Victim GUIDs from Type 1 and Type 2 DGA strings, we can identify the related DNS queries, and how many different machines are infected within the same domain.

Let’s see some examples. I worked with data from here: https://github.com/bambenek/research/blob/main/sunburst/uniq-hostnames.txt

If anyone has a list of all such DNS queries from within their network, I guess it would also be helpful to be able to decode Type 2 DGA strings to identify all the different affected machines.

Here’s the link to the script on Github:

https://github.com/asuna-amawaka/SUNBURST-Analysis

If anyone is keen to discuss (or point out where I can improve in the code), DM me on Twitter!

References:

[1] https://github.com/RedDrip7/SunBurst_DGA_Decode

[2] https://www.netresec.com/?page=Blog&month=2020-12&post=Reassembling-Victim-Domain-Fragments-from-SUNBURST-DNS

[3] https://github.com/2igosha/sunburst_dga

[4] https://securelist.com/sunburst-connecting-the-dots-in-the-dns-requests/99862/

--

--