[QuickNote] Technical Analysis of recent Pikabot Core Module
In early February 2023, cybersecurity experts on Twitter issued a warning about a new malware variant/family being distributed by the #TA577 botnet (associated with the same group from #Qakbot). This malware shares similarities with the Qakbot Trojan, including distribution methods, campaigns, and behaviors. It was quickly nicknamed Pikabot.
Pikabot consists of two components: loader/injector and core module. It utilizes loader/injector to decrypt and inject the core module. Core module then performs malicious behaviors, including gathering information about the victim machine, connecting to command and control server to receive and execute arbitrary commands, downloading and injecting other malware.
Pikabot is continuously upgraded, employing various anti-analysis techniques and different obfuscation methods to make it difficult for analysts to understand its behavior. In the next section of this article, I will focus on analyzing the Pikabot core module, including:
- How Pikabot obfuscates and decrypts strings.
- How Pikabot retrieves API addresses.
- How Pikabot slows down the analysis process.
- How Pikabot generates victim uuid.
- Collecting information from the victim’s machine.
- How Pikabot decrypts C2 addresses.
- How Pikabot utilizes Syscall.
Sample hash: ce742b7cc94a5c668116d343b6a9677523dc13b358294bba3cd248fba8b880da
2. Decrypt string
In some older versions, to decode strings, Pikabot utilizes a XOR loop to decode encrypted data stored on the stack:
In recent versions of Pikabot, the process of decrypting strings has become more sophisticated.
- RC4 is used to decrypt encrypted data stored on stack. Each encrypted data has a corresponding RC4 key.
- The RC4-decrypted string will be converted to a valid Base64 string (by replacing the character ‘
_
’ with ‘=
’) and then decoded using Base64. - Finally, AES-CBC will be used to decrypt the decoded data to return the original string.
AES Key
and AES IV
used in this sample are also decrypted using RC4:
- Decrypted AES Key: “
dVOEz=/e/Xf=0WMiz6uR9cZKe+tyb+VJhSu+tfi0HzT2COoz25r4+8osEx4
" - Decrypted AES IV: “
nsdA1ANUAH+K1XhVjnsg92tGMNQG=fsgrqJQ8AtZIacqaYg
"
However, Pikabot only uses 32 bytes from the decrypted AES Key
and 16 bytes from the decrypted AES IV
. Therefore, the final AES Key and IV used for string decryption are:
- AES Key: “
dVOEz=/e/Xf=0WMiz6uR9cZKe+tyb+VJ
” - AES IV: “
nsdA1ANUAH+K1XhV
”
The entire process was simulated using CyberChef as follows:
Here is the CyberChef recipe:
https://gchq.github.io/CyberChef/#recipe=RC4(%7B'option':'Latin1','string':'currentContextId'%7D,'Hex','Latin1')Find_/_Replace(%7B'option':'Simple%20string','string':'_'%7D,'%3D',true,false,true,false)From_Base64('A-Za-z0-9%2B/%3D',true,false)To_Hex('Space',0)AES_Decrypt(%7B'option':'Latin1','string':'dVOEz%3D/e/Xf%3D0WMiz6uR9cZKe%2Btyb%2BVJ'%7D,%7B'option':'Latin1','string':'nsdA1ANUAH%2BK1XhV'%7D,'CBC','Hex','Raw',%7B'option':'Hex','string':''%7D,%7B'option':'Hex','string':''%7D)&input=NjAgOEUgRkUgMUIgQTYgNTkgRUUgNUEgIDA4IDkzIDc2IEY0IEEyIDVEIDFDIDI2IDc1IEQyIDMwIEFBIEM2IDM4IDdEIEEz
3. Retrieve API address
To get the address of API functions, Pikabot does the following:
- It gets the base address of the corresponding Dll based on the decrypted input string.
- Decrypts the API function name, then uses
GetProcAddress
to optain the real address of the API.
The function pkb_load_dll_based_on_input_str (0x41E657)
has the following code graph:
In this function, Pikabot decrypts relevant strings and compares them to the string passed to the function. If the strings match, Pikabot decrypts the name of the corresponding DLL and loads it using LoadLibraryA
. Firstly, Pikabot finds the addresses of the GetProcAddress
and LoadLibraryA
functions using pre-calculated hash values.
The pseudo-code for calculating the hash of API functions is as follows:
Based on the pseudo-code above, we can rewrite it in Python and perform a brute-force to find the API function name corresponding to the pre-calculated hash values:
With the API function addresses obtained above, Pikabot will load the corresponding DLL:
Here is the list of DLLs that Pikabot will load during execution:
The function pkb_get_api_addr_by_name_using_GetProcAddress (0x41E636)
will decrypt the API function name and call GetProcAddress
to retrieve the function address:
4. Slowing down the analysis process
In order to slow down the code analysis, Pikabot inserts a large number of meaningless junk functions into the execution flow. These functions typically do nothing. This can make it much more time-consuming for analysts to understand the code and identify its malicious behavior.
5. System language check
Pikabot checks the system language code of the victim’s machine before executing its main task by using API function GetUserDefaultLangID
. In the previous version, if the result returned a region code for a country such as Russia
or Ukraine
, the malware would immediately exit without any further activity.
However, in the version I am analyzing, Pikabot simply checks the return code if it is different from 0x1
, the function pkb_check_default_lang (0x0042F7A0)
will return 0x0
:
6. Create Mutex
When the result of the function pkb_check_default_lang (0x42F7A0)
return 0x0
, Pikabot will continue executing, with the sample I am analyzing it uses the hardcoded mutex name (after decrypting): “{F0B9756B-5D50-4696-A969-4C9AF7B69188}
” to prevent reinfection on the victim’s machine.
7. Create victim uuid
After creating the Mutex as described above, Pikabot creates the victim uuid
using the function pkb_collect_victim_info_n_gen_victim_uuid (0x42E233)
. The graph code for this function is as follows:
The string is generated based on the information collected from the victim machine, including:
- Volume serial number by using API function
GetVolumeInformationW
. This is a unique identifier assigned to each physical volume on a computer. - computer name by using API function
GetComputerNameW
. This is the name of the computer that the malware is running on. - user name by using API function
GetUserNameW
. This is the name of the user who is currently logged on to the computer. - OS product type by using API function
GetProductInfo
.
The information collected above will be formatted as follows: “<computer_name>\<user_name>|<os_type>
“. This information will then be hashed using the algorithm mentioned in 3. Retrieve API address with the hash value will be initialized to the value of VolumeSerialNumber
.
The hash value calculated for the collected information along with the VolumeSerialNumber
will be futher calculate by using function pkb_calc_hash_2 (0x42E123)
below:
Finally, use the API function wsprintfW
to format the uuid
string in the format %07lX%09lX%lu
:
8. Collecting victim machine information
Before connecting to the C2 server, Pikabot will collect some information about the victim machine. The function pkb_collect_victim_system_info (0x410E37)
performs the following collection tasks:
- Retrieves the
PEB
, gather operating system information, including (OSMajorVersion
,OSMinorVersion
,OSBuildNumber
), determines whether it is running on a64-bit
operating system or not through the API functionIsWow64Process
. - Collects the operating system type by using the
GetProductInfo
. - Gathers the computer name and username by calling the
GetComputerNameW
andGetUserNameW
. - Collects CPU information by employing
cpuid
with the initial value ofEAX = 0x80000000
. - Obtains information about display devices on the machine through the API
EnumDisplayDevicesW
. - Retrieves the RAM capacity of the victim’s machine using
GlobalMemoryStatusEx
. - Gets the system uptime by utillizing the API funciton
GetTickCount
. - Checks if its process is running in admin privileges or not through the
GetCurrentProcess
,OpenProcessToken
,GetTokenInformation
. - Retrieves information about screen resolution using the
GetDesktopWindow
andGetWindowRect
. - Collects the domain name using the API
GetComputerNameExW
withNameType
isComputerNameDnsDomain
. - Gathers
DomainControllerName
,DomainControllerAddress
usingDsGetDcNameW
. If no information is available, Pikabot will assign it as “unknown
”.
Next, Pikabot decrypts information related to pikabot version
and stream
, my sample has respectively info “1.1.17-ghost
” and “GG13TH@T@f0adda360d2b4ccda11468e026526576
“. Then, the information about the victim collected above will be constructed into a JSON string with the following format:
{
"Xtt2VRnA": "%s",
"qleNiC": "%s",
"LPLLXuTl2": " Win %d.%d %d ",
"0RbIhQuDq": %s,
"6bw35n": "%s",
"FQkA0G": "%s",
"bFFqxURzx": "%s",
"a0xIcXZI": %d,
"LkLMKwP1": "%s",
"R8N3ujt": %d,
"2sIw0rUG": "%s",
"UTrXReY": "%s",
"YoViBQC": "%s",
"QeMM8": "%s",
"VLsFyV4d": "%s",
"EcZbr": %d,
"XKb5WP": %d
}
All information after being formatted into a JSON string will be encrypted. The encryption process is as follows:
- Call the function
pkb_gen_random_chars(0x41BC4A)
to generate the session key:aes_key
(32 bytes) andaes_iv
(16 bytes). - Call the function
pkb_gen_random_chars(0x41BC4A)
for generating 3 random characters, which was used as a marker. I will temporarily call itmarker
. - Call the function
pkb_aes_crypt_data (0x40A97A)
to encrypt the JSON string with the generatedaes_key
andiv
. - Call the function
pkb_base64_encode (0x0040B4DD)
to encode the encrypted data above. - Then all information will be stored in the following format:
<marker (rand_3_chars)><aes_key (first 16 bytes)><aes_iv><encoded data><aes_key (last 16 bytes)>
. - Finally, use a loop to iterate through the entire buffer to replace the character ‘
=
’ with ‘_
’.
Here is the code flow:
9. Information gathering with other commands
In addition to the information collected as mentioned above, Pikabot also executes the following commands to gather additional information from the victim’s machine:
netstat.exe –aon
ipconfig.exe /all
whoami.exe /all
The results of these commands are also encrypted and stored in the same way as above. However, the sample that I am analyzing is configured as DISABLED
.
10. Collect running processes
Pikabot call the function pkb_enum_n_collect_all_running_processes (0x415BAF)
to gather information about running processes on the victim’s machine by employing the API functions CreateToolhel32Snashot
, Process32FirstW
và Process32NextW
. The graph code of this function is as follows:
The information collected will be compiled in the following format:
Then, the information will also be encrypted and encoded in the same way as described above:
11. Decrypt C2 configuration
The C2 addresses (IP and port) will be decrypted by Pikabot during execution. First, Pikabot performs the decryption of C2 encrypted data using RC4, with the decryption key in this sample being “threadId
”:
Here is the result with CyberChef:
Then, Pikabot decrypts the character “&
” and uses it as delimiter to extract the decrypted string above into sub base64 strings:
Result of the above process when debugged with x32dbg:
Next, Pikabot calls function pkb_decrypt_data (0x41D07B)
to perform the task of decrypting the C2 address. The graph code of this function is as follows:
The entire decrypting process is as follows:
- Allocate buffers to store the
AES key
andiv
. - Convert the string to the valid
Base64
string by replacing the character “_
“ with “=
“. - Discard first 3 characters of string, take the next 16 characters (bytes) and store them to the buffer to create the first part of the
AES key
. - Take the next 16 characters (bytes) and store them to the buffer to use as
AES iv
. - Take the last 16 characters (bytes) to make the second part of the
AES key
, combine it with the first part to create the completeAES key
. - Get the string to be decoded after obtaining the
AES key
andiv
. - Perform Base64 decode.
- Use AES-CBC with
AES key
andiv
above to decrypt the final C2 data.
Pseudocode of the entire process is as follows:
Using CyberChef, we get the following results:
We can write a Python script to decrypt all the C2 addresses that Pikabot will use:
12. Pikabot uses Syscall
During the analysis, we will encounter the following functions:
The above function will perform the following tasks:
Iterate over the PEB, check if the loaded dll is ntdll.dll
If yes, proceed to find API functions starting with “Zw
” exported by ntdll.dll
.
The found functions will be hashed, and the result will be stored in the format: <calced_hash><api_func_RVA>
The calculated table will be then sorted by Function RVA in ascending order:
Finally, compare the pre-calculated hash value with the table containing the calculated hash values above, if equal, return the function ID. This ID
value is stored in the EAX
register:
Based on the hash algorithm, we can find out the API functions that Pikabot will use as follows:
13. References
- PikaBot Tiny loader that seems very familiar
- PikaBot Is Back With a Vengeance — Part 1
- PikaBot Is Back With a Vengeance — Part 2
- Technical Analysis of Pikabot
- Pikabot deep analysis
- 2023–10–03 (TUESDAY) — PIKABOT INFECTION WITH COBALT STRIKE
- PikaBot distributed via malicious search ads
- Pikabot Loader
End.
m4n0w4r