Defeating Windows ASLR via low-entropy shared libraries in 2 hours

Maksim Shudrak
9 min readMay 7, 2020

--

Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my current employer or any former employers.

Introduction

Address Space Layout Randomization (ASLR) is one of the most important security techniques that exist in all modern mobile and desktop operating systems. Introduced in 2001, ASLR is a memory-protection technique that prevents the exploitation of memory-corruption issues by randomizing the base address of executables, shared libraries, stack, and heap.

ASLR makes sure that even if an attacker managed to hijack the control flow of our program by exploiting memory corruption issues, he or she wouldn’t know the correct address of a first ROP-gadget, system function or shellcode address. This way we can protect against ROP-chains, return-to-libc, and other types of memory-corruption exploitation techniques.

ASLR, like any other security technique, has its own weaknesses and attack vectors invented by the security community in the last decade (e.g. heap spray, offset2libc, Jump Over ASLR, and others). Of course, even one memory disclosure can completely defeat ASLR and provide an attacker with a significant opportunity.

Moreover, ASLR is only efficient when all executables and shared libraries loaded in the address space of a process are randomized. One shared library/executable allocated at a predictable address is good enough for an attacker to write a successful exploit.

In the continuation of this problem, I want to discuss one issue that exists in ASLR implementation on modern Windows that is often not very well understood. In particular, we will discuss that even one weak shared library linked with a 64-bit application can significantly reduce ASLR efficiency. We will discuss how ASLR is implemented on Windows, determine which bits are randomized, and experimentally compare how fast an attacker can break 32 and 64-bit ASLR on modern Windows.

Problem Description

This research started with an article from FireEye called Six Facts About Address Space Layout Randomization on Windows. The author did a great job describing and summarizing various facts about ASLR implementation on this OS. More specifically:

Fact 3: Recompiling a 32-bit program to a 64-bit one makes ASLR more effective

Fact 2: Windows loads multiple instances of images at the same location across processes and even across users; only rebooting can guarantee a fresh random base address for all images (Figure 1).

Figure 1. ntdll.dll has the same base address in two different instances of Notepad

These two facts actually resonated a lot with my experience of Windows buffer overflows exploitation and internal vulnerability research project where I managed to find a 0-day bug and successfully get RCE in a product linked with a shared library (a story for another article) compiled with weak ASLR settings. The product was built with ASLR enabled for the main executable and all shared libraries. However, low entropy and on-boot randomization made it possible to relatively quickly brute-force the base address of the shared library and successfully execute my ROP-chain. Let’s try to understand why it is possible.

First of all, we have to discuss conditions that a target application should meet for this type of exploitation approach:

  1. Application is developed for Windows OS (we are talking about 64-bit modern Windows 10 OS).
  2. Application is a 32-bit process OR there is at least one shared library compiled without /HIGHENTROPYVA and /LARGEADDRESSAWARE flag loaded by a target 64-bit process (yes, such library will have a 32-bit low-entropy ASLR). Application or library should be suitable for exploit development (e.g. large enough to craft ROP-chain).
  3. The application is automatically restarted after the crash. While it might sound like a serious limitation, in practice we can find a lot of such types of products. Especially if we are talking about server-side applications. Another example would be an application that spawns a child process to handle certain tasks and we are exploiting a vulnerability in the child process. In this case, a crash in the child does not lead to a crash in the parent.

If all 3 aforementioned conditions are met, ASLR can be bypassed by brute-forcing base address of a low-entropy DLL (either 32-bit DLL or 64-bit DLL compiled without /HIGHENTROPYVA and /LARGEADDRESSAWARE). You can check whether those flags are enabled or not by issuing dumpbin /HEADERS command and searching for High Entropy Virtual Addresses string in DLL characteristics. If it is not set, it means the DLL has low-entropy ASLR.

Of course, it was known before and Microsoft does not hide this but I believe there is a need for additional clarification and examples here, so people can better understand the potential risks and time frame required to achieve bypass.

Exploring Randomized Bits

Based on Microsoft’s article, 32-bit DLL randomization has only 14 bits of entropy (read as 14 bits are randomized) and 32-bit EXE has 8 bits. In contrast, 64-bit executable and 64-bit DLL (compiled with /HIGHENTROPYVA and /LARGEADDRESSAWARE) have 17 and 19 bits of entropy respectively.

Thus, a low-entropy DLL base address can be correctly predicted in less than 16384 attempts and EXE in just 256 attempts which makes brute-force attack more than feasible. Let’s try to verify it and find which bits are particularly randomized on the latest Windows 10 with all updates as of 4/29/2020.

We can develop a simple program that will print in a file its base address and base addresses of all loaded libraries (Listing 1). We need to compile the application with (/DYNAMICBASE).

Listing 1. Source code of a program that prints the base address of all modules and shared libraries

Then, we can restart OS, run our program again, and compare outputs. We can do it hundreds of times to have more data for analysis (Listing 2).

Moreover, we can build both 32-bit (to test low-entropy ASLR) and 64-bit versions (with /HIGHENTROPYVA and /LARGEADDRESSAWARE enabled to test high-entropy ASLR) to compare results as well. Of course, there is another (better) way to understand Windows ASLR by reverse-engineering code responsible for PE loader. However, I decided to choose the easiest approach.

Listing 2. A python script to execute both versions of TestASLR.exe, save results, and restart OS. The script was placed in Computer\HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Run to execute it automatically after restart.

In 15 hours that I gave the script, it managed to obtain results from 783 executions. After that, I implemented another script to find and print the position of all bits that were volatile (see the project’s GitHub for more details). Figure 2 shows the final results for each module.

Figure 2. Position of bits (in square brackets) and number of bits that were randomized after 783 reboots

As you can see, 64-bit experimental results are the same as what was mentioned by Microsoft. However, the results of the 32-bit experiment seem to be different. In particular, 9 bits are randomized in TestASLR.exe instead of expected 8 bits mentioned by Microsoft and 9–11 in shared libraries instead of the expected 14 bits.

As for the main executable, we can find 4 examples when bit 24 was randomized as well. All 4 times the binary was loaded at the base address 0x1180000 (see raw data on GitHub for details). This is quite an interesting finding for me and requires additional research to be able to understand why this happened.

However, in terms of brute-forcing, it doesn’t significantly change our strategy. We have to start with a lower base address and increase them continuously moving to higher addresses. As for low-entropy DLLs, it seems like 14 bits is the maximum amount of bits that can be randomized. Additionally, the number of randomized bits seems to depend on when the DLL was loaded during binary execution.

Brute-Forcing Base Address

Now, armed with this knowledge, we can try to understand how long it actually takes to correctly predict some address in a shared library. Let’s try to predict the address of the function CreateProcessA from kernel32.dll assuming an attacker wants to execute a return-to-libc style attack.

To be able to simulate this, let’s write a simple TCP/IP server (Listing 3) that will receive an address from a client script (Listing 4), compare it with the actual address of CreateProcessA and exit. If the address is correct, the program returns 2 and 0 otherwise. The source code of TCP/IP server was taken from here.

Listing 3. Receive buffer from a client and compare provided value with the actual address of CreateProcessA

The code to obtain an address of CreateProcessA in kernel32.dll is just two lines of code (Listing 4).

Listing 4. Obtaining an address of CreateProcessA using GetProcAddress

Since our server exits after receiving the address (emulating crash), we have to restart it in a loop. The script in Listing 5 will restart it in an infinite loop and stop only if ASLRServer returns 2 which means the address was correctly predicted.

Listing 5. Start 32-bit or 64-bit version of ASLRServer and check results. Stop execution and print run time duration if return code equals 2

Now, we need to implement a client that will brute-force an address. As mentioned above, we will start with lower addresses and increase them continuously until we predict the right address of CreateProcessA.

First, we have to find the offset of CreateProcessA within kernel32.dll for both versions located respectively at C:\Windows\SysWOW64 (32-bit) and C:\Windows\System32 (64-bit). It can be easily done using any available disassembler (e.g. IDA). Then we need to generate a new base address and add CreateProcessA offset (Listing 6). We need to increase base addresses in a loop until we reach the maximum base address that can be assigned for DLL. The full version of 32 and 64-bit version of this script can be found here.

Listing 6. Brute forcing address of CreateProcessA in 32-bit DLL. Final_addr is what we send to the server

Now we are ready to launch all 4 programs and evaluate the time required to brute-force address of CreateProcessA for both versions of kernel32.dll (Figure 4).

Figure 4. The process of brute-forcing both versions of ASLRServer running on a Windows VM. The scripts are running on a separate machine and send address via the network.

After 1h 56m 14s and 13'862 attempts, the 32-bit script managed to correctly predict the right address of CreateProcessA. After a long 2d 8h 54m 4s and 407'399 attempts, the 64-bit script successfully finished brute-forcing as well (Figure 5).

Figure 5. The time difference for prediction the right address in 32 and 64 bit DLL

Of course, brute-forcing 32-bit executable is much faster than DLL, and as we discussed before it can be done in less than 256 attempts (in most cases). However, despite 30 times longer wait time, 64-bit brute-forcing is also feasible if the application state is not specifically monitored for crashes by System Administrator or Security. Also, the brute-force time of a certain application will significantly depend on how fast it can be restarted.

Conclusion

As it was demonstrated in this article, ASLR implementation on Windows has important nuances and in some situation can introduce additional risk for an application, especially if the target is a 32-bit program or it is linked with a library which was compiled without /HIGHENTROPYVA and /LARGEADDRESSAWARE flags. While the best solution would be to have per-execution randomization as it is done in Linux and modern MacOS, the good decision would be to move away from 32-bit to 64-bit applications and avoid linkage with shared libraries compiled without /HIGHENTROPYVA and /LARGEADDRESSAWARE flags. This would help to significantly increase complexity of an attack.

I hope this was a helpful article. If you see any inaccuracies, please do not hesitate to ping me at https://twitter.com/MShudrak. All source code and raw results can be found at https://github.com/mxmssh/ASLR_bruteforce.

Update: The article was updated on 05/08/2020 after Joseph Bialek’s twit https://twitter.com/JosephBialek/status/1258765599120912384. In the original article, it was incorrectly described that a 64-bit application can load 32-bit DLLs which is not true. Instead, the risk is actually when the 64-bit application loads a 64-bit library compiled without /HIGHENTROPYVA flag set.

--

--