Analyze Binaries in Ghidra to Write Shell Payload for Windows Systems

Dennis Chow
May 29 · 13 min read

In this article, we’ll go over some example C code that is Windows x86 compatible and analyze binaries using Ghidra to help you write or improve upon your shell code creation skills by crafting the payload first. The practical applications of malware analysis and reverse engineering efforts can help penetration testers improve their evasion techniques and achieve command execution on systems without Linux (or ported) tools against Windows systems. We’ll examine samples using native windows libraries, compilers, C based shell payload, and Metasploit (MSFvenom) payload for Windows. Are you ready? Let’s dive right in!

Disclaimer: The methods, code examples, and techniques mentioned throughout this article for educational purposes only. All code or compiled binaries are provided ‘as is’ with no expressed warranty.

Feel free to download and install these tools and follow along in the article to practice your win32-ninja payload coding skills with us.

Tools in Use:

Writing your first Win32 Compatible Shell Payload in C

Many cyber security professionals (including myself) aren’t experts in shell code creation nor the ancient C language. So when we do pen testing engagements; our go-to tool for shell payloads almost always includes Metasploit and specifically running either MSFVenom, Veil, and or some other C2 frameworks (in a post Empire world) that generates the desired shell code for you. But these solutions, like any pre-made template aren’t always perfect and many vanilla payloads produced are caught by endpoint security solutions.

So why not write our own? Many tutorials you see focus on compiling or writing payloads for Linux. If we’re compromising and pivoting between Windows systems, we need to step it up. So let’s get our first C-code template ready to go down below:

In the above you see one of the simplest ‘cmd only’ forms of shell payload. It’s not a full shell, but it’s a starter template that uses native standard libraries so you can execute an external system call that will honor the Windows System32 directory path. It’s quite obvious what happens in the above snippet.

Note: This payload is detected as ‘malware’ from Chrome and Google Drive services. Windows defender at the time of this writing on Windows 10 does not flag the compiled binary.

Image for post
Image for post

You might be wondering “who cares” in the above template. It serves as a base for us to compile this to a binary and examine a very simple way to begin reverse engineering a standard portable executable and get you comfortable with navigating Ghidra for finding functions, and tracing references and variables to the decompiler window as we see coming up.

Compile Your C Shell Payload in Windows

In our example template; we’ll use Visual Studio (VS) since it’s got nice colors and a GUI to make it easy to showcase. You can also use the common ‘MSBuild’ method by including the C file in an XML template that can be compiled that is native on most workstations. But let’s use VS because I have screenshots.

Image for post
Image for post

Create a new project for a Win32 Console application. Ensure you’ve got the C++ extension installed. After the default solution files are generated, right click on the solution explorer and add a new source file. Instead of adding a C++ file with the extension (.cpp) call it (.c). The VS compiler will use the appropriate language compiler based on your extension.

Now paste your C source code, and you’ll notice that if you try to build the solution; you may receive an error regarding the main function already been represented in another file. You’ll have to remove (disable) the original default C/++ file from the project solution so the compiler knows you won’t use it as shown below:

Image for post
Image for post

Now that we’ve prepped the environment you can compile and build the solution. If you hit ‘Crtl+F5’ it’ll execute the binary as well and you’ll see calc.exe pop up. If you’re like me; you may have started ‘modifying’ the shell code by adding different enhancements to try to make it more useful. Note that the use of insecure classic functions that include potential buffer overflow conditions will show up as errors or warnings and prevent the build. You’ll need to instruct the compiling pre-processor to ignore these like so in the below image:

Image for post
Image for post

To do so, right click on your C file in solution explorer and set the configuration properties under ‘preprocessor’. Edit the macros and add in (case sensitive) the warning bypass macros to the definitions field. Copy and paste the snippet below if you are running into this issue:

Congratulations, you’ve compiled your first Win32 C-source shell code. Let’s analyze the binary in Ghidra so you can get a feel for what the code decompile looks like when you DON’T have the source code. For example, if you managed to isolate a sweet piece of malware that did not get detected and you want to mimic its TTP’s.

Using Ghidra to Search and Decompile

In this section, we’re going to import the binary into Ghidra and start exploring the varying structure of the our original C shell code so we can identify the main() function, when variables are loaded, and the system() execution and compare it to our source code.

When you first load the binary into Ghidra, you’ll want to use the default ‘automatic analysis’ settings (select YES) before you get to the main screen. From the main view, find the left panel under functions/symbols and discover the ‘entry’ point as often you won’t see the main() function properly parsed.

Image for post
Image for post

Hit up the ‘entry’ icon in the symbol tree (functions) menu towards your bottom left pane and click on it. Your main code viewer window will jump to the entry point of when the program begins to execute your main function.

Image for post
Image for post

Also note to the right in the decompiler pane, is a familiar looking main() structure that’s been labeled a function and your computer’s memory address followed by the return. We’ll rename this to main() by right clicking on the function label. This denotes our main structure of our code.

Image for post
Image for post

You can further explorer other functions from DLL imports potentially called and non-obfuscated strings by examining the symbols tree (if the binary wasn’t already stripped). Since we already know from our source code, let’s look for ‘calc.exe’ since we know that’s what we executed in the payload. *Yes, I know: You don’t have that luxury examining other pieces of malware or binary shell payloads; we’ll examine how to trace and map functions more effectively in our upcoming examples. Hold Tight!*

Image for post
Image for post

Double clicking on the location address in the string search window calc will jump us to the data segment (DS) section. To our highlighted right (in yellow); we also see local variables in the de-compiler being listed and pushed onto the stack. If you look carefully, it is indeed for Win32 x86 Intel architecture as the bytes are stored in little endian. It’s also 4 bytes across as a proper word value (though we reference this as DWORD in Windows) of 32 bits as validated by converting the hex below:

Image for post
Image for post

The careful analyst will also observe the varying system call related functions surrounding our string pushed onto the stack. Lets take a closer look:

Image for post
Image for post

What you see above is the subroutines in the main function. We see the data ‘calc.exe’ being pushed onto the stack frame and set in memory by ‘memset()’ followed by our famous ‘system()’ call. Since this is an imperfect de-compile of the C code, we reference the library methods to see how the actual structure of the source might look like (if we didn’t already have the source code):

In the above eample from GeeksforGeeks.org site we see that memset was explicitly called but in our code, it was simply implied after setting up the construct and variable: “char command[100] = “calc.exe”;” like so. In the documentation it says memset is indeed filling our buffer into memory. Now let’s get to the actual ‘evil’ execution of our intended payload via system().

In the above the external reference also shows how system() syntax is used and it simply takes in a string argument and we see our standard return statement from the routine.

Examining a more complex sample (reverse_tcp_shell)

In this next example, we will examine slightly modified version of a reverse tcp shell payload written in C and designed to compile and run natively for Win32. Using vanilla msfvenom and other non-tuned payloads; the endpoint (antivirus) may detect such shell code. The original shell.c is found here where there’s a few typos that needed adjusting. The author is “Yahav N. Hoffmann” written in 2016 but still wasn’t alerted (compiled) by my Windows Defender in May of 2020 in Windows 10! Amazing what custom shell coding can do. Some of his methods are similarly demonstrated in another piece of shell code authored by ‘paranoidninja’ and that is located here (if you wish to read more on the evasion techniques).

Image for post
Image for post

And yes, the code works as shown in my PCAP runtime below:

Image for post
Image for post

For now, use the one I have hosted in my Github for the purposes of my demonstration and syntax corrected compiling ease. Here’s our code template to reference from:

That looks exciting, we have attempted ‘cmd.exe’ evasion by splitting it into an array of format strings and then concatenating them later; we also have if/else branching conditions and CLI level arguments we can process. Now that we’ve examined the code. What does it look like under a decompiler assuming we don’t have it?

Open up Ghidra again and let’s hit up the entry point, identify the main() function and begin tracing are functions down the rabbit hole. *Don’t worry, we won’t be ‘cheating’ and specific string searches. We will examine the symbols and strings for any interesting keywords though.

Image for post
Image for post

Wow, that’s alot of unique strings and good information about the variables and function uses in the data segment (DS) of our binary. Do you recognize the famous “%s%s%s%s” in sets of (4 bytes or 32 bits)? I hope you do! But let’s get more realistic and start thinking how we can examine the decompiled pseudo code in Ghidra. Let’s open up the function map window similar to how you would do it with ‘space bar’ in IDA Pro or x64dbg Graph View.

Image for post
Image for post

If you do a side-by-side comparison you can see the conditional statements, and potential loops from our source code and the graph view. This will help you determine where there might be subroutines used and also focus on true/false conditions that you’ll want to investigate for creating a fork or a patch to some C code. Another tool that we can use is commonly called ‘references’ which is commonly called (xref with the ‘x’ hot key in IDA).

This lets you map functions or variables that have been called or mentioned in other functions or portions of the code. As you can imagine, jumping in and out of varying functions can get very complex fast! So for pure shell code payload that is compiled, it’s best to start top down from the OEP and main() function and dig into what would likely be used such as system calls and socket creations. The great thing about Ghidra is that it visualizes this for you if you ‘right click’ and scroll to the references sub-menu and select the open call tree option.

Image for post
Image for post

In the above, I’ve highlighted the function call tree windows for incoming and outgoing calls. relative to our main() function that we renamed earlier. You’ll also notice lots of ‘XREF’ or referenced mentions to the same function (main) memory address space and their appropriate IO in memory (Read/Write in colors). What’s also very interesting are all the identified native windows C functions and their outgoing calls. Given that we have a reverse shell, I might be inclined to start investigating the WSASocketW calls first.

But, before we do that, remember when we discussed format strings? Let’s revisit that code comparison and how it also looks in the C decompiler once more with rigor:

Image for post
Image for post

In the above, we side by side compare our original source to the decompiler window and we see clues that Ghidra couldn’t parse the snprintf() function as easily. Even still, this gives you clues to which functions might be used based on the data arguments. Before leaving the screen, take note of the CreateProcess function which is not C specific, but actually Windows specific and note how it wasn’t decoded. Other solutions such as IDA Pro might already have this decoded for you; but learning to write shell code from scratch in C; this is great practice for your API research skills.

Image for post
Image for post

In our last portion of this example; we examine another portion of our shell code construct. It takes (2) arguments, an IP and port according to our source code. Notice in the decompiler Ghidra gets very close to showing you the useful syntax and the number of arguments. We also know that this is part of the main function as we see it is very close and loaded position right after the entry point of the application.

What about the metasploit payloads from msfvenom?

We don’t have any screenshots of our examination of the basic windows (non staged) bind shell and reverse TCP shells. However, when we examined it under Ghidra; you really get a sense of just how much work the teams at Rapid7 and the security community for the Metasploit framework have put in to making them difficult to detect and robust in their error and condition handling.

We weren’t able to easily jump between sections and show case easy C structure and functions to reference (which is honestly, kind of good for AV detection anyways) after exporting (using -f exe from msfvenom). What this means is that you as a seasoned pen tester need to practice DFIR and REM skills. I’ve personally enjoyed my GREM certification and it complements the GXPN very well in exercising skills to be able to develop shellcode on your own.

Closing

I hope you’ve enjoyed a little preview into how power Ghidra is and how you can exercise malware analysis and reverse engineering skills to complement and take your shell code writing skills to the next level for windows systems. There’s so much more reading that is available for those wanting to extend their knowledge beyond this article. I encourage you to visit the links below in your spare time.

If your organization is in need of an MSSP or other security subject matter expertise; find us online at www.scissecurity.com

Additional Resources and Examples

There’s more reading if you wish to learn more and have more templates to choose your initial reversing from. You aren’t just limited to analyzing C based payloads; there’s many other payloads and solutions that can you can gather more ideas from.

Using Slack as a C2 Channel

Analyzing meterpreter payload with Ghidra

MultiOS Reverse Shell made in .NET (you can always use dotpeek to decompile the code if you only have a binary because .NET is middlware language)

MSbuild XML Template for Shellcode (C source ready)

Compile C code entirely from Windows CLI

The Startup

Medium's largest active publication, followed by +732K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store