Windows CE SuperH3 Exploit Development Part 4: RISC Shellcoding Philosophy and Examples

Author’s Note: This shellcode was produced as part of a PoC exploit for the buffer overflow found in this article:

This is a long article, but I figured some people may only be interested in the principles behind analyzing parameters to create shellcode, the shellcode itself, or the difference between this shellcode and other Windows CE shellcode. For this reason, I broke up the article into sections with bolded headers. I hope you enjoy it!

I decided to include the word “philosophy” in the title because like the vast majority of shellcode examples for Windows CE (three out of the four in existence, now four out of five), this example is not actionable in the majority of cases. However, it can easily be transformed into actionable shellcode. This is because my example is not in Unicode (Windows CE 2.0–3.0) or ASCII (Windows CE 3.0 .NET — Present) string format. For this reason, it will only work in cases where the input is being copied into memory. In a future article, I’ll be adapting this shellcode for Unicode filtered buffers.

This is, however, the first Windows CE shellcode ever produced for the SH3 processor. My hope is that I’ve opened the door to those who wish to write fun exploits for all of those barcode readers and voting machines and other odds and ends that use Windows CE and the SH3 processor. In many ways, Windows CE machines were the predecessors of modern IoT devices. Many are still in use today.

Process Improvements

Before I move on to the shellcode explanation or show how to produce Windows CE 2.11 SH3 shellcode, I want to start off with some process improvements.

Unlike the other two individuals who produced shellcode aimed at Windows CE versions earlier than 5.0 I was able to:

  • Produce shellcode without use of the Windows CE source code, analysis of the header files, or static analysis of coredll.dll
  • Produce very short raw shellcode (28 bytes before venetian formatting or null removal)

This is mainly because I used an unconventional method of accessing the function offset for MessageBoxW in coredll.dll, the kernel DLL for Windows CE. Like the other two individuals, I realized that coredll.dll was always loaded into the same base address relative to every program that used it. I also realized that every single program ever written for the system would have to use coredll in some way, so it would always be loaded. Unlike the other two individuals, however, I decided to use this knowledge to skip the process of locating the base of coredll and hashing the DLL name. I figured there was no reason to prepare for cases that would never come up in practice.

This is the reason that I left malloc in the shellcode tester despite not using the segmented egghunter technique. Like every standard C function in Windows CE, malloc is contained within coredll, and I wanted to ensure that the DLL was loaded when I tested the shellcode.

It took me a while to figure out how to capture the function offset for MessageBoxW. After realizing that coredll was not extractable in usermode, I decided to create a simple Visual C++ program that utilized the function.

I compiled it with Embedded Visual C++ in release mode so that I wouldn’t have to deal with any debugging information.

I opened up the compiled executable in IDA Pro and figured out where the jump to MessageBoxW was.

I then set a breakpoint at that address in the Embedded Visual Tools debugger and captured the address of the function in coredll: 01FB87D8

Now that I’ve demonstrated the method of finding the offset, I’ll move on to explaining the shellcode itself.

Shellcode Example and Explanation

Here is the shellcode in hex format. I’ve uploaded this shellcode example to my github repository along with all of the other source code produced with this article. The link is at the bottom of the page.

Here’s the disassembled view of the shellcode. Note that I used 09 00 as nop. I actually got that wrong in the previous article. I misread an opcode in IDA Pro.

In Windows CE, a lot of the functions will load parameters into registers instead of pushing them into the stack. Some of the registers store parameters, but others store the addresses of parameters in the data section.

I explored several ways of keeping this position independent, but eventually settled on the SH3 equivalent of LEA.

According to the documentation, it is only able to load an offset from the program counter into r0. This is possible because of how the data is stored when the program is assembled, but I will get into that in the section on reproducing my results.

I sent the address to the stack so that I could send it to the title and caption registers later. I chose to use the same character for both the title and the caption to reduce the size of the shellcode.

After sending the MessageBoxW offset to r0, I set up all of the parameter values. General registers r7 and r4 store the two number parameters, both of which are null because I used the default values for message box type and message box owner. General registers r5 and r6 store the caption and title addresses, so I sent the address of the “W” that I put on the stack earlier to them.

When I disassembled my MessageBoxW test program, I realized that it was actually taking two jumps.

One was a delayed jump to a routine, which is why the last parameter is set up after the jump instruction.

The next was the immediate jump to the function in the DLL, followed by a nop.

I used a plain “JMP” because I didn’t have a separate routine to handle the final jump to MessageBoxW.

The rest of the shellcode just sets up the data in little endian format.

This shellcode will spawn a message box with the title “W” and the caption “W”.

Now I’ll move on to the shellcode development process.

From Parameter Analysis to Shellcode Assembly

I included this section because turning a disassembled function into shellcode was a bit of a challenge. This was due to a few factors, including unfamiliarity with the processor and calling conventions for Windows CE 2.11 functions. It took me a while to figure out how to do it, but I was able to do every step with ease once I got the hang of it. What took me a few days before now only takes a few minutes, and I’m hoping to save you some time.

Most Helpful Sources (Linked in the “Sources” section):

  • Azeria’s guide to ARM Shellcoding
  • The Corelan Team’s intro to Win32 Shellcoding
  • The “Help” menu in Embedded Visual Tools (seriously, the microprocessor and API references are so useful, how I found out that WinCE 2.11 had it’s own lax version of SEH)
  • The Hitachi SuperH RISC Assembler Reference (the reference manual for the Renesas Assembler)
  • The Renesas SH-3/SH-3E/SH-3 DSP Software Manual (great for finding obscure instructions like MOVA that aren’t really used by any SH3 compilers)

Tools required for this section:

  • IDA Pro if you have it, Radare2 if you don’t
  • Embedded Visual Tools 3.0 (Abandonware, official product key required)
  • Renesas C/C++ Compiler for the SuperH Family (Account required, 60 day free trial)
  • ActiveSync 3.5–3.8 (I think this link is safe)
  • A Windows XP SP3 Virtual Machine to run all of this software in (Except for IDA Pro or Radare, those only run on the host)

I covered parameter analysis in the section prior to this one, so I’ll briefly go over that again before I move on to shellcode assembly.

The basic process involves creating a new project targeting only the SH3 processor.

Creating a source file with nothing but your function and putting it into release mode.

Connecting the handheld PC via ActiveSync 3.5 (3.8 works too).

Building the project.

And transferring the executable from your device or the project folder to IDA Pro for analysis.

Sometimes the parameters will be stack parameters, and sometimes they’ll be stored in the registers. Relative addresses are stored as named strings or “unk_”, numbers are stored as themselves.

Finding the function offset is just a matter of going to the jump named after the function and setting a breakpoint at the address given in the Embedded Visual Tools debugger.

Once you’ve captured the offset in r0, you’re ready to move on to writing the shellcode in assembly.

If you’re familiar with early versions Visual Studio, then the Renesas tool suite should look familiar.

When you open it up for the first time, it will ask you to create a new project. Select “Empty Application”, name the project, and click “OK”.

In the next menu, you want to select “SH3-DSP” as the CPU series and click “Next”.

In the next menu, change the Endian to “Little” and select “Position Independent Code”.

Don’t worry about the simulator, just click “Finish” and “OK”.

In the “File” menu, click “New”. A blank document should come up.

Save the document as “Name.src” to make it an assembly file.

To add the document to your project, right click on the second project icon and click “Add Files”.

Select your document and click “Add”.

Now you need to set up your environment.

In the “Build” menu, select “Build Phases”.

Remove every phase except for “SH Assembler” and click “OK”.

In “Build Configurations”, change the configuration to “Release” and click “OK”.

To write the code itself, you need a section that’s aligned to 4, a start label, and an end declaration.

General parameter philosophy:

  • The instruction directly after the “JSR” is usually another parameter you need to set up
  • Every register can store a dword
  • Remember that r15 is the stack pointer
  • Moving a dword into a register or a stack address requires MOV.L
  • Moving a null value to a register requires MOV
  • Any constant declarations require #H’[Value]
  • Dword values will be stored in a mini data section by the assembler, smaller values will not
  • Any register value stored as a relative address needs the address loaded by MOVA into r0
  • Because r0 should always be used for the final jump, you need transfer each parameter to a register or the stack before transferring it to it’s final destination; do this as many times as necessary
  • Unicode strings will be stored in the little endian format; they must be double null terminated and null separated; keep this in mind
  • To transfer a constant to the stack, you must first transfer it to a register like so:

Once you’re done translating your function, build it using the same button you use in Embedded Visual Tools.

Keep this assembler warning in mind for later.

Next, you need to take the object file and transfer it to the host.

Open it up in IDA Pro. You may need to clean up the code a bit with the “C” key, as IDA sometimes thinks it’s data. Do not be alarmed if the mini data section is 8 byte aligned, the shellcode will function the same.

Once you know the opcodes you’re looking for, you can open the object file up in a hex editor and paste the shellcode into the first character array in the shellcode tester on my github (link below article).

Notice how this shellcode has a 09 00 where the 04 A0 used to be. The 04 A0 is the branch instruction the assembler warned us about. I usually replace the branch instruction with a nop, but it’s not strictly necessary. Whether you choose to do that or not is up to you. Separate the shellcode with “\x”, build, and execute.

If you did everything right, you should see the desired result on your device.

Next Steps

I still need to adapt the shellcode to Unicode, but there are plenty of alternative instructions that I can use to make it compatible with the venetian method.

I chose not to go the decoder route because I don’t posses the level of skill required to create self modifying code for a processor that’s based on the Harvard architecture. San and Tim Hurman, the other two individuals who produced shellcode for Windows CE 4.2 and 3.0 .NET respectively, both attempted to develop decoders. Tim Hurman was successful, but his decoder produces ASCII payloads. It is hard to judge how successful San was from his Phrack article. Both articles are linked at the top of the “Sources” section.

Update: A Chinese source that I found doing some research into the platform builder has succeeded at producing a unicode shellcode decoder for ARM WinCE 4.2, found here:

I also still plan to create that egghunter. I may not need it in this instance, but it would be good practice.

Author’s Note: Adapting this shellcode for an ASCII string filter is very simple, check out the Corelan Team’s guide to Win32 shellcoding to get an idea of how to do that.

Update: I decided that showing was better than telling, so I’m currently working to produce an ASCII string filter compatible version of my shellcode. So far I’ve been able to reduce the number of nulls down to three using tricks like “xor rn,rn” instead of “mov #0,rn" and replacing standard nops with “mov r1,r1”. I know for a fact that I can safely reduce the number of nulls down to two, but getting down to zero is going to be a challenge because of how parameters are stored as relative addresses in registers instead of being pushed to the stack. Current attempts are on my GitHub.

Update #2: Azeria’s guide to ARM shellcoding lent a helping hand once again. I’ll publish the null free shellcode tomorrow. Not only is this a good excersise, it’s a way to prepare for writing unicode compliant shellcode.



Written by

Enjoys edev, cyber forensics, hardware hacking, and RE, former CACI BIT Systems intern, GREM, Security+

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store