Windows CE SuperH3 Exploit Development Part 3: Unicode Blues and an Unfortunate Conclusion (For Now)
Author’s note: This article is part of an ongoing series on Windows CE SH3 exploit development, and is focused on the continuation of research and clearing up mistakes from this article:
I was for part 3 of this series to be a post where I cleared up some of the mistakes and misinformation I spread last time due to my limited knowledge of the SH3 architecture and RISC exploitation in general, then triumphantly presented my skeleton (pre-shellcode) exploit and talked next steps. Instead I’m here to explain why the particular buffer overflow I was attempting to exploit was not viable, and my plans for where this series will go next.
I’m going to break this into three sections: Mistakes I Made, Challenges I Faced, and Tool Troubles. The first category isn’t necessarily all relevant to exploiting this particular vulnerability, I just wanted to clear the air. The second category is an explanation of why this exploit was not viable, and the third is about the additional challenges presented by the Embedded Visual Tools 3.0 debugger. The criticisms of this tool are more complaining than anything else, as the tools was not meant for reverse engineering, it was just sort of capable of it and the only comprehensive tool available for CE 2.11 debugging. Let’s get started.
Mistakes I Made:
Looking back at the last post I’m almost embarrassed by how much I misstated about RISC exploit development and the SH3 processor in general. Here are a few of the things I wanted to clear up:
- The SH3 has no hardware exception handling — This statement was based off of a not-so-thorough skimming of the processor manual. I learned the truth, however, from a thesis published by Bei Wang on the design philosophy of the SuperH processor line. Hardware exception handling is not strictly enforced by Windows CE 2.11, but it is present. I’d recommend checking the paper out if you’re into RISC processors in general or looking to research the SH2, SH3, or SH4. It’s much easier to understand than the manual, and gives a more comprehensive look at certain relevant aspects of the processor.
2. Misleading imagery of the initial overflow — This one is just inexcusable. I didn’t have my payload on the machine when I took the first image, so I just typed as many A’s as I could into the input box. I ended up overflowing every vulnerable register. For this reason, I want to make it clear that with 50 A’s, I was not able to overflow the program counter, procedure register, frame pointer, and general purpose registers 8 and 9. The program raised an exception at the same offset, 50. To replace the values in the registers, however, I needed to use 52 characters. The reason for this will be discussed in more detail later, but for now just know that every character was actually four bytes. This was not the same offset required to overflow every register.
3. Registers, just a lot of stuff about registers — Going to be brief in this one because a lot of it is covered in the Bei Wang thesis. I just misstated the purpose of a lot of registers, I called the frame pointer an FPU register, and I glossed over the procedure register in a big way. The procedure register is supposed to be a one element stack that keeps the return address of the program counter when it goes into a subroutine. The frame pointer just keeps the stack base. I don’t know a lot about the purpose of r8 and r9 in this case, but I know that r15 is the stack pointer, which is mainly important for reverse engineering.
4. The frame pointer being overflowed means stack overflows are viable — Wrong, in the case of this exploit it was a very bad idea to smash the stack. In fact, I tried to specifically avoid doing so that I could use it to store shellcode.
That’s pretty much it for mistakes, I may have made a few other minor ones but I think I covered everything important. Now it’s time to move on to what went wrong.
Challenges I Faced:
I think I spent well over 20 hours analyzing an overflow that experienced exploit developers would immediately recognize as not worth the effort. In my defense, I was very green and very determined.
The very first problem appeared when I noticed that every character I overflowed the buffer with was followed by two zeroes. After some research, I was able to find out that this was a Unicode overflow, which limited the places I could store shellcode to addresses in the range of 00xx00yy. Fortunately, however, I soon figured out that I had a bit more freedom when I looked into the Unicode tables.
Less fortunately, I discovered that my program only supported Unicode 2.0, which shrunk the list of usable characters by quite a bit. It still gave me more wiggle room than previously anticipated, but I was still confused about where to start. I was familiar with Windows x86 buffer overflows, but not very familiar with RISC overflows, so I decided to consult a friend who was very familiar with ARM and MIPS for some guidance. Here’s a link to his blog, he was a great help throughout the process:
He told me to try every single input field, file input, search box, anything I could find to get the shellcode in program memory. I took his advice and decided to check out possible locations where I could store shellcode using IDA Pro.
I was originally excited by another input, the “Current Shell” dialogue box. I targeted it because it was not vulnerable to buffer overflows. I did not know at the time that this was because it did not actually accept any input.
I then pivoted to another possible location, the registry! I realized that the program was querying specific key values and comparing them to hardcoded values to determine if features were active.
I was elated to find out that the registry key the program was reading was an SZ key, meaning no relevant length limit.
I replaced the existing key value with my own tag, made sure that the relevant function was called, started looking through the stack and…couldn’t find it. I searched the entire program memory. Maybe RegQueryValue just doesn’t work the same way in CE?
Either way, that left one avenue, the vulnerable buffer itself. It really wasn’t much of an avenue, only 36 characters in the test payload of D’s I sent remained unmangled. Using Unicode shellcoding techniques such as the Venetian padding technique, I would have had 18 or fewer bytes to work with. Not to mention the difficulty of pointing to that spot with the limited address space I could work with. I did manage to find the test payload on the stack though!
After I’d exhausted all reasonable avenues, I decided to just give up on exploiting the program. It just didn’t seem viable, and there are more 3rd party Windows CE 2.11 programs out there to test. More on that later, it’s time to complain about my debugger!
Tool Troubles:
I wanted to make this clear before I started: I don’t hate Microsoft Embedded Visual Tools 3.0. It’s an amazing set of utilities. It sports communications monitoring tools, remote process managers, dependency walkers, and some dynamic recompilation capabilities. I even used the remote registry editor on this project!
Many find it’s remote debugging capabilities unreliable, but I never had connection issues.
My real problems were the tool’s bugginess, the lack of a search tool, the disassembly window, and everything about the memory view.
Here’s a little list of grievances I came up with while using this tool:
- Bugs — I’m not one of those people who complains about the speed. I recognize that this thing is going over an RS-232 connection and that the speed isn’t going to be great no matter the circumstances. What I don’t enjoy is the entire program crashing because I decided to stop debugging, or just broke somewhere it didn’t want me to.
2. Lack of a search tool and disassembly— Or any tool to search for a particular offset or series of opcodes in disassembly. This is kind of compounded by the fact that it shows every address from 0x00000000 to 0xFFFFFFFF regardless of whether opcodes or even null characters are present there, and the speed that the disassembly window could scroll at. This made setting breakpoints hell. I’m not even sure breakpoints fully worked, I had to just run to cursor a lot of the time.
4. Memory — I was proud of finding that payload for a reason. The memory window holds the code, the stack, unused space, and everything else. If you want to see the stack, you just have to find the base in FP and start scrolling. You can’t search for hex strings in the memory window, and because of a fun infinite scrolling bug that happens when you dock it, you have to keep it a tiny size to keep it usable.
A lot of my complaints stem from the fact that this tool was clearly intended for developers with intimate knowledge of their programs, not exploit developers. Some of them, however, are just kind of based off of the design philosophy of the era, what was appropriate to put out, how friendly certain features had to be, and how much effort had to be put into certain “extra” tools that were included just in case they were needed. The most helpful guide I had during this process was the O’Reilly chapter I included in a previous article. I think it’s currently the only online guide to this tool.
I want to finish up by saying that failing to develop an exploit for this program does not mean that this is the end of the series. There will always be another program, and at some point I may even develop a vulnerable program just to demonstrate RISC shellcoding. None of this is happening until I finish Practical Malware Analysis. I spent a bit too much time on this project and need to make up for lost time. That should be done by the end of next week, so until then I’m putting this series on pause. Until next time!
Additional Sources: