KatWalk C2: p.5: overclocking and bugfixing or how to use Ghidra to analyse ARM firmware

Anton Fedorov
24 min readApr 5, 2024

--

As I’ve reminded in the last part (where I’ve described how to patch the firmware — I’m describing what’s the KatWalk C2 treadmill, how to integrate with it and how to connect to its sensors directly.

The original receiver refreshes sensors data at speed of about 86Hz, while technical limit is 133 Hz — which significantly lowers the latency, but the connection was unstable.

Let’s deep dive into sensors — learn about the game ghidra_11.0_PUBLIC which I have installed into `C:\Games` on my PC and peek into the sensors’ firmware and poke it: make some patches, fix some race conditions, fix some bugs… Be ready for the deep dive.

This time — it’s serious.

We need to go deeper into Ghidra

Ghidra: how to feed a dragon with raw ARM binary

Since buying full-featured IDA doesn’t fit into my fun budget, and IDA Freeware doesn’t support ARM, I had to fallback to the alternate options. The obvious choice was Ghidra, which are quite famous since last few years open-source reverse-engineering tool — powerful, feature-rich, with scripting support and knowledge of multiple architectures.

Basically, I am finally have an excuse to learn it!

So, I’ve downloaded it into `C:\Games`, unpacked and run `C:\Games\ghidra_11.0_PUBLIC\ghidraRun.bat`. Create new project… Go to “File=>Import File” to import raw firmware binary. Ghidra support many formats, but somehow HEX is not one of them — it’s good that I’ve made it BIN already.

It also looks like Ghidra doens’t detect architecture on its own — well, again, I know it already: ARM, Cortex M3, little endian:

Ghidra language selection

Now double-click on the firmware in the project to open the disassembler itself.

Ghidra asks to analyze the file — let’s agree. But there is almost nothing useful, clean slate:

Clean slate…

Memory map

Well, RAW binaries typically need some extra work. So, let’s get our hands dirty! First of all — let’s follow the memory map found somewhere inside of TI manuals:

Memory map

Go to “Windows => Memory Map” to fix it. The existing region should be renamed to “flash”, got “W” checkbox removed (it’s non-writeable memory). By using green plus button from the top-right tools panel adding one more region, name it “rom”, make start address 10000000, length 0x20000 (more precisely — 0x1CC00 but that’s not important), with access modes R and X (also, unckeck W). Then add the third region “ram”, started 20000000 with size 0x5000 and R/W (but no X).

With knowledge of the memory map we see that first we have pointer to the begin of stack, pointing to the ram 0x20004000 as it should be. Then there is a vector to the entry point of the compiler-generated prologue, then interrupt vectors — all pointing inside of the ROM region, which we don’t have, and two last vectors doesn’t look like vectors at all.

Although, we’ve not yet done with the memory map. ARM processors also hve a “Peripherals” regions: read and writes to/from it give access to the on-chip hardware. Most convenient way to deal with it is to download CMSIS-SVD package, where are hardware descriptions for many ARM chips collected together in machine-readable format. To use it, we should download SVD-Loader-Ghidra, then go to “Window=>Script Manager”, click onto third form the right icon on the toolbox (something looking like burger menu, left from X and red cross, in the hover text it called “Manage Script Directories”) and press the green plus inside of the script directories manager to add a path to the downloaded plugin (`$USER_HOME/Documents/GitHub/SVD-Loader-Ghidra` in my case). Close then the directories manager, enter into filter field “SVD” to find added plugin:

Ghidra: script manager, filter “svd”

Run script by double-clicking it, it’ll ask for the SVD file. Let’s choose `…GitHub\cmsis-svd-data\data\TexasInstruments\CC26x0.svd`.

Script will process and add hardware regions inside of the 0x40000000 address space for us.

Add symbols into unavailable ROM

When one works with raw binary, every bit of knowledge is precious. So, go, download the SDK; unpack and start digging. We know already that it supports debugging, so there should be symbols somewhere. But where?

Let’s start with search for the interrupt address: 1001c901. Let’s switch back to WSL and:

$ cd /mnt/c/ti/simplelink_cc2640r2_sdk_5_30_00_03
$ find . -name '*map' | xargs grep -i '1001c90'
./kernel/tirtos/packages/ti/sysbios/rom/cortexm/cc26xx/r2/golden/CC26xx/rtos_rom.map: 1001c900 00000020 arm_m3_Hwi_asm_rom.obj (.text:ti_sysbios_family_arm_m3_Hwi_excHandlerAsm__I)
./kernel/tirtos/packages/ti/sysbios/rom/cortexm/cc26xx/r2/golden/CC26xx/rtos_rom.map:1001c901 ti_sysbios_family_arm_m3_Hwi_excHandlerAsm__I
./kernel/tirtos/packages/ti/sysbios/rom/cortexm/cc26xx/r2/golden/CC26xx/rtos_rom.map:1001c901 ti_sysbios_family_arm_m3_Hwi_excHandlerAsm__I

Great, looks like what we need! Now, how to load them? Ghidra already has a plugin `ImportSymbolsScript.py`, but it’s not good enough after trying it out. So I’ve made an updated version ArmImportSymbolsScript.py. Main point is to make function pointers start at the even address and if the address is pointing to unknown region, make a word here — so the jmp tables points to the same symbol. That makes symbols more convenient and useful for auto-analysis afterward and code reading.

Although, we still need to turn the symbols into compatible format: <name> <address> <type: f or not>,

Also, while the ROM part addresses is correct, it doesn’t look like everything under the 1xxxxxxx is anyhow valid — for example at address `00001a25 main` — not a main at all!

So, let’s not bother digging deeper into this discrepancies of the map file and just filter out data out of the binary itself:

$ cd /mnt/c/ti/simplelink_cc2640r2_sdk_5_30_00_03
$ mkdir symbols
$ objdump -t ./kernel/tirtos/packages/ti/sysbios/rom/cortexm/cc26xx/r2/golden/CC26xx/CC2640R2F_rtos_rom_syms.out | perl -ne '@a=split;print "$a[-1] $a[0] $a[2]\n" if $a[1] eq "g"' > symbols/rom.txt

Now open Scripts Manager, double click on `ArmImportSymbolsScript.py`, choose the just generated `C:\ti\simplelink…\scripts\rom.txt`:

It starts to make sense!

And it starts to make sense! Although, it’s not the whole ROM yet. We also need a bluetooth stack, symbols for which we can find in `source/ti/ble5stack/rom/ble_rom_releases/cc26xx_r2/Final_Release`.

$ for k in source/ti/ble5stack/rom/ble_rom_releases/cc26xx_r2/Final_Release/*symbols; do
objdump -t $k | perl -ne '@a=split;print "$a[-1] $a[0] $a[2]\n" if $a[1] eq "g"' > symbols/${k##*/}.txt
done

Now, import also `ble_r2.symbols.txt` and `common_r2.symbols.txt`.

Digging for API SDK symbols

As memory map states, at address 0x1000 there are “TI RTOS ROM Jump Table”. Indeed, at this address there are many pointers (not jmp instructions, though)… But nothing of real interest — although, from the SDK sources there are another address table — “ROM_Flash_JT”, which, apparently, is not located at 0x1000, but can be found easily: just back-reference already labelled from ROM function, I use HCI_bm_alloc:

ROM_Flash_JT — found by back-reference

That’s the table we want, so we can mark it’s beginning as “ROM_Flash_JT”, and compare it to the contents of `rom_init.c` file (full path in my system) `C:\ti\simplelink_cc2640r2_sdk_5_30_00_03\source\ti\blestack\rom\r2\rom_init.c`), to make lot more symbols properly named. I’ve created another script ROM_Flash_JT which can do that — run it, select rom_init.c, enjoy:

Even more knowledge!

Unpack ROM2RAM

The next important pool of information is the initial RAM contents. Whenever program has mutable structures / arrays statically initialized, they need to be initialized in RAM. So whenever code says something like:

const char * str = "MyString";
char * str2 = "OtherString";

Then `str` could be a pointer to ROM, but `str2` is a pointer to RAM, which means compiler should make a code to copy this string into RAM. Similar happens to various structures, static variables and so on. To do so, compiler inserted prologue code, before calling the main() function, unpacks constant data from ROM into RAM. I believe different compilers do this differently, but GCC used in TI SDK (as part of the XDC Tools) does it at the very first step by using a table storing unpackers and table storing pairs of ROM source — RAM target addresses. To unpack, get the arm-romtotram script, then go to the address pointed by Reset vector:

Reset handler

Create function there pressing “F”, rename it into “resetHandler” (hotkey “L”) and look at the several calls in a row. First of these functions is the one that does unpacking. Let’s rename it to “unpackRomToRam”:

unpackRomToRam decompile

Now, from this function we can trivially make three required for the script labels:

  • “ROMtoRAMtable” — initial loop value,
  • “ROMtoRAMtableEnd” — loop end condition value (ghidra doens’t allow to create label there from the loop itself: double click on it, create there pointer pressing “P” and then create label here with “L”),
  • “ROMtoRAM_Processors” at the pointer to the array of handlers used inside fo the loop.

The function should look like this now:

void unpackRomToRam(void)
{
byte **ppbVar1;
for (ppbVar1 = (byte **)&ROMtoRAMtable; ppbVar1 < &ROMtoRAMtableEnd; ppbVar1 = ppbVar1 + 2) {
(*(code *)(&ROMtoRAM_Processors)[**ppbVar1])(*ppbVar1 + 1,ppbVar1[1]);
}
xdc_runtime_Startup_exec__E();
return;
}

Check the processors table: it should have only three functions, where one does something LZ-style (copy byte or copy N bytes starting from M bytes in the past), function that calls memcpy and function that calls memzero. The table rom2ram just contains pairs of addresses, the length of the regions etc stored inside of the address pointer by the ROM source.

Once three labels ROMtoRAM* created, just run “arm-romtoram” script and whoosh — RAM is split into initialized regions following the rules.

Code analysis

Now. once we’ve got all support information, time to deep dive into the code. AFirst of all, the two pointers “SysTick” and “IRQ” doesn’t look like pointers at all. That’s definitely code, not pointers, so let’s reset their status with “C”, rename “SysTick” to “Begin” and make it code with F12 and then function with “F”.

That’s something that looks like an initialization sequence:

  FUN_0000c9d0(&DAT_20001130,&DAT_20001150);
ti_sysbios_knl_Queue_construct(&DAT_2000118c,0);
DAT_20001154 = &DAT_2000118c;
FUN_0000cdfc(&DAT_2000120c,&LAB_000100e2+1,0,8);
FUN_0000cdfc(&DAT_20001230,&LAB_000100e2+1,3,4);
FUN_0000cdfc(&DAT_20001254,&LAB_000100e2+1,0,0xb);
FUN_0000cdfc(&DAT_200011e8,&LAB_000100e2+1,0,3);
DAT_20002038 = FUN_00009730(&DAT_20002040,&DAT_20002064);
if (DAT_20002038 == 0) {
do {
/* WARNING: Do nothing block with infinite loop */
} while( true );
}
FUN_00005bf4(0x10,0xe165,0x1f,6);
uVar20 = 0;
local_5c = 9;
local_5e = 8;
local_53 = 0;
local_56 = 0;
local_5a = 100;
local_58 = 1000;
FUN_0000363c(0x306,2,&local_56);
FUN_0000363c(0x308,0x10,&DAT_200011a4);
FUN_0000363c(0x307,7,&DAT_20001174);
FUN_0000363c(0x310,1,&local_53);
FUN_0000363c(0x311,2,&local_5e);
FUN_0000363c(0x312,2,&local_5c);
FUN_0000363c(0x313,2,&local_5a);
FUN_0000363c(0x314,2,&local_58);
FUN_00005bf4(0x10,0xe165,6,0xa0);
FUN_00005bf4(0x10,0xe165,7,0xa0);
FUN_00005bf4(0x10,0xe165,8,0xa0);
FUN_00005bf4(0x10,0xe165,9,0xa0);
local_64 = 0;
local_50 = 0;
local_52 = 1;
local_51 = 1;
local_4f = 1;
FUN_00005bf4(0x10,&DAT_00003fb5,0x408,4,&local_64);
FUN_00005bf4(0x10,&DAT_00003fb5,0x400,1,&local_52);
FUN_00005bf4(0x10,&DAT_00003fb5,0x402,1,&local_51);
FUN_00005bf4(0x10,&DAT_00003fb5,0x403,1,&local_50);
FUN_00005bf4(0x10,&DAT_00003fb5,0x406,1,&local_4f);
FUN_00005bf4(0x10,&LAB_0000f3f8+1,&DAT_2000117c);

Would be good to understand it. Let’s search thru constants list in SDK (using Notepad++ or good ol’ grep):

datacompboy@NUUBOX:/mnt/c/ti/simplelink_cc2640r2_sdk_5_30_00_03$ find . -name '*.h' | xargs grep 0x306
./kernel/tirtos/packages/gnu/targets/arm/libs/install-native/arm-none-eabi/include/elf.h:#define NT_S390_LAST_BREAK 0x306
./source/ti/blestack/profiles/roles/cc26xx/broadcaster.h:#define GAPROLE_ADV_EVENT_TYPE 0x306 //!< Advertisement Type. Read/Write. Size is uint8_t. Default is GAP_ADTYPE_ADV_IND (defined in GAP.h).
./source/ti/blestack/profiles/roles/cc26xx/multi.h:#define GAPROLE_ADVERT_OFF_TIME 0x306
./source/ti/blestack/profiles/roles/cc26xx/peripheral.h:#define GAPROLE_ADVERT_OFF_TIME 0x306
./source/ti/blestack/profiles/roles/peripheral_broadcaster.h:#define GAPROLE_ADVERT_OFF_TIME 0x306 //!< Advertising Off Time for Limited advertisements (in milliseconds). Read/Write. Size is uint16. Default is 30 seconds.

Oh, cool! We looking at the firmware of a sensor, which is most probably a peripheral. Let’s load constants into ghidra from there:

File=>Parse C source=>green plus=>”c:\ti\simplelink_cc2640r2_sdk_5_30_00_03\source\ti\blestack\profiles\roles\cc26xx\peripheral.h”=>Parse to program.

It’ll complain about something, but constants will be there now. Point to the “0x306”, press “E” and see GAPROLE_ADVERT_OFF_TIME in the list of possibilities — apply it with double-click. Wait for re-decompile. Repeat for others. Below we see other constants — 408/400/402… We can repeat with them as well, but let’s look at why the function calls are not lablled.

Let’s grep thru samples about GAPROLE_SCAN_RSP_DATA:

$ find . -name '*.c' | xargs grep GAPROLE_SCAN_RSP_DATA
./examples/rtos/CC2640R2_LAUNCHXL/blestack/multi_role/src/app/multi_role.c: GAPRole_SetParameter(GAPROLE_SCAN_RSP_DATA, sizeof(scanRspData),
./examples/rtos/CC2640R2_LAUNCHXL/blestack/project_zero/src/app/project_zero.c: GAPRole_SetParameter(GAPROLE_SCAN_RSP_DATA, sizeof(scanRspData), scanRspData);
./examples/rtos/CC2640R2_LAUNCHXL/blestack/simple_broadcaster/src/app/simple_broadcaster.c: GAPRole_SetParameter(GAPROLE_SCAN_RSP_DATA, sizeof (scanRspData),
./examples/rtos/CC2640R2_LAUNCHXL/blestack/simple_np/src/app/simple_np_gap.c: GAPRole_SetParameter(GAPROLE_SCAN_RSP_DATA, sizeof(scanRspData),
./examples/rtos/CC2640R2_LAUNCHXL/blestack/simple_np/src/app/simple_np_gap.c: status = GAPRole_SetParameter(GAPROLE_SCAN_RSP_DATA, len, pDataPtr);
./examples/rtos/CC2640R2_LAUNCHXL/blestack/simple_peripheral/src/app/simple_peripheral.c: GAPRole_SetParameter(GAPROLE_SCAN_RSP_DATA, sizeof(scanRspData),
./examples/rtos/CC2640R2_LAUNCHXL/blestack/simple_peripheral/src/app/simple_peripheral_dbg.c: GAPRole_SetParameter(GAPROLE_SCAN_RSP_DATA, sizeof(scanRspData),
./examples/rtos/CC2640R2_LAUNCHXL/blestack/simple_peripheral_oad_offchip/src/app/simple_peripheral_oad_offchip.c: GAPRole_SetParameter(GAPROLE_SCAN_RSP_DATA, sizeof(scanRspData),
./examples/rtos/CC2640R2_LAUNCHXL/blestack/simple_peripheral_oad_onchip/src/app/simple_peripheral_oad_onchip.c: GAPRole_SetParameter(GAPROLE_SCAN_RSP_DATA, sizeof(scanRspData),
./examples/rtos/CC2640R2_LAUNCHXL/blestack/simple_peripheral_oad_onchip/src/persistent_app/oad_persistent_app.c: GAPRole_SetParameter(GAPROLE_SCAN_RSP_DATA, sizeof(scanRspData),
./examples/rtos/CC2640R2_LAUNCHXL/blestack/simple_peripheral_secure_fw/src/app/simple_peripheral_dbg.c: GAPRole_SetParameter(GAPROLE_SCAN_RSP_DATA, sizeof(scanRspData),

Okay, promising. What’s inside of simple_peripheral.c:

  // Setup the Peripheral GAPRole Profile. For more information see the User's
// Guide:
// http://software-dl.ti.com/lprf/sdg-latest/html/
{
// By setting this to zero, the device will go into the waiting state after
// being discoverable for 30.72 second, and will not being advertising again
// until re-enabled by the application
uint16_t advertOffTime = 0;

uint8_t enableUpdateRequest = DEFAULT_ENABLE_UPDATE_REQUEST;
uint16_t desiredMinInterval = DEFAULT_DESIRED_MIN_CONN_INTERVAL;
uint16_t desiredMaxInterval = DEFAULT_DESIRED_MAX_CONN_INTERVAL;
uint16_t desiredSlaveLatency = DEFAULT_DESIRED_SLAVE_LATENCY;
uint16_t desiredConnTimeout = DEFAULT_DESIRED_CONN_TIMEOUT;

GAPRole_SetParameter(GAPROLE_ADVERT_OFF_TIME, sizeof(uint16_t),
&advertOffTime);

GAPRole_SetParameter(GAPROLE_SCAN_RSP_DATA, sizeof(scanRspData),
scanRspData);
GAPRole_SetParameter(GAPROLE_ADVERT_DATA, sizeof(advertData), advertData);

GAPRole_SetParameter(GAPROLE_PARAM_UPDATE_ENABLE, sizeof(uint8_t),
&enableUpdateRequest);
GAPRole_SetParameter(GAPROLE_MIN_CONN_INTERVAL, sizeof(uint16_t),
&desiredMinInterval);
GAPRole_SetParameter(GAPROLE_MAX_CONN_INTERVAL, sizeof(uint16_t),
&desiredMaxInterval);
GAPRole_SetParameter(GAPROLE_SLAVE_LATENCY, sizeof(uint16_t),
&desiredSlaveLatency);
GAPRole_SetParameter(GAPROLE_TIMEOUT_MULTIPLIER, sizeof(uint16_t),
&desiredConnTimeout);
}

Wow, sequence is 1-to-1: GAPROLE_ADVERT_OFF_TIME, GAPROLE_SCAN_RSP_DATA, GAPROLE_ADVERT_DATA, GAPROLE_PARAM_UPDATE_ENABLE…

So, we can rename FUN_0000363c => GAPRole_SetParameter, and look further:

  // Set the Device Name characteristic in the GAP GATT Service
// For more information, see the section in the User's Guide:
// http://software-dl.ti.com/lprf/sdg-latest/html
GGS_SetParameter(GGS_DEVICE_NAME_ATT, GAP_DEVICE_NAME_LEN, attDeviceName);

// Set GAP Parameters to set the advertising interval
// For more information, see the GAP section of the User's Guide:
// http://software-dl.ti.com/lprf/sdg-latest/html
{
// Use the same interval for general and limited advertising.
// Note that only general advertising will occur based on the above configuration
uint16_t advInt = DEFAULT_ADVERTISING_INTERVAL;

GAP_SetParamValue(TGAP_LIM_DISC_ADV_INT_MIN, advInt);
GAP_SetParamValue(TGAP_LIM_DISC_ADV_INT_MAX, advInt);
GAP_SetParamValue(TGAP_GEN_DISC_ADV_INT_MIN, advInt);
GAP_SetParamValue(TGAP_GEN_DISC_ADV_INT_MAX, advInt);
}

So right after batch of GAPRole_SetParameter calls should be a call to GGS_SetParameter and then a batch of GAP_SetParamValue. But in the decompilation no single call, nothing that resembles GGS_SetParameter. So… GATT service calls was removed/commented out, that’s why we can’t use sensors directly.

Another issue is: GAP_SetParamValue should only have two arguments, but we see calls with 4:

  FUN_00005bf4(0x10,0xe165,6,0xa0);
FUN_00005bf4(0x10,0xe165,7,0xa0);
FUN_00005bf4(0x10,0xe165,8,0xa0);
FUN_00005bf4(0x10,0xe165,9,0xa0);

Let’s check, is this indeed the right calls?

$ find . -name '*.h' | xargs grep TGAP_LIM_DISC_ADV_INT_MAX
./source/ti/blestack/inc/gap.h:#define TGAP_LIM_DISC_ADV_INT_MAX 7

Yup, indeed they are. So, perhaps, GAP_SetParamValue is not a function but a macro?

$ find . -name '*.h' | xargs grep GAP_SetParamValue
./source/ti/ble5stack/icall/inc/ble_dispatch_lite_idx.h:#define IDX_GAP_SetParamValue JT_INDEX(152)
./source/ti/ble5stack/icall/inc/icall_api_idx.h:#define IDX_GAP_SetParamValue GAP_SetParamValue
./source/ti/ble5stack/icall/inc/icall_ble_api.h:#define GAP_SetParamValue(...) (icall_directAPI(ICALL_SERVICE_CLASS_BLE, (uint32_t) IDX_GAP_SetParamValue , ##__VA_ARGS__))
./source/ti/ble5stack/icall/inc/icall_ble_apimsg.h: * @see GAP_SetParamValue()
./source/ti/ble5stack/inc/gap.h: * Parameters set via @ref GAP_SetParamValue
./source/ti/ble5stack/inc/gap.h:extern bStatus_t GAP_SetParamValue(uint16_t paramID, uint16_t paramValue);
./source/ti/ble5stack/rom/map_direct.h:#define MAP_GAP_SetParamValue GAP_SetParamValue

That’s why! Depending on compilation settings, it’s either direct call, indirect via jump table (with index 152), or indirect with direct function address. Let’s check that ICALL_SERVICE_CLASS_BLE == 0x10:

$ find . -name '*.h' | xargs grep ICALL_SERVICE_CLASS_BLE
...
./source/ti/blestack/icall/src/inc/icall.h:#define ICALL_SERVICE_CLASS_BLE 0x0010
./source/ti/blestack/icall/src/inc/icall.h:#define ICALL_SERVICE_CLASS_BLE_MSG 0x0050
./source/ti/blestack/icall/src/inc/icall.h:#define ICALL_SERVICE_CLASS_BLE_BOARD 0x0088
...

And rename FUN_00005bf4 into icall_directAPI, mark address 0xe165 as GAP_SetParamValue. And read simple_peripheral.c further:

    GAPBondMgr_SetParameter(GAPBOND_PAIRING_MODE, sizeof(uint8_t), &pairMode);
GAPBondMgr_SetParameter(GAPBOND_MITM_PROTECTION, sizeof(uint8_t), &mitm);
GAPBondMgr_SetParameter(GAPBOND_IO_CAPABILITIES, sizeof(uint8_t), &ioCap);
GAPBondMgr_SetParameter(GAPBOND_BONDING_ENABLED, sizeof(uint8_t), &bonding);
GAPBondMgr_SetParameter(GAPBOND_LRU_BOND_REPLACEMENT, sizeof(uint8_t), &replaceBonds);

Yes, that’s our 408, 400, etc. Rename 0x3fb5 => GAPBondMgr_SetParameter.

And the decompiled code further doesn’t look like parameters table initialization either. So either it’s a wrong sample or one more proof that GATT services were commented out.

Side note: I reiterate on usefulness of extra data bits and pieces! As I said in the previous article, search by strings sometimes give interesting things. For example, “inputGyroRv” and “inputNormal” strings — search over github gives an interesting library which is indeed exactly the one used in the direction sensor. Unfortunately, nothing of interest for the feet sensor…

Anyway, we can drill down as long as we like, but let’s focus on changing logic of the firmware.

Playing with the firmware

As an experiment, we already patched sensor by changing “KATVR” to “KAT-F” (“Feet”). But that’s no fun: each foot has its type (left/right), so let’s try to make it dynamic to announce KAT-R or KAT-L. To find this out, we should find where the type is stored.

We know that setting is set via USB, and packets are started with 0x55/0xAA sequence. So by simple scrolling down I fond this piece of code:

    case 0xc:
cVar6 = *(char *)(puVar23 + 1);
pcVar26 = *(char **)(puVar23 + 2);
FUN_0000af16(&DAT_200011c8,0,0x1f);
if ((*pcVar26 == 'U') && (pcVar26[1] == -0x56)) {
DAT_200011cb = 0;
DAT_200011c8 = 0x55;
DAT_200011c9 = 0xaa;
cVar4 = (char)local_48;
cVar7 = DAT_200011c0;
if (cVar6 == '\x01') {
....

Which smells like the USB packet processing event handler. WriteDeviceId is command 0x04, so:

        if (cVar6 == '\x03') {
DAT_200011cc = '\x03';
cVar4 = DAT_200011cc;
cVar7 = DAT_200011c1;
goto LAB_000009b8;
}
if (cVar6 == '\x04') {
DAT_200011c1 = pcVar26[5];
PostMsg(7,0,0);
DAT_200011cd = 0;
DAT_200011cc = '\x04';
FUN_0000fa20();
FUN_0000f8ec();
break;
}

Given ReadDeviceId is a command 0x03, we may conclude that DAT_200011c1 is the device ID parameter. Right-click on it, search, references to it. There are few references, but this one looks promising:

void FUN_0000f8ec(void)
{
if (ParamDeviceId == '\x03') {
DAT_20001132 = 5;
}
else {
DAT_20001132 = 4;
}
return;
}

This function called from the Begin() twice — once early during initialization phase (obviously after load of the parameters) and once right after the call to WriteDeviceId. Ideal injection point!

Let’s look at it’s assembly code:

                             *************************************************************
* FUNCTION
*************************************************************
undefined FUN_0000f8ec ()
undefined r0:1 <RETURN>
FUN_0000f8ec XREF[2]: 0000038c (c) , 000009a4 (c)
0000f8ec 04 49 ldr r1,[DAT_0000f900 ] = 20001130h
0000f8ee 91 f8 91 00 ldrb.w r0,[r1,#0x91 ]=>ParamDeviceId
0000f8f2 03 28 cmp r0,#0x3
0000f8f4 14 bf ite ne
0000f8f6 04 20 mov.ne r0,#0x4
0000f8f8 05 20 mov.eq r0,#0x5
0000f8fa 88 70 strb r0,[r1,#0x2 ]=>DAT_20001132
0000f8fc 70 47 bx lr
0000f8fe c0 ?? C0h
0000f8ff 46 ?? 46h F
DAT_0000f900 XREF[1]: FUN_0000f8ec:0000f8ec (R)
0000f900 30 11 00 20 undefine 20001130h ? -> 20001130

Function loads address of some object to r1 from nearby stored constant; then loads to r0 param id as object+offset inside. Then does comparison to 0x03 (left) and depending on the value stores 4 or 5 using “ite ne”.

That’s a very nice mechanism of branchless conditional execution. In the full ARM mode they “.ne” and “.eq” and others are parameters of the instructions, but in the thumb mode the “ite” is an real operator (I)f,(T)then,(E)lse, which could also be “IT” (if-then) or “ITT” (if-then-then”) and so on — it sets mode for the next up to 4 instructions to be then or else, while the first one after it is always “then”.

So, it stores to r0 either 4 or 5 depending on the leg into another parameter in the object, and then it branches back to lr (“return”). Since the function only touches r0 and r1, there is no push/pop instructions.

Since after the return there are two unused bytes (alignment), we can directly change bx lr + c0/46 to a jump somewhere, where we’ll store our code.

Right after the ROMtoRAM table there are some free space available, all-zero. Let’s move a little bit down from it, to some round number (0x12e10).

What should we write here? We want to make “KATVR” to be “KAT-L” or “KAT-R” depending on the setting. So we may make R0 to be “L” or “R” and then store it — but where? “KATVR” is twice in the image now — once in ROM and once in RAM:

                             DAT_200011a4                                    XREF[1]:     000000ee (*)   
200011a4 06 ?? 06h
200011a5 09 ?? 09h
200011a6 4b ?? 4Bh K
200011a7 41 ?? 41h A
200011a8 54 ?? 54h T
200011a9 56 ?? 56h V
200011aa 52 ?? 52h R
200011ab 05 ?? 05h

So we want to make equivalent of:

  if(left) {
scanRsp[6] = 'L';
} else {
scanRsp[6] = 'R';
}
scanRsp[5] = '-'; // scratch that, it could be ROM patch as it fixed

At the function end we already have R0 either 4 or 5, and R1 is pointing to 20001130. The target, 200011A9, is only 0x79 away from R1, and flags are already set from the last comparison, so we may do something like this:

  ite ne
mov.ne r0,#'L'
mov.eq r0,#'R'
strb r0,[R1,#0x7A]
mov r0,#'-'
strb r0,[R1,#0x79]
bx lr

To edit code in Ghidra one should clear the instructions (“C”), then via Ctrl+Shift+G run an assembler for the current line (it’ll complain about it being beta). Enter instruction, arguments, and sometimes it understands what I want to do — sometimes not. F.e. it doesn’t want to translate “mov.ne r0,#’L’” (nor just mov). So I fallback to online assembler when that happened: enter there instructions and just copy translated bytes. “mov r0, #’L’” => “4f f0 4c 00”… No no, give me 16bit command. “movs r0, #’L’” => “4c 20” — yup, that one. “movs r0, #’R’” => “52 20”. And so on…

So, it’s done, but Ghidra clearly out of its mind:

                             MoveFeetNumNew
00012e10 14 bf ite ne
00012e12 4c 20 mov.ne r0,#0x4c
00012e14 52 20 mov.eq r0,#0x52
00012e16 81 f8 7a 00 strb.eq.w r0,[r1,#0x7a ]
00012e1a 81 f8 79 00 strb.eq.w r0,[r1,#0x79 ]
00012e1e 70 47 bx.eq lr

I have no idea why it stuck, but definitely, flags/state tracking got broken. Nevermind, we can ignore it for now, export binary, make a patch, flash it…

Whoops. Not working. :(

Ah, right! Just change a structure is not enough, we should also inform the BLE stack about it; let’s add a call GAPRole_SetParameter with GAPROLE_SCAN_RSP_DATA.

Okay, to do so we need to get the pointer to the structure into R2, and, since we just need to return afterwards, we can just use tail jump to it:

  ite ne
mov.ne r0,#'L'
mov.eq r0,#'R'
adds.w r2,r1,#0x74
strb r0,[R2,#6]
movw r0,#0x308
movs r1,#0x10
b.w GAPRole_SetParameter

Okay, mistracking of the code flow has gone to a new level of uselessness so let’s fix it. First, right-click onto “b.w” instruction and use “modify instruction flow” to change it to “CALL_RETURN”. That makes it better. To fix the “.eq” we can right-click onto the first broken instruction (adds.eq.w) and select “Clear Flow and Repair”, then press F12 to recreate code here — now code looks much better:

                             MoveFeetNumNew                                  XREF[1]:     MoveFeetNumSmt:0000f8fc (j)   
00012e10 14 bf ite ne
00012e12 4c 20 mov.ne r0,#0x4c
00012e14 52 20 mov.eq r0,#0x52
00012e16 11 f1 74 02 adds.w r2,r1,#0x74
00012e1a 90 71 strb r0,[r2,#0x6 ]
00012e1c 40 f2 08 30 movw r0,#0x308
00012e20 10 21 movs r1,#0x10
00012e22 f0 f7 0b bc b.w GAPRole_SetParameter undefined GAPRole_SetParameter()
-- Flow Override: CALL_RETURN (CALL_TERMINATOR)

And the decompilation is also looks good now:

void MoveFeetNumSmt(void)
{
if (DeviceId == '\x03') {
DAT_20001132 = 5;
UNK_200011aa = 0x52;
}
else {
DAT_20001132 = 4;
UNK_200011aa = 0x4c;
}
GAPRole_SetParameter(0x308,0x10);
return;
}

Well, semi-good. Let’s right-click onto the GAPRole_SetParameter, “Edit Function” and add three arguments:

Fix function types

Now the code looks exactly as intended:

void MoveFeetNumSmt(void)
{
if (DeviceId == '\x03') {
DAT_20001132 = 5;
UNK_200011aa = 0x52;
}
else {
DAT_20001132 = 4;
UNK_200011aa = 0x4c;
}
GAPRole_SetParameter(0x308,0x10,&scanRspData);
return;
}

Repeat the procedure (File=>Export Program; “fc.exe /b katvr_foot_orig.bin katvr_foot.bin”, etc). Flash it. Scan surrounding, excellent! We see the “KAT-R” and “KAT-L” in the range, perfect.

Okay, let’s stop child’s play and go mad scientist mode.

Overclock the KatWalk C2 sensors to 133 Hz

To get the sensor problems, one should think like a sensor. We know, that sensors send data as Notification packets. Let’s look for it. On the left side there is a “Symbol tree” with a search field under it, enter there “Notofi” and double-click on “GATT_Notification”. Hm. No references? Perhaps, an indirect call.

Let’s search for its address 0x10010045 (thumb mode, so address+1), and we can find the only one place, where some preparation is going on and then call to it:

  _DAT_20001186 = 0x14;
_DAT_20001188 = (undefined *)thunk_EXT_FUN_10018404(DAT_20001142,0x1b,0x14,0,in_r3);
cVar1 = DAT_20000521;
if (_DAT_20001188 != (undefined *)0x0) {
_DAT_20001184 = 0x2e;
if (_DAT_20001146 == 0) {
...
cVar1 = icall_directAPI(0x10,(int)&GATT_Notification + 1,DAT_20001142,&DAT_20001184,0);

0x2E is indeed the handle number used, so we found our place.

Simplified, the code of the function looks like this:

void KatSendNotification() {
out = malloc(...);
if (out) {
if (packetNo == 0) {
out->_type = 0; // status packet
fill_charge_levels(out);
} else {
out->_type = 1; // data packet
if (!DATA_READY || !DATA_OK) {
out->_x = 0;
out->_y = 0;
} else {
DATA_READY = false;
out->_x = DATA_X;
out->_y = DATA_Y;
}
out->status = STATUS;
}
if (something) {
out[0] = 0; out[1] = 1; out[2] = 0; out[3] = 1; out[4] = 0; out[5] = 1;
}
attHandleValueNoti_t notification = { 0x2e, 0x14, out };
if (!GATT_Notification(0x2E, &notification, 0)) { free(out); }
if (++packetNo == 500) {
packetNo = 0;
}
}

Every 500 packets sensor sends its charge level (plus firmware version and its ID), and all other packets contain the movement data. If the movement data is not ready or invalid, it sends zeros. If there some flag is set, then at the beginning of the packet sequence 0–1–0–1–0–1 is written over.

Now let’s inspect how often the notification is sent. There are only two references to (as I called it) “KatSendNotification” function found, both from the main thread inside of the event processing loop:

  while(!QueueEmpty(...)) {
Entry* event = QueueGet(...);
switch (event->id)
{
...
case 4:
KatSendNotification();
ClockStop(...);
break;
case 6:
if (Flag1 == 1 || Flag2 == 1) {
KatSendNotification();
}
break
...
}
}

Event #4 should be posted by a timer, but the timer is not started anywhere, and even if it would — it stops right after single processing. No other references to the timer found, so, that’s not our entry point.

Event #6 is generated inside of a callback fired by the BLE stack and data is only sent when two flags are up. One of the flags is set after the connection is just established and the second flag is set as part of another BLE processing sequence. That is consistent with the earlier observation, that data updates flow starts only after the connection is established and connection parameters have been changed.

It’s uncertain, but looks like packets are generated every time the previous one is sent, basically, with the refresh request rate. Okay, let’s see how the data gets updated. Cross-reference to DATA_OK flag leads us to another thread:

void ReadSensorData(...)
{
do {
Semaphore_Pend(SensorSemaphore, -1);
Task_sleep(700);
GPIO_SET(..., 0);
SPI_Send(0x50);
Task_sleep(10);
char* out = &SensorData;
for (int i = 0x0C; i; --i) {
*(out++) = SPI_Recv();
}
GPIO_SET(..., 1);
Task_sleep(0.1);
DATA_OK = 0;
if ((SensorData[0] & 0x80 != 0) && (SensorData[0] & 0x20 != 0)) {
DATA_OK = 1;
}
DATA_READY = 1;
SensorReads++;
If (SensorReads > 99) {
// refresh something
SensorReads = 0;
}
if (SomeFlag == 0) {
Semaphore_Post(SensorSemaphore);
}
} while(true);
}

That thread continuously refreshes the sensor data until SomeFlag wouldn’t set. As deeper look shown, “SomeFlag” is actually “SleepMode”, it is set when sensor decides to go to sleep. Semaphore is initialized at the beginning of the main thread and set up, so the sensor basically continuously read the sensor as it wakes up and until it goes to sleep, with a delay of 700+10+13byteSPI plus refresh of some other data every 100 reads. SPI is set up to 4 megabode. Sleep is works in 10s of microseconds. So together, sensor data is refreshed every ~7.11 milliseconds, so around 140–141 Hz. Looks like there should be enough the fresh data to send updates at 133 Hz — but experiments have already shown that we do have problems getting zeros frequently.

Observed behavior: we see zero packets during the continuous movement (like, in Gateway dot suddenly jumps to the center).

This behavior is understandable: since the sensor refreshes at ~140Hz and reads happen at 133Hz, they are close which makes the probability of hitting the race case when we just formed the packet and at the same time data refresh happened quite high. We see the typical race condition between two threads when there is no synchronization between them.

The solution is obvious: we should synchronize them. We only need new data to send it out, so only when there is a connection. The packets are requested at regular intervals, so, what if… We move `Semaphore_Post` from `ReadSensorData` to `KatSendNotification`? That should work great: ReadSensorData gets fresh data and goes to sleep, and once KatSendNotification fetch this data — it’ll wakes the ReadSensorData thread to prepare for the next packet. Ideal.

One more thing: the KatSendNotification is run once previous packet sent out, which means the delay inside of ReadSensorData is still needed, but I would reduce it a little bit. Since the data requested with frequency between 86 Hz and 133 Hz, the actual sampling moment is not that critical as long as the delays are the same — plus/minus few milliseconds won’t matter.

So, let’s run the notepad and plan for the patch.

First step: reduce the delay:

        000079d0 41  f6  58  31    movw       r1,#7000  # The delay
=>
000079d0 41 f2 88 31 movw r1,#5000

Second step: short-cut the loop before it updates SensorSemaphore. We have many ways to do so: we may change conditional jmp from the if to unconditional, just NOP-fill the call, or replace condition with goto:

                             LAB_00007ab0                                    XREF[1]:     00007a54 (j)   
00007ab0 28 78 ldrb r0,[r5,#0x0 ]=>SleepState = 52h
00007ab2 00 28 cmp r0,#0x0
00007ab4 98 d1 bne LAB_000079e8
00007ab6 68 68 ldr r0,[r5,#0x4 ]=>SensorReadSem
00007ab8 f9 f7 7e fa bl Semaphore_post undefined Semaphore_post()
00007abc 94 e7 b LAB_000079e8

=>

LAB_00007ab0 XREF[1]: 00007a54 (j)
00007ab0 9a e7 b LAB_000079e8

Third step: expand `KatSendNotification` function. Luckily, we’ve just learned how to play with the function expansion, so let’s do it the same way. Go get some free space (right after the previous patch, 00012e40), and replace the return (which is implemented as pop in this case) with branch to the extender:

                             LAB_00006b14                                    XREF[1]:     00006a08 (j)   
00006b14 f8 bd pop {r3,r4,r5,r6,r7,pc}
00006b16 c0 ?? C0h
00006b17 46 ?? 46h F
=>
LAB_00006b14 XREF[1]: 00006a08 (j)
00006b14 0c f0 94 b9 b.w KatSendNotificationTail

And then form the continuation to do the same code as we cut out of the `ReadSensorData`. Comparing to the patching for the L/R, we need to plan for a gap required for the code before we can place the constant from where we can read the pointer to the semaphore. Same as before, we may need to fix Repair Flow and or adjust jump tracking, but once all the code is assembled and written in, it should be like that, and the decompilation show it beautiful:

=>
KatSendNotificationTail XREF[1]: KatSendNotification:00006b14 (j)
00012e40 03 4d ldr r5,[->SleepState ] = 20001584
00012e42 28 78 ldrb r0,[r5,#0x0 ]
00012e44 00 28 cmp r0,#0x0
00012e46 02 d1 bne LAB_00012e4e
00012e48 68 68 ldr r0,[r5,#0x4 ]
00012e4a ee f7 b5 f8 bl Semaphore_post undefined Semaphore_post()
LAB_00012e4e XREF[1]: 00012e46 (j)
00012e4e f8 bd pop {r3,r4,r5,r6,r7,pc}
PTR_SleepState_00012e50 XREF[1]: 00012e40 (R)
00012e50 84 15 00 20 addr SleepState = 52h
Fixed notification decompilation

Generate patch, upload… Wow, it works! From the first shot! 133Hz receiver working great, slow continuous move no longer has any breaks. Hooray!

Fix the bug (throw away forgotten debug appendix)

When I was almost happy with the result, Utopia Machina (yes, one more time many thanks for the thorough testing!) raised an issue that his direction sensor requires resetting it frequently to be able to use it. It gets stuck showing some angle and only minimally changes from it as you turn it, plus stops reporting its charge level. Another issue was once data from one of the feet disappeared… And this is not reproduced (at least not easily) on the original receiver.

I wasn’t able to reproduce the issue either until we realized that I keep sensors attached to the power, while he is not. Okay, I screw the sensor out of the backplate, and let it be for a couple of hours periodically touching it to see if is it still alive. And yes, the problem was reproduced!

Once it happened, Wireshark confirmed by suspicion: that direction packets packets has 0–1–0–1–0–1 sequence at the beginning. The rest of the packet was normal, so a little bit of the quaternion was changing, leading to the little visible difference in the angle as you turn it.

So yes that was an explosion of this time bomb:

    if (something) {
out[0] = 0; out[1] = 1; out[2] = 0; out[3] = 1; out[4] = 0; out[5] = 1;
}

That explains also how this lead to loss of signal from one of the feet — once the same code gets triggered in the foot sensor, while coordinates do not get corrupted (they are much further in the packet than direction), but battery level and sensor number are.

Great, so we know what happened — but why? Luckily, the “something” flag has only one write reference to it, inside of the timer that fires every second; the write happens when some counter gets to 1800 and doesn’t get reset along the way. Reset happens when some condition related to the battery level happens. So… That’s some debugging code forgotten for sensor battery consumption optimization. I also found USB commands to reset or adjust the constants for these events.

So, since this is not production-critical in any way code, we can just throw it away. Again, there are many ways to throw it away: change if, wipe with NOPs… I just NOP’ed the flag change operation.

Fixed firmwares has none of these issues anymore.

What about other issues

Yes, fixed sensor firmware is compatible with the original receiver… Almost. Before, the optical sensors got read at fixed ~140Hz and now they are read with each receiver update (86..133 Hz).

Optical sensors return distance measured (in some magic units) since the last read.

The original receiver requests data every 86 Hz from the foot sensor, but the foot sensor reads its optics every 140 Hz. So gateway uses the data not as a distance, but as a sampling of speed. This way we lose some read packets, but as we request fast enough, we can recover the original speed with a good enough precision.

The patched sensor reads its optics with the same frequency as it gets requested, which gives more accurate data for distance passed, but the absolute values are different from what we can see with fixed refresh frequency. At 86 Hz it’ll be higher by ~163%, and with 133 Hz refresh rate it only be ~105%.

Is this an issue? It depends on how to use the data. For direct speed estimation. ignoring refresh rate (how the Gateway does it, unfortunately), then it’s yes and no. No, because at 133 Hz refresh rate, the data is almost the same, and one doesn’t feel any difference in the speed but it makes life much better with lower latency to react onto steps (especially at 120 fps). Yes, because the use of the patched sensor with the original receiver leads to significantly higher speeds, requiring to adjust settings in the Gateway for each game.

Can it be fixed? Yes, of course, in many ways: patch for the gateway, patch for the original receiver, more complicated sensor patch… But that’s a topic for another time.

What’s next

The next step is to get rid of the gateway at least for the native games. Right now one can’t make a standalone game using KAT SDK, since it windows only and requires the gateway to be run in the background. Since I already know how to communicate with the receiver directly and made a receiver that can be attached to the headset… I have everything required to make the real standalone SDK — and that’s a perfect fit for the game that Utopia Machine doing.

That means, there are work ongoing to make UE SDK with direct access to KatWalk C2 data for both windows and native on Quest 2/3. :) Stay tuned!

Links

--

--

Anton Fedorov

Multitool: Sr. SWE-SRE. I have tendency to cause all sort of problems, so learned how to solve them.