DSP hacking — Reversing G-Pay’s ultrasound signal (part 3)

9 min readApr 16, 2020

A code sequence with good auto-correlation properties yields a Gaussian-distribution of cross-correlation scores.

In the last part, we were able to recover the data-signal by un-mixing (i.e. de-spreading) the base-band signal..

We then proceeded to demodulation and discovered that the signal is MFSK modulated (i.e. Multiple Frequency Shift Keyed).

To be more precise, we are fairly confident or rather have enough information to assume it’s MFSK-16 i.e. uses 16 different tones for data transfer.

So, lets pick up where we left Part2.

Symbol Extraction*
Symbol decoding*
Token recovery*

7. Symbol Extraction

So, how do we demodulate an MFSK-modulated signal?

At first, I thought it shouldn't be any different from demodulating something like Binary FSK or 2-FSK but it turns out the same methods don’t really apply.

You can read more about my experience reversing a basic 2-FSK signal here.

Binary FSK uses 2 tones for data transfer and it’s a fairly simple modulation scheme. At any given point in time, you’re either transmitting a ‘1’ or ‘0’ i.e one or the other frequency/tone. The traditional way to demodulate BFSK is via one of these methods

‘Quadrature demodulation’ (also called quad demod) — simply calculate the instantaneous phase change by extracting angle information.
An offline signal-analysis tool like inspectrum (also that was just 2 frequencies, here we have 16). Easiest way to demodulate B-FSK.
Or a more traditional route — 2 matched filters and threshold

Some of these techniques have to be followed up with ‘clock-recovery’. Something, I’d rather avoid if I can.

Methods 1 and 2 just don’t work with MFSK (i.e. not scalable). Method 3 is the traditional way to demodulate FSK or any M-ary FSK but just gets too complicated for 16 tones. In short, the above methods are either not usable or too complicated.

We’re kind of stuck, again -

My number 1 rule for when you’re stuck — is to go back to basics. What do we know so far?

Our data signal is MFSK modulated.
We know we have 16 frequencies/tones and
We also know where each symbol starts and ends (i.e. our correlation peaks)
This is the moment where a light-bulb should go off.

Sometimes, we get so lost trying to fit an existing solution to a problem, we don’t see the answer that was right in front of us, the whole time.

If we already know where each symbol starts and ends (i.e. at our correlation peaks). In essence, it means we have already extracted our symbols (i.e. consecutive sequences of 2032 data samples). Each such sequence is a symbol.

In terms of real physics, you can think of a symbol as a single burst of a particular frequency and we have 16 different frequencies which can be represented via 16 different symbols (or in other words, 16 different sequences of 2032 samples).

Given the above, if we were to perform a Fourier transform (frequency domain plot) on each of the sequence of 2032 samples (or symbols), we should get the tones/frequencies that fired for each corresponding symbol period.

Big Money and Cutting Edge Technology: How Investment in AI/ML Will Revolutionize The Healthcare…

Among remarkable developments in Artificial Intelligence ( AI) and Machine Learning (ML) over the last couple of years…

www.datadriveninvestor.com

If that doesn’t make immediate sense, that’s OK. It takes a while to sync in.

Put simply, let’s just take an FFT of the real and imaginary components separately for each sequence of 2032 data samples (or symbol) and look for the frequency/tone with the highest energy content. The tone with the most energy is our symbol.

The code-snippet below does exactly that.

This snippet does not produce any output but stores all extracted symbols as a list in {symbols_real[] and symbols_imag[]}. We’ll get our symbols in the next step.

8. Symbol decoding

Finally, we have our symbols.

But, we’re not done yet. In MFSK-16 each symbol represent 4 bits (not 1). That’s kind of why you use these complex modulations schemes. You can pack more bits in a limited amount spectral bandwidth.

So, we need to map /decode our symbols. In other words, translate received symbols to actual data bits. In G-pay’s case, we need to translate symbols to an 8-digit token or 64 bits and if you remember, we’re looking for token number — ‘04755658’.

Note: Actually, G-pay only uses the last 7 digits without the 0.

You’d think this is the easy part. I mean how hard can it be? We’ve come this far. But …. nope

The process of decoding symbols can range from trivial to hilariously obscure, considering it all custom.

Nevertheless always start with some OS-INT i.e. try to find some documentation or nuggets of information that give you some clues to the type of encoding in-use. I managed to find a patent in google’s patent repository that lays out the basic protocol for transmitting the data.

It goes like this — You always start with a spacer (think of it as a start of token) symbol which is the last tone/frequency (in our case its ‘15’ and not 16. Computers count from ‘0’) followed by a payload of 6 symbols and 1 parity symbol for error detection.
Each transmitted token looks something like this

[spacer + payload + parity] — this sequence is called a token.

Payload symbols are hex decoded. In the example (see below), the correct sequence of symbols are as follows- [15, 4, 8, 9, 0, 12, 10, 0]. We remove the first and the last symbol from the sequence and hex decode the remaining (i.e. 4, 8, 9, 0, 12, 10,) which gives us 4755658 in decimal.
G-pay includes a parity symbol or checksum of sorts for error detection. The parity symbol is the one that satisfies the following condition i.e. sum of all symbols modulo 16 must equal zero.

∑(𝑠𝑝𝑎𝑐𝑒𝑟+𝑝𝑎𝑦𝑙𝑜𝑎𝑑+𝑝𝑎𝑟𝑖𝑡𝑦)%16==0

My face lit up like a Christmas tree when I found this document.

But my excitement was short-lived. Upon evaluation, the parity symbols never seem to satisfy the above condition even when I had the right tokens i.e. the token was correct, but the checksum was wrong. After working through a ton of permutations and combinations, I’m now of the opinion that google may have changed its checksum.

To illustrate this in action, here’s what a decoded sequence of symbols looks like. In the example below, the first 4 sequences of list 1 are slightly off by one (or two) symbols (Ex: observe that 3rd and 4th sequence have an ‘11’ instead of an 12 at ‘position 6’). The last 2 sequences are the correct ones. You might be wondering how do I know that? Lets unravel that mystery in the final section.

Symbol decoding

The above code snippet cycles through the the list of symbols retrieved from step7, looking for the spacer (symbol 15) and then prints the next 7 symbols.

9. Token generation

We’re almost done.

As I couldn’t figure out the the checksum. I went for the next best thing

Approximation :)
Instead of taking the sum % 16 == 0 as stated in the patent document, I approximated with (sum % 16 <=4 or >=13) and it works for the most part.
But now — the token we’re looking for is one of a few ‘candidate tokens’, as you can see below. The token we’re looking for is the one in the demo video in Part2 — 04755658

Note -
Some android phones, especially older phones — we may need to change sum(tokens) expression to add 1 extra symbol (i.e. instead of [i : i+8] symbols, we will need [i : i+9]). So, there may be a few cases that may not fall into our approximation

Token generation after removing duplicates.

The token we’re looking for is one a few ‘candidate tokens’.

That’s it, we’re done.

We just pulled a G-pay token out of thin air!

A few comments on google’s threat model:

There is no obvious vulnerability in the underlying app-implementation or usage workflow i.e.

When you activate cash mode on the app (android/iOS version of the app), it calls the underlying cross-platform API called ‘google nearby’ to allocate a short-lived pairing token.
The API then uses one of many options (such as Bluetooth classic, Bluetooth low energy or ultrasound) to get that code across the air-gap to other nearby devices.
Any device that detects the same code can exchange messages.

But we have confirmation that ‘cash mode’ does exactly what it says it does and G-pay only uses this mode of ultrasonic communication to discover and pair parties, while exchanging payee information uses a different communication channel.

However, Google makes a few claims about it being difficult to spoof these ultrasonic pairing codes which we just proved is NOT true. Having reverse engineered google’s ultrasonic modem, it is possible to spoof and transmit valid pairing codes. Though in this case, it’s important to mention that the pairing codes are short-lived which makes for a stronger threat model.

2. Another point to note is google’s use of a 127 bit DSSS spreading code is a good trade-off but it also means that updating its implementation with a longer code or a different sequence is a non-trivial trivial endeavor.

Longer spreading codes increase computational overheads and we don’t have too many 127-bit maximal length sequences

3. Lastly, I haven’t explored the nearby API fully but I see it offers an option to send arbitrary data over ultrasound. It would be interesting to see which apps are using it and for what.

Correction: In Part1, I said pairing tokens cannot be replayed (i.e. they were protected against replay attacks). Not really, my initial tests were a false positive. The tokens are re-playable but short-lived.

Learning:

Asymmetry: The amount of work involved in reverse engineering signals Vs designing and implementing protocols is asymmetric. Its very much in favor of an attacker/reverse-engineer as he/she is not constrained by the use-case or implementation requirements.

For example: All I had to do to synchronize the code-signal was write a few lines of python code i.e. I didn’t care about time taken to compute correlation scores or amount of memory available to perform these operations unlike the the protocol designer/developer who is constrained by the trifecta of (compute/performance/energy) requirements.

2. DSSS usage: DSSS and other spread spectrum technologies are the go-to option in a huge number of applications that require noise, interference and multi-path resiliency. Ex: Drone and satellite communications. Unfortunately, it also assumed to be secure as long as the spreading code is a secret. This is NOT true. The spreading is called ‘pseudo-random’ for a reason. We just brute forced it without a sweat.

3. Learning digital signal processing: Most of the DSP literature (including lectures/tutorials) on the internet is free. That’s great but a lot of it is pretty abstract, filled with pages of equations and no intuitive explanations. It doesn’t have to be this way.

In the last 2 weeks, I managed to find content from many independent bloggers and educators who focus on the intuition part (and have nailed it, mostly).

I personally learnt more about the Fourier transform, watching a 20 min video by [3blue1brown], than 3 days of flipping through multiple e-books or
From a practical introduction of Z-transforms by [audioprogrammer] than a dozen theoretical lectures.

Conclusions:

In conclusion, this was one journey I’m happy I stuck with, despite thinking of throwing in the towel a hundred times. In the end, I think I learned a thing or two about DSP and no longer think its groot’s language.

A few suggestions to anyone starting out,

Stick with it. Take a break if you need it (a really long one if required)
Test every theory by experimenting. This isn’t a race. Assume you have all the time in the world. Experimentation just means, write some code and see if it works. If no, come up with something new and repeat.
Build Intuition. There is no better advice that this. At the end of the day, signal processing is about physics, if you can visualize it, you can understand it.

The code for this project is up on my GitHub account.