I have one of those personal projects that I just can’t finish. I think most software developers can relate to this. You have some awesome idea but due to work and other responsibilities, it ends up being forgotten about or buried under other tasks.
Now this isn’t a success story of one of said projects being completed, but it is the story of a long-forgotten project being revived and given some new life. In fact, I’m hoping writing this forces me to eventually finish the project. But enough with introductions, let’s get into it.
The goal of the project is to build a Spotify client that learns from my listening habits and skips songs I would normally skip. Like many programs, the desire for this comes from laziness. I don’t want to have to create or find a playlist when I’m in the mood for certain music. I want to select a song in my library, shuffle the rest, and have the songs that don’t “flow” deleted from the queue.
To accomplish this, I will need to learn some sort of model that will be able to perform this task (maybe more on that in a future post). But to be able to train a model, I first need data to train it with.
I basically want my entire listening history, including what songs I skip. It is fairly straightforward to get my extended history. Even though the Spotify API only allows you to get the last 50 played songs, we can set up a cron job to repeatedly poll this endpoint, effectively keeping track of every listen. Code for this is posted here: https://gist.github.com/SamL98/c1200a30cdb19103138308f72de8d198
The hard part is tracking skips. The Spotify web API provides no endpoint for this. Now, in the past I created some services to control playback using the Spotify AppleScript API (the rest of this post will pertain to the MacOS Spotify client). I could just use these services to keep track of what is skipped, but that felt like shying away from the challenge. How else could I accomplish it?
I recently learned about a process called hooking, where you can “intercept” function calls made from a target binary. I thought this would be the perfect way to track skips.
The most common type of hook is the interpose hook. This type of hook overwrites a relocation in the PLT but what exactly does this mean?
The PLT, or Procedure Linkage Table, allows your code to reference external functions (think libc) without knowing where that function is in memory, you just reference an entry in the PLT. The linker performs the “relocation” for each function or symbol in the PLT at runtime. One benefit of this approach is that if the external functions are loaded at different addresses only the relocation in the PLT needs to be changed, not every reference to that function in your code.
So when we create an interpose hook for say
printf, whenever the process we are hooking calls
printf, our implementation of
printf will be called instead of libc’s (oftentimes our custom library will also call the standard implementation).
Now that we’ve got a little bit of background under our belts on hooking, we’re almost ready to try inserting a hook into Spotify. But first we need figure out what we want to hook.
Finding Where to Hook
As stated earlier, an interpose hook can only be created for an external function, so we’ll look for a function in the libc or in the Objective-C runtime.
While researching where to hook, I thought a good place to start looking would be where Spotify handles the “media control keys” or F7-F9 on my MacBook. It seemed safe to assume that the handlers for the these keys would call the functions that are called when, say the next button is clicked in the Spotify app. I eventually came across the SPMediaKeyTap library on: https://github.com/nevyn/SPMediaKeyTap. I figured I’d give it a shot and see whether or not Spotify copy and pasted the code from this library as that would make my life a whole lot easier.
In the SPMediaKeyTap library, there is a method
startWatchingMediaKeys. I ran the
strings command on the Spotify binary to see if they have this method, and sure enough:
Bingo! If we load the Spotify binary into IDA (free version of course) and search for this string, we find the corresponding method:
If we look at this corresponding source code for this function, we find the interesting parameter
tapEventCallback to the
If we look back at the disassembly, we can see that the
sub_10010C230 subroutine is passed as the
tapEventCallback parameter. If we look at either the source code or disassembly of this function, we see that only one library function,
CGEventTapEnable is called:
Let’s try to hook this function.
The first thing we will need to do is create a library to define our custom
CGEventTapEnable. This is fairly simple with the following code:
#include <stdio.h>void CGEventTapEnable(CFMachPortRef tap, bool enable)
printf(“I'm hooked!\n”); old_tap_enable = dlsym(RTLD_NEXT, “CGEventTapEnable”);
dlsym function call gets the address of the actual library
CGEventTapEnable function. We then call the old implementation so that we don’t accidentally break anything. Let’s compile our library like so (credit to https://ntvalk.blogspot.com/2013/11/hooking-explained-detouring-library.html):
gcc -fno-common -c <filename>.c
gcc -dynamiclib -o <library name> <filename>.o
Great. Now let’s try running Spotify while inserting our hook:
DYLD_FORCE_FLAT_NAMESPACE=1 DYLD_INSERT_LIBRARIES=<library name> /Applications/Spotify.app/Contents/MacOS/Spotify . Hit Enter, and uh oh:
Spotify opened fine but Apple’s System Integrity Protection (SIP) didn’t let us load our unsigned library :(.
There seems to be some sort of complaint about codesigning. Luckily, I am a member of Apple’s very reasonably priced developer program, so I can codesign the library. Crisis averted. Let’s sign our library with our $100 certificate, run the previous command, and…
fail. Not surprisingly, Apple won’t let you insert a library signed with any old identity, only the one used when signing the original binary. Looks like we’ll have to find another way to dig our hooks into Spotify.
As a side note, the astute reader might notice that the function we are hooking,
CGEventTapEnable is only called when the media key event times out. So even if we could insert our hook, it would only be triggered on an edge case and we likely wouldn’t have seen any output. This section’s main purpose was to detail my initial fail (and oversight) and serve as a learning experience.
After some digging, I came across the awesome library HookCase: https://github.com/steven-michaud/HookCase. HookCase lets us implement a much more powerful type of hook than the interpose hook, the patch hook.
Patch hooks are inserted by modifying the function you wish to hook to trigger an interrupt. This interrupt can then be handled by the kernel and then transfer execution to our personal code. For those interested, I highly recommend reading the HookCase documentation for it is much more detailed.
Patch hooks allow us to hook not only calls to external functions, but any function within the target binary (since it doesn’t rely on the PLT). HookCase provides us with a framework to insert patch and/or interpose hooks as well as a kernel extension to handle the interrupts generated by patch hooks and run our custom code. It is truly a great framework and invaluable for this project.
Now that we have a way to hook into any function within the Spotify binary, there’s only one question remaining… Where?
Let’s revisit the SPMediaKeyTap source code to see how the media control keys are handled. In the callback function, we see that if F7, F8, or F9 (NX_KEYTYPE_PREVIOUS, NX_KEYTYPE_PLAY, etc.) is pressed, we execute the handleAndReleaseMediaKeyEvent selector:
And then the delegate is notified in said selector:
Let’s look at this delegate method in the repo:
Turns out it just lays out a template for handling the keys. Let’s search for the function receivedMediaKeyEvent in IDA and look at the graph view for the corresponding function:
Looks pretty similar, doesn’t it! We can see that one common function, sub_10006FE10, is called for each type of key, only an integer parameter is set to distinguish them. Let’s hook it and see if we can log what key is pressed.
We can see from the disassembly that sub_10006FE10 gets two parameters: 1) a pointer to the
playerDelegate property of the
SPTClientAppDelegate singleton, and 2) an integer specifying what type of event occurred (0 for pause/play, 3 for next, and 4 for previous).
Looking at sub_10006FE10 (I won’t include it here but I highly recommend you check it out for yourself), we can see that it is actually a wrapper for sub_10006DE40, which contains most of the meat:
Woah! That looks complex. Let’s try to break it down just a bit.
Looking at the structure of this graph, there is a single node towards the top that has many outgoing edges:
As IDA helpfully suggests, this is a switch statement on esi (the second integer parameter described earlier). It looks like Spotify’s handling a little more than just Previous, Pause/Play, and Next here though. Let’s focus on the block that handles Next, or 3:
Now admittedly, this took me some time to decipher but I want to draw your attention to the
call r12 line fourth from the bottom. If you look at some of the other cases, you will find a very similar pattern of calling a register. This seems like a good function to look into but how do we know where it is?
Let’s crack open a new tool for this: the debugger. I had a lot of trouble when I was initially trying to debug Spotify. Now this could be due to me not being too masterful with the debugger, but I think I came up with a pretty clever solution.
We’ll first set a hook on sub_10006DE40 and then we will trigger a breakpoint from within our code. We can do this by executing the assembly instruction
int 3 which is what debuggers like GDB and LLDB use to trigger breakpoints.
Here’s what that hook looks like in the HookCase framework:
After adding this to the HookCase template library, you also have to add it to the
We can then compile this using the template Makefile HookCase provides (substituting filenames of course). The library can then be inserted into Spotify with the following command:
HC_INSERT_LIBRARY=<full path to hook dylib> /Applications/Spotify.app/Contents/MacOS/Spotify.
We can then run LLDB and attach it to the running Spotify process like so:
After continuing past the initial trap, try pressing F9 (if Spotify is not the active window for the first press, it may open iTunes). The
int $3 line in our hook should have triggered the debugger.
We can now step until we reach the entrypoint of sub_10006DE40. Notice that the PC will be at an offset address from the one shown in IDA (honestly, I don’t have the best grasp as to why this happens but I assume it’s due to where the process is loaded into memory). In my current process, the
push r15 instruction is located at 0x10718ee44:
In IDA, the address of this instruction is 0x10006DE44 which gives us an offset of 0x7121000. In IDA, the address of the
call r12 instruction is at 0x10006E234. We can then add our offset to this address and set a breakpoint accordingly,
b -a 0x10718f234, and then continue.
When we hit our target instruction, we can print out the contents of register r12:
All we have to do is subtract the offset from this address, and voila, we get our titular address: 0x100CC2E20.
Let’s now hook this function to confirm our suspicions:
Add this to the
user_hooks array, compile, run, and lo and behold: Any time we press F9 or click the next button in the Spotify app, our message is logged.
Now that we’ve hooked the skip function,
I’ll post the rest of the code but I won’t go through the process of reverse engineering the rest because this post is already pretty long.
In short, I also hooked the previous function (a good exercise if you’re following along). Then in either of these hooks, I first check if I am past halfway in the current song. If I am, I don’t do anything, assuming I’ve just gotten bored with the song, not that it doesn’t fit. Then on backs (F7), I pop the last skip.
Where am I?
The way that I found out how to check if the current song is past halfway deserves a few words. My initial way was to actually call
popen and then run the corresponding AppleScript commands but that just didn’t feel right.
I ran class-dump on the Spotify binary and found two classes: SPAppleScriptObjectModel and SPAppleScriptTrack. These methods expose the necessary properties we need for playback position, duration, and track id. I then hooked the getters for these properties and called them from with the next and back hooks (I think it makes more sense to swizzle but I was having trouble with getting that to work).
I use a file to keep track of skips where the first line holds the number of skips and on a skip, we increment this counter and write the track id and timestamp to the file on the line specified by the counter. On a back press, we simply decrement this counter. This way, when we press the back button, we just set the file to write new skips over the ones that were backtracked. Anyways, here’s the code: https://gist.github.com/SamL98/0cd20b00951b9a5cca6b5c9380ec5642
I hope you enjoyed this post and learned something, I know I learned a hell of a lot throughout the process. Let me know what you think and if you think I could’ve done anything better/differently. Thanks!