The Yak is a Hack

Kyle Stevenson
9 min readNov 27, 2014

--

After roughly a month of using Yik Yak, I finally decided to dig deep and see what information it saves and shares with other users on the service. I was quite surprised right off the bat, as there were a number of sketchy details the privacy conscious may want (or not want!) to know about. Let’s dive in!

Yik Yak API Response Breakdown

Let’s take a quick look at the way Yaks are returned from the Yik Yak API. Since this post is about promoting anonymity, I won’t be using existing Yaks by real people, rather I’ll create a fake Yak data and inject it into the app using a handy tool called mitmproxy (man-in-the-middle proxy — basically, it allows me to view and edit responses sent from a device to any other routable server on the Internet). If you’d like to see the source code for the in-line script used for this post, take a look at this gist.

Here’s the JSON object we’re going to inject into each response body every time the app makes a request to the API route /api/getMessages. The data corresponds to a fake Yak with a faked location of a Chipotle location in Beaverton, Oregon.

{
"comments": 0,
"deliveryID": 102,
"handle": null,
"hidePin": "1",
"latitude": 45.4953056,
"liked": 1,
"longitude": -122.8090428,
"message": "This Chipotle is delicious!",
"messageID": "R/5476c66b5542a75500257f30813b4",
"numberOfLikes": 1337,
"posterID": "9cb4afde731e",
"reyaked": 0,
"score": "2.1417070187",
"time": "2014-11-27 05:34:27",
"type": "0"
}

Some readers out there may have had a reaction similar to the one in this gif when they noticed the numeric and boolean values stored as strings. If you’re not that technical, just know that that’s widely accepted as not a great practice to follow.

Not surprisingly setting non-numeric values in some of those strings can brick the app — I’ll get to that later in the post. For now, let’s break down the JSON object key-by-key. Any key omissions are done so for the sake of brevity.

  • comments : Number of comments on the Yak.
  • handle : When posting a Yak, the submitter can enter in a couple of words, or a string, to be displayed along with their Yak. The same handle can be used by many people, at any point in time, which means this is not a way of uniquely identifying a user.
  • hidePin : Basically, this just tells the app if it should display the Poster’s exact* location when it draws the map behind the Yak.
  • latitude : See Latitude on Wikipedia. This is the unmodified Latitude value of where the Yak was submitted from. Strangely, these values appear to be stored as Strings inside the app itself.
  • liked : The current user’s vote on the Yak. These are commonly referred to as “upvotes” (liked = 1), “downvotes” (liked = -1). If the current User has not voted on the Yak, then liked = 0.
  • longitude : See Longitude on Wikipedia. This is the unmodified Longitude value of where the Yak was submitted from. Strangely, these values appear to be stored as Strings inside the app itself.
  • message : The “Yak”/message.
  • messageID : The unique identifier of the Yak.
  • numberOfLikes : The net upvotes/downvotes. Example: if a Yak has 5 upvotes and 3 downvotes, the numberOfLikes would be calculated by subtracting downvotes from upvotes: 5-3 = 2
  • posterID : This is a bit of an odd key, and needs more than a sentence to explain — see below.
  • score : Appears to be a scalar value likely based on velocity: ~ (net upvotes) / (time passed). This value is likely used in the app for sorting Yaks based on how active/“hot” they are and is not displayed to users.
The intercepted and rewritten response in mitmproxy’s interface.
The red rectangle shows the approximate map view as displayed on the Android App. The red pin is where the GPS coordinates we injected actually are. If hidePin is not set to 1, then the area displayed behind the Yak text is much closer to the real GPS coordinates, though not necessarily centered on the exact coordinates.

Every Yakker (user) has to be able to be uniquely identified by the API servers in order to authenticate actions such as posting, editing, or deleting one’s Yaks. In API responses, the user IDs are denoted as posterID.

When I first looked at the API responses, I saw that the values were in hexadecimal which I hoped meant they could possibly be used to identify a user across Yaks. However, I noticed that some hexadecimal strings had different lengths and were capitalized while others were not.

After a bit of research, and a simple MapReduce method, I figured out that none of the posterID’s for the previous 101 Yaks collided (meaning every posterID was unique). That was intriguing, so I dug a little deeper.

Simple Python MapReduce method to count the number of unique posterID fields

It turns out that, at least on Android, those longer (32 hex digits) and uppercase hexadecimal strings I mentioned are the full user ID of the current user using the app while every other Yak that was not posted by the current user appears to be a shorter (13 hex digits) hexadecimal string.

This leads me to believe that Yik Yak’s API servers do some sort of filter before returning API results similar to the following:

Additionally, I noticed that the posterID’s seemed to be in an incrementing sequence that would differ by the user’s location. For example, in Corvallis, Oregon the first few hexadecimal digits might be “548225f80" while in Portland, Oregon they might start with “548225d8a”.

Fortunately, this appears to imply that identifying individual users based on the posterID field alone is nearly impossible.

However, the GPS coordinates in API responses appear to be 100% unmodified/anonymized. I took the latitude and longitude of a previous Yak I posted and plugged it into Google Maps which lead to me looking at a map centered on where my bed is located inside my apartment… Creepy!

Since I’ve already established that they need to comb over the messages posted in order to anonymize the posterID’s, why can’t they take it a step further and slightly modify the GPS coordinates, too?

Note (December 8th 2014): Yik Yak updated their API over the weekend to return randomized coordinates on each refresh. I believe they do the following calculation for each GPS coordinate returned: coordinates = [x + math.randint(-1.5mi, 1.5mi), y + math.randint(-1.5mi, 1.5mi). If a heatmap of the Yaks in an area is generated, they eventually converge to a square.

Additional Analysis

The rest of this post is focused on some of the interesting traits/quirks I discovered while poking around with the Yik Yak API and the Android APK. It’s mostly a brain dump for anyone else interested in getting their feet wet. Or for the laughs at some of the odder “functionality” of the API and app.

Hacky Code and Features

Yik Yak URL blocking “algorithm”

Having users or bots spam urls/websites on any platform is never a good thing. I came across this portion of code in the Yik Yak app which surprised and disappointed me.

This code essentially says: if the Yak about to be sent has “.com”, “.org”, “.me”, “.net”, or “.ly” in the body, do not send it and let the user know they can’t send URLs.

This code should work because there are only like, what, five TLDs?

That got me thinking: what happens if I try one of the couple hundred official TLDs that they aren’t searching for?

Not surprisingly, it sends just fine!

How to Brick Yik Yak on Android

Please parse (and store) data correctly, folks.

Earlier I mentioned how some of the number values in the JSON responses were string values, when they really should have been just numbers. Well, I decided to set a value returned from the server named yakarma (net upvotes) to a non-numeric value.

This is what it originally looked like:

{
...
"yakarma": "1234"
}

To brick the app, just set it to any string that isn’t completely filled with digits:

{
...
"yakarma": "onetwothreefour"
}
Don’t store numbers as strings.

This caused the app to never be able to start up again. Instead, it displays a crash error and allows you to submit a report via Google Play’s reporting interface. I ended up having to reinstall the Yik Yak app on my phone before I could even open the app again.

Since IDs are generated based on your phone’s hardware, reinstalling the app should not change your ID. However, if you decide to swap out your SIM card, your ID may change. Luckily I still had all my hard earned yakarma after the reinstall ☺

Quick pro-tips for developers:

  1. Don’t store numbers as string values.
  2. Use proper error handling.

Yak Creation

While observing the traffic sent through mitmproxy during the creation of a new Yak, I found something quite… peculiar. Along with the expected 200 OK response, the response body had a single character: “1".

Now, considering almost all of the responses I had observed prior to this had been JSON responses, I looked a little closer and found a cookie being set with a JSON object as the value. Decoded it looked roughly like this:

{
"recordCreatedTimeStamp": 1417069197.187,
"yakID": "R/5476c66b654b766fe27f8d20a065",
"textContent": "the contents of the Yak I posted",
"handle": null,
"yakType": "Yak",
"relatedYakID": null
}

I know Yik Yak probably didn’t set out to follow proper REST practices, but seriously? A cookie is truly a strange place to report information regarding the success of an API call.

Tools

If you’re interested in the tools I used, they are as follows:

  • reJ — “reJ is a graphical tool for manipulation and inspection of .class files of the Java platform.”
  • dex2jar suite — used for transforming the APK (Dalvik virtual machine format) to a Jar (Java Archive of JVM class files) to be able to use reJ. Due to control flow obfuscation, method and field name obfuscation, along with other obfuscation and translation side-effects (going from a register based format to a stack based format), this code couldn’t be easily decompiled and edited. However, the Java bytecode is still readable enough to analyze the business logic of the application.

I would have opted for JEB, but why throw down $1000 when I can have similar enough functionality from open source software? ☺

FYI, the md5 checksum of the APK I worked with is: dc2ffcf1e0ff499b2d3cec8f7a59bf01.

Authentication

Generation of User IDs

The code used to generate a user’s ID is roughly the following (in Pythonic pseudocode):

md5(‘%s.%s.%s.%s’ % (deviceID, simSerialNumber, getSystemProperty(“ro.serialno”), wifiMacAddress))
Excerpt form the (refactored) Java bytecode method that generates the seed for the userID prior to being md5'd.

This means, unless a malicious actor somehow acquires this information about a user’s device, it will be incredibly difficult to impersonate another user of Yik Yak. Since Yik Yak uses API endpoints with TLS v1.2 enabled, it would be very difficult to obtain the full userID used for generating valid Hmac ciphertexts which allow for a client to authenticate and perform actions such as:

  • Creating a new Yak/Comment
  • Viewing one’s Top Yaks
  • Deleting one’s Yak/Comment

API Authorization via Hmac

Method for creating an Hmac of a string based on the passed in IV, returning the base64 encoded string of the resulting ciphertext.

I am fairly certain that the Initialization Vector, p1, is created by combining the API method name (e.g. getMessages) with a number of parameters from the query string, including the current timestamp. The server is able to verify the Hmac passed via the query string since it knows all the values used to create the Hmac.

I hope this blog post either helped you learn something about security, designing sane APIs, or, at the very least, made you laugh.

If you’d like to reach out to me, you can find my contact information here or feel free to leave a comment here on Medium. ☺

--

--

Kyle Stevenson

Software Engineer. 23 years old. PDX. Previous places of employment include Rackspace and Intel.