How to detect, block and manage DataDome

campo
4 min readJul 27, 2020

--

In this article I will show you what my research on the new antibot called DataDome led me to. This tutorial is for informational purposes only. I guess you’ll add the code to your scripts, go ahead but quote me, thanks.

What is DataDome?

DataDome is a new antibot focused on sneakers bot (and much more), I suggest you to read their article, is pretty good, I enjoyed it.

An extract from the article

Where can i find DataDome?

Lots of sneakers sites, like SlamJam, Starcow, Courir ecc… are using DataDome as bot protection and for lots bot developers is a problem.

When you visit one of these sites and you will be asked to verify your identity through the challenge, you will be redirected to a different link (we will need it later)

DataDome url of SlamJam
This is a classic DataDome challenge, in this case we are on SlamJam and a simple Google reCaptcha is required. Important note, this page is inside an iframe.
The “You have blocked” page

How to solve DataDome?

All we need to acces sto the site is a DataDome cookie. They are unique for every ip and session, you can get it by solving the challenge.
Keep in mind that the cookies are unique for every ip, this cookie will not wotk with another ip.
Honestly I don’t remember how many time does it last, but you can start harvesting them before a sneakers drop.

We need a small script to solve it, keep in mind that DataDome check ips, unbanned proxies are required. I can’t mention some good providers to avoid having them clipped. I don’t suggest to run localhost.

Let’s start with SlamJam (Keep in mind that SlamJam sometimes got CloudFlare up, this script will help you only with DataDome)

import requests, json, reheaders = {"user-agent":"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36"}
url = "https://www.slamjam.com/en_IT/cart"
# The DataDome url of SlamJam
datadome_url = 'https://www.slamjam.com/on/demandware.store/Sites-slamjam-Site/en_IT/DDUser-Challenge'
s = requests.Session()
proxies = your_proxies_in_json_format
s.proxies.update(proxies)

r = s.get(url, headers=headers)
if r.url != datadome_url:
print("No needed to solve datadome")
else:
print('Found datadome challenge')

Even if you got 200 as response status code, you need to check if you have been redirected to the DataDome challenge page.

This is the source code of the main challenge page, we need to extract the the json data highlighted in red, to make the first post.

# Let's load it as json and pick the data
dd = json.loads(re.search('var dd=([^"]+)</script>', r.text).group(1).replace("'",'"'))
initialCid = dd['cid']
hsh = dd['hsh']
t = dd['t']
host = dd['host']
cid = s.cookies['datadome'] # The request will set this cookie

Now we are ready to make the first post!

post_url = 'https://'+host.replace('&#x2d;','-')+'/captcha/?initialCid={}&hash={}&cid={}&t={}'.format(initialCid, hsh, cid,t)
first_post = s.get(post_url)
try:
data = re.search('getRequest([^"]+)ddCaptcha', first_post.text).group(1)
except:
print('Proxy banned')
# You got the "You have been blocked page", You'll see this page if your proxy is banned
else:
# We have extracted the second javascript to parse

We are now on the first page (pic 1), inside the iframe.
We only need to scrape:

  • The redirect url
  • The captcha sitekey
  • Your request user agent and ip

The page script is very long, I’ll load it on my github repo (link at the end of the aricle)

useragent = re.search('&ua=([^"]+)&referer=', data).group(1).replace("' + encodeURIComponent('",'').replace("');",'').replace("\n    getRequest += '",'')ip = re.search('&x-forwarded-for=([^"]+);', data).group(1).replace("' + encodeURIComponent('",'').replace("')",'')sitekey = first_post.text.split("'sitekey' : '")[1].split("'")[0]challenge_link = first_post.url

We are almost done, now we need the magic number to make the final post, it is calculated through a script using one of the first variables that we have taken, the cid, and the user agent of the request.
To facilitate the work I loaded the script on heroku, just send a request with these parameters to get the magic number.

magic_request = json.loads(requests.get('https://datadome-magic-number-solver.herokuapp.com/datadome?id={}&ua={}'.format(cid, useragent)).text)['id']

The answer will be a json of this type {"id": 162047291} if the number is made up of 9 characters, it’s all right.
Now we need to solve the captcha, maybe using an online service like 2captcha, and then we can make the final post. The post url of it will be made using all the stored data.

sitekey = first_post.text.split("'sitekey' : '")[1].split("'")[0]
response = solvecaptcha(challenge_link, sitekey) # Your 2captcha function
final_url = 'https://c.captcha-delivery.com/captcha/check?cid={}&icid={}&ccid=null&g-recaptcha-response={}&hash={}&ua={}&referer={}&parent_url={}&x-forwarded-for={}&captchaChallenge={}'.format(cid, initialCid, response, hsh, useragent, datadome_url, datadome_url, ip, magic_number)
second_post = s.get(final_url)
if second_post.status_code == 200:
print('Datadome passed')
else:
# Maybe you have done something wrong

If it is all good, the server will respond with 200 as status code and something like this as header.

set-cookie: datadome=Tdx_AVi.VpcPns7JD7n9~EedCazO2jmhdrv_5Hhxmg3ZnUB4iHxn1OE0pum84C2RrSAm_Tnbf7VfF-6.Kfy_XQGeYZBFPwQkbn2~xSmO0J; Max-Age=31536000; Domain=.captcha-delivery.com; Path=/; SameSite=Lax

This is what we need, the DataDome cookie. With it you are able to enter into the site.
For all the sites under DataDome you’ll need to solve the same challenge, the only thing to remember is to update the “datadome_url”.
Some sites like StarCow ban proxies very frequently, and have a different type of captcha (always supported by 2Captcha), maybe you will have to change the string to extract the site key.

Hope i’ve helped you. Take a look to my repo on GitHub too!
If you need other help/info or work for DataDome, here’s my mail 13campo12@gmail.com

--

--