How I gained access to revenue and traffic data of thousands of Shopify stores
- Almost vulnerable
- Getting da wordlist
- A Fail
- The new approach
- The exploit
Permission to write a blog post was given by Shopify prior to this public disclosure
About one year ago when I was hacking on Shopify program, I had to set a few alerts to get notified whenever a new API endpoint appears on a list of subdomains and URLs (a custom script I put together to keep track of new APIs).
A few months later, I received a notification about a new endpoint that I haven’t seen before which was something similar to:
I honestly did not bother to check those endpoints, simply because I was already getting tons of new alerts for various targets. Another reason is the fact that I was not actively hunting on any program, only spending a few hours during weekends and mostly relying on recon automation to score quick wins every time I get interesting alerts.
Back to the story … a few months forward, I received another alert which was related to the same endpoints:
That means the last endpoint has been removed from the subdomain source, which indeed was a red flag for me to dig into what’s really happening and investigate the reason why it was removed.
2. Almost Vulnerable
Upon checking, REDACTED was basically the name of a Shopify store (REDACTED.myshopify.com is the link to store). It was actively listed for sale on https://exchangemarketplace.com/shops/ (Alias of https://exchange.shopify.com).
So the first thing was to check the following endpoint (that I will call later as almost vulnerable endpoint — for the sake of simplicity [and fun])
It’s apparently leaking the revenue data of REDACTED store. API endpoint is supposed to be used internally to fetch sales data and present it as a public graph. In that case, it was expected behaviour, to expose the data of the store is listed for sale on the exchange marketplace and publicly displaying the same data in a graph:
I identified another occurrence of the same API leaking the revenue data of another store. The store was sold a while ago and removed from the marketplace but the data is still being returned for some reason.
At this point, I was almost sure the endpoint is vulnerable to an Insecure Direct Object Reference (IDOR) security vulnerability iterating over $storeName.
Next, I set up a new store and used the $storeName on the same vulnerable API endpoint to check if we would get the sales data of our new store to finally verify the legitimacy of the vulnerability and report it to Shopify
date: Fri, 29 Mar 2019 20:28:18 GMT
OK. This is not going to be as easy as expected!
At this point, it is clear that we do not have a working proof of concept yet, so there isn’t anything to report to Shopify.; I had to carefully think of all the possible scenarios in order to figure out the situation and thoroughly investigate this behaviour.
The first idea that came to mind is to perform a mass check on eventually all existing stores, and see if we would get any customer data out of any. First challenge? we need to get a wordlist of store names;
The attack process will be as follows:
- Building a wordlist of store names (from storeName.myshopify.com);
- Iterate the wordlist against the almost vulnerable endpoint:
- Filtering out the vulnerable domains;
- Analyzing affected stores to figure out the root cause of the observed behaviour or eventual vulnerability.
3. Getting da wordlist
The first method we have is checking the reverse IP to find all DNS records of ‘A’ type associated with the IP;
The store is live on $storeName.myshopify.com and a quick DNS query results in the following:
; <<>> DiG 9.10.6 <<>> REDACTED.myshopify.com
REDACTED.myshopify.com. 3352 IN CNAME shops.myshopify.com.
shops.myshopify.com. 1091 IN A 18.104.22.168
So REDACTED.myshopify.com has a CNAME pointing shops.myshopify.com which itself points to 22.214.171.124. Luckily, there is no reverse proxy WAF behind it, so we can use that IP to fetch valid ‘A’ DNS records for domains hosted on the same IP space.
For this purpose, I used one of my scripts (Find it at https://gist.github.com/ayoubfathi/57c3fef7d4eada575a8b080cc3c4a562)
We are going to run it against shops.myshopify.com to see how many records we can obtain, running it produced the following output:
So we were able to get almost 1000 stores URLs using reverse IP.
Now we need to get the store link associated with each domain (i.e. gadgetstore.com → $storename.myshopify.com), strip out the $storename and build a wordlist of store names to feed it into our test request
4. A Fail
Hence, I have built a script to execute the aforementioned attack flow and exploit our (what I call it) almost vulnerable endpoint; what it basically does is the following:
- Use the output of the last script (revIP.py) and feed it to the current script.
- Scrap the HTML Source for each domain specifically looking for .myshopify.com links.
- Extract the store name from the scrapped links.
- Automatically apply the store name in the almost vulnerable request as in:
5. Filtering out the vulnerable domains and printing them to stdout
We are all set to run the script, running it will produce the following results!
Only four results?
So out of 1000 stores, I was only able to identify four vulnerable stores, of which three were listed on the exchange marketplace, so they are expected to have their sales data publicly listed, and one store was deactivated (Lucky, Huh?)
As a result, I don’t see any security impact yet for this to be considered a security vulnerability. Hence, I stopped testing for some few weeks (Busy life?) and decided to come back later to explore further possibilities and keep digging.
A few weeks later, I came back to the same aforementioned API request and started poking around it. I was not able to obtain any useful information out of it, so I decided to go with a different approach around this.
To obtain more data to analyze, we will switch from testing on 1000 stores to a bigger sample (Thousands, millions depends on the results). How would we be able to pull that off? next section will go through the details of the new approach.
5. The new approach
What’s the best way to find all existing Shopify stores without missing any 🤔
First thing I would think of is scanning the internet but why would we do that when we have someone else’s data?
For this particular research, I will be using public data -Forward DNS. Using this approach, we don’t need to generate store names from a given domain list. Instead, we will be using the FDNS to obtain reverse CNAME records of shops.myshopify.com (which all the stores point to)
I used an instance with great specifications and cloned the data needed for this research
Now, we will be looking for CNAME records that match shops.myshopify.com where Shopify merchants are hosting their stores.
Upon checking how many stores are available, I found:
Yay! finally, we have a considerable number of stores to test on instead of the limited sample of 1000 stores.
Next, let’s build our new word list of stores name based on the new results. It should produce 813684 entries.
At this point, we are done with the wordlist and can move on to the exploit part.
6. The exploit
It was a very late night, I edited the aforementioned exploit.py script to use the new word list consisting of 813K store names.
Then SSH’d to my box, since the wordlist is huge, I would not consider waiting for the results, so I had to run the script in the background and go get some sleep.
For about an hour trying to sleep …
I abandoned the idea of sleeping and opened my machine immediately, logged in to my remote box and all I see is thousands of 403 errors
I quickly made a request to the same subdomain to double check whether this was caused by reaching the rate limit or if they changed the page to be forbidden and …
Not permitted? Yup, I call this “ You’ve got WAFfed ”
To hell with it, it’s been a long day, I need a decent sleep for today …
Later on, I came back to find a way around the WAF, I realized bash script is issuing almost a request per second which was a decent rate for me to test with, so I quickly made another script to check whether we will pass this time or not (hoping that they don’t implement an Average Threshold tightened to that number)
That basically takes the 800K store names as an input (stores-exchange.txt), send a curl request to retrieve the sales data, then it would insert the store name within the same JSON response entry using DAP Library (shoutout to Rapid7 for this) before printing the data to stdout.
This time our script will be very slow, as you know bash is single threaded and this is the only way we might be able to bypass the rate limit policy, I ran the script and logged out from my instance …
A few days later, I logged back into my instance to check the results, and?
We are getting sales data of Shopify merchants that includes a monthly breakdown of revenue in USD of thousands of stores from 2015 until today.
We have a list of vulnerable stores, so if we query any of them, we would get a breakdown of monthly revenue data in USD of the current store during its lifetime:
That’s an example of sales data of a Shopify merchant from 2015 to date.
As per CVSS 3.0, the score of this particular finding is 7.5 — High, which reflects the significance of the vulnerability, customer traffic and revenue data were exposed, where no privileges nor user interaction is required to gain access to the information.
This was tested on 800K merchant stores, +12,100 of them were exposed, +8700 were vulnerable stores that we were able to obtain their sales and traffic data and they should not be public, and 3400 are expected to have their sales data public, to summarize:
- This was tested on +800K stores
- +12,100 were exposed
- +8700 stores were vulnerable and their data is set to private.
- Only +3400 stores data was expected to be public.
Analysis of root cause
Based on above data and a few more days of research, I came to the conclusion that this was caused by Shopify Exchange App (Actively used by merchants now) which was introduced only a few months before this vulnerability. Any merchant who has Exchange App installed would be vulnerable.
Afterwards, I quickly put together all the information and data in a report to submit to Shopify bug bounty program.
- Oct 13, 2018: First disclosure to Shopify
- Oct 16, 2018: Triaged
- Oct 16, 2018: Fixed (after 1 hour from Triage)
- Oct 17, 2018: Need more information
- Nov 1, 2018: Report is not eligible for a bounty (Policy Violation)
My own opinion on Shopify decision
Despite that I disagree with the 3rd violation held against me, I was not properly able yet to confirm the existence and legitimacy of the said vulnerability. Furthermore, I wasn’t able to consistently work on researching the behaviour so I only had a few hours weekly.
However, regarding the second violation, I would agree with them, in spite of a few points:
- I was testing with the best intention to demonstrate an impact and avoid sending a theoretical report without any working proof of concept.
- I believe that I had no other way to demonstrate the existence of this particular security vulnerability if I have not proceeded it the way I did.
Quite frankly, even the outcome of this report was not as expected, but it’s my fault at the end. They mentioned in the policy page that “Interacting with shops other than those created by you” is prohibited.
Hence, I unintentionally violated the said policy regardless of my good intentions. As a result, I fully assume the consequences and respect their decision and extend my apologies to Shopify team for the lack of awareness.
Always read the policy carefully, reach out to the relevant team as soon as you have something even if you are not confident it is a vulnerability.
Finally, regardless of the outcome, I’m proud I saved a company from a potential breach. In the end, that’s one of the missions that drives my work and makes me fulfilled besides the monetary compensations that come along with it. A safer Internet for everyone!
Many thanks to Shopify team, particularly to Peter Yaworski for being very helpful and supportive. I still highly recommend hacking on their program as they are fast handling incoming reports.