Mapping out Bank of America’s Credit Card Portals
As a preface, I’d recommend you get acquainted with churning (reddit.com/r/churning) and web spiders (Web crawler). They’re not necessary to understand the article, but it will help with some of the acronyms and terms I use.
If you’re familiar with the travel hacking world, you’ll know that the best way to amass large amounts of points is through credit card sign up bonuses. Recently, this hobby has been gotten a much higher profile with the release of the CSR, which had an unprecedented sign up bonus. These sign up bonuses can be elusive, however — they could disappear one day, or have hidden links to a higher version of the sign up bonus (see the Amex Platinum 100,000 links, where the regular bonus is just 40,000 MR — this can be a difference worth upwards of $1000). It can be quite difficult to map websites, especially banking ones, as they will try to obfuscate or otherwise prevent a crawl, not simply to prevent people from reaching these supposedly “targeted” yet public offers, but also due to the nature of banking sites and their security.
I first began with this idea of mapping out sites for churning purposes a couple months ago, when I found every Amex credit card image they have ever offered. After playing around with americanexpress.com, it was easy to notice that the images of your credit cards in “My Account” view were public, and followed a very similar naming scheme to each other. I wrote a quick script to generate all possible combinations, then used wget to download every URL in a txt file. This worked well for the first purposes — if a URL did not exist, the webpage would return a 404 and wget would not download anything. The rest of the images were downloaded to a directory, and only valid images were downloaded. It was a neat proof of concept for crawling a website for churning related purposes, and I created an album on imgur. Notable ones include the Amex Centurion Card and the old Amex Delta design.
The result was fruitful, and we were able to get high quality images of a lot of cards that previously only had blurry pictures from FlyerTalk.
I wanted to take it further, however, starting with an easy goal — Bank of America. They had a simple application page, with a unique 7 digit ID. I used itertools in python to write out a list of 40XXXXX urls (100,000 unique combinations). I started off by attempting to repeat the same process I used with the Amex card images.
However, using wget did not work — I was redirected to an error page, which wget downloaded anyways. This means it would download a directory for every one of the 100,000 links, and would need to be manually parsed. It was also fairly slow — nearly 2 seconds per application page. At 200,000 seconds, that’s 55 hours. That’s also assuming I don’t get rate limited by Bank of America, or otherwise blacklisted. I needed a better way of quickly telling whether a URL linked to a valid application page, and what that bonus was.
It wasn’t until I noticed the HTTP header response codes that I made any progress. A quick glance would show me that any invalid URL would lead to a 302 redirect, a valid site would be a 200, and a rate-limit would be 429. This was great — I didn’t have to download the entire site for each application page, I could just do a simple HTTP HEAD request, which brought my requests per second up to 20. This was 2 orders of magnitude faster, and I was able to parse through the entire list in an hour.
The software I used to accomplish this was called Screaming Frog SEO Spider. It was able to take a txt file with URLs and provide nearly any information on the webpages — response codes, redirect URI, analytics, etc.
I started getting rate-limited though — after about 750 links parsed, I would begin to get 429 errors, which meant that Bank of America would no longer serve me webpages, as my IP was requesting too many pages.
To overcome being rate limited, I set up tor. This worked, but meant I had to manually change tor nodes and circuits every time I got 429'd. I was able to write a python script to directly interface with tor, though, and managed to force a switch every 60 seconds. I’d randomize my IP, pull roughly 700 URLS, then switch. Some nodes were faster than others, but the end result was the same — a list of all valid URLs.
There were a few more hiccups along the way, including the fact that Bank of America has multiple formats for their application links. In a certain type of link it would always return a 302, as a redirect was necessary to get to the correct application page. To circumvent this, I noticed that valid URLs had 2 hops, while the ones leading to the error page only had one. A few hacky tricks like this and I was able to get a fairly complete picture of their backend application portals.
From here a user on /r/churning recommended http screenshot to use a headless browser to get screencaps of every site, so we could quickly browse through all the offers and find the best ones.
Through this method, we were able to find a Merrill+ Visa application (normally reserved for those with high balances in Merrill Lynch brokerage accounts), public Amtrak 30k/15k offers, and a few older $200/$500 spend.
Future endeavours will including trying to map out Citi and Chase application portals. Amex looks like it will be much more difficult, as each application is linked to a session ID and to a referrer, so will need a lot more research (or at the very minimum a full headless browser to do the crawl).
Overall a fairly successful venture — web crawling will be really useful for churning purposes, especially since it can make the difference between a $300 bonus and a $500 bonus.
If anyone has any questions feel free to reach out on reddit at /u/JonLuca or by email, email@example.com.