One of the most common issues when it comes to web scraping and automation is how to best handle HTTP errors when a request fails. Naturally, this can be as easy as wrapping the request function call inside a try-catch block, but when I started my automation project to purchase shoes from a certain vendor, I needed to have a more graceful way of handling errors—the program would be running unattended and with the full power of my credit card information, so it needed to be as solid as possible whenever a bad status code was thrown.
“To every rule there is an exception — and an idiot ready to demonstrate it”
- Vera Nazarian
I approached this project with some caution for a few reasons. First of all, this was my first fully functional purchasing software written in NodeJS. I’m naturally drawn to Python3 and its popular package Requests when it comes to automating a scraper, so this would be an adventure. After some research, I decided to go with Axios as my request workhorse in Node. The most popular approach was to use the Request library (not to be confused with its plural Python counterpart), but I selected Axios for its baked-in promise functionality and its request interceptors, which allow you to write handler functions before the response of a request is returned.
The request interception functionality is powerful because it allows the user to do a wide array of operations on the data before returning it to the caller. For example, this could be used to make all types of validity checks for both outgoing and incoming requests. For my purposes, I needed Axios to be able to handle a set of common error codes that I knew I would encounter.
404naturally indicated a ‘not found’ whenever I was scraping a particular product or series of products.
401indicated that the credentials’ session that I was using were no longer valid and/or had expired.
403indicated that my program had encountered a ‘bot detection page’ and would need to submit a ReCaptcha token, in order to renew the session and continue.
5XXwas quite common during high-traffic times (the particular API that I was scraping was not particularly well scaled during peak load). These
5XXerrors needed a fast-paced retry sequence, in order to get the request through.
The first step was to configure my Axios instance that would handle my session for a series of requests. This is similar to how Request works in Python: we establish the set of behaviors and headers for the object.
Essentially, this is just establishing a base API URL that we’ll be sending our requests to, as well as some headers and options that keep our session consistent. Of course, we could just drop this whole object right into the instantiation call of the Axios instance, but since this was my first project, it felt better to break out the options object.
I also implemented proxies, so as not to expose my IP and allow different instances of my worker class to send requests on different connections.
Once these options were established, I instantiated my Axios instance, naming it a
transport that I would be using to convey my outbound requests.
After this point, I could make calls to
this.transport within my class functions and they would contain the same proxies, options, and headers as all the others. Great! Here’s an example of how I implemented this into a function
guestLogin() that establishes a session and sets the
Authorization HTTP header upon success with a fresh login token.
Note how in this function, I directly modify my headers by accessing the
defaults object, which was set whenever we instantiated Axios.
We haven’t written the interceptors just yet! Let’s rewind back to when we were instantiating Axios, and write our first interceptor just in case the
guestLogin() function were to throw a
500 error, because (maybe) the token server on the other end is swamped.
This part is pretty self-explanatory, but the
null argument replaces what would be a success handler (meaning that the response was in the
3XX range). I don’t need to do anything more at this layer with a successful response, so I’ve just nullified that function for simplicity.
Note: just as the
response can have an interceptor, so can the
request ! Meaning that we can manipulate or validate outbound data and configuration objects before they are sent across the network.
Furthermore, it’s totally possible for us to set the handler functions at the global level as well:
But since we might want to handle other errors on different API’s in varying ways, it’s best to keep your handlers as specific as possible, while still trying to cover as many of your requests as possible. Conceptually, we’re able to do this since all of the calls made through our Axios instance are going to the same API—meaning that erroneous responses will have similar structure and be handled in similar ways. If this program was intended to talk to 10 different APIs with different responses, we’d likely write in error handling at the function call level—like inside
guestLogin() . Or, we’d make a parent class of handlers and subclass them for each individual API!
Once we’ve started writing the function to handle errors, we need to access the contents of the
error parameter, so that we can determine how to handle the response code. Here’s the block that I used to handle
5XX errors and retry them. (I didn’t find a cleaner/simpler solution to recursively making the same call without specifying the request options again, but this works just fine).
Note: make sure you reject your Promise if an error doesn’t match up with your list of criteria. This code needs to be explicit, so following a pattern of
if-elseif-elseif-...-else works well because you guarantee that some component of the handler will execute, even if it’s an error you don’t expect or haven’t seen before.
And that’s all there is to it! We’ve now written in retrying functionality by using Axios interceptors, and we can expand from here: writing in functions to handle a wider array of errors is my recommended starting point, or returning more intimate error data back to your calling functions.
Interceptors make a great space to implement exponential backoff/retry patterns when working with less-than-reliable APIs, and the cost to integrate solutions like this is low because it doesn’t require refactoring existing code inside your functions.
Hopefully this run-down, crash course article has given you a few tools to start using this pattern in your own HTTP layers.
As ever, QuarkWorks is available to help with any software application project — web, mobile, and more! If you are interested in our services you can check out our website. We would love to answer any questions you have! Just reach out to us on our Twitter, Facebook, or LinkedIn.