Check for broken links on your website using a Postman Collection

We are in the process of refreshing the online documentation for the Postman app. As we introduce new documentation pages, we could manually test every link to make sure it’s working, but that’s boring and time-consuming. If you can throw together a few lines of code, then you too can build a handy dandy link checker using a Postman Collection.

Let’s create a collection to automatically crawl all the pages on our website and check every link for a healthy HTTP status code.

We can do this with 2 simple requests, run together as a collection.

  1. Initialize: The first request will kick things off. Under the Tests tab, we will use the setEnvironmentVariable() method to establish some important environment variables to be used in the subsequent request.
     – “links” – an array to contain all the links to be checked
     – “index” – an index to iterate through the “links” array
     – “url” – an initial URL to start our page crawler

2. Check URL: When you send a GET request to a webpage, an HTML representation of the page will be returned, including all the HTML anchor tags for hyperlinks on the page. We can collect, or scrape, these links, and store them in an array to be checked. And so on, and so forth, we can continue looping through every page and scraping every page’s links until we’ve crawled the entire site and checked every link.

So we have 2 requests, the first to set environment variables, and the second to crawl our pages and then scrape and check every link. The second request does the heavy lifting and will continue looping through every page until every link has been checked.


This process illustrates 2 important capabilities of the Postman app: HTML scraping and branching and looping.

HTML scraping

Finding all the links on a page requires scraping HTML. The Postman Sandbox supports Cheerio as a library for scraping HTML elements. Read more about using the Postman Sandbox and other libraries and utilities supported in the pre-request and test scripts sections.

Branching and looping

The setNextRequest() method accepts a request name or id within the same collection as a parameter. Use this method to establish a workflow sequence and designate which request in the same collection to run next, instead of defaulting to the linear execution. Read more about building workflows.

In this example, we will call the same request, again and again, until all the links have been checked.

Quickstart

Click the Run in Postman button to import the sample collection and environment template into your Postman app, and check out the collection documentation for more details. You should now see the collection in the sidebar to the left and the environment selected in the dropdown in the top right.

  1. Update environment: Click the Quick Look icon in the top right to view and edit the environment variables. This is where you can update the values to check links on your own website. In many cases, your root_url will be the same as your start_url. However, in this example, we will use https://www.getpostman.com/ as our root_url, and start checking links on https://www.getpostman.com/docs/.

2. Open Postman Console: This step is optional. If you want to see a stream of requests and view any logged statements, go to the application menu, and select View > Show Postman Console to open the console in a separate window. Do this before you send any requests or run the collection.

3. Run collection: Click the right angle bracket (>) to expand the collection details view. Click the Run button to open the collection runner in a separate window.

Verify that your collection and environment are selected in the respective dropdowns, and click Start Run to begin running your collection.

You should now see your tests running and passing, crawling all the links until there are no more links to check.


This example of traversing links on a page is similar to how you can use Postman with Hypermedia APIs. Rather than knowing the specifications up front, a Hypermedia API response can provide guidance for the next links to check, using environment variables and conditional logic to loop through the data in a nonlinear fashion.


Originally published at blog.getpostman.com on June 22, 2017.