Efficiently Test Your Sitemap URLs with Cypress

Boris Selivanov
SSENSE-TECH

--

Sitemaps: A Brief Introduction

A sitemap is an XML file that provides a structured list of URLs for a website. It serves as a roadmap for search engines like Google and Bing, making it easier for them to crawl and index a site’s content. Sitemaps help improve a website’s visibility and search engine optimization (SEO) by ensuring that search engines are aware of all the pages on the site, including those that might not be discovered through the normal crawling process.

- The URL itself

- The last modification date of the URL (optional)

- The priority of the URL relative to other URLs on the site (optional)

- The change frequency of the URL (e.g. daily, weekly, monthly) (optional)

By submitting sitemaps through platforms such as Google Search Console or Bing Webmaster Tools, you can ensure the efficient and effective indexing of your website’s content by these search engines.

Why Test Your Sitemap?

Testing your sitemap is a crucial step in maintaining a healthy and high-performing website. Just as you would regularly check your car’s oil or tire pressure, regular sitemap testing helps to ensure that your site continues to function optimally.

Errors in a sitemap, such as incorrect or non-existent URLs, can result in search engines failing to correctly crawl and index your website. This leads to lowered visibility in search engine results, impacting your website’s organic traffic and overall SEO ranking.

By running regular sitemap tests, you can detect and resolve any errors before they become larger issues. This proactive approach not only helps you maintain your site’s SEO performance but also ensures a better experience for your users as they can smoothly navigate through your website without encountering broken or non-existing pages.

Hence, testing your sitemap is not just a recommended practice but a vital part of maintaining your website’s overall health and performance. By utilizing testing tools, developers can easily validate each URL in a sitemap, ensuring that all pages are accessible and function as intended. This, in turn, contributes to a more reliable and user-friendly web application.

While there are many tools available for this purpose, Cypress stands out because of its versatility and comprehensive testing capabilities. Let’s dive into how Cypress can be effectively used to enhance the efficiency of testing your sitemap URLs.

Leveraging Cypress to Validate Sitemap URLs

A sitemap is not just a navigation guide for search engines, it also serves as an excellent resource for developers to validate the functionality of every URL it contains. Let’s look at how Cypress can be used for this purpose.

To initiate the process, you need to access the sitemap either by opening the sitemap URL directly or by sending a request to it and checking if it has a ‘body’ property. Once the sitemap file is accessible, the next step is parsing the XML to extract each URL it references. There are several libraries in JavaScript that can assist with this. One such tool is x2js, an npm package which simplifies the conversion of XML content into a JavaScript object. This conversion makes it easier to manipulate the XML content. Once the XML parsing is successful, you will receive a list of URLs included in the sitemap.

First, let’s look at the code:

Now, let’s break down the actions that this script performs.

  • Import the x2js library: The script starts by importing the x2js library, a handy tool that converts XML content into a JavaScript object.
  • Fetch the sitemap: Using the cy.request method, the script sends a request to fetch the sitemap.
  • Extract the body from the response: The its(‘body’) command retrieves the body of the response, which contains the XML of the sitemap.
  • Convert XML to JSON: The x2js.xml2js(body) command converts the XML content into a JavaScript object for easier manipulation.
  • Log the JSON: For debugging purposes, the script logs the resulting JSON object to the console using cy.log(json).
  • Assert the presence of URLs: The script confirms that the list of URLs exists and contains at least one URL. This is done using expect(json.urlset.url.length).to.be.greaterThan(0). It also confirms that the json.urlset.url is an array.
  • Iterate through the URLs: For each URL in the list, the script performs several actions:
  1. Parse the URL: The script creates a new URL object and extracts its pathname.
  2. Log the pathname: It then logs the pathname for debugging purposes.
  3. Request the URL: The script sends a HEAD request to the URL using cy.request(‘HEAD’, parsedUrl.pathname). If a URL does not exist, the script will fail at this point, a feature built into the cy.request method.
  4. Visit the URL: Finally, the script visits the URL using cy.visit(parsedUrl.pathname). This not only checks if the URL exists but also if the web page can be fetched, the type text/html, if it loads successfully, and if it throws any JavaScript errors.

By following these steps, the script efficiently verifies every URL in the sitemap, ensuring your web application is robust and reliable. This code provides comprehensive coverage of the website’s sitemap and quickly identifies and addresses any issues that might impact the user experience or search engine indexing.

Testing Each Sitemap URL in Separate Cypress Tests

Now, let’s delve into our main focus of using Cypress to test each URL from the sitemap in its own separate Cypress test. First, let’s have a look at the complete code snippet:

This code imports the cypress-each plug-in, extracts the sitemap URLs from the Cypress environment, and then uses the it.each method to create a separate test for each URL.

Let’s break it down:

  1. Import the cypress-each plug-in: The cypress-each plug-in extends Cypress’s global it function with an each method. This method allows us to create a separate test for each item in an array. In our case, we will use it to create a separate test for each URL in our sitemap.
  2. Define the test suite: Our test suite is called ‘Sitemap’ and it will contain our test cases for the sitemap URLs. Inside the test suite, we access the sitemap URLs that were fetched and passed to the test environment and we parse the URLs to extract only the pathname for each one.
  3. Run a setup check before the tests: We use the before hook to run a setup check before executing the tests. This ensures that the sitemap URLs were fetched correctly and are in the expected format.
  4. Create a separate test for each URL: We use the it.each method provided by the cypress-each plug-in to create a separate test for each URL in the urls array. The cy.visit command is used to navigate to each URL, verifying that the page loads successfully.

By using this approach, we ensure that each page of our web application is tested independently. This helps to quickly identify issues and fix them without affecting other parts of the application. Even if one test fails, the remaining tests will still be executed, providing comprehensive coverage of the website’s sitemap.

Exploring Error Scenarios: A Real-World Example

One of the common issues that web developers frequently encounter is dead or broken links. These links lead to a ‘404 Not Found’ page, creating a frustrating experience for visitors and potential deterioration in SEO ranking.

Let’s illustrate how the above script aids developers in catching and resolving such issues proactively.

When the script makes a HEAD request to each URL in the sitemap, it verifies the existence of the corresponding page. If a URL responds with a ‘404 Not Found’ status, the test fails. This failure acts as an alert for developers, flagging the problem to be fixed promptly.

Cypress offers a robust error-handling mechanism to aid developers. If cy.request or cy.visitmethods encounter a ‘404 Not Found’ error, Cypress automatically fails the test and logs an informative error message. This error message is highly detailed, providing developers with insights about the issue, and the specific location of the problem.

For instance, a typical error message from Cypress might appear as follows:

cy.visit () failed to try to load: https://www.ssense.com/404-not-found

The response we received from the web server was:

> 404: Not Found

This message indicates what went wrong and where, making it easier for developers to swiftly diagnose and rectify the issue.

The integration of Cypress into the testing workflow effectively automates the process of checking for broken links. It not only saves significant time and effort for developers but also ensures an enhanced experience for site visitors, while maintaining optimal SEO performance.

Conclusion

In this article, we’ve discussed how to use Cypress to check and verify every URL in a sitemap, and how to utilize the cypress-each plug-in to test each URL in a separate Cypress test. For a more interactive learning experience, consider watching Gleb Bahmutov’s video on Using cypress-each to Create Separate Tests, which provides additional insights into this plug-in.

By following these steps, you can ensure that your web application’s sitemap is accurate and that each page is functioning as intended. This will ultimately lead to a more robust and reliable web application.

Editorial reviews by Catherine Heim & Mario Bittencourt

Want to work with us? Click here to see all open positions at SSENSE!

--

--

SSENSE-TECH
SSENSE-TECH

Published in SSENSE-TECH

Ideas and research from the software, data & product teams behind the global fashion platform SSENSE.