Handling Broken links in Selenium Web Driver

Archana Gore
4 min readDec 7, 2022

--

By Archana Gore

What are broken links?

Broken links on a web page are links that are not reachable or no longer work due to some issues.

Broken links are also called dead links.

When we click on such links, sometimes we see error messages like…

Above are some examples of error messages. Sometimes we don’t see any message. These are invalid HTTP requests and have 4xx and 5xx status codes.

Reasons for broken links on a Web page:

The reasons for having broken links in a web page can be…

1. A web page is moved without adding a redirect link

2. The target page no longer exists, moved, or down

3. The user has entered a misspelled URL

4. The target page has been removed from the website

Why do we need to identify broken links?

We should always check for broken links on a website so that users won’t land on an error page.

We cannot find any broken link until we click on it. Finding such links manually is a tedious task, especially when a site contains many links.

Finding such links using an Automation script will be a better solution.

How to identify broken links?

Let’s understand the generic concept of finding all the links on a web page.

Links are always implemented on a web page in HTML Anchor (<a>) tags are a value of the ‘href’ attribute.

Here, we need to understand what happens when we click on a URL.

So, when we click on a URL browser sends an HTTP request to the server. The server processes the request and sends back a response. Each response has some status code.

In order to check broken links, we need to check if the ‘href’ attribute has some value, then only we can hit that URL to the server. And as a server response, if the status code is greater than or equal to 400, it is a broken link.

Note: If the href attribute doesn’t have any value we simply ignore it. We cannot check if it is a broken link.

How to find broken links on a web page in selenium?

For checking broken links we need to follow some steps:

1. Instantiate web driver

2. Navigate to an application

1. Capture all of the elements in <a> tag using findElements() method and store them in List.

2. By using advanced for loop traverse through href attributes and check if it is null or empty.

3. Convert attributes from String format to URL using URL class.

4. Open a connection with the server using HttpURLConnection class.

5. Connect it to the server using connect() method.

6. Get server response using the getResponseCode() method.

7. Verify if it is a broken link or not

Code to find broken links on a web page:

import java.io.IOException;

import java.net.HttpURLConnection;

import java.net.MalformedURLException;

import java.net.URL;

import java.time.Duration;

import java.util.List;

import org.openqa.selenium.By;

import org.openqa.selenium.WebDriver;

import org.openqa.selenium.WebElement;

import org.openqa.selenium.chrome.ChromeDriver;

public class HandlingBrokenLinks {

public static void main(String[] args) throws IOException {

WebDriver driver=new ChromeDriver();

driver.manage().timeouts().implicitlyWait(Duration.ofSeconds(10));

driver.get(“https://opensource-demo.orangehrmlive.com/web/index.php/auth/login");

driver.manage().window().maximize();

//capturing all the elements in tag ‘a’ and storing them in list

List<WebElement> links=driver.findElements(By.tagName(“a”));

System.out.println(“Total number of links:”+links.size());

int brokenlinks=0;

//traversing through all the links to get href attribute

for(WebElement li:links)

{

//storing href attribute in a string variable

String hrefAttb=li.getAttribute(“href”);

//checking if href attribute value is null or empty

if(hrefAttb==null || hrefAttb.isEmpty())

{

System.out.println(“href attribute value is empty”);

continue;

}

//converting href attribute values from string to URL

try {

URL url=new URL(hrefAttb);

//opening URL connection with server

HttpURLConnection conn=(HttpURLConnection)url.openConnection();

//connecting to server

conn.connect();

//getting response from server

if(conn.getResponseCode()>=400)

{

System.out.println(hrefAttb+” — “+conn.getResponseMessage()+” It is a broken link \n”);

brokenlinks++;

}

else

{

System.out.println(hrefAttb+” — “+conn.getResponseMessage()+” Not a broken link”);

}

}

catch(MalformedURLException e)

{}

}

System.out.println(“Total number of broken links:”+brokenlinks);

driver.quit();

}

}

The output of the above program:

Console output: we have got a total of 5 links from a webpage, 1 is a broken link with an error message. And rest 4 are not broken links.

All of the web page links should work properly to avoid a bad user experience. So, this way by checking the status code of each and every URL in a webpage we can find whether the particular link is a broken link or not.

Happy Learning!!!

--

--