How to Make a Web Crawler in Swift 🕷
Federico Zanetello
1875

Deeper Swift Web Crawler Script Explanation

The script can be separated in fours steps:

  1. Visit a web page
  2. Search for the Word we’re interested in
  3. Collect all the links
  4. Repeat

1. Visiting a Web Page

In this case I’m using Foundation’s URLSession: what we’re doing here is defining an URLSession task where we indicate our webpage url.

After the task definition, we start the task (a.k.a we request the download the webpage) by calling .resume().

Once downloaded, the task invokes its completionHandler where we verify that no errors have occurred and where we finally start parsing the page.

2. Searching for the Word in the Document

This is a small trick: since we can treat the whole webpage as a String, we’re using Foundation’s contains(_:​) to check whether the word is present or not.

3. Collecting URLS

One more trick. There are better ways to analyze a webpage but I wanted to keep the script simple and 100% independent from any third-party libraries.

What we’re doing here is using NSRegular​Expression to find all the document urls. Please note how my regular expression fails to detect any relative url path, and urls that don’t start with http: feel free to submit a PR!

Once we have all the urls, we return the whole collection (that later is added to the web pages to visit).

If you want to know more about NSRegular​Expression in Swift, I suggest you this great article by Nate Cook.

4. Repeat

All of the steps above are repeated everytime we call crawl().

This function first checks whether we have visited enough pages (visitedPages.count <= maximumPagesToVisit) and if we have any other web pages to visit (guard let pageToVisit = pagesToVisit.popFirst()).

In case we pass both controls, then the function checks if we have visited this new pageToVisit already: if we haven’t, we jump to step 1, otherwise we call crawl() again.

That’s all! Happy scripting! 😊

Like what you read? Give Federico Zanetello a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.