Deeper Swift Web Crawler Script Explanation

Federico Zanetello
Apr 4, 2017 · 2 min read

The script can be separated in fours steps:

  1. Visit a web page
  2. Search for the Word we’re interested in
  3. Collect all the links
  4. Repeat

1. Visiting a Web Page

In this case I’m using Foundation’s URLSession: what we’re doing here is defining an URLSession task where we indicate our webpage url.

After the task definition, we start the task (a.k.a we request the download the webpage) by calling .resume().

Once downloaded, the task invokes its completionHandler where we verify that no errors have occurred and where we finally start parsing the page.

2. Searching for the Word in the Document

This is a small trick: since we can treat the whole webpage as a String, we’re using Foundation’s contains(_:​) to check whether the word is present or not.

3. Collecting URLS

One more trick. There are better ways to analyze a webpage but I wanted to keep the script simple and 100% independent from any third-party libraries.

What we’re doing here is using NSRegular​Expression to find all the document urls. Please note how my regular expression fails to detect any relative url path, and urls that don’t start with http: feel free to submit a PR!

Once we have all the urls, we return the whole collection (that later is added to the web pages to visit).

If you want to know more about NSRegular​Expression in Swift, I suggest you this great article by Nate Cook.

4. Repeat

All of the steps above are repeated everytime we call crawl().

This function first checks whether we have visited enough pages (visitedPages.count <= maximumPagesToVisit) and if we have any other web pages to visit (guard let pageToVisit = pagesToVisit.popFirst()).

In case we pass both controls, then the function checks if we have visited this new pageToVisit already: if we haven’t, we jump to step 1, otherwise we call crawl() again.

That’s all! Happy scripting! 😊

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store