Crawling website using Laravel Dusk Spider.

Tushar Gugnani
Feb 27 · 3 min read

I have created a simple web spider using Laravel Dusk, this spider goes though all the links in a website and get’s its title, content and status code and stores it in local database.

What is Laravel Dusk ?

Laravel Dusk is an end-to-end browser automation tool provided by Laravel. This official package has the ability to visit your web-application or any other website same in the browser, which is very similar to an actual user operating your website.

Although the primary purpose of dusk is automation testing it can also be used for web scraping.

What’s the use of this spider tool?

As an example to get started with scraping I have created this simple tool that goes through all the website in your application.

  • Check broken links on your website.

Uses are enormous

Installation of Laravel Dusk

Installation of dusk package is fairly simple. Add the dusk composer dependency to your laravel project.

composer require --dev laravel/dusk

Once the dependencies are installed, you can now go ahead and install the dusk which will generate default scaffolding in your project.

php artisan dusk:install

Preparing Migration and Database Table

Make sure your project is connected to a database. I am using a mySql database for this project.

To store the crawled data into the database we just need a single table named pages. Let’s generate a model and a migration file for the pages table.

php artisan make:model Page -m

Let’s modify the migration file to include the required columns so that it looks like this

Apart from the obvious column names, status will be used to store the HttpStatus code returned by the page url and isCrawled is used to track weather the page is been crawled or not.

Dusk Spider Test

Let’s start writing spider test in dusk by generating a new dusk test.

php artisan dusk:make duskSpiderTest

This content goes in duskSpiderTest file

Let’s understand the code in brief

  • Specify the $startUrl and $domain as per the website you are trying to crawl.

That’s about it !

You can run the dusk test from CLI

php artisan dusk --filter urlSpiderTest

If you want to see the test run in browser, comment off the headless mode in DuskTestCase.php class

Here is a little video of the spider crawling my blog in local environment

Displaying Results

Details of crawled pages are available at pages table in our database.

You can make use of Eloquent pagination to display data in the table.

Crawled Results via Dusk Spider.

The code is available at

If you are looking learn more about laravel dusk , You can checkout tutorial series I have written on my blog

https://www.5balloons.info/introduction-to-laravel-dusk/

Feel free to ask me your questions/bug reports in the comment section 🙂

Tushar Gugnani

Written by

Web Developer and Blogger at https://www.5balloons.info