Automatically generate a sitemap in Laravel

Today my company released a package called laravel-sitemap. There are already a lot of excellent sitemap packages out there. They all have in common that you have to manually add links that must appear in the sitemap. With our new package that isn’t required. It can automatically build up a sitemap by crawling a site. In this post I’d like to explain why we built it and how it works.

Is a sitemap really needed?

In theory sitemaps helps web crawlers from search engines discover all pages of your site. Google’s own documentation has this to say about them:

If the pages of your site are properly linked, web crawlers can usually discover all links. Even so, a sitemap can improve the crawling of your site, particularly if your site meets one of the following criteria:
- Your site is really large. As a result, it’s more likely Google web crawlers might overlook crawling some of your new or recently updated pages.
 — Your site has a large archive of content pages that are isolated or well not linked to each other. If you site pages do not naturally reference each other, you can list them in a sitemap to ensure that Google does not overlook some of your pages.
 — Your site is new and has few external links to it. Googlebot and other web crawlers crawl the web by following links from one page to another. As a result, Google might not discover your pages if no other sites link to them.
 — Your site uses rich media content, is shown in Google News, or uses other sitemaps-compatible annotations. Google can take additional information from sitemaps into account for search, where appropriate.

For bigger sites, where not all pages are not linked (for example a webshop, where not all products are linked in the dom), a sitemap is definitely needed. But for small to medium-sized sites where all url’s are linked properly, I’d conclude, when reading Google’s documentation, that a sitemap is not needed per se. When asking to peers about this and Googling around it becomes clear that there is no consensus if a sitemap is really needed for such sites. If you have an opinion on this or a link to a good blogpost on the subject, let me know in the comments below.

What often gets mentioned however is that sites will be crawled a bit faster if it has a sitemap and you submit it to the various search engines. Also heard quite often as an advantage of having a sitemap is that, in Google’s Search Console you can compare the number of pages in your sitemap versus the number of pages Google has crawled. In this way you can detect if Google is somehow failing to crawl sections of sites that you expect to be crawled.

There seem to be no disadvantages of having a sitemap and you might get, it’s not guaranteed, to enjoy at least some of it’s advantages. That’s why your site probably should have a sitemap. Google has this to say about it in their docs:

Using a sitemap doesn’t guarantee that all the items in your sitemap will be crawled and indexed, as Google processes rely on complex algorithms to schedule crawling. However, in most cases, your site will benefit from having a sitemap, and you’ll never be penalized for having one.

Creating a sitemap

Imagine you have a Laravel app, running at example.com, where every page is properly linked (aka all pages appear in the dom somewhere). The app has a homepage, a contact page, some project pages and some news items. Using our package this is how you could generate a sitemap:

use Spatie\Sitemap\Sitemap;
use Spatie\Tags\Url;

$sitemap = Sitemap::create()
->add(Url::create('/home'))
->add(Url::create('/contact'));

NewsItem::all()->each(function (NewsItem $newsItem) use ($sitemap) {
$sitemap->add(Url::create("/news/{$newsItem->slug}"));
});

Projects::all()->each(function (Project $project) use ($sitemap) {
$sitemap->add(Url::create("/project/{$project->slug}"));
});

$sitemap->writeToFile(public_path('sitemap.xml'));

That’ll work but it’s quite verbose. If you add another content type or another loose page like /contact, you mustn’t forget to add it the sitemap.

Generating a sitemap

To avoid having to manually add links to a sitemap, the package includes a SitemapGenerator. This class can automatically crawl your site and put all the links it discovers in a sitemap.

Using a SitemapGenerator all of the code from the previous example can be replaced by this:

use Spatie\Sitemap\SitemapGenerator;

SitemapGenerator::create('https://example.com')->writeToFile(public_path('sitemap.xml'));

You can easily create an artisan command to create a sitemap and schedule it to run frequently. This will ensure that new pages and content types will be automatically picked up. Here’s how such a command could look like:

namespace App\Console\Commands;

use Illuminate\Console\Command;
use Spatie\Sitemap\SitemapGenerator;

class GenerateSitemap extends Command
{
/**
* The console command name.
*
* @var string
*/
protected $signature = 'sitemap:generate';

/**
* The console command description.
*
* @var string
*/
protected $description = 'Generate the sitemap.';

/**
* Execute the console command.
*
* @return mixed
*/
public function handle()
{
// modify this to your own needs
SitemapGenerator::create(config('app.url'))
->writeToFile(public_path('sitemap.xml'));
}
}

It can be scheduled in the console kernel to be run daily.

// app/Console/Kernel.php
protected function schedule(Schedule $schedule)
{
...
$schedule->command('sitemap:generate')->daily();
...
}

The best of both worlds

You can also combine the two approaches. You can manually add links to a generated sitemap. Here’s an example on how to do that:

SitemapGenerator::create('https://example.com')
->getSitemap()
->add(Url::create('/extra-page')
->add(...)
...
->writeToFile($path);

Limitations

Our package is targeted at small to medium-sized apps. According to the specification a sitemap can hold up to 50 000 items (if have more links you’ll need a sitemap index). There are also specific link types for video’s, image’s, etc… The package currently does not have support for sitemap indexes and these other types of links because it’s not needed for any of our projects. I’d accept a PR that adds these things to our package.

Here are some alternatives that already support these features (but they don’t include the crawler from our package)

Further reading

If you want to know more about sitemaps in general, take a look at these posts (provided by my colleague Jef)

In conclusion

If you need a sitemap for your small to medium sized app, laravel-sitemap can probably help you. Take a look at the package on GitHub to learn all the features not mentioned in this blogpost:
 — customizing the properties of a link in the sitemap
 — leaving out some links
 — preventing the crawler from crawling parts of your site

Be sure to also take a look at the list of Laravel packages we’ve previously made.