JavaScript Crawling — How enterprise search works to make Dynamic sites like PWAs and SPAs searchable.

SearchBlox Team
SearchBlox
Published in
3 min readAug 14, 2020

In the past decade with the advancement in browser technologies have brought about a new kind of website known as dynamic sites.

From small applications to large ones, all websites are now moving towards dynamically rendered sites using modern frameworks such as ReactJS, VueJS and a host of others as opposed to static sites which were the norm since the dawn of the internet.

First, let’s look at the difference between a static site and a dynamically generated one. Static sites are sites where all the content of the website is sent out by the server in its final form. Whereas, in a dynamic site, the server sends out JavaScript code that is meant to be run in the customers’ browser to “dynamically” generate the website. They are sometimes referred to as Single Page Applications or SPAs.

These dynamically generated websites present search engines with a huge problem because the bots that search engines use to get their data are now completely ineffective on these dynamic sites.

Search engines have their own web scraping bots that crawl the website for data and index it in their servers — making them searchable. Web scraping bots in general function on a strict request-response model where the bot would make the request to a website and expect complete content to be sent by the server in response.

This method of gathering data by search engines will not work on the modern dynamically generated sites because with dynamic sites, as we already know, the server sends out javascript code that is meant to be run on a browser to generate the final content. This makes the response received by the web scraper completely useless to the search engine.

In all fairness, most websites are dynamic in some form or the other. For example, you can be using a nice font from google fonts and that would be loaded asynchronously and rendered dynamically later on. But this particular example does not take into consideration the fact that all information on the website was already present (or sent from the server). But search engines don’t really care about how a website looks, it’s rather interested in its content.

SearchBlox Dynamic Content Indexing.

SearchBlox, being an enterprise search server, uses web crawling to index data from your website and make it searchable. With the release of SearchBlox 9.2, it supports indexing of dynamic websites. How this is achieved is simple. SearchBlox uses chromium drivers to make its request and receives the response, builds out the webpage using the chromium driver and then uses that content that was dynamically generated to index it into its collections and makes it searchable! For those of you who are wondering what a chromium driver is, it’s the underlying brains behind the world’s most popular web browser — Chrome.

This means, using SearchBlox 9.2, you can index your website, be it static, dynamic (both Single Page applications, and even Progressive Web Pages or PWAs) or import data from database, or index files from Azure/AWS — you can satisfy all your search needs with one single powerful product!

SearchBlox 9.2 offers a free version that is capable of indexing upto 10,000 documents that include the likes of which we discussed above!

If you find any of this interesting, we highly recommend you that you grab our free version of SearchBlox 9.2 that is capable of indexing upto 10,000 documents, be it dynamic websites, databases or many others — try it out!

We have complete details on indexing dynamic sites on our developer documentation here:

--

--

SearchBlox Team
SearchBlox

The official blogging account of the Engineering, Design, and Marketing teams at SearchBlox.