Adding Cloudant search to your static Jekyll blog
Making a static HTML website have dynamic search
I’m a big fan of Jekyll for building static websites. If you’re not familiar with Jekyll, it takes a collection of configuration, templates and source files (I write my posts in Markdown) and transforms them into static HTML files that can be delivered to the world by any web server. Jekyll is built into GitHub Pages so that you can host the source files for your website or blog in a Git repository and have the resultant static web site served out by GitHub Pages without having to manage any server infrastructure yourself. As of May 2018, GitHub Pages now supports HTTPS on your custom domains.
Static sites are fast and easy to manage but without any dynamic server-side components, they may leave your users without features they expect, such as search. In a Wordpress-style blog, the content is served out from a MySQL database and site search is powered by querying that data set.
On a static site, how can you allow your users to search the titles, tags and content of your blog if the data doesn’t reside in a database and there’s no server-side layer that can render dynamic pages? Here’s how it can be done:
- Create a static website with Jekyll and serve it out on GitHub Pages, or another static site hosting service.
- Write some code to poll the site’s Atom feed. A serverless platform like IBM Cloud Functions can be used to run the code periodically.
- Write the Atom Feed meta data into an IBM Cloudant database that has a free-text search index configured.
- Query the Cloudant database directly from the web page whenever a search is to be performed.
Let’s dive into the detail.
Building a blog with Jekyll
There are plenty of guides that show you how to build a Jekyll-powered blog on GitHub Pages or follow the Jekyll documentation’s Quick Start Guide.
Once your blog is setup, make sure it has an Atom feed published at the
/feed.xml endpoint. This is powered by the jekyll-feed plugin.
In order to add a search tool to your static blog we’re first going to need a database of blog post meta data. Using Cloudant as the database, we can store one JSON document per blog post like this:
All of this data can be gleaned from the blog’s
feed.xml Atom feed with two exceptions:
- the _id field needs to be unique — we can use a hash of the URL of the blog post.
- the _rev field is generated by the database and indicates the revision of the document.
First sign up for a Cloudant service and log into the dashboard. Create a new database called
In that database we need to define a Cloudant Search index to answer free-text queries. Choose New Search Index from the menu next to “Design Documents”:
index for every value that is to be searchable:
The index function takes three parameters:
- The name of field to be stored in the index e.g.
- The value to be indexed e.g.
- An options object. When
storeis set to true, a copy of the value is stored unaltered in the index for retrieval at query-time. When
indexis set to
falsethe value is not indexed for search, but is reproduced in the search results.
CORS and effect
If we want to be able to query our Cloudant database directly from a web page, we need to make two further tweaks to the Cloudant configuration.
Firstly we must enable CORS (Cross-Origin Resource Sharing) in the Cloudant dashboard:
Enabling CORS instructs Cloudant to output the HTTP headers that will allow an in-page web request (sometimes called an AJAX request) to proceed without an error. By default, the rules-of-the-road for the web wouldn’t allow a web page to fetch JSON from a different domain name, and CORS is the work-around.
Secondly, we need to make the database readable. You can either make the database world readable (grant
_reader access to everyone) or create an API Key that grants
_reader access to our database of blog post meta data. Both options are accessible from the "Permissions" panel in the Cloudant dashboard:
Now our database is created and set up, we need a script to poll the blog’s Atom feed, convert it to JSON and write it to the Cloudant database.
Atom feed poller
We can write a simple Node.js script to fetch the Atom feed using a handful of npm modules:
The code itself then becomes pretty simple:
main function is passed an object with the following attributes:
BLOGURL- the URL of the blog's Atom feed.
ACCOUNT- the admin username of the Cloudant service.
PASSWORD- the admin password of the Cloudant service.
DBNAME- the name of the Cloudant database to write to.
We can deploy this code to IBM Cloud Functions using the
bx wsk tool (substituting your Cloudant account, password and blog URL):
IBM Cloud Functions now has your polling code and is invoking it every 15 minutes. The script fetches your blog’s Atom feed turns it into JSON ready to be inserterd into the Cloudant database and then writes all the records in a single bulk request. It manages to deduplicate the listings because it uses a hash of the document’s URL as the document id — Cloudant won’t accept two documents with the same
_id so duplicates are rejected.
Our database should contain some documents. Let’s see them by querying our database’s _all_docs endpoint:
You should see a handful of documents.
Now we can query the search index we created earlier:
q=*:* matches every indexed record. The array of
rows returned contains a
fields object containing each item indexed with
store: true during the indexing process.
Imagine we wish to answer a user query for documents matching the search phrase “red apples”, then we can construct a Cloudant Search query to look for “red apples” in the description field:
A better search for this use-case is this:
The above query matches the
description fields against the query string, but attaches greater weight to title matches, than tags or description matches. This use of the
^ operator weights the search results to bring more relevant documents to the top of the results.
We can send this query to Cloudant using
Querying from the front end
The final piece of the puzzle is making the search request from inside a web page. Here there are myriad options:
fetch because it's new and shiny and it couldn't be easier:
All that remains is to loop over the returned JSON’s
The Cloudant blog is an example of this technology in action. It is a Jekyll-powered static website whose Atom feed is being polled by an IBM Cloud Functions action and whose search facility is powered by a Cloudant database containing the post’s meta data.