More about search as a service

Sam Dutton
5 min readFeb 9, 2018

--

This article is a companion to How to add full text search to your website.

It’s an overview from the perspective of a front end developer — and is in no way comprehensive!

If you’re running a busy site with limited server resources (or not enough sys admins) you might want to use a managed search engine service rather than install and maintain your own.

Many application platforms and database services, such as Firebase and Cloudant, also provide add-on search functionality.

Algolia and Amazon CloudSearch are described below.

Algolia

Algolia is a closed-source search engine available as a managed service, implemented in C++, running as an nginx module. Algolia can also be used to add search capability to Firebase and other databases, CMS and e-commerce platforms.

Algolia is currently available as a free trial (without requiring a credit card!) The getting-started guides, API documentation and tutorials are excellent.

The company claims their engine makes successful ranking easier to configure, and returns results much faster than Lucene-based search engines. (One commentator suggests an overall 10–20x improvement.) In terms of general functionality, Algolia provides features similar to other engines, such as spellcheck and result highlighting. Algolia point out that their engine ‘is designed to index structured data, not large blocks of text’, and that other search engines are better suited to unstructured data.

API clients are provided for JavaScript and other languages and platforms, including Android and iOS, along with UI/UX components and ready-made widgets. Other types of integration are provided for a number of frameworks and libraries, as well as CMS, e-commerce and database platforms.

As with other search engines, either a REST API or a GUI dashboard can be used to upload data and search.

Data records (described as ‘objects’) are uploaded in JSON format, without requiring any extra boilerplate. Algolia give the following example:

[
{
"name": "Monica Bellucci",
"alternative_name": "Monica Anna Maria Bellucci",
"rating": 3956,
"image_path": "/z3sLuRKP7hQVr.jpg"
},
{
"name": "Sean Connery",
"alternative_name": "Sir Sean Connery",
"rating": 746,
"image_path": "/ce84udJZ9QRSR44jxwK2apM3DM8.jpg"
},
{
"name": "Will Smith",
"alternative_name": null,
"rating": 492,
"image_path": "/2iYXDlCvLyVO49louRyDDXagZ0G.jpg"
},
{…}
]

Search results are returned in the same format, but with an objectID value:

{
"name": "Monica Bellucci",
"alternative_name": "Monica Anna Maria Bellucci",
"rating": 3956,
"image_path": "/z3sLuRKP7hQVrvSTsqdLjGSldwG.jpg",
"objectID": "5"
}

‘Searchable attributes’ can be defined and prioritised via the dashboard or REST API. For example, it probably makes sense in this example to prioritise name over alternative_name and not to search image_path. It’s also possible to specify fields to influence ranking — in this example, the rating field can be used to prioritise results (rather than, for example, alphabetical name order).

If required, Algolia can return results in HTML format rather than raw JSON, using a variety of templates. For example, a ‘detailed’ template response could look like this:

<li class=”results”>
<img src=”https://image.tmdb.org/t/p/w154/{{ hit.image_path }}” />
<h3>{{{ hit._highlightResult.name.value }}}</h3>
{{ hit.alternative_name }}
</li>

There is a well-written tutorial explaining how to use the Algolia APIs with JavaScript and other languages and platforms.

You can run the JavaScript port from Node:

npm install algoliasearch — save

Or from a client app:

<script src=”https://cdn.jsdelivr.net/algoliasearch/3/
algoliasearch.min.js”></script>

A search-only client is also available, which weighs in at around 50KB:

<script src=”https://cdn.jsdelivr.net/algoliasearch/3/
algoliasearchLite.min.js”></script>

Initialize the client:

var client = algoliasearch(‘YourApplicationID’, ‘YourAPIKey’);
var index = client.initIndex(‘your_index_name’);

Push data (sadly no Promises yet):

var index = client.initIndex(‘contacts’);
var contactsJSON = require(‘./contacts.json’);
index.addObjects(contactsJSON, function(err, content) {
if (err) {
console.error(err);
}
});

Make configuration changes:

index.setSettings({
'customRanking': ['desc(followers)']
}, function(err, content) {
console.log(content);
});
index.setSettings({
'searchableAttributes': [
'lastname',
'firstname',
'company',
'email',
'city',
'address'
]
}, function(err, content) {
console.log(content);
});

And search!

index.search(‘jimmie’, function(err, content) {
console.log(content.hits);
});

InstantSearch.js provides a library of UI widgets including components to provide a search bar, display results, sort, select filters and do pagination. React, Vue and other frameworks and libraries are also supported.

Find out more

Amazon CloudSearch

CloudSearch is based on Solr; Amazon Elasticsearch uses Elasticsearch. It has a free tier, but you will need to sign up to Amazon Web Services with a credit card.

Create and configure a search domain

An Amazon search domain is a collection of related documents: for example, customer data or a product catalog. This corresponds to an index in Elasticsearch or a collection in Solr. Each domain has a unique URL endpoint.

The CloudSearch Developer Guide provides default configuration settings, along with sample data (5000 listings from IMDB). Setting up a search domain is a four step process:

  1. Configure indexing options and access-policies using an online wizard GUI.
  2. Initialize resources for a domain — this takes around ten minutes.
  3. Upload documents.
  4. Test search from the Amazon CloudSearch console.

You can then add Suggesters for tasks such as auto-completion and fuzzy matching.

Making changes to the index (such as adding a Suggester) can take several minutes, but users can continue to make searches using the old configuration while changes are being processed.

Send search requests to your domain

Each search domain corresponds to an endpoint:

search-foo-uvlaoh4rkf7gxj6maogmqmce2i.eu-west-1.cloudsearch.amazonaws.com

Searches are done via HTTP request to the endpoint, adding an API version and query string. By default results are returned as JSON:

search-foo-uvlaoh4rkf7gxj6maogmqmce2i.eu-west-1.cloudsearch.amazonaws.com
/2013-01-01/search?q=foo

Amazon provides a structured query syntax for more complex requests: for example, to use sorting and numeric ranges:

/search?q=(and genres:'Sci-Fi' year:{,2000])&q.parser=structured&
return=title,year&sort=title asc

Upload data

You create document batch files, with a maximum size of 5MB, then send them to your endpoint. As described in the CloudSearch documentation, each batch file looks like the following example, which adds one document and deletes another:

[
{"type": "add",
"id": "tt0484562",
"fields": {
"title": "The Seeker: The Dark Is Rising",
"directors": "Cunningham, David L.",
"genres": ["Adventure","Drama","Fantasy","Thriller"],
"actors": ["McShane, Ian","Eccleston, Christopher" ...]
}
},
{"type": "delete",
"id": "tt0484575"
}
]

Note the constraints here:

  • Batch files can be no more than 5MB in size.
  • You can’t just upload ‘vanilla’ JSON data: you need to follow the batch file format.

You can submit a document batch via the Amazon CloudSearch console, by making a POST request to the endpoint, or by using the AWS command line interface (CLI).

Use AWS from the command line

First you need to install the AWS CLI — the easiest way to do that is to follow the Bundled Install instructions.

You can then upload document batches using the aws tool:

aws cloudsearchdomain --endpoint-url 
http://doc-movies-y6gelr4lv3jeu4rvoelunxsl2e.
us-east-1.cloudsearch.amazonaws.com
upload-documents --content-type application/json
--documents movie-data-2013.json

Alternatively, you can simply post batches to documents/batch:

curl -X POST --upload-file movie-data-2013.json doc-movies-123456789012.us-east-1.cloudsearch.amazonaws.com/2013-01-01/documents/batch --header "Content-Type:application/json"

Search

You can try out a test search here.

Search results look like this — note that query term highlighting is available by default:

{
"status": {
"rid": "+9HBsIws+sYBClMmzQ==",
"time-ms": 36
},
"hits": {
"found": 90,
"start": 0,
"hit": [{
"id": "tt0369226",
"fields": {
"rating": "2.3",
"genres": ["Action", "Horror"],
"plot": "Based on the video game, Alone in the Dark focuses on Edward Carnby, a detective of the paranormal, who slowly unravels a mysterious events with deadly results.",
"release_date": "2005-01-28T00:00:00Z",
"title": "Alone in the Dark",
"rank": "4404",
"running_time_secs": "5760",
"directors": ["Uwe Boll"],
"image_url": "http://ia.media-imdb.com/images/M/MV5BMTIxNDI5ODY2MF5BMl5BanBnXkFtZTcwNzQ1NzcyMQ@@._V1_SX400_.jpg",
"year": "2005",
"actors": ["Christian Slater", "Tara Reid", "Stephen Dorff"],
"_score": "8.61123"
},
"highlights": {
"actors": "Christian Slater Tara Reid Stephen Dorff",
"directors": "Uwe Boll",
"plot": "Based on the video game, Alone in the *#*Dark*%* focuses on Edward Carnby, a detective of the paranormal, who slowly unravels a mysterious events with deadly results.",
"title": "Alone in the *#*Dark*%*"
}
}
...
]
}
}

Find out more

--

--

Sam Dutton

I am a Developer Advocate for Google Chrome. I maintain simpl.info: simplest possible examples of HTML, CSS and JavaScript. South Australian, living in London.