Elasticsurf is not a Made Up Word

Sam Tarakajian
Barnes Foundation
Published in
5 min readMay 11, 2017

I want you to imagine something. Imagine you’re out on the ocean, surfing. You catch a big, beautiful wave, and then, riding it to shore, with the sunlight glinting gold off the water and the taste of sea salt in the air, you wonder, “What would the exact opposite of this experience be?”

Here’s a hint

Remember, we’re working with the Barnes to bring the experience of visiting their collection in person to the browser. To do that, we want the collection to feel less like a spreadsheet and more like a big, smooth ocean. We want to help users to get into a state of art flow—picking a direction, surfing through the collection, finding new things without having to think too much about how the data is organized. In other words, a painless search. A fun search. Elasticsearch.

In a previous post, I talked quite a bit about why TMS wasn’t that great for our purposes. But even assuming we’re all in agreement about that, what makes Elasticsearch so great? We think there are four main reasons.

I’m in Love with the Shape of our Data

Maybe you’ve worked with a Relational Database Management System (RDMBS) before, something like MySQL. One of the challenges with a system like this is adding new data, especially when the shape of the data changes from one iteration to the next. Suppose we’d already created a collection database using something like this:

But now, we decide that we want to add a new item to the database, only this one not only has an id, name, and description, but also a maker.

<sarcasm>Great.</sarcasm>

Now we need to add a new column (what’s the syntax for that again?), pick a default value for that column (should it be null? an empty string? how many characters should it be?) and then finally we can insert our new entry into the collection.

The way I see it, the problem is that we have to take our data, which is already nice, structured JSON, and translate it into SQL queries in order to update the database. Allow me to contrast this with how you add new data to an Elasticsearch index (“index” means “database” in Elasticsearch).

That’s it. No worrying about types or varchar lengths, primary keys or indexes, or whether anything is in the store already. Like many people, we may not know right from the start exactly how our data should be structured. As we work, typically we make a guess, find a bug in our prototype and then update our model. Having to organize a data migration every single time would be at best a headache and at worst a massive time-suck.

It’s Fast

We’re interested in two basic things here: how fast can we build a new index from our CSV data files, and how fast can we execute queries? Well, we’ve got about 5000 rows of data, and the whole thing takes about 5 seconds to import into Elasticsearch. Mind you, that includes tokenizing and stemming all the text (more on that in a second) and building an inverted index on all our words.

As for queries, anecdotally I can tell you they’re really fast too. Of course, 5000 rows is quite small for a database, so perhaps much larger collections would have to employ some optimization.

For more on how fast Elasticsearch is, check out more writing on the subject.

It Scales

Elasticsearch has been, in my personal experience, almost too eager to scale up. The first time I started Elasticsearch I accidentally split my data across 5 shards (Elasticsearch distributes its data across stores called “shards”), each of which had its own backup shard. Not particularly useful when I’m just trying to run search on my laptop, but it does mean that wherever we choose to deploy Elasticsearch, it’s going to be able to scale up to whatever size we need.

Full-Text Search

I like beards. Who doesn’t?

Portrait of a Man with Beard and Ruff Collar. Judith Leyster

Say you wanted to find all of the paintings in the Barnes collection that contained a beard. Luckily for you, each object in the collection has a description. So you could quite easily fire off a query like this one:

There’s just one problem: what if a collection description includes the word beards? Or bearded? Or beardelicious? Unsurprisingly, our search for descriptions containing the word “beard” will miss this one. Luckily, it’s extremely easy to configure Elasticsearch to create a field for us that is stemmed, meaning Elasticsearch will do the work of converting the word “bearded” to its stem, “beard,” before storing the word. This is done by creating an Elasticsearch mapping. First, you need to create an analyzer that Elasticsearch can run on the text before indexing it.

Later, in the mapping for the field you’d like stemmed and tokenized, you apply this analyzer.

With this mapping in place, the Elasticsearch server will automatically analyze the description of each object, extracting the stems of each word according to the English language stemmer. By the way, this stemmer wasn’t something we wrote or installed or even configured—it comes prepackaged with Elasticsearch along with stemmers for French, German, Armenian and a whole bunch of other languages as well.

As a result, we can search for the string beard and return results that contain the string bearded. In the case that the original search query was the string bearded, we can even search for the stem beard but prioritize the full string bearded, since we’ve stored both the stemmed as well as the unstemmed description:

Using this query, I was able to find this highly bearded painting.

The Postman (Joseph-Étienne Roulin). 1889. Oil on canvas. 25 7/8 x 21 3/4 in. (65.7 x 55.2 cm). BF37

In short, Elasticsearch is making it easy and, dare I say, fun, to play with the data in the Barnes collection. Analyzing, comparing and discovering relationships in our data is feeling less like staring at a spreadsheet and more like surfing. Surfing on an ocean of art.

If you’re curious, why not try setting up Elasticsearch yourself? Next time we’ll talk about Kibana, and some of the fun stuff that we’re doing with our data on the back end.

The Barnes Foundation collection online project is funded by the Knight Foundation and our code is open source. Follow the Barnes Foundation on Medium.

--

--