Full-text search and indexing with Bleve


The problem

The well know ElasticSearch and Apache Solr, written in Java, are complete solutions, but I don’t want to execute the JVM (a.k.a. memory lover) on my server. Yes, I’m cheap.

The lover

Go. It’s an easy to learn, simple, robust and performative programming language. I’ve been using Go for almost everything (I still have some Ruby on Rails/PHP/Python running around).

Google it: golang and meet the happiness.


The solution

Bleve: It’s a text indexing package for Go. Yes, a package! You don’t need an extra service + connector/library to have text indexing with scoring, faceting and highlights in your service.

The package is a mix of pure Go features and wrappers to some C/C++. In the article, We’ll only use the pure Go features. If you need a little bit more power: go to the Bleve build docs and learn how to do it.

To exemplify the use, I’ll create an “Event Finder”. I don’t know your database preferences (I’ve been living inside MongoDB/Redis databases on the last 2 years), but I know something who everyone can run… SQLite!

The objective is to create some events, create an index for them and retrieve some data using the Bleve Search.


Let’s code!

What? Don’t you have Go on your machine? Shame on you! https://golang.org/doc/install

First, setup our project workspace.

# My Go studies project/folder
mkdir -p ~/Workspace/Go
cd ~/Workspace/Go
# Exporting the env var GOPATH to the actual directory
export GOPATH=`pwd`
# Creating and accessing the project folder
mkdir -p ~/Workspace/Go/src/github.com/nassor/studies-blevesearch
cd ~/Workspace/Go/src/github.com/nassor/studies-blevesearch

Now let’s download the package to our $GOPATH environment. The purpose isn’t database implementation, so to not loose too much time working with the SQL database, we will use the ORM package called gorm.

go get github.com/blevesearch/bleve/... # bleve package
go get github.com/mattn/go-sqlite3 # sqlite3 package
go get github.com/jinzhu/gorm # orm package

Everything is ready, so let’s learn how to create an index connection. We’ll create a BleveConn method to create or connect to the indexing persistence. The Bleve Search uses the BlotDB as the default persistence, but you can choose others.

package conn

import "github.com/blevesearch/bleve"

var bleveIdx bleve.Index

// Bleve connect or create the index persistence
func Bleve(indexPath string) (bleve.Index, error) {

// with bleveIdx isn't set...
if bleveIdx == nil {
var err error
// try to open de persistence file...
bleveIdx, err = bleve.Open(indexPath)
// if doesn't exists or something goes wrong...
if err != nil {
// create a new mapping file and create a new index
mapping := bleve.NewIndexMapping()
bleveIdx, err = bleve.New(indexPath, mapping)

if err != nil {
return nil, err
}
}
}

// return de index
return bleveIdx, nil
}

Ref: github.com/nassor/studies-blevesearch/conn/bleve.go Ref: github.com/nassor/studies-blevesearch/conn/bleve.go

Let’s start with our model. It is a classic Event data:

// Event is an event! wow! ;D
type Event struct {
ID int
Name string
Description string
Local string
Website string
Start time.Time
End time.Time
}

// Index is used to add the event in the bleve index.
func (e *Event) Index(index bleve.Index) error {
err := index.Index(string(e.ID), e)
return err
}

Ref: github.com/nassor/studies-blevesearch/models/event.go

I add a method called Index, it receives as a parameter a bleve.Index struct and return an error. The method is in charge to add the event in the index.

The bleve.Index.Index() method accept only string to identify. Because we are using the Default IndexMapping, all the fields in the type Event will be indexed.

Below, the code I use to test the functionality:

func TestIndexing(t *testing.T) {
_, eventList := dbCreate()
idx := idxCreate()

err := eventList[0].Index(idx)
if err != nil {
t.Error("Wasn't possible create the index", err, ballotX)
} else {
t.Log("Should create an event index", checkMark)
}

idxDestroy()
dbDestroy()
}

Ref: github.com/nassor/studies-blevesearch/models/event_test.go

Simple, isn’t? To retrieve the data we need a little bit more steps, but not too much. :)

func TestFindByAnything(t *testing.T) {
db, eventList := dbCreate()
idx := idxCreate()
indexEvents(idx, eventList)

// We are looking to an Event with some string which match with dotGo
query := bleve.NewMatchQuery("dotGo")
searchRequest := bleve.NewSearchRequest(query)
searchResult, err := idx.Search(searchRequest)

if err != nil {
t.Error("Something wrong happen with the search", err, ballotX)
} else {
t.Log("Should search the query", checkMark)
}

if searchResult.Total != 1 {
t.Error("Only 1 result are expected, got ", searchResult.Total, ballotX)
} else {
t.Log("Should return only one result", checkMark)
}

event := &Event{}
db.First(&event, &searchResult.Hits[0].ID)

if event.Name != "dotGo 2015" {
t.Error("Expected \"dotGo 2015\", Receive: ", event.Name)
} else {
t.Log("Should return an event with the name equal a", event.Name, checkMark)
}

idxDestroy()
dbDestroy()
}

Ref: github.com/nassor/studies-blevesearch/models/event_test.go

You have access to the code here. Pull the code, change it and have fun. \o/

And that’s it! I’m here studying how to make more complex searches. When I have something new, I’ll write the second part.

P.S.: Writing this article I had the idea to use the BlotDB as the main persistence, a simple key/value instead a SQL/NoSQL/NewSQL database. The point came with this thought: “why I need a second query structure if I’ll use Bleve to do queries?”. I’ll write an article about my studies about that too.


Originally published at nassor.me.