Indexing Geo-spacial Data with Lucene 6

Janaka Chathuranga
4 min readOct 21, 2016

Let’s think you have a database. It has a table which has Geo-spacial data. I mean ‘Latitudes and Longitudes’ ;) Also, frequently you want to query on the table depending on this Latitude and Longitude values.

Querying using a non-indexed field on a table is not a good idea. So you need to index that column, otherwise your queries will take a while to complete. The question is that how are you gonna index them?

Let me give you an example. Let’s assume that there is a table called student. It has a column called height. Will you index this column.

Hell yeah!

If you access them frequently, if you search students by height or sort them by height, It is really helpful. Since the height do not change frequently the time it takes to update the values won’t be a problem too. As an example, indexing will give you the power to search for the shortest student in the table without iterating through all the students.

Then assume you have a table called airport and it has columns saying latitudes and longitudes. They represent the location of the airport. Are you gonna index those fields?

Are you? ;)

Yeah, you want to index them because you need to search airports depending on their location. The problem is that indexing them as separate columns will not be helpful just because you don’t want to search airports by latitude and longitude separately.

Do you wanna search the airport which has the lowest latitude value?

Amundsen–Scott South Pole Station has the lowest latitude value. Almost -90 degrees. ;)

Of course it is an interesting fact but you won’t be doing this kind of searches in most of you applications. No offense Amundsen–Scott South Pole Station. You are doing a great job there..!

Let’s see what kind of queries you do?

What are the airports which are closest to your town? What the airports in the state you live?

To solve above query indexing them in a typical database will not be helpful. For that purpose there are systems called Spacial Databases. They index data using special types of structures designed for multi dimensional space searching. (Example:- k-d tree)

If you are familiar with any of these yeah you can go ahead and use it, but unfortunately most of us (including me) are not familiar with these databases. Further running them as an external data base service is quite difficult to maintain. This is where Lucene 6 comes into play… ;)

Apache Lucene is quite a popular, powerful Java library which is used to index data. At the beginning, the main purpose of Lucene was to index text data, but now it allows you to index different stuff such as Multi-Dimensional Space Data and Geo-Special Data.

If you are not familiar with it don’t worry because it’s Java; pretty simple ;) Add maven-dependency or add jars to the class path then you get the system running.

Don’t blame me if you find if difficult. I am not much of a Java fan either!

Then I will go through a simple example on how to use Lucene’s multi dimensional space search.

First you need to add maven dependency or add jars’ to the class path. If you are working with a maven project add these dependencies.

Yeah, that’s simple

Now we need to index data. Without going around I will show you my sample code part by part.

Here we initialize an index writer and add data to the index.

Here I index the details of Ratmalana Air port, which is an local airport in Sri Lanka. Stored Field is that the value will be stored in the database.

Then comes the today’s special LatLonPoint. It is used to index the location value. First parameter is the Field Name(Similar to a column name in database). Then comes the latitude and longitude values.

As above you can add many documents with different values to the index.

A document in Lucene is an data entity. It’s quite similar to a row in a table. It’s like a single entity. What we add there are fields. Those fields are similar to values or columns in a RDBMS. And please consider there is no schema. So all records do not have to have similar fields.

Then we can create an index searcher and query for above values.

There are two query executions in above code snippet. First one searches for airports within 30km from Colombo. And second one searches for airports within 20 km from Colombo.

Yeah. It’s really simple. Here is the complete code for above example. This is a very simple example and Apache Lucene has so many different queries.

If you are interested you can see some more examples in my git repo. Further refer API docs for advanced information on queries and index types.

If you are interested in running Lucene as a service/server try Apache Solr which provides scalable, highly available, distributed indexing with flexible queries.

Yeah, That’s it!

Thank you for reading. ;)

--

--