Communicating with Elasticsearch from Google App Engine

Published in

Devs @ FOODit

7 min readApr 13, 2015

Google App Engine promises to reduce all of the hard work of developing, deploying and scaling Java applications by allowing developers to focus purely on code. However, nothing is free in this world and the sandboxed environment Google provides for running applications can cause issues with using third party libraries not specifically designed with Google App Engine (which to be fair is 99.9% of them). Here we take a look at the problems we had with communicating with an elasticsearch server from within the Google App Engine sandboxed environment.

The full source code for this blog can be found at https://github.com/FOODit/Blog-20150413-Communicating-with-Elasticsearch-from-GAE.

Elasticsearch

Elasticsearch is a great search server that we have been using for a while at FOODit to search for meals. It comes with an excellent API for both indexing and search that is exposed via a REST interface and officially supported clients for a variety of languages.

Using Java primarily for our backend applications, using the native Java client for elasticsearch seems like a no brainer. The client comes with a number of great interfaces and features for developers to take advantage of. Creating complex search requests is relatively easy and straightforward with the query API found under org.elasticsearch.index.query.Indexing documents is a breeze with the Update API.

One of my favorite features is the ability to add the client as a node to the elasticsearch cluster. This means that whenever the client wishes to perform an operation on the cluster (e.g. perform a search or index a document) the client will avoid performing a “double hop”. Since the client is part of the cluster, it knows on which shard to perform operations. For example, an index operation will automatically be run on the shard that will end up storing the data.

The problem with Google App Engine

These features all sound great, and as mentioned above, when interacting with elasticsearch from a Java application, using the official client is a no brainer. However, when running your Java application on Google App Engine a spanner is thrown into the works. Using the elasticsearch Java client to connect to an elasticsearch server doesn’t actually work. If you follow one of the many examples of how to use the elasticsearch client and try to run your code in the GAE runtime, you will be faced with stack traces like this:

The restrictions put in place by GAE to make our deployments and management easier have resulted in us not being able to use a useful third party library. Anybody that has used GAE will probably be used to these sort of issues. After all, we can’t even use Apache HttpComponents on GAE.

So what do we do? We could use the extremely power REST interface directly. After all, anything that is possible through the Java client will also be available through the REST API. However, this means that we are going to have to start writing our own code for generating the JSON required by the REST API. The stuff that has already been provided to us by the Java client.

Using the parts of the client that work on GAE

After a bit of investigation we managed to uncover the classes from the elasticsearch Java client that GAE didn’t like. The main issue GAE was having was with the class org.elasticsearch.client.Client and the classes used to set the client up. Take these away and it’s possible (with a few modifications) to get the elasticsearch client up and running on GAE.

Now, org.elasticsearch.client.Client is actually a rather important class. All of our search and index requests to elasticsearch are normally going to derived from this class. So then, how are we actually going to communicate with elasticsearch? The answer, which we have found and have been using for a while is to use an hybrid approach of using the client wherever possible, but falling back to the REST API whenever we actually to need to talk to elasticsearch. This way, we still get to use the really powerful elasticsearch query API whilst still running on GAE.

How it works

For searching we try to use the Java client as much as possible. Unfortunately, for indexing this is not very easy on GAE, so we don’t have much choice but to talk to the REST API without any help. When searching, we can actually still use a large part of what is provided in the Java client. For both cases, we send requests to elasticsearch ourselves via an HTTP request. We issue HTTP requests using the google-http-client library which has good support for GAE.

Searching

Outside of GAE, once a query has been build up using the elasticsearch query API, it is very simple to issue a search request to elasticsearch. We simply need to build up a SearchRequestBuilder from the Client passing in the query, the from value and the number of results to return:

This is all that needs to be done to run the query in elasticsearch and retrieve the results.

On GAE, fortunately it is still possible to build up the query in exactly the same way. However, running the query and returning the results takes a little more effort.

On GAE, we need to take any query that we build and turn it into JSON that the REST API will understand. All of the classes we use to build up a query in elasticsearch contain a toString method that is capable of generating the required JSON. Unfortunately, we can’t use these. Any attempt to do will result in more errors from GAE about classes that are blacklisted within the runtime. Under the hoods, the toString methods will attempt to create an instance of org.elasticsearch.common.io.stream.BytesStreamOutput. This is a class that GAE does not like and will not run. The elasticsearch client needs a BytesStream in order to generate the JSON we require. In order to overcome this limitation we have generated our own which will run on GAE called GaeBytesStream. This just forwards all calls onto a ByteArrayOutputStream. The source code for this can be found at https://github.com/FOODit/Blog-20150413-Communicating-with-Elasticsearch-from-GAE/blob/master/src/main/java/com/foodit/example/util/GaeBytesStream.java.

Generating the query is exactly the same on GAE as it is using the client anywhere else:

However, generating the JSON that we will send to the REST interface can be a laborious task. First we generate the JSON for the query:

We then need to add in the other properties that we wish to send to elasticsearch (the from value and the results size). Using GSON we are able to create a JSON document with all the details elasticsearch requires:

We next need to send the search request to elastic search. This is done by sending a GET request using:

It’s worth noting there are also some other considerations that need to be taken into account such sorting. Sorting needs to be handled slightly differently but view the complete source code at https://github.com/FOODit/Blog-20150413-Communicating-with-Elasticsearch-from-GAE/blob/master/src/main/java/com/foodit/example/es/http/HttpBasedSearcherService.java to see how it is handled.

Indexing

Using the elasticsearch client, indexing is a relatively simple task. We just need to initialize a IndexRequestBuilder from the Client, making sure to pass in the JSON of the document we want to index:

On GAE, things get a bit more complicated since we can’t use the Client. Instead we have to build everything up ourselves and issue an index request through the REST API.

The above example becomes more complicated:

Conclusion

Although it is more complicated to do, it is possible to get the elasticsearch Java client running on GAE for searching. We just need to handle the communication between elasticsearch and the client ourselves. Although we miss out on some of the benefits the client gives us, at least we still get to use the query API.

Using the source code

The complete source code can be found at https://github.com/FOODit/Blog-20150413-Communicating-with-Elasticsearch-from-GAE.

It requires Java 7 and maven 3.1 or greater. In order to run the project simply run

mvn appengine:devserver

This will start the GAE dev server on http://localhost:8080. If you have an elasticsearch instance running locally you can use the application to search and index documents. You will either need to create an index in elasticsearch or update the application to use an index you already have. The values which will be used for the index, type and the field to search on are defined in com.foodit.example.util.Properties.

Once the application is up and running you can trigger the indexing of a simple document by going to http://localhost:8080/index. A search can be run by going to http://localhost:8080/searcher?query=test.

These requests are handled by the simple servlets IndexingServlet and SearchingServlet. These hand the work to implementations of IndexerService and SearcherService respectively. There are two implementations of each of the interfaces. By default, the application is using the version that will on GAE. There are comments in the servlets with instructions on how to switch to the implementation that relies on the banned GAE classes.

he easiest way to see the difference between GAE version of the elasticsearch integration and the default version is to compare:

ClientBasedIndexerService with HttpBasedIndexerService
ClientBasedSearcherService with HttpBasedSearcherService

James Faulkner is a Senior Java Developer at FOODit — a young startup with a passion for food and tech, creating software to help independent restaurants grow. FOODit is always on the lookout for talented developers and is currently hiring. Connect with us via LinkedIn and Twitter.

Keyboard Image | CC-BY-SA 4.0 | http://de.wikipedia.org/wiki/Wikipedia:Kurier#/media/File:Backlit_keyboard.jpg