Expedia Group Technology — Software

Testing Elasticsearch Applications

Self-contained testing with Elasticsearch backed applications.

Published in

Expedia Group Technology

7 min readNov 14, 2019

The Goat team at Expedia Partner Solutions (part of Expedia Group™) recently started working on a project that uses Elasticsearch to provide advanced search capabilities to our application.

Source: elastic-product-logos-package.zip

The application

For this new application, we decided our Stack would be an AWS Lambda written in Python and backed by Elasticsearch.

The problem

One significant challenge we encountered in our adoption of Elasticsearch was that of testing — how can we test that our application works correctly? Further to this, how can we ensure our testing is reliable, repeatable and fast?

There were a couple of things that we thought were essential to us when building these tests:

No need to install Elasticsearch locally
Elasticsearch should be virtualized locally
Fast feedback from the tests while developing
The tests should run locally and on a CD/CI Pipeline with the same behavior
Each test class should have its own Elasticsearch instance
Each test should set up its own test data before it runs

Solution

In a previous project where the application was backed by a Postgres database, we had tested the application against an in-memory H2 database that we created on demand and loaded with consistent test data. This approach worked well as it made our tests reliable and self contained. We wanted to use a similar approach to testing our application against Elasticsearch. With that in mind, we wrote the tests in Python using the pytest-elasticsearch package that allows us to start an Elasticsearch for each test class.

Since we also wanted to be able to run this in any environment and we didn’t want to install Elasticsearch locally, we decided to package the tests and their dependencies inside a Docker container. By using Docker we were able to ensure the behavior would be identical when running locally on a developer laptop and when running remotely in a CI/CD Pipeline. With this set up, when we push our code to our repository we can be confident that our tests will run successfully.

The scope of our tests is limited between the AWS Lambda and Elasticsearch. In these tests we didn’t really want to test that the ELB can trigger an AWS Lambda function as a target (we will create an end-to-end smoke test for that), what we wanted was to test that given an event, the AWS Lambda is then able to parse the event, query Elasticsearch and return an AWS Lambda response.

Before going into more detail, we assume the reader has some knowledge about Docker, Elasticsearch and Python.

This is how our Dockerfile looks:

FROM elasticsearch:6.7.2ENV PYTHON_VERSION="3.7.4"
ENV SRC_DIRECTORY="/usr/share/src"RUN yum update -y \
    && yum install -y gcc openssl-devel bzip2-devel libffi-devel make \
    && cd /usr/src/ \
    && wget https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tgz \
    && tar xzf Python-${PYTHON_VERSION}.tgz \
    && cd Python-${PYTHON_VERSION} \
    && ./configure --enable-optimizations \
    && make altinstall \
    && rm /usr/src/Python-${PYTHON_VERSION}.tgz \
    && mkdir ${SRC_DIRECTORY} \
    && sed -i 's/-Xms1g/-Xms512m/g; s/-Xmx1g/-Xmx512m/g' /usr/share/elasticsearch/config/jvm.optionsCOPY src/ ${SRC_DIRECTORY}RUN cd ${SRC_DIRECTORY} \
    && python3.7 -m pip install -r requirements.txt \
    && python3.7 -m pip install -r requirements-test.txtUSER elasticsearchCMD ["pytest", "/usr/share/src/functionaltests/"]

We pull the Elasticsearch Docker image from Docker Hub. The image contains Elasticsearch installed under /usr/share/elasticsearch. This directory and its subdirectories are owned by a user called elasticsearch.

[elasticsearch@83bc8529580b ~]$ pwd
/usr/share/elasticsearch
[elasticsearch@83bc8529580b ~]$ ls -rlt
total 476
-rw-r--r--  1 elasticsearch root            8519 Apr 29 09:03 README.textile
-rw-r--r--  1 elasticsearch root           13675 Apr 29 09:03 LICENSE.txt
drwxr-xr-x  2 elasticsearch root            4096 Apr 29 09:08 plugins
-rw-r--r--  1 elasticsearch root          427502 Apr 29 09:08 NOTICE.txt
drwxr-xr-x  3 elasticsearch root            4096 Apr 29 09:08 lib
drwxr-xr-x 31 elasticsearch root            4096 Apr 29 09:08 modules
drwxrwxr-x  2 elasticsearch root            4096 Apr 29 09:09 data
drwxr-xr-x  3 elasticsearch root            4096 Apr 29 09:09 bin
drwxrwxrwx  3 elasticsearch elasticsearch     96 Oct  9 07:43 tmp
drwxrwxr-x  1 elasticsearch root            4096 Oct 10 09:08 config
drwxrwxr-x  1 elasticsearch root            4096 Oct 10 09:09 logs

This is important to remember, because we will be using the elasticsearch user to run our functional tests, since it has permission to start Elasticsearch. Hence the two instructions at the end of the Dockerfile:

USER elasticsearchCMD ["pytest", "/usr/share/src/functionaltests/"]

With the Dockerfile created we next added pytest-elasticsearch as a dependency in requirements-test.txt to be able to start writing our tests.

Below is a simple example of a successful functional test using pytest-elasticsearch that invokes a lambda, makes a query to Elasticsearch and returns a valid AWS Lambda response:

import jsonimport pytest
from * import main_lambda
from hamcrest import assert_that, contains, equal_to, has_entries
from pytest_elasticsearch import factorieselasticsearch_proc = factories.elasticsearch_proc(port=-1, index_store_type='fs')
elasticsearch = factories.elasticsearch('elasticsearch_proc')class TestSuccessResponses:    @pytest.fixture(autouse=True)
    def spam_index(self, elasticsearch):
        elasticsearch.indices.create(index='spam')
        elasticsearch.indices.put_mapping(body={
            'properties': {
                'id': {
                    'type': 'keyword'
                },
                'type': {
                    'type': 'text',
                    'fields': {
                        'keyword': {
                            'type': 'keyword',
                            'ignore_above': 256
                        }
                    }
                }
            }
        }, doc_type='_doc', index='spam')    def test_egg_finder(self, elasticsearch):
        elasticsearch.create('spam', '1', {
            'id': '1',
            'type': 'Scrambled'}, refresh=True)        test_event = {
            'multiValueQueryStringParameters': {
                'type': ['scrambled']
            }
        }        actual_response = main_lambda.handle_event(test_event, elasticsearch, 'spam')
        assert_that(actual_response, has_entries({
            'statusCode': equal_to(200),
            'statusDescription': equal_to('200 OK'),
            'isBase64Encoded': equal_to(False),
            'multiValueHeaders': equal_to({'Content-Type': ['application/json;charset=utf-8']})
        }))
        assert_that(json.loads(actual_response.get('body')), contains(has_entries({
            'id': equal_to('1'),
            'type': equal_to('Scrambled')
        })))

The key points to take from the functional test are:

elasticsearch_proc = factories.elasticsearch_proc(port=-1, index_store_type='fs')

factories.elasticsearch_proc — creates an Elasticsearch process fixture with the default Elasticsearch configuration. Worth noticing that it will look for the Elasticsearch executable under /urs/share/elasticsearch/bin/elasticsearch, the same location where the Elasticsearch image gets installed inside the Docker container.

elasticsearch = factories.elasticsearch('elasticsearch_proc')

factories.elasticsearch — retrieves an Elasticsearch client that can be used to connect to Elasticsearch. You need to pass the Elasticsearch process fixture name (in our case elasticsearch_proc) and once it’s called the first time, it will instantiate Elasticsearch at the beginning of the tests and it will stop at the end of the tests.

@pytest.fixture(autouse=True)
def spam_index(self, elasticsearch):
    elasticsearch.indices.create(index='spam')
    elasticsearch.indices.put_mapping(body={
        'properties': {
            'id': {
                'type': 'keyword'
            },
            'type': {
                'type': 'text',
                'fields': {
                    'keyword': {
                        'type': 'keyword',
                        'ignore_above': 256
                    }
                }
            }
        }
    }, doc_type='_doc', index='spam')

We created a spam_index method so that we can create an index and mapping for each test. This method will be called before each test run and the first time it gets called (since it is referencing elasticsearch), it starts Elasticsearch.

def test_egg_finder(self, elasticsearch):
    elasticsearch.create('spam', '1', {
            'id': '1',
            'type': 'Scrambled'}, refresh=True)    test_event = {
            'multiValueQueryStringParameters': {
                'type': ['scrambled']
            }
        }    actual_response = main_lambda.handle_event(test_event, elasticsearch, 'spam')
    assert_that(actual_response, has_entries({
            'statusCode': equal_to(200),
            'statusDescription': equal_to('200 OK'),
            'isBase64Encoded': equal_to(False),
            'multiValueHeaders': equal_to({'Content-Type': ['application/json;charset=utf-8']})
        }))
    assert_that(json.loads(actual_response.get('body')), contains(has_entries({
            'id': equal_to('1'),
            'type': equal_to('Scrambled')
        })))

Our functional test test_egg_finder receives an Elasticsearch client that will be used to call Elasticsearch. When the functional test runs, the Elasticsearch will be already up and running since it was instantiated in the spam_index method. The test is also responsible to set up its own test data by adding a typed JSON document in a specific index, making it searchable.

elasticsearch.create('spam', '1', {
            'id': '1',
            'type': 'Scrambled'}, refresh=True)

The functional test then creates an event that will be consumed by our lambda.

test_event = {
            'multiValueQueryStringParameters': {
                'type': ['scrambled']
            }
        }actual_response = main_lambda.handle_event(test_event, elasticsearch, 'spam')

We then assert on the response from the lambda and we validate that we got the expected result from Elasticsearch.

assert_that(actual_response, has_entries({
            'statusCode': equal_to(200),
            'statusDescription': equal_to('200 OK'),
            'isBase64Encoded': equal_to(False),
            'multiValueHeaders': equal_to({'Content-Type': ['application/json;charset=utf-8']})
        }))
assert_that(json.loads(actual_response.get('body')), contains(has_entries({
            'id': equal_to('1'),
            'type': equal_to('Scrambled')
        })))

Conclusions

We did find a caveat when running tests with pytest-elasticsearch. We noticed that the memory inside the Docker container was increasing quite fast because of Elasticsearch running on it. Since the default value for memory in Docker is 2GB and we were running two Elasticsearch instances on it (for successful and failure scenarios) at some point the memory was reaching the 2GB maximum limit which caused the Docker container to struggle. We fixed this by decreasing the Elasticsearch memory settings to 512MB (the default is 1GB). We did it by adding the line below to the Dockerfile that replaces the Xmx and Xms parameters on the Elasticsearch jvm.options configuration file:

RUN sed -i 's/-Xms1g/-Xms512m/g; s/-Xmx1g/-Xmx512m/g' /usr/share/elasticsearch/config/jvm.options

Another enhancement that could be done is to run the tests in parallel. That could be achieved if each test sets up its own index. If every test has a dedicated index and adds data into that index, they can run in parallel.

Overall we really liked the fact that we can develop changes in our project and get immediate feedback by running the functional tests locally. It gives us a lot of confidence that once we test our changes locally, after pushing our changes, they will run smoothly through the CD pipeline. Our tests are independent and repeatable as each test class starts its own Elasticsearch with its own indexes and test data.

Useful links:

Elasticsearch process and client fixtures for py.test: https://pypi.org/project/pytest-elasticsearch/

Elasticsearch Docker image in Docker Hub: https://hub.docker.com/_/elasticsearch