Managing the test data for functional tests

Akshay Maldhure
Circles.Life
Published in
3 min readDec 9, 2020

Background

Most functional tests need test data to work. Consider this classic example of an end-to-end login test for an application in a “testing” environment, where the application services are typically configured to talk to a real database system. We need to provide the credentials of a user for such a test. So our test flow should either create a user or use an existing one. This is where accessing and managing the test data comes into picture.

This article talks about one of the ways of de-tangling the test data dependency and management part from the tests.

Problem

The task of creating, accessing and managing the test data during test execution has below challenges.

  1. Unavailability of a data mocking system: You need to rely on one or more “real” downstream systems for data creation.
  2. Complexity in creating new data: No straightforward way of creating data (e.g. lack of APIs, multiple systems needing updates after data creation, etc).
  3. The state of test data matters for the tests: The test data may not be re-usable if we are unable to reset its state after test execution.
  4. Resilience: Some system(s) might fail to create new test data.

Approach

In order to address this problem, we came up with an approach of storing the test data in a centralised data store, which was used by our automation suites for any test data needs during test execution.

We achieved this by building a test data management system (TDMS) that uses an Elasticsearch instance as a data store.

In the application testing process, there are a set of test cases that aim at validating the test generation flows. We leveraged such tests to seed our data store with newly generated test data in a specific format (schema) either through automation test hooks or through cron (maintenance) jobs.

So going back to our original example of a login test, the most fundamental data entity could be a user. So we could choose to store the test data on a corresponding Elasticsearch index called “user”, in which documents look like below.

{
"id": 1,
"personalDetails": {
"firstName": "John",
"lastName": "Doe",
"age": 28
},
"accountStatus": "active"
},
{
"id": 2,
"personalDetails": {
"firstName": "Stephen",
"lastName": "Williams",
"age": 30
},
"accountStatus": "suspended"
}

Now we can have the tests query TDMS for the data as per the requirement. For example, get an active user for testing login success or get a suspended user for testing login failure.

A test data management system showing the flow of test data with the application under test

However, there are a couple of problems with this approach.

  1. The test data state will not be updated in TDMS after test execution.
  2. Multiple tests are likely to end up using the same test data during execution.

In order to address these problems, we added a data lock mechanism to prevent this from happening. Once the data is locked, no other tests can reuse the data set until the data is refreshed in TDMS.

Future plans

While this approach has solved our burning problem of managing the test data, we have plans to overcome some of the shortcomings and make it as a holistic system useful for the entire organisation by:

  1. Having real-time data refresh/updates so that more data is available immediately after consumption.
  2. Building a UI tool to filter and access the test data so that anyone can fetch the data based on their testing requirements.
  3. Ensuring we have enough test data to serve all the test scenarios as some tests might invalidate/terminate the data points permanently.

--

--

Akshay Maldhure
Circles.Life

Indian • Music lover • Technologist at Gojek • Husband • Father • An ardent TBBT fan • Loves planes 😄