Test Data Management

Published in

Segmentify Tech Blog

6 min readFeb 11, 2022

Tests…Tests everywhere… Making an introduction with this phrase is an important start to convey my thoughts. With “testing everywhere”, we mean that testing is inevitable at every stage of software processes. The test is sometimes done by testers, sometimes by developers, and sometimes by people in other positions in the software process. Testing is done by everyone at different stages, with different techniques. However, some testing problems occur that are common to all. Today we will talk about one of these common problems — test data.

Testing is required at every stage. Using the correct data in each test phase increases the quality of the test.

Let’s begin then.

TDM: Test Data Management
CI/CD: Continuous Integration and Continuous Delivery

One of the most critical software development cycle points, test activities have become an integral part of CI/CD processes. However, measuring the accuracy of existing and newly added functions on each delivery presents challenges in the context of testing.

Products developed with progress on a sprint basis create different testing needs in each sprint. Therefore, the test data you planned in the previous sprint may not be suitable for the next one. In this case, we need to generate new and realistic test data that complies with the standards for the improvements made in the new sprint.

Test data and test data management is a unique process that needs to be constantly updated and whose quality has proven to impact product quality. Test data are expected to have different features, bring the customer experience closer to reality during test activities, and increase the quality of the test. We can use test data as input or control output in related test activities. Due to the use of incorrect test data, the quality measurement may be inaccurate, or the test run may fail. Worse still, mistakes can be overlooked. With the use of test data suitable for the live environment, problems in the live system can be discovered beforehand.

Main Issues Caused by Problems in Test Data Management

1. Data Availability

Test teams may not be able to access data for extensive and dependent testing. This is because the test data is in different sources. For example, different information, such as customer information, billing information, collections, campaign information, can be found in various sources. The test team must provide data from all source systems to gather this information.

2. Incomplete Data

In some scenarios, the number of data to be used has to be relatively high. For example, you need 100 different customer data within certain criteria. But you only got 50 customer data from your environments. In this case, you will be using incomplete data to run test activities. Therefore, a successful TDM system must have the ability to synthesise new datasets.

3. Data Quality

For the most part, we can have data. However, they may not comply with data quality standards for some reason. With TDM, you can increase the quality of your data and reach specific criteria. Quality problems that test data often experience:

Corrupt or incorrect data
Unnecessary or missing data
Unmasked or identifiable data

An important point to note here is that you should mask and anonymise the data of natural or legal persons before using them. If this issue is not taken seriously, you may face heavy fines and sanctions.

How to Set Up the Right Test Data Management System

We understood the importance of test data management and how the resulting errors effectively affect the test and quality.

How do we set up the right test data management system? What steps should it include? Let’s look for answers to these questions.

Test data management should depend on a process, and its steps should be clear. In this way, you can provide complete control and supervision throughout the process. One of the essential points of the process is that it is agile. Being agile will be compatible with the development flow, in line with the CI/CD philosophy. In addition, the correct planning and implementation of the test data management system will serve the test team and all other teams that need testing.

1. Identification of Needs

In this step, the criteria are clearly defined. For example, what features should the account on which you will run the tests have? For example, if you’re performing the push notification test in Safari browsers, you must specify or create an account with a Safari certificate and push module active.

If we look at it differently, how the data will be classified is determined at this stage. In the tests we do on the Search module, we can classify the data we will use under search-test-data. Then we can determine the basic properties of this class. Another point to be determined is the basic characteristics of the data, such as required data size and data update time (every day — every week). These properties are determined at this stage. The examples here may increase depending on the company, industry and test area.

2. Collection and Storage of Test Data

After determining the necessary criteria, at this stage, where and how the data will be collected is determined. Necessary improvements are made to collect data that needs to be obtained from different sources. After these developments, relevant databases are created, and connections are established. While collecting test data, it is necessary to pay attention to the criteria determined in the previous stage.

3. Updating Test Data

This phase varies according to the frequency of updates, which is a characteristic of test data. Some test data may need to be updated hourly, some constantly, and some weekly. In this case, an update system should be established in accordance with the needs of this data and automatically updated.

In addition, it should be checked that the data is healthy and complete at this stage. The data may be corrupted, and different problems may arise depending on the places of use and the operations performed by the department that uses it. In order to detect these problems in advance, it is necessary to measure the quality of the data and take the required actions.

4. Masking Data

If the data we use is not covered by adequate confidentiality and security measures, you may face data protection problems. In order to avoid these problems, at this stage, we apply masking processes to make the data we will use anonymous at this stage. This way, the data we use does not show any person or institution.

5. Synthesising Data

It may not always be possible to collect sufficient data from production environments. For this reason, at this stage, data synthesis should be done for cases that need more data. Data synthesis is synthetic data that conforms to the rules of real production data.

6. Transfer and Availability of Data

After all these stages, a structure should be established that provides access to the test data desired by all the teams that will need and use the test data. This structure can be manual acquisition for manual testing, database access, api’s, etc., for test automation systems, or technical ways such as database access and api for developers. The point to be noted here is that there is no data loss while capturing speed and data in terms of reaching. In addition, attention should be paid to the issue of security at this step.

Another point is that data synthesis has turned into systems that can be done continuously, and automation or developers can synthesise data as much as they want at any time.

As you can see from the picture, we usually use 4 different pieces of data in our tests:

Types of data in data tests: No data, valid data, invalid data and illegal data format

No Data: It is used to measure the system’s response without data entry.

Valid Data: It is used to measure the system’s response with correct and format-appropriate data.

Invalid Data: It is used to measure the system’s response with appropriate but incorrect data in the format.

Illegal Data: It is used to measure the system’s response with data that is not suitable for the input format.

We should consider these 4 data types when creating our datasets.

We have come to the end of this article. In the next article, we will go deeper into the steps of creating test data management. I can’t wait already! Thank you :)

Test Data Management

Written by Rıfat Koçak