Planet Test Automation: First Steps Data-Driven Testing

7 min readDec 18, 2019

This article complements corresponding webinar from the Planet Test Automation series we are running at Inflectra. Once the recording of the webinar is available it will be referenced here.

Data-driven Testing

One day a tester walks into a bar. Do you know this story? It is exactly about data-driven testing. We have a test scenario: a tester gets into a bar and orders something. The tester may get into a bar using different ways: run, squeeze, jump and many others. And the tester may order different things. By choosing a way of getting into a bar and an order we may run the test scenario many times and test how the bar works under different circumstances.

This joke is missing just one part — expected outputs. What should happen if someone orders a lizard in a glass?

Formal Diagram of Data-Driven Testing

So, we get the schema in a more formal way. Simple yet powerful.

Advantages of Data-Driven Testing

Ability to run same test with different input data reveals the power of test automation. Manual execution of a data-driven test may not be always viable for several reasons:

- it is too slow,
- it is prone to human errors,
- it is boring as hell.

Automated tests run fast and never get tired or mistaken.

Data-driven approach itself enables separation of test data and logic. It means that an automation engineer may work on a test logic while testers and domain experts will be working on data. Since data is not hard coded into tests it is easier to manage.

Complexity of Data-Driven Testing

Making a data-driven test requires some knowledge and skills. It is not a rocket science but nevertheless as a test automation engineer you may need to know

- how to design a data-driven test,
- how to parameterize a test,
- how to work with files, spreadsheets and databases within a test,
- how to code,
- such techniques as: equivalences class testing, boundary value testing, pairwise testing.

For sure the list is not full but you get the idea.

Components of Data-Driven Testing

Let’s now dig into details.

Data Sources

You may store data for a test in various sources. It can be a simple plain text file or a text file in CSV, XML or JSON format. You may use Excel spreadsheets or databases. Typically, data is organized into tables.

Also, data may be generated on the fly. If you need unique IDs or random dates or strings of a certain length — sometimes it is convenient to generate such data during playback of a test.

Data Origin

Data itself may have different origin. It can be a prebuilt table. So same data is used every time you run a test. Or it can be a random set of values generated for each test execution. Also, some values may be required to be unique, for example, transaction ID. You may need a sequence incrementing a value before each test run. Sensitive data may be stored in encrypted form. All these methods may be combined within a single test, there are no any constraints.

Data Output & Comparison

If a test needs to compare actual and expected values it can be organized in two ways: a test may compare simple values like numbers or strings individually or output data to an external data store to compare with sample output. For example, you need a test to check data on patient registration preview page.

Implement the test in a way to read the inner text of enclosing div element and output to a file. Then compare with a file with expected content. Such a file can be obtained from a successful run of the test.

Data Usage Scenarios

Input data is useful for many purposes. One is to populate system with records that should exist prior to test execution. It is called data seeding. For example, if you plan to check that search is working correctly you need to have some data in your system. Data seeding is usually performed at the back-end level: for example, by adding records via running a database script or via REST/SOAP API. Sometimes setting up initial data via UI is also a viable approach. Data seeding goes hand-in-hand with data cleanup procedures.

It is also important to check a system by using not only correct data but also with invalid input. A system must process invalid data in a consistent way and display relevant error messages to a user. So be prepared to generate invalid data for your tests.

Data Selection

UI tests are slow compared to unit tests. If we are writing a unit test for Add function, we may execute it million times with different valid and invalid parameters. This is not viable in UI testing. We should select data for our data-driven test cases carefully. It must be representative and concise. Several simple techniques may help with data selection.

Equivalence Class Testing

If we can split the whole set of possible values for some input parameter into sub sets and a system must behave in the same way for every value from a specific sub set (and differently for values from different sub sets) then such sub sets are called classes of equivalence. To test the system, we should take at least one value from each class.

Let’s return to our bar where a tester orders beer. Imagine that the bar should work this way:

- if someone orders 1–50 beers — take the order,
- if someone orders 51+ beers — refuse the order (are you kidding?)
- any other input value is invalid (better have some sleep)

In this case, for example, we may run the test with 10, 100 and qwerty. All three input values belong to different classes of equivalence.

Boundary Value Testing

Let’s expand this example to edge cases. This is where bugs like to hide. We will additionally use 1, 50, 51, 1000000, 0 and -1. This is called boundary value testing. We take border values of our classes of equivalence.

Pairwise Testing

Imagine that a test takes 3 input parameters and each parameter has 4 different values. If we want to check all combinations of input values then we will end up with 24 data rows. But practice shows that most of bugs are revealed by a single parameter value or values of a pair of parameters. There is a combinatorial technique called pairwise testing that allows to generate a set of data rows that contains all possible combinations of parameter pairs. Such a set is substantially smaller than a set of all combinations. In our example it has just 12 rows.

Boundary and pairwise testing substantially increases the number of data values to check. Therefore, they should be used carefully.

Final Notes

There is a temptation to generate a lot of data and run test cases with all input parameter combinations. Try to not overuse data-driven testing and minimize the number of data rows you need to have reasonable test coverage.

Also, it is a good practice to run a test case with a single row of data before running a loop through all table rows. Make sure that in general a system under test works and only then run data-driven tests.

It may be OK to order (-1) beers but a lizard in a glass may be too much overhead for UI testing.

Demo

This section illustrates the concepts outlined above with help of Rapise — test automation tool we develop at Inflectra.

OpenMRS — is our application under test You may use the online demo or download the standalone edition and simply run it with a click of a mouse. The only prerequisite is to have Java installed on your machine.