Basics of Software Testing

Published in

CodeX

11 min readSep 18, 2021

A software bug is a failure in a computer program that causes it to break down completely, behave in unexpected ways, or produce erroneous results. Software bugs are inherently a part of software development and can cause catastrophic damage in critical systems. According to a study done by Cambridge University in 2013, the annual global cost of software errors was 312 billion USD, and the size of the software industry in 2013 was approximately 407 billion USD; the cost of bugs was about 75% of the entire industry. Software Testing, Debugging, and Verification are crucial for the production of clean, almost bug-free software. In this article, we will dive deep into the world of software testing.

What is Software Testing?

Some famous Software Bugs

Before understanding software testing, let us look at some examples of software bugs that resulted in significant losses.

Ariane 5 Rocket: The Ariane 5 Rocket exploded just after launch (around 5 seconds after takeoff). It used the same guidance system as used by Ariane 4. The flight trajectory of Ariane 5 was different, and due to a lack of proper system testing, it crashed. There was an issue related to converting a 64-bit float to a 16-bit integer, which caused an exception that resulted in the crash. The European Space Agency spent 10 years and 7 billion USD to produce Ariane 5.
Pentium Floating Point Bug: There was a bug that caused an incorrect result through floating-point division. The bug rarely happened (~1 in 9 billion), but it was present. Intel spent 475 million USD to correct it.
Mars Climate Orbiter: Contact with the Mars Climate Orbiter was lost in 1999, and it crashed on landing. This was due to a mismatch in units (imperial versus metric) in the software systems, which resulted in the crash. This was due to a lack of thorough integration testing.
Therac-25 Radiotherapy Machine: Software bug in the software resulted in radiation level entry to be ignored, due to which patients got overdosed, resulting in the death of 3 patients.
Toyota Unintended Acceleration: Bugs in the electronic throttle control system resulted in the car being accelerated on its own. This may have resulted in approximately 89 deaths due to accidents. 8 million vehicles were recalled as a result.

So, what is Software Testing?

Several sources can result in defects in software: incomplete requirements, design flaws, programming errors, defects in third-party tools, etc. To assure that a program does what we want it to do, it needs to be tested thoroughly; the code needs to be reviewed and formally verified.

The first part of this process is software testing — Evaluating software by observing its execution, and executing a program to find failures by trying out inputs and checking if the corresponding outputs are correct, so as to improve the software.

How does a bug result in failure?

A bug results in the infection of the program state during execution. As the infected state propagates, it may cause a failure. Hence, a bug results in infection, which propagates and results in the failure of the system. Some failures are apparent, such as — wrong output, non-termination, crash, freeze, etc. However, in most cases, failures are not apparent. What constitutes a failure sometimes can be ambiguous, and that is why we always need a specification. A failure can be defined only when the program has a specification. Specification of a program must be a detailed description of what a program should do, and a bug is a failure to meet the specification.

For example, for a function that takes an array of integers as input and returns a sorted array, the correct specification would be —

The function requires (input): a non-null array of integers as an input.

The function ensures (returns): a permutation of the input array that is sorted.

The Contract Metaphor: Design By Contract

Bertrand Meyer coined the term — contract programming, which is an approach for designing software. A “contract” is the preferred specification metaphor for procedural and object-oriented programming languages. The principles of contract programming are similar to those of a legal contract between a client and a supplier in real life. The supplier (implementer of a method/function) assures that it will supply to the client (implementer of calling method, usually a main() function) provided that the contract (one or more pairs of ensures/requires clauses defining mutual obligations of supplier and client) is followed.

A contract can be defined as follows: Let F() be a function. The specification for function F() requires some pre-condition and ensures some post-condition. If a caller (client) of F() fulfills the required pre-condition, then the callee (supplier) of F() ensures that the post-condition holds after F() finishes.

Therefore, we can say that a method/function fails when it is called in a state fulfilling the required pre-condition of its contract, and it does not terminate in a state fulfilling the post-condition to be ensured, and a method/function is correct whenever it is started in a state fulfilling the required pre-condition then it terminates in a state fulfilling the post-condition to be ensured.

Is Testing just a bunch of test cases?

Essentially, testing is all about test cases. Finding good and sufficiently many test cases is complex, and we often have coverage criteria to see if we have enough test cases or not. Even a good test set of test cases cannot exclude all failures. Testing involves designing test inputs, running tests, analyzing results, and reporting results to developers.

Testing Levels based on Software Activity: V-Model

There are different levels of testing when it comes to software that consumers use:

Acceptance Testing: Assess software with respect to user requirements.
System Testing: Assess software with respect to the system-level specification.
Integration Testing: Assess software with respect to high-level design.
Unit Testing: Assess software with respect to low-level design.

Along with the above testing levels, Regression Testing is also essential. Regression Testing is testing that is done after changes in the software, and it is a standard part of the maintenance phase of software development. The purpose of Regression Testing is to ensure that the changes did not cause failures.

Failures on higher levels are challenging to debug, as propagation from bug to failure is difficult to trace back to the source. Hence, Unit Testing forms the very base for Software Testing, and while developing any software, thorough unit testing is crucial.

Test Cases, Test Sets, and Test Suites

A Test Case is a tuple (Method, Input, Output) where Method is the method under test, Input is a tuple (Parameters, Initial State) of call parameters and their initial state, and Output is a function on return value and final state, telling whether they comply with the correct behavior. In simple words, a test case consists of initialization, a call to the method under test, and a decision whether the test succeeds or fails.

A Test Set consists of several test cases, and a Test Suite consists of corresponding test sets for different methods.

Automated Testing Tools

Several tools provide automated and repeatable testing (Jasmine for JavaScript, JUnit for Java, PyTest for Python are some examples). Using such tools, an extensive collection of tests can be run automatically, and the testing code can be integrated into the source code, thus enabling unit testing in an organized way. After debugging, the tests can be rerun to check if the failure has been eliminated or not, and regression testing can be done as well.

Such tools often support the Extreme Programming paradigm, which involves creating the test cases for the code first, before writing the actual code and doing regression testing after every incremental change. Extreme Programming relies heavily on the unit and acceptance testing. There are several benefits of extreme programming: developers gain confidence that the code will meet the specifications and understand the specifications and requirements better.

Incremental Testing using Automated Testing Tools

Testing a unit using these tools may often require Stubs and Drivers. Stubs simulate the behavior of components not yet developed, and Drivers simulate the environment from where the procedure is called. Stubs are required to replace called procedures, and Drivers are required to replace calling procedures.

Top-Down Incremental Testing

In Top-Down testing, we test the main procedure, then go down the call hierarchy. We can see that this kind of testing will require stubs but no drivers. Top-Down testing is advantageous if major bugs occur towards the top level. However, this type of testing may tempt developers to defer from testing specific modules, especially those whose stubs are challenging to produce.

Bottom-Up Incremental Testing

In Bottom-Up testing, the test leaves the call hierarchy and moves up to the root. The procedure is not tested until all the children have been tested (like the leaves of a tree). We can see that this kind of testing will require drivers but no stubs. Bottom-Up testing is advantageous if major bugs occur towards the bottom level. However, this type of testing may be complex since the entire program does not exist until all the units are added.

Coverage Criteria

How do we know if we have enough tests? Exhaustive testing is impossible in general. Coverage Criteria helps us determine how much of the code is covered by the set of tests. Most metrics used as quality criteria for test suites describe the degree of some coverage, and these metrics are called Coverage Criteria. Coverage Criteria are essential for testing safety-critical software. Let us look at different categories of coverage criteria:

Control Flow Graph Coverage

In Control Flow Graph Coverage, the program is represented as a graph while testing. A node represents every statement, and edges describe the control flow between statements. Edges can be constrained by conditions.

Execution Path is a path through a control flow graph, that starts at the entry point and is either infinite or ends at one of the exit points.

There are three ways that we can describe coverage in a control flow graph:

Statement Coverage is satisfied by a test suite if and only if every node n in the control flow graph has at least one test in the test suite causing an execution path via the n.
Branch Coverage is satisfied by a test suite if and only if for every edge e in the control flow graph there is at least one test in the test suite causing an execution path via e. Branch Coverage subsumes Statement Coverage.
Path Coverage is satisfied by a test suite if and only if for every execution path ep of the control flow graph there is at least one test in test suite causing ep. Path Coverage subsumes Branch Coverage. Note that Path Coverage cannot be achieved in practice.

Logic Coverage

Logical expressions can come from many sources in a program — if statements, while statements, etc. Logic Coverage can be divided into three types of coverage criteria — Decision Coverage, Condition Coverage, and Modified Condition Decision Coverage.

Decision Coverage is satisfied by a test suite for a given decision d if it contains at least two sets, one where d evaluates to false and one where d evaluates to true. For a given program, Decision Coverage is satisfied by a test suite if it satisfies Decision Coverage for all decisions in the program.

Condition Coverage is satisfied for a given condition c by a test suite containing at least two sets, one where c evaluates to false and one where c evaluates to true. Conditions are the boolean sub-expressions present in decisions. For a given program, Condition Coverage is satisfied by a test suite if it satisfies Condition Coverage for all conditions in the program. Note that condition coverage does not imply decision coverage or vice versa; the above example has no decision coverage.

Modified Condition Decision Coverage (MCDC) is satisfied for a given condition c in decision d by a test suite if it contains at least two tests, one where c evaluates to false and one where c evaluates to true. The decision d should evaluate differently in both, and other conditions in d evaluate identically in both. For a given program, Modified Condition Decision Coverage is satisfied by a test suite if it satisfies Modified Condition Decision Coverage for all conditions in the program. MCDC is an industrial certification standard.

Input Space Partitioning

Ultimately all testing is about choosing elements from the input space. Input Space Partitioning considers this quite literally; the input space is partitioned into regions assumed to contain “equally useful values,” and test cases contain values from each region.

Partitioning of a domain defines a set of blocks such that the blocks are pairwise disjoint (there is no overlap between any of them), and together the blocks cover the entire domain. Usually, different partitionings are combined.

After partitioning, there are several strategies to choose values from the blocks. Some strategies include sub-partitioning blocks, exploring boundary conditions, and including valid, invalid, and unique values. Usually, the process for Input Space Partitioning starts with looking at the specification, then dividing input space into regions for which the program acts similarly. Then some inputs are taken from the regions, mainly the borders.

Independent Path Coverage

Independent Path Coverage consists of designing test cases such that all linearly independent paths in the program are executed only once. Any execution path through the program with an edge not present in other paths is called a linearly independent path.

For simple programs, it is pretty easy to identify linearly independent paths. However, for complicated programs, it is not so easy to determine the number of independent paths. McCabe’s Cyclomatic Complexity helps us find an upper bound for the number of linearly independent paths of a program.

Given a control flow graph G, McCabe’s Cyclomatic Complexity V(G) is defined as:

V(G) = E — N + 2

where N is the number of nodes in G and E is the number of edges in G.

Black-Box Testing and White-Box Testing

Black-Box Testing consists of deriving test suites from external descriptions of the software, and no knowledge of the source code is required.

White-Box Testing consists of deriving test suites from the source code of the software.

Mutation Testing

Mutation Testing is another way to find out if we have enough test cases. Mutation Testing involves randomly mutating the function/method under test a bit. The new mutant obtained is incorrect most of the time, and if a test fails (“kills” the mutant), then it is good; otherwise, if no tests can kill the mutant, we have a missing test case.

Conclusion

Software Testing is a vast field, and many advancements have happened in automated testing. Since testing plays a crucial role in the performance of any commercial software, especially safety-critical ones, it is the important to rigorous testing procedures. In this article, we dived into the world of software testing and understood the basics. In the next part, we will look at Software Debugging.

References:

Cambridge Study : http://www.prweb.com/releases/2013/1/prweb10298185.htm
ARIANE 5 Rocket: https://www-users.cse.umn.edu/~arnold/disasters/ariane5rep.html
Pentium Bug: https://en.wikipedia.org/wiki/Pentium_FDIV_bug
Mars Climate Orbiter Bug: https://www.simscale.com/blog/2017/12/nasa-mars-climate-orbiter-metric/
Therac-25 Radiotherapy Machine Bug: http://users.csc.calpoly.edu/~jdalbey/SWE/Papers/THERAC25.html
Toyota Unintended Acceleration Bug: https://users.ece.cmu.edu/~koopman/pubs/koopman14_toyota_ua_slides.pdf