Understanding Pairwise Test Generation

Jan 3, 2012 | Blogs

Combinatorial test data generators generate data tables for testing. The most basic, commonly used combinatorial data generation strategy is what is known as pairwise testing, all-pairs testing, covering arrays or Taguchi designs. (The term orthogonal array is sometimes used also, but it actually refers to a method for designing statistical experiments and is subtly different.)

Pairwise testing tries to alleviate the following practical problem: your system has ten configuration parameters, and every configuration parameter has ten different, interesting values. How can you test that your system behaves correctly with all the different configurations?

Because there are 10 billion different configurations available, you cannot test for all of them. The proposal of pairwise testing is that it is enough to test for all the possible value combinations for any two of the ten variables (this is where the name all-pairs testing comes from). A rude calculation shows that there are 45 configuration variable pairs, and for every pair there are one hundred (ten times ten) different pairwise set values available, so 4,500 tests should suffice. This is obviously an improvement over 10 billion, but actually all the pairs can be covered in a much smaller test set, because every single test can cover multiple pairs at the same time.

The theoretical basis for pairwise testing is what is known as coupling effect. This is a practical hypothesis that software faults can be discovered by relatively simple tests. The coupling effect was researched e.g. by Jefferson Offutt who writes already in 1992 that “the major conclusion from this investigation is the fact that by explicitly testing for simple faults, we are also implicitly testing for more complicated faults” (J. Offutt: Investigations on the software testing coupling effect).

How this is related to pairwise testing is that the coupling effect hypothesis suggests that if there is a fault that manifests with a specific setting of configuration variables, it is most likely caused actually by only a small subset of those variable values.

Of course, there is no reason why coupling two variables and no more would be always the best strategy. A natural extension of pairwise testing is indeed to cover not only pairs but also triples, quartets and so on. This is not necessarily good to do for all small subsets of data variables, so advanced combinatorial data generation tools allow users to define the “strength” of data combination individually for different data variables and their combinations.

Combinatorial data generation is a very good way to generate discrete test data tables. Some people do not consider it to be a model-based testing approach at all because there does not seem to be any real model of the system anywhere. However, it is a prominent approach for automated test design and the input to a combinatorial test data generator can be seen as a special purpose model that models the space of inputs the system can accept.

Sometimes other types of model-based testing tools include combinatorial data generation capabilities. For example, a finite state machine path generator can generate out test cases that themselves then pick their actual data inputs from a data table generated with a combinatorial test data generator.

A combinatorial data generation input for testing a multi-platform DVD playing software could look like this:

Platform Windows XP Windows Vista Windows 7 Mac
Word length 32-bit 64-bit
Hardware DVD DVD-RW Blu-ray
External display None VGA HDMI

There are 4 x 2 x 3 x 3 = 72 different combinations of the testing parameters, but a pairwise test generator can create a smaller test suite that covers all the pairs.

This is such a small example that we can do this by hand. First start with a data table that contains the pairwise combinations of two of the longest rows:

Platform Word length Hardware External display
Windows XP DVD
Windows XP DVD-RW
Windows XP Bluray
Windows Vista DVD
Windows Vista DVD-RW
Windows Vista Bluray
Windows 7 DVD
Windows 7 DVD-RW
Windows 7 Bluray
Mac Bluray

Note how this table has all the pairwise combinations of the two configuration variables Platform and Hardware (4 x 3 = 12 combinations). Next, fill in the values of External display so that we cover all its pairings with both Platform and External display at the same time:

Platform Word length Hardware External display
Windows XP DVD None
Windows XP Bluray HDMI
Windows Vista DVD VGA
Windows Vista DVD-RW HDMI
Windows Vista Bluray None
Windows 7 DVD HDMI
Windows 7 DVD-RW None
Windows 7 Bluray VGA
Mac DVD None
Mac Bluray HDMI

By now, every possible pairing of any values in the three filled-in columns is covered in the table. Note that in order to achieve this, the ordering of values for the External display column had to crafted carefully and that the sequence looks “random” first but is not actually random. Now the remaining job is to fill in the Word length column. This is easy because there are only two possible values for this column so we can easily cover all the required pairwise combinations with the other columns:

Platform Word length Hardware External display
Windows XP 32-bit DVD None
Windows XP 32-bit DVD-RW VGA
Windows XP 64-bit Bluray HDMI
Windows Vista 64-bit DVD VGA
Windows Vista 64-bit DVD-RW HDMI
Windows Vista 32-bit Bluray None
Windows 7 32-bit DVD HDMI
Windows 7 64-bit DVD-RW None
Windows 7 64-bit Bluray VGA
Mac 32-bit DVD None
Mac 32-bit DVD-RW VGA
Mac 64-bit Bluray HDMI

This is now a covering array or an all-pairs test data table for the example problem. If you pick any two variables from the original test data table and any two values for them, there is at least one row in the final table that contains both of those. Larger test data tables are not easy to construct by hand, which is why the combinatorial test data generators are useful.