Automatic Test Data Generation by Constraint Solving

Dec 12, 2011 | Frequently Asked Questions

Conformiq Designer generates test data automatically from system models using something known as “constraint solving”. In this blog post I want to explain how this works. It is actually an FAQ: “where does the test data come from?” Many first-time users are actually confused by the fact that there is no manual test data entry, so I hope I can provide some answers to this frequently occurring question here.

Let’s say we are testing a simple system for calculating your income tax, and that the taxation system has three brackets. For income below $50.000 you pay 10%; for income between $50.000 and $100.000 you pay 25%; and for income above $100.000 you pay 35%. In the traditional, manual test case design approach you would create a test data table that could look something like this:

Income Expected Tax
$0 $0
$25000 $2500
$75000 $11250
$200000 $52500

Creating this table would be a crucial part of the manual test design process. In the Conformiq approach, however, you do not create any test data tables. Instead your model directly replicates the actual, desired system functionality in something like this:

if (income < 50000)
  tax = income * 0.1;
else if (income < 100000)
  tax = 50000 * 0.1 + (income - 50000) * 0.25; /* X */
  tax = 50000 * 0.1 + 50000 * 0.25 + (income - 100000) * 0.35;

This piece of your model describes how your system operates, but it does not describe how to test it. There is no data table or test data provided by a human; you have only the system description.

Now, Conformiq Designer generates test cases automatically from this block of model code. The generated test cases contain both actual test inputs as well as expected test outputs. How does this work?

There are three different control flows through the model code, corresponding to the three different taxation brackets. The test generation algorithm in Conformiq Designer attempts to generate at least one test case for every one of those three control flows. Let’s use the bracket $50.000 – $100.000 as an example. Conformiq’s test generation algorithm first observes that in order to be able to reach the line marked by /* X */, it must be that income is $50.000 or above, because otherwise the very first condition is true and the second alternative can be never reached. So the tool logs internally a constraint that states:

  • income >= 50000

Now then it is clear that the second conditional must evaluate to true so the tool logs another constraint that states:

  • income < 100000

It then turns out that there are no further constraints that are needed, so now in order to generate an actual test case for the second taxation bracket, Conformiq Designer needs to internally solve the combined constraint:

  • income >= 50000 and income < 100000

One mechanical way to solve such a simple minimum-maximum constraint is to take the midpoint of the minimum and maximum values, for the solution

  • income = (50000 + 100000) / 2 = 75000

Now that there is a concrete value for the income variable, Conformiq Designer can generate the expected output value by simply executing the model forwards with the chosen value. By this process, the value of the expected tax becomes

  • tax = 50000 * 0.1 + (75000 – 50000) * 0.25 = 11250

So the complete test case has input income set to $75.000 and the expected tax $11.250.

This is obviously a simple example, and in real-life models the constraint systems can be much more complex. They can also contain mixed data types including numbers, strings, records and arrays and so on.

The point of this process is that it actually relieves the human test designer from lots of low-level, error-prone work such as designing individual test data entries. It ensures high coverage and removes lots of risks related to accidentally incorrect test data items.