Extended String Testing with RegEx

Nov 21, 2013 | Blogs

Quite recently we added a support for expressing ”regular expressions” in the Conformiq models. As this is quite an interesting topic, I’m detailing this feature a bit in this post.

Regular expression is simply a sequence of characters that forms a pattern for searching and matching. The concept dates back to the 1950s when Stephan Kleene formalized the description of regular language after which it came into common use with Unix tools such as ed and grep. Regular expressions themselves are a convenient way to express something like “whether this character string represents a valid email address or not”.

Also, in testing it is common practice to test for pattern match / mismatch of input fields, protocol data fields, etc., often the patterns being a regular expression. We are often faced with problems where we need to test the system with valid and invalid email addresses, string encoded IP addresses, fully qualified domain names, and so on. Because of this, we here at Conformiq decided to introduce support for regular expression matching in modeling and the related test generation heuristics. Without this support, it is quite cumbersome to create models where one would like to express logic such that an input field on a web page should be a “valid e-mail address”, because if the modeling language does not allow one to represent a typical email pattern, the user is forced to revert to fixed / constant test data, which reduces productivity, makes models more fragile, and limits their reuse. Just to give a typical “workaround” that users previously needed to apply, here’s a snippet that models “email validation”:

if (msg.email == “john.doe@conformiq.com” ||
    msg.email == “jane.doe@conformiq.com”) {
    requirement “Email Address Validation/Valid Address”;
} else {
    require (msg.email == “” ||
             msg.email == “@conformiq.com” ||
             msg.email == “john@d”);
}

More than clumsy and limited I would say and oh so error prone.

So today, with Conformiq Designer, in order to say that an input field is a valid email address, any valid email address, and not just one that we have explicitly managed to think of, one can write (a simplified email pattern):

if (msg.email.matches(“[a-zA-Z0-9]+@[a-zA-Z0-9]+\\.[a-zA-Z]{2,4}”))
    requirement “Valid Email Address”;

where the requirement “Valid Email Address” only gets covered if the given input string represents a valid (simplified) email address. The Designer tool is now responsible for calculating “interesting character strings” that match the given pattern, just like it does with all the other kinds of data that you have in the model. I don’t really want to spend my time trying to figure out all these possible inputs that I should stimulate the SUT and why should I? I have something better to do. Just let a computer do this kind of work. It’s much faster, it does not make accidental mistakes and it does not miss things.

Now as the simplified regular expression in the above example already looks a bit technical and complicated, we packaged some of the patterns that our customers have been using in their modeling efforts into a library, so actually in order to express the email pattern above, you could simply write:

if (CharacterClasses.isEmailAddress(msg.email))
    requirement "Valid email address";

which produces the same result.

Now of course the next question regarding pattern matching is the related test generation heuristics. What should the tool think of the following model fragments?

This question is really beyond the scope of this text, but here we went and invented quite a nice algorithm that the Designer tool applies for character strings that are “constrained” by string patterns and then it produces a selection of tests that verifies that the system under test correctly accepts valid strings and rejects invalid ones.

So let’s see how this works out in practice. I created a very simple model for classifying whether the email address provided to the system is valid or not. What the model does, is that it takes in a message, it checks the ‘email’ field of the message in exactly the same way as in the model code fragment above, and responds whether the email was valid. In the table below, you will find a sample collection of test cases that Conformiq Designer generated from this model.

Positive Tests Negative Tests
7@a.aaaaA.ZZa a@a.Haaa.Z@
3@a.aa0.ZZ J@a.a0.z
A@-.Zua M@-.{
a0@-.ZZDa a@0.ZZ\
a@-.ZZP a@O.aaaA~
a@Ba.ZZa a@a.aa0.ZK,
a@a.aaGaA.Za a@a.aaaUa.ZZZZ:
a@a.aaaa.ZZvz a@a.aaaaA.QZZZz
a@aA.ZmZ a@a.aca.Z,
a@aH.ZZA f@a.aZ~

Not enough tests? Too many tests? No problem. Just adjust the number of test case variants that you want the tool to generate for patterns matches. Here I wanted to have 10 tests that are used to verify valid patterns and 10 for invalid ones.

I think that this is a quite nifty and unique feature and would encourage you to give it a spin in order to see how it can help you reduce your modeling and increase coverage in your test design efforts!