Randomized Testing - What, Why, When?

Still hardcoding concrete values in your tests? That's a shame since better techniques exist that provide better coverage and test isolation. Let's talk about how we can improve testing by adding randomization. We'll cover some of the techniques like Property/Model Based Testing, Test Oracles, etc. The purpose is not to dive deeply into the details of each different kind, but to give an overview and show possible applications.

By the way, if you know Russian and don't like reading here is my talk at Heisenbug 2016 that covers randomized testing in more details and with more examples.

Why hardcoded test data is bad?

Suppose that we test a validation. E.g. First Name field can accept alpha numeric values from 2 to 20 symbols long. We've done the equivalence partitioning and for happy path we chose a value from 3 to 19 - e.g. Barney. But this has drawbacks:

  • No clear winner. Why Barney? Why not Michael, Ohno or Fedor?
  • Prone to Pesticides Effect. While the codebase increases in size, the odds of new bugs creeping into First Name validation increases. But because we use the same "pesticide" (test) and don't change it, the chances that we don't find those defects go up as well. We can say that bugs become resistant to one particular pesticide and it doesn't prevent them from multiplying. If only we were using many different pesticides..
  • Poor test isolation. If we write System Tests that run against the same environment again and again, how do we go about unique fields? Emails often must be unique, what would happen if we run same test twice and it'll try to create 2 users with the same email? The 2nd run will fail. Even if we decide to clean data some how, we're still prone to race conditions - what if multiple people run the same test from their local machines?

Fixing test data with randomization

Randomized testing addresses all the concerns gracefully - instead of taking a particular value we'll generate it randomly every time. Let's compare the approaches:

Partition Example-based Randomized
Happy Path Barney alphanumeric(3, 19)
Positive Min Boundary Mi alphanumeric(2)
Positive Max Boundary Blah Blah Blah Blah1 alphanumeric(20)
Negative Min Boundary a alphanumeric(1)
Negative Max Boundary Blah Blah Blah Blah12 alphanumeric(21)
Numbers only 12345 numeric(2, 20)
With spaces aB 0A between(2, 20).with(spaces()).alphanumeric()

You can use libraries like Datagen that already implementthis kind of randomization in a nice way.

Notice, that we still leverage boundary values and equivalence partitioning. This is because bugs are usually concentrated on the boundaries. And we have either a choice of running the same test many times to be almost sure that the boundary value was used or we can optimize our tests by using boundaries explicitly. And the randomization happens within the equivalence class.

Does it find more bugs in reality?

In simple cases like validation it's not finding a lot of bugs comparing to traditional approaches. Otherwise the whole idea of equivalence classes would be invalid. After randomizing validation tests for couple of years the only case I remember when it helped to find more bugs is when we were testing date conversion that didn't work properly for some not-commonly-used dates. But this fact shouldn't stop you from using randomization - don't forget about other goodies like test isolation.

Validation though is just one of the applications. Randomized testing can be even more beneficial when testing business logic, algorithms and concurrency.

Testing algorithms with random data

With validation things are easy - we know the result when we're crafting the input. But with algorithms it's different - if the input is random, then the output is random. So how do we check the output if we don't know it beforehand? Implementing the same algorithm in the tests is absurd!

Property Based Testing

Instead of checking the actual result we can check its properties. So if we test summation we can check that a + b == b + a or if a + 0 == a where a, b are randomly generated values. Or if you test sorting, you could generate random list of objects, sort it and then go over each element and check if it's >= next element.

Often it's combined with the repeating of the same tests many times to increase the probability of finding defects. There are specialized frameworks that can do this for you - they usually have names ending with QuickCheck which comes from the pioneering Haskell's QuickCheck. E.g. for Java there is JUnit QuickCheck. Additionally those frameworks often provide so called shrinkage - if test failed with value 1000 but passed with 500 the framework will start picking values in between to find the boundaries where failures start and where they end.

While this is a very impressive technique it's also extremely complicated. When I tried it I could spend couple of days thinking on how to fully cover my algorithm (and I haven't finished). It requires practice and mathematical thinking, so be prepared to meet difficulties. On the bright side - you don't have to cover your code at 100% level with this technique, you can still combine it with example-based testing to fill the holes.

You can read more about Property Based Testing in this nice article.

Using Test Oracles

Another option and probably the most cost effective one is using Test Oracles. If there is an easier (but not as optimal or secure) way of doing the same thing, you can use a simpler implementation in the tests and cross-check the results of both algorithms. E.g. you can use a default sorting of your Programming SDK as a reference implementation while checking your super-fast O(1) and super-complicated sorting algorithm.

Reverse operations to get the original value

This is particularly useful if you transform objects from one another (DTO->Entity, Entity->DTO). You fill the ObjectA with values randomly, then convert it into ObjectB, then back into ObjectA'. If ObjectA equals ObjectA' then both transformations work correctly. Very efficient testing especially if you employ reflection-based comparison (e.g. reflectionEquals() from Unitils).

But it's not limited to DTO-Entity transformations. A lot of operations may have reverse sisters: string tokenizer + concatenation, string occurrence replacing + replacing it back, element insertion + element removal, etc. The principle is the same - after the operation you apply a reverse function and check that the result is the same as the original input.

Model Based Testing to test states and transitions

I'm still waiting for a possibility to try it out, but I can't resist mentioning Model Based Testing after watching this inspiring presentation. Instead of randomly generating data as we did previously this approach generates random behaviour. The idea is that we build a model of our app: ActionA can be done after ActionB but only if ActionC happened. Then we ask our framework to invoke these operations in random order (it must follow the rules though). After that we can check that the expected side-effect (final result) is what we described in the model.

This can be useful when testing complicated flows transitioning from state to state. But what's marvelous is that you can test it in concurrent environment as well. With that you don't only check that the flow is correct, but also that it works under high load with many concurrent users. After all if we bought a ticket 10 times then in the end we should have 10 unique tickets even if we did that in multiple threads.

Some of the frameworks can also provide the shrinkage - they will try to find the minimal sequence of steps that result in bug. And that is so important because after doing 1000 steps in 10 threads it'll be hard to reproduce the problem because of so much noise.

Decreasing number of combinations to test

Sometimes there are object graphs where each object could be in different states. Returning to our example: 20 account types, 30 legal entity types with 5 account roles each. If the business rules for different types or roles may differ we should test that. But there are so many combinations - it may take too much time to run the tests. Especially if there are many features that we want to test for these combinations.

But if we build a model according to which we can generate these object graphs correctly, we can partition those combinations and check only a single one from a given equivalence class. This can reduce the number of tests from 1K to just 3. In case some particular combination is buggy - at some point we'll know that because these combinations differ from run to run.

Even if the number of combinations is manageable, at some point when the functionality is stabilized we can reduce the number of cases by checking only small subset of them each time. By doing this we can speed up our tests.

Randomizing Environment Settings

We can also randomize user locales, OS that's used to run the tests, SDK versions, etc. Of course only those environments that are expected to be compatible with our app should be used.

But I want tests to be reproducible and stable!

Well, that's true - the test needs be designed in a way that it won't fail (the test itself) in different environments. I.e. we don't want the test to fail just because someone's machine is too fast or too slow. But we still want to find real bugs. So if production code (not test) fails because we changed the environment while we expect it to work on that env, then we found something. If a test fails for a reason, then the test did its job right. After all - this is the reason we write tests.

But how do we reproduce the test if it relies on randomness? For that we need one of two: a) logging (values, environment) b) ability to set seed for random generator. Ideally we need both. Haven't worked with manually set seeds myself, but I can see how helpful it may be. If a failure happens we can reuse the same seed and all the "random" values will repeat.

Randomizing manual tests

Randomization is applicable in manual tests as well though it won't be as chaotic:

  • Randomizing data. You can achieve that by saying "Enter alpha numeric value from 3 to 29 symbols long" instead of "Enter value Barney". Each test engineer will choose its own value when going over the test case. You may also want to use online services for that like Text Mechanic to help you out.
  • Randomizing behaviour. Instead of describing every step of the test case in details you can choose something more generic. E.g. instead of saying "Press login button, enter your credentials" you can say "Sign In". Another popular technique is using checklists instead of test cases. So instead of tedious test case with several steps we just say "Check first name is saving with happy path value (alphanumeric from 3 to 29)". This way every engineer may choose its own path to get to the destination and accomplish what's intended.

To sum up

Randomized testing is helpful because:

  • Provides better parameter coverage
  • Helps us finding the cases we didn't think of. This pushes us to think about our domain deeper.
  • Isolates the tests so that they don't step on each other toes
  • Allows to test large number of combinations in practical time frame

Does it makes sense using both hardcoded and randomized data? Well, when testing looks complicated (Property Based) it makes sense to have examples as well to keep the tests understandable. But apart from that there don't seem to be any reasons to do that.

Note, that randomizing data is just one advice for proper test data management, see Effective Data Management for more. Also, check out the Test Pyramid as an example of project where randomized testing is used.