Still hardcoding concrete values in your tests? That's a shame since better techniques exist that provide better coverage and test isolation. Let's talk about how we can improve testing by adding randomization. We'll cover some of the techniques like Property/Model Based Testing, Test Oracles, etc. The purpose is not to dive deeply into the details of each different kind, but to give an overview and show possible applications.
By the way, if you know Russian and don't like reading here is my talk at Heisenbug 2016 that covers randomized testing in more details and with more examples.
Suppose that we test a validation. E.g. First Name field can accept alpha numeric values from 2 to 20 symbols long. We've done the equivalence partitioning and for happy path we chose a value from 3 to 19 - e.g. Barney. But this has drawbacks:
Randomized testing addresses all the concerns gracefully - instead of taking a particular value we'll generate it randomly every time. Let's compare the approaches:
Partition | Example-based | Randomized |
---|---|---|
Happy Path | Barney | alphanumeric(3, 19) |
Positive Min Boundary | Mi | alphanumeric(2) |
Positive Max Boundary | Blah Blah Blah Blah1 | alphanumeric(20) |
Negative Min Boundary | a | alphanumeric(1) |
Negative Max Boundary | Blah Blah Blah Blah12 | alphanumeric(21) |
Numbers only | 12345 | numeric(2, 20) |
With spaces | aB 0A | between(2, 20).with(spaces()).alphanumeric() |
... | ... | ... |
You can use libraries like Datagen that already implementthis kind of randomization in a nice way.
Notice, that we still leverage boundary values and equivalence partitioning. This is because bugs are usually concentrated on the boundaries. And we have either a choice of running the same test many times to be almost sure that the boundary value was used or we can optimize our tests by using boundaries explicitly. And the randomization happens within the equivalence class.
In simple cases like validation it's not finding a lot of bugs comparing to traditional approaches. Otherwise the whole idea of equivalence classes would be invalid. After randomizing validation tests for couple of years the only case I remember when it helped to find more bugs is when we were testing date conversion that didn't work properly for some not-commonly-used dates. But this fact shouldn't stop you from using randomization - don't forget about other goodies like test isolation.
Validation though is just one of the applications. Randomized testing can be even more beneficial when testing business logic, algorithms and concurrency.
With validation things are easy - we know the result when we're crafting the input. But with algorithms it's different - if the input is random, then the output is random. So how do we check the output if we don't know it beforehand? Implementing the same algorithm in the tests is absurd!
Instead of checking the actual result we can check its properties. So if we test summation we can check
that a + b == b + a
or if a + 0 == a
where a, b
are randomly generated values. Or if you test sorting, you could generate random list of objects,
sort it and then go over each element and check if it's >= next element.
Often it's combined with the repeating of the same tests many times to increase the probability of finding defects. There are specialized frameworks that can do this for you - they usually have names ending with QuickCheck which comes from the pioneering Haskell's QuickCheck. E.g. for Java there is JUnit QuickCheck. Additionally those frameworks often provide so called shrinkage - if test failed with value 1000 but passed with 500 the framework will start picking values in between to find the boundaries where failures start and where they end.
While this is a very impressive technique it's also extremely complicated. When I tried it I could spend couple of days thinking on how to fully cover my algorithm (and I haven't finished). It requires practice and mathematical thinking, so be prepared to meet difficulties. On the bright side - you don't have to cover your code at 100% level with this technique, you can still combine it with example-based testing to fill the holes.
You can read more about Property Based Testing in this nice article.
Another option and probably the most cost effective one is using Test Oracles. If there is an easier (but not as optimal or secure) way of doing the same thing, you can use a simpler implementation in the tests and cross-check the results of both algorithms. E.g. you can use a default sorting of your Programming SDK as a reference implementation while checking your super-fast O(1) and super-complicated sorting algorithm.
This is particularly useful if you transform objects from one another (DTO->Entity, Entity->DTO). You
fill the ObjectA with values randomly, then convert it into ObjectB, then back into ObjectA'. If ObjectA
equals ObjectA' then both transformations work correctly. Very efficient testing especially if you
employ reflection-based comparison (e.g. reflectionEquals()
from
Unitils).
But it's not limited to DTO-Entity transformations. A lot of operations may have reverse sisters: string tokenizer + concatenation, string occurrence replacing + replacing it back, element insertion + element removal, etc. The principle is the same - after the operation you apply a reverse function and check that the result is the same as the original input.
I'm still waiting for a possibility to try it out, but I can't resist mentioning Model Based Testing after watching this inspiring presentation. Instead of randomly generating data as we did previously this approach generates random behaviour. The idea is that we build a model of our app: ActionA can be done after ActionB but only if ActionC happened. Then we ask our framework to invoke these operations in random order (it must follow the rules though). After that we can check that the expected side-effect (final result) is what we described in the model.
This can be useful when testing complicated flows transitioning from state to state. But what's marvelous is that you can test it in concurrent environment as well. With that you don't only check that the flow is correct, but also that it works under high load with many concurrent users. After all if we bought a ticket 10 times then in the end we should have 10 unique tickets even if we did that in multiple threads.
Some of the frameworks can also provide the shrinkage - they will try to find the minimal sequence of steps that result in bug. And that is so important because after doing 1000 steps in 10 threads it'll be hard to reproduce the problem because of so much noise.
Sometimes there are object graphs where each object could be in different states. Returning to our example: 20 account types, 30 legal entity types with 5 account roles each. If the business rules for different types or roles may differ we should test that. But there are so many combinations - it may take too much time to run the tests. Especially if there are many features that we want to test for these combinations.
But if we build a model according to which we can generate these object graphs correctly, we can partition those combinations and check only a single one from a given equivalence class. This can reduce the number of tests from 1K to just 3. In case some particular combination is buggy - at some point we'll know that because these combinations differ from run to run.
Even if the number of combinations is manageable, at some point when the functionality is stabilized we can reduce the number of cases by checking only small subset of them each time. By doing this we can speed up our tests.
We can also randomize user locales, OS that's used to run the tests, SDK versions, etc. Of course only those environments that are expected to be compatible with our app should be used.
Well, that's true - the test needs be designed in a way that it won't fail (the test itself) in different environments. I.e. we don't want the test to fail just because someone's machine is too fast or too slow. But we still want to find real bugs. So if production code (not test) fails because we changed the environment while we expect it to work on that env, then we found something. If a test fails for a reason, then the test did its job right. After all - this is the reason we write tests.
But how do we reproduce the test if it relies on randomness? For that we need one of two: a) logging (values, environment) b) ability to set seed for random generator. Ideally we need both. Haven't worked with manually set seeds myself, but I can see how helpful it may be. If a failure happens we can reuse the same seed and all the "random" values will repeat.
Randomization is applicable in manual tests as well though it won't be as chaotic:
Randomized testing is helpful because:
Does it makes sense using both hardcoded and randomized data? Well, when testing looks complicated (Property Based) it makes sense to have examples as well to keep the tests understandable. But apart from that there don't seem to be any reasons to do that.
Note, that randomizing data is just one advice for proper test data management, see Effective Data Management for more. Also, check out the Test Pyramid as an example of project where randomized testing is used.