Shrinking

8. Shrinking

Shrinking is the mechanism by which a property-based testing framework can be told how to simplify failure cases enough to let it figure out exactly what the minimal reproducible case is.

Sometimes the input required to find a failure can be fairly large or complex. Finding the initial failing case may have required hundreds of attempts, and it may contain vast amounts of irrelevant information. The framework will then attempt to reduce that data set through shrinking. It generally does so by transforming all the generators used and trying to bring them back towards their own zero point:

A number tends to shrink from floating point values towards integers, and integers tend to shrink towards the number 0.
binaries tend to shrink from things full of bytes towards the empty binary (<<>>)
lists also tend to shrink towards the empty list
elements([A,B,C]) will shrink towards the value A

In short, containers tend to empty themselves, and other values try to find a neutral point. The nice aspect of this is that as custom generators are built from other generators, the shrinking is 'inherited', and a custom generator may get its own shrinking for free.

For some data types however, there is no good zero point: a vector of length 15 will always have length 15, and same with a tuple. Similarly, larger recursive data structures that have been defined by the user may not have obvious legal ways to shrink. This will particularly be the case in statically-typed languages that take their inspiration from the Haskell implementation (without covering everything the Haskell version covers), where actual types declarations are used as generators, rather than composable functions as in Erlang. For many statically-typed languages where the base Property-based testing framework doesn’t have very powerful built-in combinators for generators, the default shrinking rules are sometimes not that impressive, especially when the type system cannot appropriately represent actual data type constraints.

In such frameworks, whatever information ends up not encodable in the type system (such as a collection being ordered, a tree that is expected to be balanced, a UUID in string form that should be case insensitive, and so on) needs to be specified as shrinking rules.^[1]

In PropEr and Quickcheck, the need to specify shrinking still exists, but far less often. Shrinking is rather a tool to be used in order to give hints to the test framework in order to better reduce the size of the data set while still keeping it relevant after a failure.

8.1. ?SHRINK

The simplest way to shrink is through the ?SHRINK(Generator, [AltGenerators]) macro. This macro allows the specification of the actual generator as usual, but also allows the specification of alternative generators to be used when trying to reduce the size of the data structure.

This is useful when there are multiple variations in data that may or may not be the source of failures, and there are simple ways to encode safer patterns through the alternative generators.

For example, when dealing with time, January 1st 1970 is a more common and meaningful zero value than the year 0 itself (never mind that year zero does not necessarily exist in the first place). When dealing with a date and time piece of data, re-centering the 'zero' towards that epoch may be practical.

Let’s take a look at the following set of generators used to create strings of the form "1997-08-04T12:02:18-05:00", in accordance with RFC 8601. This set of generators will be centering its shrinking efforts towards January 1st January 1970. Also note that the RFC is somewhat lax and allows (or rather, does not forbid) the notation of a timezone that is +99:76, even if that is nonsensical from a human perspective.

strdatetime() ->
    ?LET(DateTime, datetime(), to_str(DateTime)).

datetime() ->
    {date(), time(), timezone()}.

date() ->
    ?SUCHTHAT({Y,M,D}, {year(), month(), day()}, calendar:valid_date(Y,M,D)).

year() ->
    ?SHRINK(range(0, 9999), [range(1970,2000), range(1900, 2100)]).

month() ->
    range(1,12).

day() ->
    range(1,31).

time() ->
    {range(0,24), range(0,59), range(0,60)}.

timezone() ->
    {elements(['+','-']),
     ?SHRINK(range(0,99), [range(0,14), 0]),
     ?SHRINK(range(0,99), [0,15,30,45])}.

%% Helper to convert the internal format to a string
to_str({ {Y,M,D}, {H,Mi,S}, {Sign,Ho,Mo} }) ->
    lists:flatten(
      io_lib:format("~4..0b-~2..0b-~2..0bT~2..0b:~2..0b:~2..0b~s~2..0b:~2..0b",
                    [Y,M,D,H,Mi,S,Sign,Ho,Mo])
    ).

Various forms of shrinking are used: years will be generated from year 0 to 9999, but will start shrinking on errors with ranges between 1970 and 2000, and if that fails, 1900 to 2100. Multiple alternative generators can be used.

Similarly, timezones will look for 00-99 for their ranges, but will try to settle on hours between 0 an 14 (expected human ranges, not just formatting oddities) and minutes that match currently existing timezones.

You can see a sample of how shrinking would work by calling the sampleshrink function:

1> proper_gen:sampleshrink(prop_shrink:strdatetime()).
"1757-06-26T02:36:60-64:38"
"1995-06-26T02:36:60-64:38"
"1970-06-26T02:36:60-64:38"
"1970-01-26T02:36:60-64:38"
"1970-01-01T02:36:60-64:38"
"1970-01-01T00:36:60-64:38"
"1970-01-01T00:00:60-64:38"
"1970-01-01T00:00:00-64:38"
"1970-01-01T00:00:00+64:38"
"1970-01-01T00:00:00+09:38"
"1970-01-01T00:00:00+00:38"
"1970-01-01T00:00:00+00:00"
ok

In practice, shrinking will be done less linearly than this. A given attempt at shrinking that fails to create another failing case won’t be used. If a test failed on every month of July, the shrinking would end up looking like "1970-07-01T00:00:00+00:00".

8.2. ?LETSHRINK

A more complex way to shrink is through the ?LETSHRINK([Pattern, …], [Generator, …], Expression) macro. The macro always requires patterns and generators to be in lists. One example usage may look like:

?LETSHRINK([A,B,C], [list(number()), list(number()), list(number())],
           A ++ B ++ C)

In this expression, the generators A, B, and C are merged together into one with A ++ B ++ C, much in the same way we would do it in a regular ?LET macro. The difference is that if the property-based testing framework has to generate a simpler term when shrinking, it will instead pick just one of A, B, or C without applying the transformation and return that directly.

?LETSHRINK is particularly appropriate for recursive structures, data made through branching, and just all kinds of pieces of data that are generated by smashing others together and applying transformations, since taking a part of it is a legitimate way to get a simpler version.

The most common form of ?LETSHRINK is the one to build binary trees. A generator for binary trees whose size is directed by its argument N would look like:

tree(N) when N =< 1 ->
    {leaf, number()};
tree(N) ->
    PerBranch = N div 2,
    {branch, tree(PerBranch), tree(PerBranch)}.

Running sampleshrink/1 on it reveals that shrinking always takes place on the contents of the tree, but not its structure (since the size is bound):

1> proper_gen:sampleshrink(prop_shrink:tree(4)).
{branch,{branch,{leaf,13},{leaf,0.6154862580810709}},
        {branch,{leaf,8},{leaf,-3}}}
{branch,{branch,{leaf,0},{leaf,0.6154862580810709}},
        {branch,{leaf,8},{leaf,-3}}}
{branch,{branch,{leaf,0},{leaf,-6}},{branch,{leaf,8},{leaf,-3}}}
{branch,{branch,{leaf,0},{leaf,0}},{branch,{leaf,8},{leaf,-3}}}
{branch,{branch,{leaf,0},{leaf,0}},{branch,{leaf,0},{leaf,-3}}}
{branch,{branch,{leaf,0},{leaf,0}},{branch,{leaf,0},{leaf,0}}}

Each of the five trees above contain exactly 4 entries, and all the integers tend to 0, but the tree structure itself is fixed in size. The only way to get the tree to shrink is through parametrizing it with the Size variable obtained from the ?SIZED(Var, Exp) macro. But if the failure require internal tree elements to be large while the tree structure itself is small, then there’s few chances that shrinking by size only will work well.

Here’s the shrink-friendly version:

tree_shrink(N) when N =< 1 ->
    {leaf, number()};
tree_shrink(N) ->
    PerBranch = N div 2,
    ?LETSHRINK([L, R], [tree_shrink(PerBranch), tree_shrink(PerBranch)],
               {branch, L, R}).

By building individual branches through the ?LETSHRINK macro and assembling them in there, the framework is able to prune parts of the trees it would detect as irrelevant to a test case failure. This allows to generate much bigger trees to initially find bugs, without losing clarity when getting a good counterexample:

2> proper_gen:sampleshrink(prop_shrink:tree_shrink(16)).
{branch,{branch,{branch,{branch,{leaf,28},{leaf,-0.039220389013186946}},
                        {branch,{leaf,3.9013940456284684},{leaf,-3}}},
                {branch,{branch,{leaf,14.576812882147989},{leaf,-16}},
                        {branch,{leaf,-2.0345272435474966},
                                {leaf,-8.151564195158691}}}},
        {branch,{branch,{branch,{leaf,-12},{leaf,-85}},
                        {branch,{leaf,16.829645166380576},{leaf,-1}}},
                {branch,{branch,{leaf,1},{leaf,5.058669843388856}},
                        {branch,{leaf,7},{leaf,2}}}}}
{branch,{branch,{branch,{leaf,28},{leaf,-0.039220389013186946}},
                {branch,{leaf,3.9013940456284684},{leaf,-3}}},
        {branch,{branch,{leaf,14.576812882147989},{leaf,-16}},
                {branch,{leaf,-2.0345272435474966},
                        {leaf,-8.151564195158691}}}}
{branch,{branch,{leaf,28},{leaf,-0.039220389013186946}},
        {branch,{leaf,3.9013940456284684},{leaf,-3}}}
{branch,{leaf,28},{leaf,-0.039220389013186946}}
{leaf,28}
{leaf,0}

Rather than a constant tree size, each shrink subsequently makes the tree smaller (if the property still fails). This helps in figuring out precisely why a test failed with more accuracy: if the tree size is at play, then it will remain large. If the tree contents are at play, then a smaller tree with a better focus on the bad elements will be generated.

In most cases, for a framework like PropEr or Quickcheck, adding shrinking instructions is not something that needs to be done as part of writing the generator the first time around. Rather, it’s something that will be worth doing once a confusing counter-example is found and the minimal counter-example given by the framework is not understandable on its own.

As with other property-based testing debugging practices, the work will be iterative and a bit explorative. The generator, its shrinking, the test cases, and our understanding of the program itself will all get to improve as we discover the hidden properties embedded in the code that was written.

8.3. Exercises

8.3.1. Question 1

What are the two macros used for shrinking and what do their arguments stand for?

8.3.2. Question 2

What are the differences between the ?LETSHRINK macro and the ?LET macro?

8.3.3. Question 3

In the following property, a list of servings for a meal is generated. If any serving contains dairy, the property fails:

prop_too_much_dairy() ->
    ?FORALL(Food, meal(), dairy_count(Food) == 0).

dairy_count(L) ->
    length([X || X <- L, is_dairy(X)]).

is_dairy(cheesesticks) -> true;
is_dairy(lasagna) -> true;
is_dairy(icecream) -> true;
is_dairy(milk) -> true;
is_dairy(_) -> false.

The generator looks like this:

meal() ->
    ?LETSHRINK([Appetizer, Drink, Entree, Desert],
               [elements([soup, salad, cheesesticks]),
                elements([coffee, tea, milk, water, juice]),
                elements([lasagna, tofu, steak]),
                elements([cake, chocolate, icecream])],
               [Appetizer, Drink, Entree, Desert]).

However, whenever the test case fails, we instead get results set that always contain the four courses. Fix the ?LETSHRINK usage in the generator so that data can be appropriately shrunk.

8.3.4. Question 4

In the case study chapter about having items on discount, the following generator was introduced:

item_price_special() ->
    %% first LET: PriceList
    ?LET(PriceList, price_list(),
         %% second LET: SpecialList
         ?LET(SpecialList, special_list(PriceList),
              %% third LET: Regular + Special items and prices
              ?LET({{RegularItems, RegularExpected},
                    {SpecialItems, SpecialExpected}},
                   {regular_gen(PriceList, SpecialList),
                    special_gen(PriceList, SpecialList)},
                   %% And merge + return initial lists:
                   {RegularItems ++ SpecialItems,
                    RegularExpected + SpecialExpected,
                    PriceList, SpecialList}))).

This generator merges two types of data: those without a special, and those with a special. The two types are joined together into one larger item list with prices, which can then be passed to the property.

Modify the generator so that both types of pricing can shrink independently.

8.4. Solutions

8.4.1. Question 1

What are the two macros used for shrinking and what do their arguments stand for?

?SHRINK(Generator, [AltGenerators, …])
?LETSHRINK([Pattern, …], [Generator, …], Expression)

8.4.2. Question 2

What are the differences between the ?LETSHRINK macro and the ?LET macro?

?LETSHRINK always takes a list of arguments and generators, whereas ?LET takes any pattern and any generator in its first two arguments. When failing in a test case and needing a new simplified generator, one of the generators in the list will be used, but without the transformations being applied. A ?LET macro has no such rule and the transformation is always applied.

8.4.3. Question 3

In the following property, a list of servings for a meal is generated. If any serving contains dairy, the property fails:

prop_too_much_dairy() ->
    ?FORALL(Food, meal(), dairy_count(Food) == 0).

dairy_count(L) ->
    length([X || X <- L, is_dairy(X)]).

is_dairy(cheesesticks) -> true;
is_dairy(lasagna) -> true;
is_dairy(icecream) -> true;
is_dairy(milk) -> true;
is_dairy(_) -> false.

The generator looks like this:

meal() ->
    ?LETSHRINK([Appetizer, Drink, Entree, Desert],
               [elements([soup, salad, cheesesticks]),
                elements([coffee, tea, milk, water, juice]),
                elements([lasagna, tofu, steak]),
                elements([cake, chocolate, icecream])],
               [Appetizer, Drink, Entree, Desert]).

However, whenever the test case fails, we instead get results set that always contain the four courses. Fix the ?LETSHRINK usage in the generator so that data can be appropriately shrunk.

The problem here is that the property expects a list coming out of the generator, but the way ?LETSHRINK works is that every single of the variable between Appetizer, Drink, Entree, or Dessert is an atom put in a list. During a failing test case, the shrinking attempt will fail as the property cannot work through receiving an atom when it expects a list.

Instead, the generator should be reworked as follows:

meal() ->
    ?LETSHRINK([Appetizer, Drink, Entree, Desert],
               [[elements([soup, salad, cheesesticks])],
                [elements([coffee, tea, milk, water, juice])],
                [elements([lasagna, tofu, steak])],
                [elements([cake, chocolate, icecream])]],
               Appetizer ++ Drink ++ Entree ++ Desert).

This ensures that every single variable is always a list, and the final result passed to the property in a successful case also remains a list. The elements generated should technically have the same type as the one returned for ?LETSHRINK to work well.

8.4.4. Question 4

In the case study chapter about having items on discount, the following generator was introduced:

item_price_special() ->
    %% first LET: PriceList
    ?LET(PriceList, price_list(),
         %% second LET: SpecialList
         ?LET(SpecialList, special_list(PriceList),
              %% third LET: Regular + Special items and prices
              ?LET({{RegularItems, RegularExpected},
                    {SpecialItems, SpecialExpected}},
                   {regular_gen(PriceList, SpecialList),
                    special_gen(PriceList, SpecialList)},
                   %% And merge + return initial lists:
                   {RegularItems ++ SpecialItems,
                    RegularExpected + SpecialExpected,
                    PriceList, SpecialList}))).

Modify the generator so that both types of pricing can shrink independently.

Since the special list and the price list are both required for the rest of operations, those two ?LET will be left alone. The third ?LET is where the merging takes place, and that’s where we’ll operate.

The gotcha there is once again that the types generated by the ?LET expression as a whole ({Items, Expected, PriceList, SpecialList}) do not match what the generators inside the ?LET produce ({{RegularItems, RegularExpected}, {SpecialItems, SpecialExpected}}). We can’t just replace it wholesale for a ?LETSHRINK.

To do so, we must ensure that the values match. This can be done by adding an expression:

item_price_special() ->
    %% first LET: PriceList
    ?LET(PriceList, price_list(),
         %% second LET: SpecialList
         ?LET(SpecialList, special_list(PriceList),
              %% third LET: Regular + Special items and prices
              ?LET({Items, Price},
                   %% Help shrinking by first trying only one of the
                   %% two possible lists in case a specific form causes
                   %% the problems on its own
                   ?LETSHRINK([{RegularItems, RegularExpected},
                               {SpecialItems, SpecialExpected}],
                              [regular_gen(PriceList, SpecialList),
                               special_gen(PriceList, SpecialList)],
                              %% And we merge:
                              {RegularItems ++ SpecialItems,
                               RegularExpected + SpecialExpected}),
                    {Items, Price, PriceList, SpecialList}))).

In this implementation, the inputs and outputs of ?LETSHRINK are always 2-tuples that include a list of items and a price expected. The surrounding ?LET is in charge of taking these values and pairing them up in a 4-tuple with the price list and special list. This then lets the framework get the chance to isolate whether the problem is with regular items, special items, or only both at once.

1. For more reading on the distinction, see Integrated vs type based shrinking and Compositional shrinking by David R. MacIver, author of Hypothesis

PropEr Testing