Custom Generators

5. Custom Generators

Basic stateless properties can cover a wide array of situations with only the basic data types provided by a property-based testing framework. On the other hand, most programs sooner or later end up using more complex data types, and custom generators become a necessity. This chapter introduces methods to write custom generators, and also shows how to evaluate how useful or relevant they actually are to the test case at hand.

5.1. Limitations of Default Generators

By default, the data types provided by PropEr include very generic Erlang types. Very generic data types imply a very large search space, and it may therefore be difficult to find examples of data that are varied enough to uncover edge cases.

The last exercise of Stateless Properties' questions asked to generate a property that validates the system’s ability to count words in strings. A candidate property for that test could be:

word_count(String) ->
    Stripped = string:trim(dedupe_spaces(String), both, " "),
    Spaces = lists:sum([1 || Char <- Stripped, Char =:= $\s]),
    case Stripped of
        "" -> 0;
        _ -> Spaces + 1
    end.

dedupe_spaces([]) -> [];
dedupe_spaces([$\s,$\s|Rest]) -> dedupe_spaces([$\s|Rest]);
dedupe_spaces([H|T]) -> [H|dedupe_spaces(T)].

prop_word_count_candidate() ->
    ?FORALL(String, non_empty(string()),
        begin
            %% assume the word count is equivalent to the number of spaces (not
            %% at the beginning or the end of the string) plus one, since a
            %% string that contains a single word won't have a space in it.
            Splits = length([1 || C <- string:trim(String,both," "), C == $\s]),
            word_count(String) =:= Splits+1
        end).

This property has to operate on generic strings that are fairly random-looking. The test itself focuses on word counting, which is a fairly specific activity based on a fairly specific subset of strings. It should not be surprising that the fairly random strings might therefore be a suboptimal fit for a word-counting test. The property can pass dozens of iterations without problem, and it should be reliable, but the test would be more trustworthy if it had a more specialized generator.

In fact, if this word-counting candidate property were to be run over more tests than the default number of iterations (100), a failure would be detected whenever two or more space characters (' ') are in a sequence in a generated string, or when a string with just spaces is submitted. This error should logically happen really early on in a good property test that is related to breaking up text on space characters, but since the default generators are so general, the odds for this specific case to show up are relatively low.

The breadth of the string() type is large: it will have the ability to search through over 125,000 characters, with lengths unbound. While this means a lot of irrelevant or repetitive tests will be run in specific cases (such as counting words), it can also prove very useful as a sanity check to see if wild unexpected inputs are handled properly. This type of testing is generally related to fuzzing. General generators, as it turns out, tend to make property-based testing a useful tool for fuzzing.

Custom generators can fortunately be written, allowing the data generated to be retargeted in arbitrary ways. In the case of a word counter, the test input would be far more relevant if the following generators were available:

splitting character: any sequence of one space or more
word: any sequence of characters that are not splitting characters

In fact, the test itself could be simpler. By creating alternating sequences of words and splitting characters, while maintaining a count of the words put in it, a custom generator could return strings with their expected word count. The property test would only need to make sure that the program returns the expected count:

prop_word_count() ->
    ?FORALL({WordyString, WordCount}, wordy_string(),
        begin
            word_count(WordyString) =:= WordCount
        end).

In order to write a wordy_string() generator, the property-based testing framework must offer some primitives. Before optimizing the generator, though, it would be useful to know how to measure how good their distribution is, and then see the primitives themselves.

5.2. Generator Distribution

To turn weak generators into better one first requires to demonstrate that the generator is weak. Analysis of the data set for each execution is required. The simplest way is to call proper_gen:pick() on a generator and look at the results. This tends to be cumbersome and time consuming.

5.2.1. collect

Instead of forcing manual accounting, PropEr gives a function called collect(Category, Property) that can count instances of each Category being input into Property. The metrics are then reported.

The function is used within the ?FORALL macro, around the test of the property:

prop_collect1() ->
    ?FORALL(Bin, binary(), collect(byte_size(Bin), is_binary(Bin))).

Running the property shows the metrics accumulated:

===> Testing prop_generators:prop_collect1()
... [...]
OK: Passed 100 test(s).

10% 2
7% 0
7% 3
6% 1
6% 6
6% 9
6% 11
5% 7
[...]

Since the byte size of binary() can vary considerably, regrouping the metrics by modifying the collector can prove worthwhile:

prop_collect2() ->
    ?FORALL(Bin, binary(),
            collect(to_range(10, byte_size(Bin)), is_binary(Bin))).

to_range(M, N) ->
    Base = N div M,
    {Base*M, (Base+1)*M}.

The to_range/2 function takes a value M and classifies it into buckets of entries over a range N:

===> Testing prop_generators:prop_collect2()
... [...]
OK: Passed 100 test(s).

56% {0,10}
27% {10,20}
13% {20,30}
3% {30,40}
1% {40,50}
===>
1/1 properties passed

In this case, more than 80% of generated binaries had a length between 0 and 20 bytes.

This type of information can be particularly useful when discriminating groups of data are known to be part of the problem space, and it is known that the test should cover them. Word counting, for example, had the categories for 'words' and 'splitting characters'. A modified version of the property could look at how many space characters were found:

prop_word_count_collected() ->
    ?FORALL(String, non_empty(string()),
        begin
            Splits = length([1 || C <- string:trim(String,both," "), C == $\s]),
            collect(Splits, word_count(String) =:= Splits+1)
        end).

This version uses the Splits value as a count of how many white space characters (\$s) are in the string, and those are reported in collect/2:

... [...]
OK: Passed 100 test(s).

96% 0
4% 1
===>
1/1 properties passed

Out of 100 tests, only 4 of them actually had space characters in them. In all of them, only one space character was generated. Technically speaking, the 100 tests only ran against strings that contained 1 or 2 words in them. Nothing more.

5.2.2. aggregate

Another function to gather statistics is aggregate(). aggregate() is similar to collect(), with the exception that it can take a list of categories to store:

prop_aggregate() ->
    Suits = [club, diamond, heart, spade],
    ?FORALL(Hand, vector(5, {oneof(Suits), choose(1,13)}),
            aggregate(Hand, true)). % `true' makes it always pass

This property’s generator creates a hand of five cards in a list. There may be duplicate cards, since nothing keeps the generator from returning something like {1, spade} 5 times. The list of cards is passed directly to aggregate/2. The function can use these multiple values and generate statistics:

OK: Passed 100 test(s).

3% {spade,11}
2% {club,1}
2% {club,8}
2% {heart,4}
2% {heart,8}
2% {heart,9}
2% {diamond,3}
[...]

The results show the overall probability of each card individually.

By using collect/2 or aggregate/2 with specific qualifiers, the distribution of the generated data can be analyzed. This is critical to ensuring the quality of the tests written, especially to make sure not all hundreds of test runs essentially boil down to variations of a single case.

5.3. Generator Sizes

Generators can nearly all be parametrized by an implicit size parameter handled by the framework. The size parameter is initially small, but as the number of test runs increase, the value is grown. The end objective is to start with simpler generated data and progressively add to their complexity.

Controlling the size manually can therefore be useful if highly complex data is what appears to trigger property failures. In such cases, the function resize(Size, Generator) can be called over an existing generator to using a given size value. The Size value must be a positive integer.

The prop_collect2() function introduced in Generator Distribution has a collector that would report the byte size of generated binaries, with the following result set:

56% {0,10}
27% {10,20}
13% {20,30}
3% {30,40}
1% {40,50}

By resizing the generator to some arbitrary value, the data size can be increased:

prop_resize() ->
    ?FORALL(Bin, resize(150, binary()),  % <= resized here
            collect(to_range(10, byte_size(Bin)), is_binary(Bin))).

The value 150 was chosen by trial and error. Running the test again reports new statistics:

15% {90,100}
10% {110,120}
9% {80,90}
8% {130,140}
7% {40,50}
7% {50,60}
7% {120,130}
[...]

The data sizes are still varied; they are not fixed to exactly 150, but where 80% of the results were 20 bytes of smaller before, the vast majority of them now sits well above 40 bytes in size.

One caveat of using the resize function with static values is that some of the variability of result sets is possibly lost. It is possible that both smaller or larger sizes would prompt interesting results. Another caveat is that relative sizing of various elements becomes cumbersome. For example, the following property test’s generator requires shorter names than biographies:

prop_profile1() ->
    ?FORALL(Profile, [{name, resize(10, string())},
                      {age, pos_integer()},
                      {bio, resize(350, string())}],
        begin
            NameLen = to_range(10, length(proplists:get_value(name, Profile))),
            BioLen = to_range(300, length(proplists:get_value(bio, Profile))),
            aggregate([{name, NameLen}, {bio, BioLen}], true)
        end).

Because size requirements vary between elements of the generator, multiple resize/2 function calls must be made and kept synchronized. Instead, PropEr and Quickcheck introduce a macro of the form ?SIZED(VarName, Expression), which introduces the variable VarName into the scope of Expression, bound to the 'size' value for the current execution.

prop_profile2() ->
    ?FORALL(Profile, [{name, string()},
                      {age, pos_integer()},
                      {bio, ?SIZED(Size, resize(Size*35, string()))}],
        begin
            NameLen = to_range(10, length(proplists:get_value(name, Profile))),
            BioLen = to_range(300, length(proplists:get_value(bio, Profile))),
            aggregate([{name, NameLen}, {bio, BioLen}], true)
        end).

In this property, the bio string is desired to be 35 times larger than the currently chosen size. Doing this allows to scale both the name and the bio relative to each other without imposing an anchor in terms of absolute size. This, in turn, provides some additional flexibility for the framework to do what it should:

→ rebar3 proper -m prop_generators -p prop_profile1,prop_profile2
===> Testing prop_generators:prop_profile1()
[...]

45% {name,{0,10}}
40% {bio,{0,300}}
10% {bio,{300,600}}
5% {name,{10,20}}

===> Testing prop_generators:prop_profile2()
[...]

32% {name,{0,10}}
28% {bio,{0,300}}
12% {bio,{300,600}}
10% {name,{10,20}}
7% {name,{20,30}}
4% {bio,{600,900}}
3% {bio,{900,1200}}
1% {bio,{1200,1500}}
1% {name,{30,40}}

This shows both names and bios tending to test over a much wider range of sizes, and in a less brittle manner.

5.4. Generators as Functions

All the generators shown until now were simple sequences of function calls, returning abstract pieces of data to be evaluated later. Since the generators return data, they can be wrapped into arbitrary functions. This allows to turn properties like this:

?FORALL({Profile1, Profile2, ...},
        {[{name, string()},
          {age, pos_integer()},
          {bio, ?SIZED(Size, resize(Size*35, string()))}],
         [{name, string()},
          {age, pos_integer()},
          {bio, ?SIZED(Size, resize(Size*35, string()))}],
         ...}
    begin
        ...
    end.

Into this one:

prop_profile3() ->
    ?FORALL({Profile1, Profile2, Profile3}, {profile(), profile(), profile()},
        begin
            lists:all(fun erlang:is_list/1, [Profile1, Profile2, Profile3])
        end).

profile() ->
    [{name, string()},
     {age, pos_integer()},
     {bio, ?SIZED(Size, resize(Size*35, string()))}].

Any generator can be defined as such a function.

5.5. Transforming Generators

In the course of writing generators, one of the common patterns is the need to generate data types that cannot simply be described in basic Erlang terms. For example, one may want to generate a queue of key/value pairs, using the queue module. Just using tuples and lists won’t be enough to enforce the internal constraint of the data structure.

One simple implementation may be to first generate a list of key/value pairs, and then turn them into queues inside the property expression:

prop_queue_naive() ->
    ?FORALL(List, list({term(), term()}),
        begin
            Queue = queue:from_list(List),
            queue:is_queue(Queue)
        end).

This makes reuse a bit annoying, since every property requiring a queue would then need to coordinate between its generator and priming them partly before the actual test. A better approach would be to use the ?LET(InstanceOfType, TypeGenerator, Transform) macro to apply a transformation to the generated dataset itself, without preventing the generator from being composable with others:

prop_queue_nicer() ->
    ?FORALL(Q, queue(), queue:is_queue(Q)).

queue() ->
    ?LET(List, list({term(), term()}), queue:from_list(List)).

Since the macro returns a generator itself (rather than an actualized data type), this allows for much nicer usage overall, since composability is preserved.

5.6. Restricting Generators

A common trait to all default generators is that they’re pretty broad in data they generate, and from time to time, there’s specific counterexamples that need to be excluded. Using the non_empty() generator to remove empty lists or binaries from the generated data set is one example of this. Such a filter generator can be implemented with the ?SUCHTHAT(InstanceOfType, TypeGenerator, BooleanExp):

non_empty(ListOrBinGenerator) ->
     ?SUCHTHAT(L, ListOrBinGenerator, L =/= [] andalso L =/= <<>>).

If any term L is an empty list or binary, the generated data will be discarded and generation will try a gain.

There is a limit to the number of tries; after too many failed attempts, the generation will fail with an error message such as:

Error: Couldn’t produce an instance that satisfies all strict constraints after 50 tries.

Similarly, generating a list of even or uneven numbers could be done by using ?SUCHTHAT() macros:

even() -> ?SUCHTHAT(N, integer(), N rem 2 =:= 0).
uneven() -> ?SUCHTHAT(N, integer(), N rem 2 =/= 0).

However, in this specific case it may be faster to use a transform to get there:

even() -> ?LET(N, integer(), N*2).
uneven() -> ?LET(N, integer(), N*2 + 1).

Since these transforms can generate correct data on the first try every time, they will be more efficient than using ?SUCHTHAT. One should be on the lookout for possibilities of reworking a restriction into a transformation.

Another thing to note is that not just guard expressions are accepted in SUCHTHAT macros. For example, the following generator looks for ISO Latin-1^[1] strings by using the io_lib:printable_latin1_list/1 function:^[2], which will let us restrict down the range of string().

latin1_string() ->
    ?SUCHTHAT(S, string(), io_lib:printable_latin1_list(S)).

The problem in this case is that for such generators as string(), the search space is large enough to be fairly ineffective. Generation can become long or fail too often. It then becomes interesting to look at building custom generators probabilistically.

5.7. Probabilistic Generators

Most of the default generators introduced in Default Generators tend to strictly return specific data types and are fairly straightforward. A few of them can be parametrized to influence how they work, and one of them specifically sees enough usage to warrant its own section.

The frequency([{N, Types}]) generator is special in that it allows probabilistic distributions of various types and sizes, and ends up being used in a lot of custom generators to allow such control.

The frequency/1 generator will prove especially useful when designing generators for complex data types where a uniform distribution is not desirable. Looking back at strings, the frequency of some characters could be used to make them look more 'realistic'.

When we were playing with the various string representations, we mentioned that it was often tricky to get values that were representative of real text: just using string() tended to yield a lot of control characters, codepoints that were very "out there", and very little in terms of latin1 or ASCII characters regular people are used to. This can be remediated with frequency/1:

text_like() ->
    list(frequency([{80, range($a, $z)},              % letters
                    {10, $\s},                        % whitespace
                    {1,  $\n},                        % linebreak
                    {1, oneof([$., $-, $!, $?, $,])}, % punctuation
                    {1, range($0, $9)}                % numbers
                   ])).

1> proper_gen:pick(proper_types:string()).
{ok,[35,15,0,12,2,3,3,1,10,25]}
2> proper_gen:pick(prop_generators:text_like()).
{ok,"rdnpw hxwd"}
3> proper_gen:pick(proper_types:resize(79, prop_generators:text_like())).
{ok,"vyb hhceqai  m f ejibfiracplkcn gqfvmmbspbt\nn.qbbzwmd"}

This generator produces something a lot closer to realistic text than what is obtained through string(). By tweaking the frequency numbers, specific characters can see their probability raised or lowered as required to properly exercise a property. A CSV^[3] parser would likely require a higher frequency of quotation marks, commas, and line breaks than what would usually be obtained, to properly exercise expected corner cases, for example.

5.8. Recursive Generators

Many generators require step-wise decision making, and others require a repetitive structure in order to work. Recursion is well-indicated for those, but recursive generators have a few tricks to consider.

The word count problem was specifically well-tailored for a recursive generator. The skeleton property looked like this:

prop_word_count() ->
    ?FORALL({WordyString, WordCount}, wordy_string(),
            word_count(WordyString) =:= WordCount).

Where wordy_string() generator returns fully formed bits of text with a count of the number of words in it. To generate the text itself, two generators are required:

splitting character: any sequence of one space or more^[4]
word: any sequence of characters that are not splitting characters

With an implementation such as:

splitter() ->
    %% one or more space characters
    non_empty(list($\s)).

word() ->
    %% a series of characters that are not a space
    ?SUCHTHAT(Chars, non_empty(string()),
              not lists:member($\s, Chars)).

In order to be able to produce a word count, the step-by-step approach will require generating a word, then a splitter, increment the count, and then repeat again, and so on for the entire string. This form of iteration is a perfect fit for a recursive generator. All recursive functions have the same fundamental elements:

one or more base cases where recursion stops;
one or more cases where a function calls itself.

In the case of a wordy string, this could look like:

wordy_string(StringFragments, Count) ->
    frequency([
        %% base case: return the string
        {1, {StringFragments, Count}},
        %% recursive case: add a word, a splitter, and increment the
        %% counter before trying again
        {50, wordy_string([word(), splitter() | Acc], Count+1)}
    ]).

This would, through a probabilistic and recursive generator, produce strings of around 50 words. The base case returns the generated fragments and count with a probability of 1 and the recursive case adds to both with a probability of 50. Changing the weight of either clauses would impact string length. The generator is not quite right though, because it will generate a list of string fragments, not the flat list required for a true string. The LET macro can be used for this:

wordy_string(StringFragments, Count) ->
    frequency([
        %% base case: return the flattened string
        {1, {?LET(Generated, Fragments, lists:append(Generated)), Count}},
        %% recursive case: add a word, a splitter, and increment the
        %% counter before trying again
        {50, wordy_string([word(), splitter() | Acc], Count+1)}
    ]).

Running this generator would unfortunately never terminate. This is because the wordy_string/2 generator calls itself, and Erlang is an eager language. Before frequency/1 has the time to make a selection, the recursive call to wordy_string/2 has already taken place. This will repeat indefinitely (or at least until memory runs out).

For those specific case, QuickCheck and PropEr have a ?LAZY() macro, which allows to defer the evaluation of an expression until it is actually required by the generator. This yields the final form:

wordy_string() ->
    wordy_string([], 0).

wordy_string(Fragments, Count) ->
    frequency([
        %% base case: return the flattened string
        {1, {?LET(Generated, Fragments, lists:append(Generated)), Count}},
        %% recursive case: add a word, a splitter, and increment the
        %% counter before trying again
        {50, ?LAZY(wordy_string([word(), splitter() | Fragments], Count+1))}
    ]).

Note that a wrapper has also been added so that the generator can be called without arguments. With this and the laziness, the generator can then take a time proportional to the length of the generated string to run, rather than infinity:

→ rebar3 as test shell
[...]
3> {ok, {String, _Count}} = proper_gen:pick(prop_generators:wordy_string()).
{ok,{[4,0,112,10,16,2,4,4,9,8,32,32,32,4,5,0,3,0,2,6,4,12,6,
      32,32,7|...],
     34}}
4> io:format("~ts~n", [String]).
^D^@p
^P^B^D^D        ^H   ^D^E^@^C^@^B^F^D^L^F  ^G^S^K^E^C^C^A^C^A        ^D^_          ^D
     ^H^A^A         ^R^P^A  ^D^F^B^C^F^W^N^C     ^G       ^H^C^@          ^A^G^L^Z^B          ^Y^A^@ ^A        ^E^@^P^C^L^@   ^A^P^E^B^E   $^@    ^D^D  ^B
      ^F^A^E^O^H^B       ^V
^A^K^H^@         ^Y^O^E      ^@^E^D^O^B    ^D             ^B^@^B^T         ^@^A^G^D^D^C^A^C^@       ^A^@^A^K  ^H^K)
ok

This is a funny definition of 'words', but then again, those are what the specifications are. Property-based testing is a great way to reveal how woefully underspecified a software project is. With the generator plugged into the previous prop_word_count definition, the property can now run for an arbitrary number of tests:

→ rebar3 proper -m prop_generators -p prop_word_count -n 5000
[...]
OK: Passed 5000 test(s).
===>
1/1 properties passed

The distribution of words to spaces should now be much higher than before. The same kind of logic can be applied to any recursive data structure.

The one caveat with this approach is that probabilities for recursion can mean two things:

Because the probabilities are fixed, the size of the created data structure will generally be unchanging
Because there are probabilities at all, there is a chance some data structures will be enormous, and possibly not terminate

In some cases, the potential variation for very large or very small structures will be desirable, but not always. Recursive functions can be written the way they usually are in Erlang, using a marker such as a counter that decrements to know when to stop:

wordy_string_deterministic(0, Fragments, Count) ->
    %% base case: return the flattened string
    {?LET(Generated, Fragments, lists:append(Generated)), Count};
wordy_string_deterministic(N, Fragments, Count) ->
    %% recursive case: add a word, a splitter, and increment the
    %% counter before trying again
    wordy_string_deterministic(N-1, [word(), splitter() | Fragments], Count+1).

Because the structure is flat, the ?LAZY macro can be avoided. All that is needed is a wrapper to seed the size parameter. It turns out that the ?SIZED macro is perfect for this:

wordy_string_deterministic() ->
    %% Set a size proportional to length
    ?SIZED(Length, wordy_string_deterministic(Length, [], 0)).

The variability of the data structure size will then be managed by the framework. Someone wanting smaller or larger lists could still use resize/2 to impact the total size.

5.9. Symbolic Calls

In the case where the generator is used to build a given data structure through side-effects, such as sequences of writes to a file, filling an ETS table, or just sending messages to a process, a counter-example built from regular or recursive generators will look like on opaque structure. In the case of a file or socket, it may be a 'Port' data type (#Port<0.123>—Erlang’s file descriptors, more or less); in the case of ETS, an integer table identifier; for a process, a Pid (<0.123.0>). Another difficult to read type of generator is for data structures that are too abstract in representation to reasonably make sense out of them from the printout. In such cases, digging for the bad state would be cumbersome, possibly requiring temporary debugging code to be added to the property since the opaque data type shows no relevant information. Symbolic calls can help.

A symbolic call is just a special notation for function calls that can be used in generators. Rather than executing the operation straight away, the calls are built up as a data structure. Once the generator materializes, then they get executed at once. The notation is {call, Module, Function, ArgumentList}:

Function Call

Symbolic Call

Automated Symbolic Call (PropEr only)

sets:new()

{call, sets, new, []}

{'$call', sets, new, []}

queue:join(Q1, Q2)

{call, queue, join, [Q1, Q2]}

{'$call', queue, join, [Q1, Q2]}

lists:sort([1,2,3])

{call, lists, sort, [[1,2,3]]}

{'$call', lists, sort, [[1,2,3]]}

local(Arg) ^[5]

{call, ?MODULE, local, [Arg]}

{'$call', ?MODULE, local, [Arg]}

By using this format, calls can be made simpler when shrinking:

dict_gen() ->
    ?LET(X, list({integer(),integer()}), dict:from_list(X)).

dict_symb() ->
    ?SIZED(Size, dict_symb(Size, {call, dict, new, []})).
dict_symb(0, Dict) ->
    Dict;
dict_symb(N, Dict) ->
    dict_symb(N-1, {call, dict, store, [integer(), integer(), Dict]}).

dict_autosymb() ->
    ?SIZED(Size, dict_autosymb(Size, {'$call', dict, new, []})).
dict_autosymb(0, Dict) ->
    Dict;
dict_autosymb(N, Dict) ->
    dict_autosymb(N-1, {'$call', dict, store, [integer(), integer(), Dict]}).

In properties, they look like:

prop_dict_gen() ->
    ?FORALL(D, dict_gen(), dict:size(D) < 5).

prop_dict_symb() ->
    ?FORALL(DSymb, dict_symb(), dict:size(eval(DSymb)) < 5).

prop_dict_autosymb() ->
    ?FORALL(D, dict_autosymb(), dict:size(D) < 5).

The first one is the standard generator, giving a standard dictionary. The second one uses symbolic calls, which require calling eval(Gen) ^[6] to execute the symbolic calls into the final dictionary. The automated symbolic calls are automatically evaluated when being returned.

The three properties are set up to fail, to show the difference on shrinking:

===> Testing prop_generators:prop_dict_gen()
...............!
Failed: After 16 test(s). (1)
{dict,5,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],
 [],[],[],[],[]},{{[],[[3|-6]],[],[],[],[[-1|-5]],
[[-14|10],[2|3]],[[5|1]],[],[],[],[],[],[],[],[]}}}
[...]
===> Testing prop_generators:prop_dict_symb()
..............!
Failed: After 15 test(s). (2)
{call,dict,store,[-6,8,{call,dict,store,[46,-13,{call,dict,store,[2,-2,{call,dict,store,[22,-2,{call,dict,store,[-12,-2,{call,dict,new,[]}]}]}]}]}]}
[...]
===> Testing prop_generators:prop_dict_autosymb()
............!
Failed: After 13 test(s). (3)
{'$call',dict,store,[7,2,{'$call',dict,store,[-1,2,{'$call',dict,store,[1,18,{'$call',dict,store,[-9,0,{'$call',dict,store,[-10,3,{'$call',dict,new,[]}]}]}]}]}]}
[...]

1	The standard dictionary fails by always showing the full data structure
2	The symbolic calls unsurprisingly show to their raw form; since they are evaluated within the property, the symbolic calls are all that show up
3	The automated symbolic calls also show their raw form, without having needed to be evaluated

With the help of symbolic calls, generated terms that are too obscure (whether through their complex size or because side-effects hide the way the state was obtained) can be made simpler to debug.

5.10. Exercises

5.10.1. Question 1

What are the functions to use to introspect the distribution of entries in a test run?

5.10.2. Question 2

What is the macro to be used to modify a generator with regular Erlang functions?

5.10.3. Question 3

When and why would someone need the ?LAZY macro?

5.10.4. Question 4

The following probabilistic generator creates trees that are not necessarily balanced through probabilistic means:

%% The tree generates a data type that represents the following types:
-type tree() :: tree(term()).
-type tree(T) :: {node,
                  Value :: T,
                  Left  :: tree(T) | undefined,
                  Right :: tree(T) | undefined}.

tree() ->
    tree(term()).

tree(Type) ->
    frequency([
        {1, {node, Type, tree(Type), undefined}},
        {1, {node, Type, undefined, tree(Type)}},
        {5, {node, Type, tree(Type), tree(Type)}}
    ]).

Make sure it can consistently terminate and generate trees of an interesting size. Using the Size variable may be an advantage for better complexity scaling.

5.10.5. Question 5

The following generators allow to generate stamps of the format {Hour, Minute, Second}:

stamp() -> {hour(), min(), sec()}.
hour() -> choose(0,23).
min()  -> choose(0,59).
sec()  -> choose(0,59).

The following are pairs of modified or restricted timestamps. Compare their implementations, and explain which of each pair is the most appropriate.

%% Return hours in the morning
am_stamp1() ->
    ?SUCHTHAT({H,M,S}, stamp(), H < 12).
am_stamp2() ->
    ?LET({H,M,S}, stamp(), {H rem 12, M, S}).

%% Return two ordered timestamps
stamps1() ->
    ?SUCHTHAT({S1, S2}, {stamp(), stamp()}, S1 =< S2).
stamps2() ->
    ?LET({S1, S2}, {stamp(), stamp()}, {min(S1,S2), max(S1,S2)}).

%% Return any time that does not overlap standup meeting
no_standup1() ->
    ?SUCHTHAT({H,M,_}, stamp(), H =/= 9 orelse M > 10).
no_standup2() ->
    ?LET({H,M,S}, stamp(),
         case H of
            9 when M =< 10 -> {8, M, S};
            _ -> {H,M,S}
         end).

5.10.6. Question 6

Write a symbolic generator that would create a file with various bytes in it, and return the active file handler (or file descriptor) for it. When a property fails, the user should be able to see the failing output without requiring to peek inside files.

You may need wrappers for file:open/2 and file:write/2 to ensure easier composability of these functions. Here’s an example of those:

file_open(Name, Opts) ->
    {ok, Fd} = file:open(Name, Opts),
    %% ensure the file is refreshed on each test run
    file:truncate(Fd),
    Fd.

file_write(Fd, Data) ->
    ok = file:write(Fd, Data),
    Fd.

5.11. Solutions

5.11.1. Question 1

What are the functions to use to introspect the distribution of entries in a test run?

collect(Term, BoolExpression)
aggregate([Term], BoolExpression)

5.11.2. Question 2

What is the macro to be used to modify a generator with regular Erlang functions?

?LET(Var, Generator, ErlangExpr); otherwise, the generators are abstract representations that won’t be modifiable the way the generated data would be.

5.11.3. Question 3

When and why would someone need the ?LAZY macro?

Whenever a generator that may probabilistically choose multiple branches is called, using eager evaluation means that all alternative paths for it will be evaluated at once. If the generator is made recursive, this can quickly blow the size of the computation to very large levels. The ?LAZY(Generator) macro allows to only evaluate a given branch when required, ensuring faster execution with more predictable memory usage.

5.11.4. Question 4

The following probabilistic generator creates trees that are not necessarily balanced through probabilistic means:

%% The tree generates a data type that represents the following types:
-type tree() :: tree(term()).
-type tree(T) :: {node,
                  Value :: T,
                  Left  :: tree(T) | undefined,
                  Right :: tree(T) | undefined}.

tree() ->
    tree(term()).

tree(Type) ->
    frequency([
        {1, {node, Type, tree(Type), undefined}},
        {1, {node, Type, undefined, tree(Type)}},
        {5, {node, Type, tree(Type), tree(Type)}}
    ]).

Make sure it can consistently terminate and generate trees of an interesting size. Using the Size variable may be an advantage for better complexity scaling.

The first step will be to ensure the generator can terminate through by using ?LAZY macros:

tree() ->
    tree(term()).

tree(Type) ->
    frequency([
        {3, {node, Type, undefined, undefined}},
        {2, {node, Type, ?LAZY(tree(Type)), undefined}},
        {2, {node, Type, undefined, ?LAZY(tree(Type))}},
        {3, {node, Type, ?LAZY(tree(Type)), ?LAZY(tree(Type))}}
    ]).

Although the tree may terminate, finding a good balance can still prove tricky. The numbers have been modified a bit to fit better, but using the Size variable proves more effective to put a predictable boundary on growth:

limited_tree(Type) ->
    ?SIZED(Size, limited_tree(Size, Type)).

limited_tree(Size, Type) when Size =< 1 ->
    {node, Type, undefined, undefined};
limited_tree(Size, Type) ->
    frequency([
        {1, {node, Type, ?LAZY(limited_tree(Size-1, Type)), undefined}},
        {1, {node, Type, undefined, ?LAZY(limited_tree(Size-1, Type))}},
        {5, {node, Type,
             %% Divide to avoid exponential growth
            ?LAZY(limited_tree(Size div 2, Type)),
            ?LAZY(limited_tree(Size div 2, Type))}}
    ]).

Although more verbose, it behaves much better.

5.11.5. Question 5

The following generators allow to generate stamps of the format {Hour, Minute, Second}:

stamp() -> {hour(), min(), sec()}.
hour() -> choose(0,23).
min()  -> choose(0,59).
sec()  -> choose(0,59).

The following are pairs of modified or restricted timestamps. Compare their implementations, and explain which of each pair is the most appropriate.

%% Return hours in the morning
am_stamp1() ->
    ?SUCHTHAT({H,M,S}, stamp(), H < 12).
am_stamp2() ->
    ?LET({H,M,S}, stamp(), {H rem 12, M, S}).

%% Return two ordered timestamps
stamps1() ->
    ?SUCHTHAT({S1, S2}, {stamp(), stamp()}, S1 =< S2).
stamps2() ->
    ?LET({S1, S2}, {stamp(), stamp()}, {min(S1,S2), max(S1,S2)}).

%% Return any time that does not overlap standup meeting
no_standup1() ->
    ?SUCHTHAT({H,M,_}, stamp(), H =/= 9 orelse M > 10).
no_standup2() ->
    ?LET({H,M,S}, stamp(),
         case H of
            9 when M =< 10 -> {8, M, S};
            _ -> {H,M,S}
         end).

For morning stamps

The first implementation, while clear to its intent, is likely to be far less efficient as it is going to require dropping roughly half of all the generated terms to replace them with new ones, assuming a roughly equal distribution for digits between 0 and 23 for the H position.

By comparison, the generator with a ?LET uses a modulus to ensure that any hour greater than 11 starts over at 0. It will need to do one operation per generator, and will never have to retry. It is the better implementation.

For ordered stamps

Similarly in the second one, chances are roughly 50% that the first generator will need to be discarded because the probability that one stamp is greater than the other should be fairly even. Using the min/2 and max/2 functions gives a proper ordering without needing to generate newer stamps. The second solution is better.

For meeting hours

In this set of generators, there is no clearly defined way to use a ?LET to transform data that is not acceptable into acceptable one. The filtering done there is roughly the same as the one in ?SUCHTHAT and some ad-hoc procedure is used to generate alternative data.

Because the range of restricted samples is fairly limited when compared to the whole problem space, the readability of the first solution is likely better in the long run.

5.11.6. Question 6

You may need wrappers for file:open/2 and file:write/2 to ensure easier composability of these functions. Here’s an example of those:

file_open(Name, Opts) ->
    {ok, Fd} = file:open(Name, Opts),
    %% ensure the file is refreshed on each test run
    file:truncate(Fd),
    Fd.

file_write(Fd, Data) ->
    ok = file:write(Fd, Data),
    Fd.

Using the provided wrapper functions, it is enough to export them and then use them with {'$call, …} symbolic calls:

file(Name) ->
    ?SIZED(Size,
           lines(Size, {'$call', ?MODULE, file_open, [Name, [read, write, raw]]})).

lines(Size, Fd) ->
    if Size =< 1 -> Fd
     ; Size > 1  -> lines(Size-1, {'$call', ?MODULE, file_write, [Fd, bin()]})
    end.

bin() ->
    non_empty(binary()).

The file is opened in both read/write mode to be more flexible to whoever will use it when running the test. The raw mode is entirely optional. It makes things go faster, but removes some flexibility when using the file module regarding encoding and buffering. Then the file is created. The generator asks for a name (which may or may not be a generator) and uses the Size parameter to know how big of a file to generate.

Through the usage of the shim functions, as many bytes can be added as desired over multiple calls. In this case, non-empty binaries are just sufficient, but the generator could as well have been written to let the user pass a generator for the data to file and lines, becoming fully configurable.

1. https://en.wikipedia.org/wiki/ISO/IEC_8859-1

2. http://erlang.org/doc/man/io_lib.html#printable_latin1_list-1

3. https://en.wikipedia.org/wiki/Comma-separated_values

4. If this were actual software needing to handle unicode, there would likely be a need to know the locale for language-specific splitters and a better handling of combining characters, but for the sake of simplicity, regular spaces will do

5. the function must be exported

6. proper_symb:eval/1 and eqc_symbolic:eval/1 are the fully qualified versions of the calls

PropEr Testing