Property-Based Testing
In the following we elaborate on the idea of testing functions by stating properties that are expected to hold for them, then verify the function implementations through randomly generated test cases.
Properties
When dealing with type classes we discussed the concept of laws, which are
properties that should hold for instances of a given type class. For instance,
if a type T is an instance of Eq we expect x == x to evaluate to True
for every possible value x :: T.
Properties are not, however, intrinsically linked to type classes. An example
of a property not related to any type class is the interaction between length
and (++). This holds in general, but for concreteness consider lists of
integers. For any two lists xs :: [Integer] and ys :: [Integer] we have
length (xs ++ ys) = length xs + length ys
Another way of stating the same thing is that the function
prop_lengthAppend :: [Integer] -> [Integer] -> Bool
prop_lengthAppend xs ys = length (xs ++ ys) == length xs + length ys
returns True for all possible arguments. We could come up with a number of test cases, e.g.
tediousTestCases :: [([Integer], [Integer])]
tediousTestCases = [([], []), ([0], [1, 2]), ([3, 4, 5], [])] -- etc.
and test with something like all (\(xs, ys) -> prop_lengthAppend xs ys) tediousTestCases, but this is quite tedious. QuickCheck automates this tedium
away by generating (somewhat) random inputs. The workhorse is quickCheck
which accepts something Testable (explained later) runs it with a number of
different inputs. Simply running quickCheck prop_lengthAppend covers more
cases than any unit test suite we would realistically have the patience to
maintain. The default is 100 tests, but if we want more we can run e.g.
quickCheck $ withMaxSuccess 10000 prop_lengthAppend
Of course, no amount of test cases is enough to argue total correctness, but tuning the amount of tests allows us to trade time for certainty.
Counterexamples
Consider another property, stating that (++) is commutative:
prop_appendCommutative :: [Integer] -> [Integer] -> Bool
prop_appendCommutative xs ys = xs ++ ys == ys ++ xs
Running quickCheck prop_appendCommutative quickly falsifies this theory and
prints out a counterexample such [0] and [1]. QuickCheck is very useful
when we are genuinely unsure whether a property holds, since in practice false
properties often have easy-to-find counterexamples.
The Value of Properties
Why do we care whether a property like prop_lengthAppend holds? It does,
after all, not directly say anything about the correctness of length, (+)
or (++). For instance, given the obviously wrong definitions
_ + _ = 0
_ ++ _ = []
the property would still hold. The crucial observation is that in practice code is seldom wrong in ways that happen to not violate any properties. Therefore observing that a number of non-trivial properties involving some function are true is a good proxy for correctness of the function.
But if properties are merely good proxies for correctness, why is that better
than just testing correctness directly? The reason is that many properties are
like the ones we have seen so far: they can be expressed as some boolean
condition with variables that should hold for all choices of those variables.
This is easy to test using QuickCheck or similar systems. Direct testing is
harder to automate. That would require producing many test cases like [] ++ [] == [] and [1, 2] ++ [3, 4] == [1, 2, 3, 4] and so on, which is manual (and
error-prone) work.
The Testable Type Class
The quickCheck function in fact works on any type which is an instance of
Testable. The primary instances and their semantics for testing are worth
going over:
()is testable and succeeds if it returns()(the only possible value of the type()) and fails if an exception occurs.Boolis testable whereTruemeans success andFalseor an exception means failure.Maybe ais testable ifais.Nothingmeans that the test should be discarded and counted neither as a success nor as a failure.Just resulthas the meaning ofresult.a -> bis testable ifaisArbitrary(meaning that we have a way of generating values of that type; see the next section) andbisTestable. The semantics is thatf :: a -> bsucceeds iff x :: bsucceeds for allx :: a. In practice this is tested by generating random values ofa. Note that this instance applies recursively so e.g.Integer -> Integer -> BoolisTestablebecauseInteger -> BoolisTestable; andInteger -> BoolisTestablebecauseBoolisTestable.
This works great. We can write down properties using familiar Haskell types
without even depending on the QuickCheck library. However, what if we want to
collect all our properties into a list for test organisation purposes? If we
have p :: Integer -> Bool and q :: String -> Bool then [p, q] is not
well-typed. The Testable type class has a method property :: a -> Property
which converts any Testable value into a Property. Think of Property as a
Testable value of unknown type. A list of properties should be of type
[Property] and constructed like [property p, property q].
Arbitrary
So far we have relied on QuickCheck automatically coming up with values for
our properties. The mechanism behind this is the Arbitrary type class alluded
to above, which defines a method arbitrary :: Gen a that is supposed to
define the "canonical" generator for the type. Hence, we first need to
understand generators.
Generator Basics
As a first approximation the type Gen a represents a probability distribution
over elements of a. It can also be thought of as a computation that can
depend on random choices. Generators are defined using a combination of
primitives and monad operations.
The simplest generator is pure x which produces the value x with
probability 1. Given a list of generators gs :: [Gen a] the generator oneof gs chooses one of the generators with equal probability. For instance oneof [pure "heads", pure "tails"] produces "heads" and "tails" with equal
probability, but oneof [pure "heads", pure "tails", pure "tails"] is biased
in favour of "tails".
It is also possible to explicitly control the bias using frequency, which is
like oneof but allows specifying the weight of each option. The biased
example using oneof would written more idiomatically as frequency [(1, pure "heads"), (2, pure "tails")].
QuickCheck has a function called sample which takes a generator and prints 10
example values. This is quite useful to get a rough sense of what a generator
produces and is often sufficient to spot simple biases like in the previous
example.
Recursive Generators
QuickCheck has a combinator called listOf which generates [a] given a
generator for a. Let us generate a list of integers using the standard
integer generator given by its Arbitrary instance. An example output is:
> sample $ listOf (arbitrary :: Gen Integer)
[]
[-1]
[2,1]
[-6,1,-6,-1]
[6]
[7,-10,8,2,-7,0,-7]
[-5,-4,-6,-5,-3,-2,-5,6,-7,-5,-6]
[]
[-11,2,7,-16,-11,11,-14,5,-12,13,12]
[18,-13,17,-9,-16]
[20,8,-12,-4,16,8,7,4,-20,-1,-6,8,6,16,14,-8,14,-6,-1]
Note that there is a decent spread both in the length of the list and the
individual integer values. How would we go about implementing a combinator like
listOf? A first attempt might be:
list1 :: Gen a -> Gen [a]
list1 g = oneof [pure [], (:) <$> g <*> list1 g]
This is a choice between an empty list and a list consisting of an element
generated by g followed by a recursively generated list. Each choice has
equal chance, so 50% of lists will be empty, another 25% will have just a
single element and so on.
Alas, the distribution leaves something to be desired:
> sample $ list1 (arbitrary :: Gen Integer)
[]
[]
[0,1]
[4]
[-7,-4]
[]
[5,-8]
[1]
[]
[]
[8]
Every other sample is an empty list and long lists are exceedingly unlikely,
which makes this generator inefficient for exploring the search space. A second
attempt might be to use frequency to introduce a bias towards longer lists:
list2 :: Gen a -> Gen [a]
list2 g = frequency [(1, pure []), (9, (:) <$> g <*> list1 g)]
The resulting distribution is better, but still heavily favours short lists with an occasional longer list.
> sample $ list2 (arbitrary :: Gen Integer)
[0]
[2,0]
[-2]
[1,5,4]
[0,6,-4]
[5,-2,-6,-3,-5]
[-4]
[-2]
[-11]
[-11]
[16]
We could try adjusting the bias, but no matter what value we use the length of the list will follow an exponential distribution, which is not really what we want.
For our third attempt, we exploit the fact that Gen is a monad. First
generate a non-negative integer n, and then generate a list of length n:
list3 :: Gen a -> Gen [a]
list3 g = abs <$> (arbitrary :: Gen Int) >>= go
where
go 0 = pure []
go n = (:) <$> g <*> go (n - 1)
Now the distribution is similar to QuickCheck's listOf.
> sample $ list3 (arbitrary :: Gen Integer)
[]
[-2,-1]
[]
[-5,0,6,-2]
[-8,-1,-6,5,5,8,4,8]
[2,1,-8,-2,7,-6,-7,-5,-8,-10]
[2,4,2,4,5,-12,5,-5,12,-4,-2,-1]
[2,11,-10,6,-3,3,-8,11,9,9,7]
[6,13,9,6,3,3,6,13,-10,-4,2,-16,-9,-16,0]
[6,-10,12,-5,-12,14]
[-14,0]
Size Dependent Generators
When testing a property it is often a good idea to start with small values and
then gradually increase the complexity of the test cases. QuickCheck uses this
approach by giving generators access to a size parameter (similar to a
Reader monad) which under default settings starts at 0 and increases by 1
every test up to a maximum of 99.
The size can be accessed directly using getSize :: Gen Int but usually a
neater approach is to use the combinator sized :: (Int -> Gen a) -> Gen which
turns a size-dependent generator into an ordinary one. A good generator
respects the size parameter, so our list generator is more idiomatically
written as:
list4 :: Gen a -> Gen [a]
list4 g = sized $ \n -> chooseInt (0, n) >>= go
where
go 0 = pure []
go n = (:) <$> g <*> go (n - 1)
This is essentially the definition of listOf in QuickCheck (where go is
known as vectorOf).
Shrinking
Suppose we define our own type of pairs with an instance of Arbitrary:
data Pair a b = Pair a b
deriving (Show)
instance (Arbitrary a, Arbitrary b) => Arbitrary (Pair a b) where
arbitrary = Pair <$> arbitrary <*> arbitrary
We can now define a version of commutativity for (++) that takes the input as
a Pair:
prop_appendCommutative' :: Pair [Integer] [Integer] -> Bool
prop_appendCommutative' (Pair xs ys) = xs ++ ys == ys ++ xs
QuickCheck still finds a counterexample. A possible output is:
> quickCheck prop_appendCommutative'
*** Failed! Falsified (after 5 tests):
Pair [-2,1,3,-2] [-4,4,-1,2]
However, the counterexample is not as simple as the counterexample to our
original property. Running quickCheck multiple times will reveal that
prop_appendCommutative consistently produces small counterexamples while
prop_appendCommutative' produces counterexamples of various sizes.
The secret ingredient is shrinking. The Arbitrary type class also defines a
member shrink :: a -> [a] which takes a value and produces a list of
shrinks, i.e. "slightly smaller" values. The idea is that if x is a
counterexample to some property then any of the elements in the list shrink x
could also be counterexamples.
When QuickCheck finds a counterexample x it tests the property for each
shrink of x. If that result in another, by definition simpler, counterexample
the process repeats recursively until the (locally) simplest counterexample is
reached.
For a value Pair x y a natural notion of "slighty smaller" is a pair which is
slightly smaller in either the first component or the second component. The
complete Arbitrary instance is thus:
instance (Arbitrary a, Arbitrary b) => Arbitrary (Pair a b) where
arbitrary = Pair <$> arbitrary <*> arbitrary
shrink (Pair x y) = [Pair x' y | x' <- shrink x] ++ [Pair x y' | y' <- shrink y]
Why not shrink both components simultaneously? Well, suppose x' is slighty
smaller than x and y' is slightly smaller than y. Then Pair x' y' is
slightly smaller than Pair x y' (or Pair x' y) which in turn is slightly
smaller than Pair x y so assuming that either Pair x y' or Pair x' y is
also a counterexample the process would reach Pair x' y' in two steps. In
general, there is a trade-off between efficiency (i.e. not producing too many
shrinks) and likelihood of finding the very simplest counterexample.
Tasty QuickCheck
The Tasty testing framework has support for QuickCheck via the package
tasty-quickcheck. For example, testProperties "properties" props is a
simple test tree, given props :: [(String, Property)].
This test tree exposes options for Tasty on the command line. For example, to control the number of tests we can run
$ cabal test --test-option --quickcheck-tests=10000
Testing stateful systems
The examples above assume that we are testing a pure function where properties involve a relationship between the inputs and outputs of the function. Regrettably however, many computer systems are stateful - they encapsulate some internal data, which is modified in response to commands, and divulges information in response to requests. A network service a clean example of a stateful system, but even a conventional mutable data structure, such as a resizable array, fits the definition.
Large stateful systems are often complicated and error-prone, and hence it is very desirable to be able to test them effectively. It turns out that property-based testing is also an effective approach for such systems (and in industry, this may even be the most impactful use case), although it requires us to build some additional infrastructure.
The basic idea is that we take the so-called software under test (SUT) and construct a model that captures the most important properties of the SUT. In many cases a stateful system has implementation details that are important to its operation (such as caching, optimisations, persistency in a database, integration with other systems, etc), but which are not part of its external interface. A model is a program that imitates some subset of the behaviour of the SUT, and is typically much simpler than the SUT. We then randomly generate commands that interact with the SUT and the model, and test that the observable behaviour is the same. Essentially, the model is an executable specification that we compare against the SUT.
One very important detail is that there is no requirement that the SUT is implemented in the same language as the property-based testing framework (e.g., Haskell in our case). Indeed, it is a common approach to construct a model in Haskell and use QuickCheck to verify that some other network-based system matches the behaviour of the model. This is particularly useful when the choice of technology for the SUT is constrained due to efficiency or integration concerns.
A sample stateful system
The SUT we will test in the following does happen to be implemented in Haskell
for simplicity. Specifically, we will define a datatype DynamicArray a for
mutable arrays of elements of type a with efficient support for appending
elements. Since the array is mutable, it will live in IO. The implementation
details of the array are not important, and so will be covered somewhat briefly.
First we will need to import some library functions.
import Control.Monad (forM_)
import Data.Array.IO (IOArray)
import Data.Array.MArray (newArray_, readArray, writeArray)
import Data.IORef (IORef, newIORef, readIORef, writeIORef)
Our representation of DynamicArray will be quite similar to what you may have
seen in systems programming classes, or implemented yourself. The idea is to
have an underlying array with room for more elements than have actually been
inserted yet, with the inserted elements all being at the front of the array,
such that there is room to grow at the end. We call the size of this array the
capacity. When the number of elements inserted by the user exceeds the
capacity, then we bump the capacity by some factor, allocate a new array of that
size, then copy the old elements to the new array. If we always double the
capacity, then it can be shown that appending an element can be done in constant
amortised time.
data DynamicArray a
= DynamicArray
{ -- Number of elements inserted.
daUsed :: IORef Int,
-- Capacity.
daCap :: IORef Int,
-- Underlying array.
daArr :: IORef (IOArray Int a)
}
Because we need to modify both the element count and the capacity, we put them
in a mutable IORef. The underlying array is an IOArray Int a, which is a
mutable (but non-resizable) provided by Haskell. The Int type argument is the
index type, which can be used to represent multidimensional arrays, but we will
not make use of this.
The interface to IOArray is through various class-polymorphic functions
defined by Data.Array.MArray. For simplicity of exposition, the types in the
following have been monomorphised to be specific to IOArray.
An IOArray is produced by the function newArray_ of the following type:
newArray_ :: (Int,Int) -> IO (IOArray Int a)
The (Int,Int) pair is the smallest and largest valid index (these arrays do
not have to start at zero, although ours do). Initially, all elements of the
array will be undefined. Reading an undefined element will cause an IO
exception.
We arbitrarily decide that the initial capacity of our dynamic arrays will be 10.
newDynamicArray :: IO (DynamicArray a)
newDynamicArray = do
let capacity = 10
arr <- newArray_ (0, capacity - 1)
used_ref <- newIORef 0
capacity_ref <- newIORef capacity
arr_ref <- newIORef arr
pure $
DynamicArray
{ daUsed = used_ref,
daCap = capacity_ref,
daArr = arr_ref
}
When indexing an array we check whether the index is in-bounds, returning
Nothing if not.
index :: Int -> DynamicArray a -> IO (Maybe a)
index i (DynamicArray used_ref _cap_ref arr_ref) = do
used <- readIORef used_ref
if i >= 0 && i < used
then do
arr <- readIORef arr_ref
Just <$> readArray arr i
else pure Nothing
Inserting always succeeds and produces no result beyond modifying the array, but we have to resize the underlying array if the new element causes the capacity to be exceeded.
insert :: a -> DynamicArray a -> IO ()
insert x (DynamicArray used_ref cap_ref arr_ref) = do
used <- readIORef used_ref
cap <- readIORef cap_ref
arr <- readIORef arr_ref
if used < cap
then do
writeArray arr used x
writeIORef used_ref $ used + 1
else do
let cap' = cap * 2
arr' <- newArray_ (0, cap')
forM_ [0 .. used - 1] $ \i ->
writeArray arr' i =<< readArray arr i
writeArray arr' used x
writeIORef arr_ref arr'
writeIORef used_ref $ used + 1
writeIORef cap_ref cap'
Overwriting an existing element is quite similar to indexing. We return
Nothing on out-of-bounds, and Just () on success.
write :: Int -> a -> DynamicArray a -> IO (Maybe ())
write i x (DynamicArray used_ref _cap_ref arr_ref) = do
used <- readIORef used_ref
if i >= 0 && i < used
then do
arr <- readIORef arr_ref
writeArray arr i x
pure $ Just ()
else pure Nothing
Our final operation allows the deletion of an element anywhere in the array.
This is a somewhat costly operation, as we have to shift all elements after the
deleted one left. Further, to avoid using too much memory, if the capacity is
too large after the deletion, we shrink the array. back down to a smaller size.
As before, we return Nothing in case the index is out of bounds.
delete :: Int -> DynamicArray a -> IO (Maybe ())
delete i (DynamicArray used_ref cap_ref arr_ref) = do
used <- readIORef used_ref
if i >= 0 && i < used
then do
cap <- readIORef cap_ref
arr <- readIORef arr_ref
forM_ [i + 1 .. used - 1] $ \j ->
writeArray arr (j - 1) =<< readArray arr j
writeIORef used_ref $ used - 1
if used < cap `div` 2
then do
let cap' = cap `div` 2
arr' <- newArray_ (0, cap')
forM_ [0 .. used - 1] $ \j ->
writeArray arr' j =<< readArray arr j
writeIORef used_ref $ used - 1
writeIORef arr_ref arr'
writeIORef cap_ref cap'
else pure ()
pure $ Just ()
else pure Nothing
Defining a model
There is a lot of intricate index-fiddling code in the above. Can we really be
sure it is correct? At a semantic level, what DynArray is doing is not
complicated - it is merely maintaining a sequence of elements. The complexity
comes from our desire for certain performance characteristics. Let us define a
model that captures the central behaviour of DynArray, but without caring
about operational matters such as performance.
data Model a = Model [a]
deriving (Show)
initModel :: Model a
initModel = Model []
A model is described by its internal state and by which commands can be sent to it, and which responses may be produced in return. These commands usually resemble the API of the SUT, but perhaps simplified in various ways. Due to the simplicity of the system we are testing here, the commands are however fairly simple.
data Command a
= Insert a
| Index Int
| Write Int a
| Delete Int
deriving (Eq, Show)
data Response a = Success | Failure | Elem a
deriving (Eq, Show)
We then define functions corresponding to each of the commands. Each function returns a new model state, as well as a response. For example, the function corresponding to an insertion is defined as follows.
cmdInsert :: a -> Model a -> (Model a, Response a)
cmdInsert x (Model xs) = (Model $ xs ++ [x], Success)
We define functions for the remaining commands, and finally a step function
for simulating the effect of running a certain Command on a Model:
cmdIndex :: Int -> Model a -> (Model a, Response a)
cmdIndex i (Model xs) =
if i >= 0 && i < length xs
then (Model xs, Elem (xs !! i))
else (Model xs, Failure)
cmdWrite :: Int -> a -> Model a -> (Model a, Response a)
cmdWrite i x (Model xs) =
if i >= 0 && i < length xs
then (Model $ take i xs ++ [x] ++ drop (i + 1) xs, Success)
else (Model xs, Failure)
cmdDelete :: Int -> Model a -> (Model a, Response a)
cmdDelete i (Model xs) =
if i >= 0 && i < length xs
then (Model $ take i xs ++ drop (i + 1) xs, Success)
else (Model xs, Failure)
step :: Model a -> Command a -> (Model a, Response a)
step m (Insert x) = cmdInsert x m
step m (Index i) = cmdIndex i m
step m (Write i x) = cmdWrite i x m
step m (Delete i) = cmdDelete i m
Similarly, we also define a function for executing a Command on our SUT. This
involves mapping the SUT API to the Command/Response types, which in our
case is quite simple, but can sometimes a bit laborious.
exec :: DynamicArray a -> Command a -> IO (Response a)
exec a (Insert x) = do
insert x a
pure Success
exec a (Index i) = do
r <- index i a
case r of
Just x -> pure $ Elem x
Nothing -> pure Failure
exec a (Write i x) = do
r <- write i x a
case r of
Just () -> pure Success
Nothing -> pure Failure
exec a (Delete i) = do
r <- delete i a
case r of
Just () -> pure Success
Nothing -> pure Failure
Specifying a property
We denote a sequence of commands as a program. The testing problem is now to ensure that the SUT and the model behave identically for all programs.
newtype Program a = Program [Command a]
deriving (Show)
First of course, we have to define appropriate Arbitrary instances. We are
going to define these in quite a simple way - the previous remarks about good
generators still apply, but it turns out we can explore a lot of the test space
even without putting any particular thought into it.
instance (Arbitrary a) => Arbitrary (Command a) where
arbitrary =
oneof
[ Insert <$> arbitrary,
Index <$> arbitrary,
Write <$> arbitrary <*> arbitrary,
Delete <$> arbitrary
]
instance (Arbitrary a) => Arbitrary (Program a) where
arbitrary = Program <$> listOf arbitrary
shrink (Program l) = map Program (shrink l)
Let us look at what programs end up being generated.
> sample (arbitrary :: Gen (Program Int))
Program []
Program [Insert (-2),Insert (-2)]
Program []
Program [Index 6,Insert 1]
Program [Delete 0]
Program []
Program [Write 4 4,Write (-2) (-12),Index 9,Write (-8) 12,Insert (-5),Delete 5,Index 6]
Program [Write 9 (-12),Index (-7),Delete 10,Write (-2) (-5),Insert (-6),Write 13 13,Insert 12]
Program []
Program [Write (-18) 12,Insert (-8),Insert 9,Insert (-8),Insert (-10),Delete (-6),Index (-12),Index (-4),Delete (-6),Index (-16),Index (-10),Index (-14),Delete 17]
Program [Write 15 13,Delete (-11),Insert (-20)]
It is clear that once again, we generate a good number of empty programs, but we
also generate a good bit of invalid ones. Look at how many of them perform
Index or Delete commands without first inserting some elements. It is good
and necessary to test these cases, but perhaps they are generated too often.
Generating good sequences of commands for complicated stateful systems is an
interesting topic in itself, but the above is sufficient for our needs.
Next we generate a procedure for executing a Program a on both a DynamicArray a and a Model a, producing a boolean that indicates whether the same
responses were observed. Recall that we cannot directly inspect the internal
state of a DynamicArray, but we can chcek its observable behaviour.
runProgram :: (Eq a) => DynamicArray a -> Model a -> Program a -> IO Bool
runProgram c0 m0 (Program cmds0) = go c0 m0 cmds0
where
go _c _m [] = pure True
go c m (cmd : cmds) = do
sut_resp <- exec c cmd
let (m', model_resp) = step m cmd
if sut_resp == model_resp
then go c m' cmds
else pure False
Finally, we just need to construct a property that uses runProgram. The
crucial building block is the function ioProperty, which allows us to turn an
arbitrary IO operation into a Property:
ioProperty :: Testable prop => IO prop -> Property
We must of course be very careful when using this function, as it is easy to use
it to construct a property that behaves nondeterministically, which can make it
difficult for QuickCheck to shrink test cases properly. In our case we are
merely using it to construct the initial dynamic array and to execute
runProgram:
prop_array :: Program Int -> Property
prop_array prog = ioProperty $ do
c <- newDynamicArray
runProgram c initModel prog
Finally we can verify the property.
> quickCheck prop_array
+++ OK, passed 100 tests.
Of course, to demonstrate that things really work, let us introduce a bug. For
example, let us remove the line writeArray arr' used x from the else-branch
in the definition of insert and rerun the tests:
> quickCheck prop_array
*** Failed! (after 56 tests and 13 shrinks):
Exception:
MArray: undefined array element
CallStack (from HasCallStack):
error, called at libraries/array/Data/Array/Base.hs:914:16 in array-0.5.8.0-74ab:Data.Array.Base
Program [Insert 21,Insert (-33),Insert (-55),Insert 46,Insert (-48),Insert 47,Insert 37,Insert 19,Insert (-35),Insert (-37),Insert 55,Delete 0,Index 9]
We are not told exactly what is wrong (this is an exception thrown from the underlying array library), but we are given a sequence of commands that reliably lead to the problem, and which we can use to debug the problem.
It is important to realise that although this particular test is of a piece of
code that happens to be implemented Haskell, same as the model, there is no real
requirement for it. The idea just requires that there is some handle
representing the SUT (in our case, DynamicArray) and some operation (which may
be in IO) that can perform state changes based on Commands. There is nothing
that fundamentally prevents us from testing a remote network service with this
approach, the control system for a robot, or for that matter a C library invoked
through a foreign-function interface. We might in those cases need to write more
complicated exec functions, of course.
Finally, you may rightly object that the amount of infrastructure needed to
perform testing of this kind is rather large - in fact, our test harness is more
lines of code than all of the DynamicArray implementation. This is a fair
objection, but real systems will tend to grow in complexity much faster than
their models, so this is a technique that scales well as the SUT grows
complicated, although it is often impractical for testing the simplest systems.