Stateful property-based testing with QuickCheck State Machine

Mike Solomon


17th Apr 2020

Property-based testing is a technique where you make assertions about a system's output with respect to its input. For example, if the input to a system (a function, a server, etc) is two numbers, property-based testing could assert that the output of the system should be the sum of these numbers. This type of testing frees you from having to come up with input data. Instead, you define relationships between the system's input and output. Then the test runner verifies that the relationships hold.

Stateful property-based testing (SPBT) is another technique for when the tested system retains a state. This is the case, for example, when the system is a database or a queue or a file. If I write an entry to a database and then list all entries in the database, I would expect the entry I wrote to be part of the list. That is a stateful property of the database.

There are libraries available in several different languages for SPBT. In this article, I will use quickcheck-state-machine. I like quickcheck-state-machine for many reasons:

  1. It is written in Haskell, which means you get access to Haskell's type safety and fast performance.
  2. Its opinionated structure splits SPBT into component parts, which helped my learning process.
  3. It builds a state machine, which can be manipulated outside of the test. quickcheck-state-machine's function prettyCommands uses the state machine, for example, to make really nice logs after the test is run.
  4. Fine-grained control of generation and shrinking is possible. This allows you to do more targeted testing.
  5. Its use of the higher-kinded types (HKTs) Symbolic and Concrete. It allows you to extract commands from a state machine using the Symbolic HKT and then run it using the Concrete HKT.
  6. It can test the parallel execution of commands to find bugs arising from race conditions.

This article shows how to use quickcheck-state-machine to build a state machine and use it for SPBT. It uses version 0.7.0 of quickcheck-state-machine. The system under test will be a FIFO queue of integers that uses the file system to store entries.

As the quickcheck-state-machine library is under active development, the API is subject to change. I will do my best to revise this article as the API changes.

Model, Command, Response#

The fundamental building blocks of a state machine built with quickcheck-state-machine come in three types. One represents a model of the system, one represents the commands that can be issued to the system and one represents responses to the commands.

All three need to be polymorphic in accepting an HKT with the signature (Type -> Type), which I'll call r. This polymorphism will never be used directly. But it is used by quickcheck-state-machine internally to inject two different HKTs: Symbolic and Concrete.

The Symbolic HKT is used by quickcheck-state-machine when generating a series of commands from a state machine. In contrast, the Concrete HKT is used when the state machine is executing. In models that can be created in pure contexts like the one below, this distinction is not useful. But when models use types that only exist in monadic contexts, the distinction is important.

Let's unpack what's going on here. The Model is an array of integers that we'll use to simulate a FIFO queue. There are three Commands - you can Push an integer onto the queue, Pop something off of the queue (either nothing or an integer), and AskLength to the queue. The Responses to these three commands are confirming that a value has been Pushed, telling us the integer that has been Popped and TellLength.

It isn't necessary to have a one-to-one correspondence between commands and responses. Haskell's pattern matching will allow us to define the function for any valid command/response pair.

Defining the queue#

Here is a FIFO queue for integers that reads and writes the queue to the file system. Each integer is separated by a colon:

Initializing the model#

The first step in creating my state machine is to initialize the model. The initializer function needs to be polymorphic as it will eventually accept the Symbolic and Concrete HKTs depending on if you are in generation or execution mode. In this case, as I am using an array as the underlying model, the logical initializer is an empty array.


The next thing I need to do for my state machine is to create transitions. The transitions are used to both generate commands and execute the tests, so the function needs to remain polymorphic.

The transition function takes a model, a command, and a response. It then returns the underlying model after the command has been applied. You can think of the model as transitioning from one state to the next.

In the implementation below, I make my own FIFO queue with Pop and Push. Then AskLength will return the length of the model:


Preconditions are guards that apply to certain commands based on the current state. Top represents the precondition always being satisfied. Bot is the opposite, the precondition is never satisfied. The Logic type contains various boolean operators that can be applied to the model and command. The outcome of the operator determines if the precondition is satisfied or not.

Because the pre-condition is only used when generating lists of programs, it doesn't need to use concrete values. So it doesn't need to be polymorphic and exists only for the Symbolic HKT.

In this model, every command can be executed irrespective of the state. So I return Top:


Postconditions are where the correctness of the response is asserted. I like this API because it provides a one-stop-shop for all assertions. In other SPBT libraries, it is easy to litter assertions all over the place, which makes the code more difficult to read. In quickcheck-state-machine, the only checks for correct behavior are in the postconditions.

Postconditions only are checked when the state machine is actually running. This means they only exist in the Concrete HKT.

Note that the model passed to the postcondition function is the one before the command executes. It is often useful to apply the transition to the model when evaluating the response, as I do below:


Invariants take a model and assert that the model is always in a certain state, irrespective of the command and response. Invariants also run after every step in the state machine, which makes them expensive to run. Because of this, quickcheck-state-machine uses a Maybe to allow for no invariants to be returned.

As there is no invariant behavior I want to see in this model, I can return Nothing:


The generator is one of the places that quickcheck-state-machine really shines. You create generators using QuickCheck combinators, so any existing QuickCheck custom combinators can be repurposed for quickcheck-state-machine.

Here, I use the oneof combinator, which generates commands with a uniform distribution. Because I am in the command generation phase, the Symbolic HKT is used:


Like in QuickCheck, the shrinker takes a value and returns an array of new values to test. Most QuickCheck programs never use the shrinker directly, but here, I use it to specify what does and doesn't need to be shrunk. This allows the generation to move fast through values that have no logical relationship.

For example, below, I only apply the shrinker to numbers pushed onto the stack, as I want to test if the size of the numbers matters. In all other places, there is no shrinker used:


Semantics take a command using the Concrete HKT, which signifies that it is used only when the tests are actually executing, and returns the result of the execution in the monadic context (here IO).


The purpose of Mock is to generate dummy responses when the state machine is in command generation mode (thus the Symbolic HKT). It is the foil to Semantics, which creates Concrete responses from real commands during text execution mode. The content of the mock responses is thrown away, as all the library uses mock for is to create a Response used to effectuate a transition between states.

One nice thing about mock is that, if you want to, you can create a full-fledged mock of your model, and this can be useful if you'd like to use SPBT to generate (Comamnd Response) pairs that can be used to induce a spec of the model (ie to induce a JSON schema or an OpenAPI spec).


Cleanup, like the semantics, exists within the monad that the system is executing. It's called after each series of commands is executed.

Since I don't need any cleanup for our queue, I can write an empty function in the IO monadic context:

Building the state machine#

Now that I have all of the ingredients, I can build my state machine.

Because the system under test takes one argument, I also pass that argument to the state machine.


Now for the fun part, let's run my tests!

First, I want each test to execute in its own FIFO queue, which means a different file for each queue. I chose the pcg unique random number generator to accomplish this. This guarantees that each number generated will be unique during the run of a program.

Then, I define the property.

forAllCommands uses a state machine in its first argument to generate the commands. This is a state machine that will only run for the Symbolic HKT, not the Concrete HKT, so it won't ever touch IO. The next argument is a lower bound for the number of commands in a sequence. I use Nothing for the lower bound, meaning no lower bound.

The last argument to forAllCommands is a function that accepts a sequence of commands and returns a monadic property. Monadic properties are defined in QuickCheck.Monadic and can be used whenever a property exists in a monadic context. The convenience method monadicIO can be used to define properties in the IO context, and run lifts the result of the monadic execution to the PropertyM context. PropertyM is QuickCheck's monad transformer, and in our case we are transforming the IO monad. To learn more about monadic transformations, mtl is a great place to start.

So, the sequence below is:

  1. Create a new random number and lift it to the PropertyM monadic context.
  2. Create a file name using this number.
  3. Create a state machine using this filename.
  4. Call runCommands from quickcheck-state-machine, which already executes in the PropertyM context, so there is no need to prefix it with run.
  5. Use quickcheck-state-machine's pretty printer prettyCommands on the histogram generated by run result.

Lastly, I execute the test and create the queues directory if it doesn't exist yet:

When I run stack test from the command line, I see the following:

And voila! Our tests pass.

GitHub repo#

All of this is on the github repo 'meeshkan/quickcheck-state-machine-example'.


Stateful property based testing is a great way to find bugs in stateful systems. SPBT exists in several other frameworks as well:

I hope you find this technique useful! If you'd like your repos to benefit from automatic SPBT, I will take this opportunity to shamelessly plug the Meeshkan alpha on

Follow up exercises#

Here are three follow up exercises you can do to expand your understanding of quickcheck-state-machine and SPBT!


Let's create a bug in the queue!

In the implementation of the FIFO queue, instead of adding the number to the head via show x : split, add it to the tail using split ++ [show x]. See if it's caught.


Create a new bug where the queue stops accepting new values once there are 50 values.

You can do this in the pushToQueue function. If you run the tests as-is, you won't find it. There are three ways you can find the bug:

  • Increase the number of times QuickCheck runs
  • Change the generator so that the frequency of push is greater than the frequency of pop. For inspiration, check out QuickCheck generator combinators and see if there is one that allows certain outcomes to happen with greater frequency than others.
  • Change the lower bound in forAllCommands from Nothing to something a bit higher.

This may take a long time to run depending on your parameters because of the shrinker. The shrinker will try to find a specific range of values to produce the bug, but because the bug is not linked to the specific value, it won't be able to meaningfully shrink.

For example, here is an excerpt of console output that would happen if you are able to provoke the bug. Here, I see that the postcondition for AskLength failed at the barrier of 50 results in the queue.


In the implementation of the queue, we use the funtion withFile for all file-based IO. Haskell also has the functions writeFile and readFile. Try using these instead and you'll hit a nasty bug!

Can you anticipate what the bug will be? Once you run into the bug, was your guess right? How does this bug show the difference between withFile vs writeFile and readFile?

Newer postOlder post

Don’t miss the next post!

Absolutely no spam. Unsubscribe anytime.


About usCareersContactT&C