Introduction to properties-driven development

Kimmo Sääskilahti

Data Science Engineer

27th Apr 2020

Properties-driven development means the application of property-based testing to test-driven development. Test-driven development asks us to constantly write tests to ensure our code is easily testable and usable. Property-based testing asks us to write test case generators instead of hard-coded example inputs and outputs and ensure given properties hold for our code.

Thinking in terms of properties forces us to be very explicit about what our code can and cannot do. We're effectively adopting a design by contract approach, which helps us understand the problem we're trying to solve before diving into coding.

In this article, we'll learn what properties-driven development looks like. We'll also apply these principles to develop a module for a sorted dictionary.

What's in this tutorial#

⚠️ Prerequisites:

  • A general understanding of software testing.
  • (Optional) Python 3+* if you want to follow along.

* This guide will use Python for code examples, but the concepts aren't limited to any specific programming language. So even if you don't know Python, we'd encourage you to read along anyway.

💻 References:

This GitHub repository contains all the featured code examples as tests. The repository also contains instructions for how to execute them.

What is properties-driven development?#

As mentioned above, properties-driven development is test-driven development with property-based tests. Test-driven development asks us to think what our code should do and put that into a test. In the context of this article, it's not that important whether we write our tests before, after, or during the implementation. What's important is that tests guide the development. Property-based testing asks us to formulate those tests as properties.

For example, let's say we're writing code for converting a CSV into a JSON array. Instead of jumping into writing a CSV parser, test-driven development asks us to first come up with test cases.

Here's an example CSV input:

This should be turned into the following JSON:

This example would make a nice unit test for our code! However, the test would have quite a few assumptions baked in:

  • Keys are distinct.
  • Values are integers.
  • There are no missing values.

Even if we wanted to be thorough and write unit tests to cover each of those assumptions... our imaginations are limited. For example, we already forgot to mention the assumptions that the list of keys is non-empty and that there's at least one row in the CSV.

Property-based tests are excellent for forcing us to be explicit about our assumptions. To come up with properties for our CSV-to-JSON parser, we would need to first generate CSVs we expect our parser to be able to handle.

Here's the pseudocode for this type of generator:

  1. Generate keys: A list of arbitrary strings.
  2. Generate the number of rows: A non-negative integer.
  3. Generate rows: For each row and key, generate an arbitrary string value.

Can you see how many tricky cases our generator forces us to cover? The list of keys may be empty or the number of rows may be zero. We also assume our code works with arbitrary strings (including empty strings and strings containing commas). And that those strings can work as both keys and values. Creating a generator ensures that our CSV parser is pushed to its limits.

But this generator might turn out to be too demanding for our use case. If that happens, we can relax the generator to produce friendlier data. For example, we may find that our code crashes with empty input. If we know that empty inputs are filtered out elsewhere in the code, we can change the generator to produce only non-empty inputs and add an assertion in our code that the input is expected be non-empty. Instead of programming by coincidence, we've consciously made the decision what our code is expected to handle.

The generator we created is an example of a bottom-up approach to data generation. Instead of generating random CSVs directly, we generate them step by step and keep track of what goes in. So when the time comes to write an assertion that our code did the right thing, we know what the expected result is. This avoids the problem where we need to duplicate the possibly complex implementation in tests. For example, with our generator, we know the length of the resulting JSON array should be equal to the number of rows generated at step 2. That's a good property! If you want to dive deeper into how to apply property-based testing to CSV parsing, take a look at this chapter from the PropEr testing book.

With that, we should have an understanding of what properties-driven development is. Next, let's apply these principles in an example project.

Example project: Sorted dictionary#

As an example project, we'll build a custom dictionary in Python. We'll call the data structure SortedDict and expect it to always keep its keys in a particular order. This type of sorted dictionary might be useful, for example, when you want to keep track of users sorted by birth date. By keeping keys sorted, we're able to traverse the sorted list of key-value pairs in linear time.

For our project, we'll keep the keys sorted by storing the key-value pairs in a binary search tree.

Implementing a sorted dictionary is a good example for properties-driven development for a few reasons. First, while property-based testing is most common in functional programming where data structures are immutable, the example shows it can be equally as useful for testing mutable objects. Second, while the implementation of the binary tree is straightforward, it's also complex enough to deserve good tests. Especially the deletion logic can be error-prone and hide complex bugs.

To implement the binary search tree, we'll resort to the reference algorithms from Introduction to Algorithms textbook.

For the sake of this article, we'll assume the keys are integers and that the keys themselves are used for comparison (instead of providing a custom callable per value like SortedDict in sortedcontainers does).

What should it do?#

To come up with properties, we first need to come up with the requirements for our SortedDict. One way to come up with requirements is to play around with the API that we expect our module to have.

First, we expect that we can search for an inserted key:

We also expect that keys are always sorted irrespective of the insertion order. Continuing on the previous example:

Like with a standard dictionary in Python, we expect re-inserting an existing key will result in the value being overwritten:

We expect that searching for a non-existing key will raise a KeyError:

Finally, if we delete a key, we expect that searching for it will also raise a KeyError:

Listing requirements#

Now that we have an idea of what we expect our code to do, we can try to generalize those previous examples into requirements:

  1. Key-value pairs added to our sorted dictionary can be searched.
  2. Adding a key that already exists overwrites the existing one.
  3. Keys are always sorted.
  4. Searching for a non-existing key raises a KeyError.
  5. Deleting a key and then searching for it raises a KeyError.

These requirements will serve as the basis for our properties.

Next, we'll write a property for the first requirement: Being able to insert key-value pairs into the dictionary and then search for them.

Implementing insert and search#

Before writing the test case generators, we first add the basic skeleton for SortedDict:

The __setitem__ method is called when sorted_dict[key] = item is invoked. Similarly, __getitem__ defines the behaviour for sorted_dict[key].

This skeleton doesn't implement setting or getting the keys yet. So once we're done writing the test, we can run it to make sure it fails. This gives us confidence our test works as expected.


We'll use the Hypothesis library for writing properties for our sorted dictionary. Hypothesis supports generating almost any kind of data you can imagine. It also provides decorators for writing property-based tests.

To get started with writing our first property-based test, we need a generator of key-value tuples. The entry point to generators in Hypothesis is the hypothesis.strategies module, which we alias as some below because it reads nicely. Strategies in Hypothesis are essentially "clever data generators" that you can compose to generate very complex data.

Here's how to compose generators in Hypothesis to produce a list of key-value tuples:

The function first defines a generator some_keys for keys, which we assume are integers. For values, we assume any binary is valid. We then define some_kvs, a generator of key-value tuples using some_keys and some_values as generators of keys and values, respectively. Finally, we return the generator some.lists(some_kvs), which generates lists of tuples.

To see what kind of data the generator creates, we can call some_key_value_tuples().example():

Now that we have a generator for lists of key-value tuples, we want to build a generator generating instances of SortedDict containing those tuples. With Hypothesis, we can use hypothesis.strategies.composite for this purpose as follows:

The @some.composite decorator injects the draw function into the decorated function definition. The draw function can be used to sample values from another generator. Above, we sample a list of key-value tuples from the some_key_value_tuples() generator and then insert those key-value pairs into our SortedDict. Now some_sorted_dicts() is a data generator generating instances of SortedDict.

However, there's a problem: when we get an instance of SortedDict from the generator, we don't know anymore what data went into creating the sorted dictionary. We therefore also return the key-value pairs from the generator, with the pairs stored in a standard Python dictionary:

The variable expected contains the key-value pairs we expect to find from our sorted dictionary. The key-value pairs it contains act as a "model" for what our sorted dictionary should contain. Using a standard dictionary as the model is useful, because it handles duplicates in the same way we expect SortedDict to handle, i.e., by overwrite.


With the test case generator some_sorted_dicts defined, we're ready to state our first property: Any keys inserted into the dictionary can be searched.

The test case is marked as a Hypothesis test with the @given function decorator. When this test is run, Hypothesis will generate 100 different sorted dictionaries and verify that all of the key-value pairs expected in the dictionary are found.

If we run the tests now, they should fail: we haven't implemented __setitem__ and __getitem__ yet. For those, we need the binary search tree.

Binary search tree#

For the binary search tree, we'll use the implementation from the Introduction to Algorithms book.

Tree is defined as a dataclass as follows:

The @dataclass decorator adds, for example, an __init__ method for creating a tree with an optional root argument. If the tree is empty, root is equal to None. Otherwise, it contains a Node defined like this:

A Node has a key and a value. It also contains references to its left and right children as well as its parent.

Inserting a key and a value to the tree happens as follows:

To search from the tree, we need to go down the tree until a match is found. Or, if no match is found, raise a KeyError:

Our search involves a helper function _search_node that searches a Node by key. We'll need this when implementing deletion later.

Putting it all together#

With the tree supporting insert and search, we can use this tree in our SortedDict:

If we now run the property-based test for insertion and search, we should see it pass!

Now we have implemented the first requirement with a property-based test. Let's take a look at our remaining requirements:

  1. Key-value pairs added to our sorted dictionary can be searched.
  2. Adding a key that already exists overwrites the existing one.
  3. Keys are always sorted.
  4. Searching for a non-existing key raises a KeyError.
  5. Deleting a key and then searching for it raises a KeyError.

In this case, we'll also consider our second requirement implemented as we trust Hypothesis to have generated duplicate keys. We could work on the requirement of keeping keys sorted next, but we want to be sure the requirement also holds after deletions. So that's the next requirement we'll tackle: Deleting a key and then searching for it raises a KeyError.

Implementing deletion#

The requirement states that searching for a deleted key raises a KeyError. But how can we put that into a property? One way is to use our some_sorted_dicts() generator from earlier and let Hypothesis draw one of the keys for deletion. We then delete that key and ensure searching raises a KeyError.

To draw a key to delete, we could use composite again to create a composite strategy. This strategy would yield a sorted dictionary, the dictionary of expected contents, and a key to delete. However, in this case, it makes more sense to use the built-in data strategy from Hypothesis. This strategy comes with the draw method and we can use it to sample a value from a generator during the test.

Here's how we can use data() to sample a key to delete from the list of inserted keys:

This test first uses sampled_from to create a generator called some_key. When sampled, some_key randomly chooses one key from the list of inserted keys. We then use data.draw() to sample a key that we want to delete. The label argument is added so that Hypothesis can print a clearer error message if it finds a failing test case. Once a key is drawn, we delete the key and verify that searching for it raises a KeyError.

But there's a problem with our implementation: The list of inserted keys may be empty. We need to cover that case by ensuring the test only runs for non-empty dictionaries. We can do that by using filter with our generator:

The filter ensures that the dictionary is non-empty.

With the property ready, we can now move on to the implementation. The delete operation in our test, del sorted_dict[key], translates to sorted_dict.__delitem__(key). So we need to define __delitem__ in SortedDict:

Again, we delegate the deletion to the tree with the tree.delete function. Because the deletion is somewhat tricky but can be copied from the Introduction to Algorithms book, we refer to in the accompanying repository for its implementation.

Having tackled deletion, our list of requirements now looks as following:

  1. Key-value pairs added to our sorted dictionary can be searched.
  2. Adding a key that already exists overwrites the existing one.
  3. Keys are always sorted.
  4. Searching for a non-existing key raises a KeyError.
  5. Deleting a key and then searching for it raises a KeyError

As our last example, we'll write a property-based test for the third requirement: Keys are always sorted. The property for final requirement (Searching for a non-existing key raises a KeyError) can be found in in the accompanying repository.

Ensuring keys are sorted#

A property like "Keys are always sorted" is an invariant: Whatever operations are performed, it should remain true. Assuming SortedDict has a keys() method returning the list of keys, we can write a property-based test ensuring that all the instances of SortedDict generated by some_sorted_dicts() have sorted keys:

The implementation for sorted_dict.keys() uses in-order walk to traverse the tree. The implementation can be found in and in the accompanying repository.

The previous property ensures that keys are always sorted after they are inserted. But this property should also be checked after deletion. So, we should make sure that the invariant holds in our earlier deletion property:

While this works, there's something unsatisfactory about it. The premise of property-based testing is that we can cover all kinds of unexpected cases. In test_keys_sorted and test_search_after_delete, we're hard-coding the cases where the invariant should be tested. Is there something we can do to randomize testing the invariant?

Enter stateful testing. With stateful tests, we define operations that can be run, but the framework decides the order.

With Hypothesis, we can use RuleBasedStateMachine for that. We won't go into details, but here's an example for SortedDict:

The methods defined in our RuleBasedStateMachine, decorated with @rule, define the commands the state machine runs. The decorator @invariant ensures that we check after every command if the keys are being sorted. We also keep a model of our current expected state in self.state to know what keys our SortedDict is expected to contain.

Final touch: Add doctest#

Remember how we played around with the expected API of our SortedDict to come up with the properties? For example, we assumed the following would hold:

It would be great to ensure these smaller examples also hold for our implementation. Writing properties can be complex and error-prone, so it's valuable to have these human-readable examples to serve as anchors. They ensure that no matter what happens, our code works as expected at least with small example inputs.

These tests serve as documentation, which means they make excellent doctests. We can add the tests at the top of our module in a docstring:

In pytest, we can enable doctests with the --doctest-modules flag. Add the following in your pytest.ini to always run doctests:

When pytest is run, it now also runs the examples from the documentation. And with that, we're done implementing the features we set out to implement for our SortedDict!


In this article, we learned how to apply property-based testing to test-driven development. We also used this principle to develop a sorted dictionary. Thinking in terms of properties instead of concrete examples can help us slow down and be precise about what our code is expected to do.

It's important to remember that property-based testing doesn't replace regular unit tests. Instead, it provides a new tool to our testing toolboxes. And sometimes, property-based testing isn't the right tool for the job. For an interesting article on when property-based testing shines, take a look at this article by Brujo Benavides. However, the more experienced we get at writing test case generators and property-based tests, the more natural it becomes to use them for testing almost any kind of code.

Thank you for reading! As always, we're happy to hear your feedback.

Additional resources#

Newer postOlder post

Don’t miss the next post!

Absolutely no spam. Unsubscribe anytime.


About usCareersContactT&C