The Three Pillars of Trustworthy Test Automation

John Ferguson Smart | Mentor | Author | Speaker - Author of 'BDD in Action'.
Helping teams deliver more valuable software sooner28th January 2016

An article by John Ferguson Smart, Antony Marcano and Andy Palmer

Few will deny the importance of Automated Acceptance Testing in modern
software development. A high quality set of automated acceptance tests helps you deliver valuable features sooner by reducing the wasted time spent in manual testing and fixing bugs. When combined with Behaviour Driven Development, Automated Acceptance Testing also guides and validates development effort, helping teams focus both on building the features that matter and ensuring that they work.

But though Test Automation is widely recognised as a Good Thing, many teams struggle with implementing effective automated test suites for projects of any size. They find that the tests are slow to run, and that each test takes longer and longer to write. They end up spending way too much time trouble-shooting sporadic and hard-to-reproduce test failures.
In this article, we look at common root causes of problems like these, and at what we can do to avoid them. We take a look at the key characteristics of a good automated test suite, and how they can help you spot where yours might have room for improvement.

What do we want from a good automated test suite?

A lot of things go into making a test suite worth it’s weight in development time, but most of them really boil down to one thing: a good automated test suite is above all trustworthy. You can believe what the tests tell you about the state of the application, and act on that information to make decisions. If a test fails, then something in the application is broken, and the application is not ready to release. Conversely, if all of the tests pass, then you can say with a reasonable degree of confidence that the application is good to go.

But what makes for a trustworthy test suite?

Trustworthy test suites give you confidence

Functional coverage is a key part of trustworthiness. You cannot confidently say that an application is ready to be deployed if the automated tests only check half of the features. A trustworthy automated test suite should tell you much more than simply what tests have been executed. It should also provide information about:

  • what features have been tested
  • how comprehensively they were tested
  • what features were not tested

But how comprehensively does each feature need to be tested, for the application to be deemed deployable? A more interesting question is perhaps, what is the least amount of automated testing you can get away with, without compromising your level of confidence in the application?

The answer to this question depends on your team culture. The number of tests you need to write to give you enough confidence will vary depending on how closely your team works together. For example, teams that practice BDD and TDD will find less need for comprehensive, detailed automated acceptance tests than teams that are less mature in these area.

Not all tests provide the same level of confidence. For example, if a choice must be made, experienced teams prioritise automated tests that check overall business flow and illustrate how the application delivers value over tests that check more precise, more detailed aspects of the application.
Another key aspect of trustworthiness is how well the test suite tells you about things that go wrong.

When red means "Broken"

Arguably, the fundamental role of modern automated tests is not to test the application so much as to describe the expected behaviour of the application, and demonstrate that the application conforms to this expected behaviour.

When a test fails, it can mean one of three things:

  • If the behaviour illustrated by the test is still correct, then the application no longer behaves in the expected way, and a regression has occurred.
  • If the behaviour illustrated by the test is no longer valid, than the test is no longer correct, and needs to be fixed or replaced.
  • If the behaviour illustrated by the test is still correct, but the application has implemented this behaviour in a different way, then the test needs to be updated (and was probably not written at the right level of abstraction)

In teams where developers are closely involved in the test automation process, they typically run the automated test suite on their development machines before committing changes, or at least monitoring the results of the test runs on a build server, to ensure that their changes have not broken any existing functionality or broken any test implementations. They help keep the tests trim and up to date and the application code testable. They help ensure that any failures occurring on the build server are genuine application failures. In other words, they help to keep the tests trustworthy.

When automated testing is seen as a separate activity, things start to get more complicated.

When red means “Might be broken”

Unfortunately, not all real world projects work like this. Especially when test automation is considered a separate activity, or a "service" provided by a separate team, it is up to the test automation team to understand what a test failure means. Depending on the quality of the automated tests and on the nature of the application, this can represent a considerable drain on the test automation efforts. Time spent diagnosing failing tests is time not spent on writing new ones.

A high quality test automation suite needs to cater for these situations too, and streamline the triage process as much as possible. A trustworthy test suite should make a clear distinction between application regressions (where the application is not behaving as expected), broken tests (for example, a test might fail because the web page structure has changed, and the test can no longer find the information it needs to verify the state of the application), and environment or infrastructure-related failures (for example, if an external service used by the application is unavailable). When a test fails for whatever reason, a trustworthy test suite helps identify the nature and the cause of the problem. For example, different error types should appear differently in the test reports, and error messages should be informative and expressed in domain terms, not in technical ones.

So what makes a trustworthy test?

Trustworthiness is the cornerstone of a good automated test suite, but trustworthiness does not come for free. Trustworthy tests are:

  • Reliable
  • Responsive, and
  • Scaleable

Let’s look at these three areas.

Trustworthy test suites are Reliable

Anyone who has done any work in test automation will be familiar with the concept of a "flaky" test, a test that sporadically fails in a way that is difficult to reproduce and diagnose. These tests are a bane on a automated tester's life: they waste valuable developer time, slow down feedback, and erode confidence in the test suite.

Trustworthy test suites avoid flaky tests at all cost. Tests that fail for reasons other than an application regression or a change behaviour should never be considered acceptable or normal. On the contrary, the team should aggressively hunt down and address the causes of test "flakiness".

A good automated test should help you identify and troubleshoot sporadic test failures. Trustworthy tests provide precise error messages and diagnostics, catering not only for expected outcomes but also for common application errors. For example, if submitting a request can result in a dialog with a technical error message, make sure this message gets into your test report.

Trustworthy test suites are Responsive

Good automated test suites are responsive. The test suite runs quickly and is run often, allowing regressions to be detected early.

Test suites can be slow for many reasons. The more layers of the application exercised by a test the slower it will be. For example, a test through a Web UI is naturally slower than a test that works directly with an API, yet it provides stronger evidence that the application as a whole behaves as expected.

Experienced teams try to limit the use of web tests to illustrating UI behaviour and overall application flow. Striking a balance between high confidence and faster feedback can be achieved by implementing just enough testing through the UI and supplementing that with more extensive testing at the lower levels.
The application itself may also be slow – tests can only run as fast as the application itself. Generally, slow test performance is more often a symptom of slow application performance. Some teams try to work around this by investing huge amounts of time in parallelising and optimising tests. While these endeavours are valuable, it is wise to consider whether greater value can be obtained by first investing in the performance of the application.

The application may not be test friendly. For example, can you set up the test environment to a given state before performing a test using an API call, or do you have to step through the UI? Does the application provide a convenient API for tests to query the state of domain objects, or must the tests do everything through the UI? Mature teams design testability into the application from the outset, and work closely with developers and infrastructure folk to ensure that the environment is easy to test. Because, if an application is easy to test, it is generally also more robust and more stable.

Trustworthy test suites are Scalable

Good test suites are designed with scalability in mind. As the test suite grows, new tests should be easier and quicker to add than previous ones. The impact of changes to the application is strictly limited, making it easier and faster to update the test suite.

A sub-optimal test suite, or a test suite running against an application that is not test-friendly, displays the opposite behaviour. The tests are initially quick to write, but then progressively slow down until the cost of maintenance severely hinders writing new tests, or writing new tests can only be done at the cost of allowing other tests to fall into disrepair.

Scalability relies on many factors:

  • Code quality: Well-designed, high quality test-automation code is essential. Test automation code can particularly be subject to duplication. Strong code design and refactoring skills are key. Test automation is as much about programming as any other domain we choose to automate.
  • Performance: it is hard to scale your test suite if it takes an unreasonable amount of time to run;
  • Stability: Unstable, flaky tests are a surefire way to prevent your tests from scaling - if you write additional tests but leave unstable tests in place, it won't be long before you are spending all your time triaging broken tests and writing no new tests at all.

Conclusion - are all the pillars equal?

So what should you prioritise to obtain a world-class test automation suite? For many teams, high functional coverage is the go-to criteria determining whether a test suite is trustworthy. It is true that with higher automated coverage comes a higher level of confidence. After all, how much confidence can you accord to a test suite that only tests half of your application’s features?

However, as with many real-world problems, things aren’t as simple as they appear on the surface.

Coverage at any cost

Many teams write automated acceptance and regression tests with the expressed or implicit goal of achieving very high functional coverage at the cost of other factors such stability or quality. However, there is a strong caveat to this statement. High functional coverage is necessary but not sufficient for successful test automation, and other aspects may actually be more important in the short term. For example:

  • Aiming for high test coverage at the cost of stability will only increase uncertainty.
  • Aiming for high test coverage but not ensuring that the tests can run quickly results in slow and cumbersome test suites that cannot give feedback in a timely manner.
  • Aiming for high test coverage but not ensuring that the framework is designed and written cleanly using high quality coding practices will ultimately result in lower test output, as the cost of maintaining the tests increases over time.

Look to your fundamentals

Trustworthiness is the most important characteristic of an automated test suite, but you cannot achieve trustworthiness by simply focusing on the most obvious aspects such as the number of tests written.

The key to maintaining a trustworthy test suite is to continually make sure you are not prioritising test throughput at the cost of quality and sustainability. This can be tricky, and many teams fall into this habit without even noticing. Here are a few characteristic red flags that can tell you if you should slow down and spend some time stabilising your fundamentals before forging ahead with more tests:

  • Your tests now take longer to write than they did at the start of the project;
  • You spend a lot of time diagnosing and troubleshooting flaky or unstable tests;
  • Your test suite is taking longer and longer to run

Rather than focusing on raw test count metrics, experienced teams focus on building and enabling a high quality test suite; when they do this, high coverage naturally follows.

© 2019 John Ferguson Smart