BDD Discovery Pitfalls – When the gherkins hide the garden
In this article, we will take a look at a real-world example of BDD scenario refactoring. Sue's team had just received a new story to work on. Sue's team works in an international bank, in the regulatory department: they are building a workflow application that makes it easier for auditors to review risky or suspicious trades.
The story was a simple one, about adding some new columns to a consolidated report which the users can download as an Excel spreadsheet.
Task audit reporting
In order to check that task reviews are being completed in time
As an auditor
I want to see when task reviews start and are completed
The acceptance criteria provided looked like this:
- Report should contain the following columns:
- Review start date
- Review due date
- Review completed date
- Review completed on time
- Date fields should have a format of dd-MMM-yyy HH:mm:ss
Sue didn't anticipate any particular difficulty.
From Acceptance Criteria to Feature Files
They met with the product owner to discuss the requirements, and then came up with a feature file containing scenarios along the following lines:
Scenario: Task audit report columns
Given the auditor has logged on to the task review application
When he downloads the latest consolidated report
Then the report should contain the following columns:
| Review Start Date |
| Review Due Date |
| Review Completed Date |
| Review Completed On Time |
Scenario: Task audit report column date formats
Given the auditor has logged on to the task review application
When he downloads the latest consolidated report
Then the report columns should have the following formats:
| Column | Format |
| Review Start Date | dd-MMM-yyy HH:mm:ss |
| Review Due Date | dd-MMM-yyy HH:mm:ss |
| Review Completed Date | dd-MMM-yyy HH:mm:ss |
| Review Completed On Time | Yes/No |
These scenarios certainly do accurately reflect the acceptance criteria specified in the original story. They check both which columns are displayed, and that the dates in these columns reflect the specified formats. And the product owner was happy with them. What else could we ask for?
Trouble in Prod
So they built and deployed this feature, and made sure all the acceptance tests were green. But once in production, issues started to appear. In particular, the bank's most reliable auditors were suddenly being reported for submitting late reviews! It turned out that the 'review due' date and the 'review completed' date were inverted.
In fact, these scenarios missed their mark. While they are very detail-focused, they do a poor job of illustrating the overall business behaviour. We can see what format these dates should be recorded in, but where do they come from and how can we be confident that the correct values are going into the correct columns? For example, what happens if the report is generated for an uncompleted review? What value should appear in the 'Review completed on time' column before the review is completed, and whose responsibility is it to calculate this value?
As they stand, these acceptance criteria would fail if a timestamp was missing the seconds (which is probably not a big deal from a business perspective), but would pass if the dates were in the wrong columns, or even if they were all hard-coded to the same value. In other words, the scenarios check what the product owner asked for, but they don't demonstrate that the feature actually works.
A better approach
A more experienced team would not have stopped here with the initial acceptance criteria. They would have drilled the product owner or BA further, maybe asking questions like:
- What should go in the 'Review Completed' column when the review is not yet completed?
- What is the 'On time' status for a review that not yet completed?
- What is the 'On time' status for a review that is overdue but not yet completed?
- Are there any review task types where the review dates should not appear?
They would ask for examples: "Can you give me an example of what is reported for a basic case, when an auditor completes a review task on time?" "What if the review task is completed late?" "What if it is abandoned?" And so on.
Some of the scenarios covering these examples might look something like this (the dates are shortened for readability):
Scenario: Should report review start, due and completed dates for Volcker review tasks
Given Bill Smith started a 'Volcker Non Trading' review on 10-May-2018 that is due on 17-May-2018
And he completed the review on 15-May-2018 at 15:00
When the consolidated review report is generated
Then the report should contain an entry with the following values:
| Auditor | Review Task | Review Start | Review Due | Completed | On Time|
| Bill Smith | Volcker Non Trading | 10-May-2018 | 17-May-2018 | 15-May-2018 | Yes |
Or this:
Scenario Outline: Should record whether tasks were completed on time
Given Bill Smith started a 'Volcker Non Trading' review on <Review Start> that is due on <Review Due>
And he completed the review at <Completed>
When the consolidated review report is generated for 20-May-2018
Then the report should contain an entry with the following values:
| Auditor | Review Task | Review Start | Review Due | Completed | On Time |
| Bill Smith | Volcker Non Trading | <Review Start> | <Review Due> | <Completed> | <On Time>|
Examples:
| Auditor | Review Task | Review Start | Review Due | Completed | On Time |
| Bill Smith | Volcker Non Trading | 10-May-2018 | 17-May-2018 | 15-May-2018 | Yes |
| Bill Smith | Volcker Non Trading | 10-May-2018 | 17-May-2018 | 18-May-2018 | No |
| Bill Smith | Volcker Non Trading | 10-May-2018 | 18-May-2018 | | No |
| Bill Smith | Volcker Non Trading | 10-May-2018 | 25-May-2018 | | |
These scenarios do include examples of the date formats and show the requested columns. But they show them in the context of worked examples that illustrate different flows through the system, and different variations of inputs and outcomes. They show not only what columns are being reported, but also why they are significant.
Conclusion
This sort of mistake happens more often than you might think. Sue's team lacked experience in BDD requirements gathering workshops, and had fallen into one of the classic pitfalls: they assumed that the product owner knew best. They hadn't challenged the requirements enough. And as a result, the acceptance criteria in the story only covered a small part of the real business requirements, and missed out on some of the most important.
Three Amigos sessions are not simply a walk-through of the next user stories fresh off the backlog. They are dynamic, engaging sessions where team members actively challenge and cross-examine a user story, trying to spot ambiguities, unmask missing assumptions and uproot uncertainty. And techniques like Feature Mapping and Example Mapping are there for us to help structure and channel these conversations.