Page Objects that Suck Less – Tips for writing more maintainable Page Objects

John Ferguson Smart | Mentor | Author | Speaker - Author of 'BDD in Action'.
Helping teams deliver more valuable software sooner19th January 2019

(Learn about more modern agile test automation practices in the Modern Agile Test Automation Playbook)

Page Objects are one of the most widely used test automation design patterns around. Most QA engineers have used a variation of Page Objects at some point. However, it is often misunderstood and used poorly, which can result in test automation code that is fragile and hard to maintain.

In this article, we will look at some of the limits and pitfalls of the Page Objects pattern, as well as some approaches that can help you ensure that your Page Objects don't lead you to a maintenance nightmare.

Origins of the Page Objects pattern

The basic idea of Page Objects is sound - hide the logic about how you find the elements of a page (for example, id fields or XPath or CSS selectors) from how you interact with these elements (what value your enter into a field for example). The idea is to keep the selector logic in one place, so that if a page element changes, you only need to update the code in one place.

The Page Objects pattern has been around for a while. In the context of Selenium WebDriver testing, the model was originally proposed by Simon Steward as a way to introduce testers to a more maintainable, Object Oriented approach to test automation.

Pages Objects are indeed much more maintainable than unstructured Selenium test scripts, and work well for simple scenarios. But for more complex UIs and more complex scenarios, they need to be used together with other design patterns to ensure that the Page Objects themselves do not become a maintenance overhead.

Limits of the Page Object model

Many articles have been written on the limits of the Page Object model. Here are a few of the key limitations:

  • They don't work well for modern UI designs: web applications today are much more complex and more interactive than they were a decade ago. Rich UI components are used across multiple pages, and on any given page users can perform a wide variety of actions. When you try to model each page as a single Page Object class, you end up with large and complex classes that can be hard to understand and harder to maintain.
  • They make you think at the wrong level of abstraction: A more subtle problem is that using Page Objects gives you a skewed perception of your test automation architecture. When you build your test framework around Page Objects, you tend to see everything in terms of interactions with a web page. And this leads to tests that do everything via the UI, even tasks which could be done more reliably and more quickly by calling a REST API or querying a database.
  • They can easily become bloated and hard to maintain: When all you have is Page Objects, all your logic goes in your Page Objects. Page Objects often have a mix of code that locates elements on a page (for example, a quantity dropdown, a size radio button, and an Add To Cart button) and that interacts with these elements (e.g a addDressesToCart() method). There ends up being a lot of code doing a lot of different things in a single class. This makes the class harder to understand and to maintain. It also means that when you modify one of the Page Object methods for your own test, you may well break it for other tests. Large Page Object classes undergo a lot of churn, and this inevitably introduces risk.

Tips for writing good Page Objects

Despite these limitations, if your current test automation suite relies on Page Objects, do not despair! There are a few simple things you can do to help ensure that your Page Objects do not become a liability.

Think Page Components, not Page Objects

If your Page Objects are a bit on the heavy side, one of the simplest improvements is to break them up into smaller classes. If you have a complex page, don't try to place every field into a single class. Instead, you can use the concept of Page Components. A Page Component represents a specific part of the page that helps the user perform a specific task. A login form, a navigation hierarchy, a search result list, or the details about the current user: all of these would make great Page Components.

For example, consider the following web page:

On this page, we can identify at least three different sections, each of which could be represented by independent components:

  • The menu bar at the top of the page
  • The “Plan a journey” search panel
  • The “My journeys” history box
  • The Search box in the top right hand corner

Some of these, such as the menu bar and the search box, appear on many pages. And other components, such as the travel preferences shown below, only appear in certain situations. But most importantly, the test code that interacts with each of these components can be developed and maintained independently.

In coding terms, a page component looks just like a page object, only much smaller. Below you can see an example of a Page Component class for the menu bar. (The examples in the rest of this article will use Serenity BDD Page Objects, but the principles apply equally to any Page Object implementation).

class MenuBar extends PageObject {

    @FindBy(css = ".collapsible-menu")
    private WebElement menuBar;

    public void selectMenuItem(String menuItem) {
        menuBar.findElement(By.linkText(menuItem)).click();
    }
}

Keep business logic out of your Page Objects

Another reason Page Objects become large and hard to maintain is that they try to do too many things.

For example, suppose we wanted to check that a particular menu item was enabled for certain users, and disabled for others. We could add some more methods to the MenuBar class to perform these checks:

class MenuBar extends PageObject {

    @FindBy(css = ".collapsible-menu")
    private WebElement menuBar;

    public void selectMenuItem(String menuItem) {
        menuBar.findElement(By.linkText(menuItem))
           .click();
    }

    public void checkMenuItemIsEnabled(String menuItem) {
        assertTrue(
            menuBar.findElement(By.linkText(menuItem))
                   .isEnabled()
        );
    }

    public void checkMenuItemIsDisabled(String menuItem) {
        assertFalse(
            menuBar.findElement(By.linkText(menuItem))
                   .isEnabled()
        );
    }
}

A test using this class would check the state of the menu item by calling this page object method:

    @Test
    public void plan_my_journey_is_available_for_all_users() {
        navigate.toTheTFLHomePage();
        menuBar.checkMenuItemIsEnabled("Plan a journey");
    }

However, adding logic into a Page Object class is another road to complexity and maintenance headaches. You would also need to add methods for other checks: what if we need to check that an element is hidden entirely?

A cleaner approach is to make Page Object classes responsible for doing a single job: locating elements on the page. The MenuBar Page Object could be limited to locating menu items by name:

public class MenuBar extends PageObject {

    @FindBy(css = ".collapsible-menu")
    private WebElement menuBar;

    public WebElement itemFor(String menuItem) {
        return menuBar.find(By.linkText(menuItem));
    }
}

The test code could now refer to this element directly, to make whatever assertions it sees fit. In the following example, we use AssertJ to check that a given menu item is both displayed and enabled:

    @Test
    public void plan_my_journey_is_available_for_all_users() {
        navigate.toTheTFLHomePage();
        assertThat(menuBar.itemFor("Plan a journey"))
                          .matches(WebElement::isDisplayed)
                          .matches(WebElement::isEnabled);
    }

Model user behaviour not user interfaces

In many test automation projects built around Page Objects, the Page Objects are manipulated directly in the tests. For example, the following code uses Page Objects in a fairly typical way:

    TFLHomePage tflHomePage;
    TFLSearchResultPage tflSearchResultPage;
    TFLStatusUpdatesPage tflStatusUpdatesPage;

    @Test
    public void trips_between_two_stations() {
        tflHomePage.open();
        tflHomePage.selectFrom("Paddington Underground Station");
        tflHomePage.selectTo("Liverpool Street");
        tflHomePage.clickOnPlanMyJourney();
        tflSearchResultPage.waitForResults();

        assertThat(tflSearchResultPage.getJourneyResultFrom())
            .isEqualTo("Paddington Underground Station");
        assertThat(tflSearchResultPage.getLastStop())
            .endsWith("to Liverpool Street");
    }

This code wraps hides the location logic for the page elements inside page objects, as prescribed by the Page Object pattern. It cleanly separates WebDriver calls from the test code itself. And yet tests like this still quickly become hard to maintain.

It is not hard to see why. Imagine we need to write another, similar test, one where the user prefers to travel by bus.

    @Test
    public void choose_a_preferred_transport_mode() {
        tflHomePage.open();
        tflHomePage.clickOnEditPreferences();
        tflHomePage.deselectAllTravelModes();
        tflHomePage.selectTravelMode("Bus");
        tflHomePage.selectFrom("Paddington Underground Station");
        tflHomePage.selectTo("Liverpool Street");
        tflHomePage.clickOnPlanMyJourney();
        tflSearchResultPage.waitForResults();

        assertThat(tflSearchResultPage.getItinerary())
            .contains("205 bus to Liverpool Street Station");
    }

The code in this test is almost, but not quite, identical to the previous one. In a real-world test suite, there would be many others like it. Each describes how the user interacts with the user interface in great detail. But this makes the tests verbose and noisy, which in turn makes them hard to read and harder to maintain.

They are also fragile. The Page Object model keeps the locators for each field in a central place, which is good. But our tests are tightly coupled to the user interface. If the broader UI design changes, the impact is spread across many tests.

For instance, imagine a change where, instead of simply entering a departure and destination station, the user needs to type the name of the station, and then select the station in a dynamic dropdown:

The code to automate this interaction might now look like this:

    tflHomePage.selectFrom("Paddington Underground Station");
    tflHomePage.clickOnStopInDropdownNamed(
                           "Paddington Underground Station");
    tflHomePage.selectTo("Liverpool Street");
    tflHomePage.clickOnStopInDropdownNamed("Liverpool Street");

This change is not hard in itself, but using a classic Page Object model, it would impact every single test that exercises the “Plan a journey” feature. And the more code you need to change, the more likely you are to make a mistake and break a test.

Encapsulate interaction logic inside step methods

A more maintainable and more robust approach is not just to model how the user interacts with the application, but to model what the user is trying to do in business terms.

Let’s see what we mean by this. In the first test we saw, the user is performing three tasks:

  • Navigate the TFL home page
  • Plan a journey from Paddington to Liverpool Street Station
  • View the proposed itinerary.

We could make the original test more readable by reorganising the interactions into “step” methods to reflect this breakdown:

    @Test
    public void choose_a_preferred_transport_mode() {

        navigateToTFLHomePage;

        planAJourneyBetween("Paddington Underground Station",
                            "Liverpool Street");

        assertThat(tflSearchResultPage.getJourneyResultFrom())
            .isEqualTo("Paddington Underground Station");
        assertThat(tflSearchResultPage.getLastStop())
            .endsWith("to Liverpool Street");
    }

This way, the UI interactions for each step are grouped in one place, making them easier to maintain and making the test code easier to understand. For example, the planAJourneyBetween() method would look like this:

    public void planAJourneyBetween(String departure, 
                                    String destination) {
        tflHomePage.selectFrom(departure);
        tflHomePage.clickOnStopInDropdownNamed(departure);
        tflHomePage.selectTo(destination);
        tflHomePage.clickOnStopInDropdownNamed(destination);
        tflSearchResultPage.waitForResults();
    }

Keep step methods in “action” classes

Breaking a test into smaller step methods is a useful technique, but you can only use these methods within a single test class. Oftentimes, these step methods can be used across many tests, or even across multiple projects.

An approach that scales better is to put the step methods in their own distinct classes, rather than including them in the test classes themselves.This makes it easier to reuse the step methods in different tests. We call these classes Action classes.

Serenity BDD provides special support for this pattern, but you can use the same approach with any framework. An example of a class that encapsulates navigation tasks can be seen below:

public class Navigate {
    TFLHomePage homePage;
    MenuBar menuBar;

    public void toTheTFLHomePage() {
        homePage.open();
    }

    public void toMenuItem(String menuItem) {
        menuBar.itemFor(menuItem).click();
    }
}

Now, rather than adding a method to the test case, we would use an instance of theNavigate class. In Serenity BDD, we can use the code>@Steps annotation to instantiate the navigate field for us:

   @Steps
   Navigate navigate;

   @Test
   public void choose_a_preferred_transport_mode() {
       navigate.ToTFLHomePage;
       ...

Use action classes to model user tasks

There are many ways to organise step methods into classes. One approach is to organise steps by user role, so that each step role represents a different user. However, different users may want to perform the same tasks.

A more flexible approach is to group step methods by business task. For example, we could have a PlanMyJourneySteps class, which includes methods related to different ways a user can plan a journey between two stations.

Using Serenity BDD, the PlanMyJourneySteps might look something like this:

public class PlanMyJourneySteps extends UIInteractionSteps {

    @Step("Plan my journey from {0} to {1}")
    public void between(String from, String to) {
        selectStation(FROM_STATION, from);
        selectStation(TO_STATION, to);
        clickOnPlanYourJourney();
    }

    @Step("Plan my journey from {0} to {1} on {2}")
    public void between(String from,
                        String to,
                        String departureDate) {
        selectStation(FROM_STATION, from);
        selectStation(TO_STATION, to);
        changeDepartureDateTo(departureDate);
        clickOnPlanYourJourney();
    }
    ...

Let’s look at the implementation of these methods more closely.

Lean Page Objects and Action classes

We saw earlier how it makes sense to keep our Page Objects clear of business logic, and have them focus on locating elements on the web page. It is the step methods that orchestrate the actual interactions with the page. I like to call these Lean Page Objects.

There are two ways we can write Lean Page Objects. One approach is to use code>@FindBy-annotated WebElement fields and to write wrapper methods which interact with these fields, like this:

public class ChooseStationsPageComponent extends PageObject {

    @FindBy(id="InputFrom")
    WebElement fromStation;

    @FindBy(id="InputTo")
    WebElement toStation;

    public static String STATION_SUGGESTION = 
      "//span[contains(@class,'tt-suggestion')][contains(.,'%s')]";

    public void selectFrom(String from) {
        fromStation.sendKeys(from);
    }

    public void selectTo(String to) {
        toStation.sendKeys(to);
    }

    public void clickOnStopInDropdownNamed(String station) {
        String locator =  String.format(STATION_SUGGESTION, 
                                        station);
        findBy(locator).click();
    }
}

The action class would declare the page object and call these methods:

public class ChooseStationSteps {

    ChooseStationsPageComponent chooseStations;

    @Step
    private void selectDepartureStation(String stationName) {
        chooseStations.selectFrom(stationName);
        chooseStations.clickOnStopInDropdownNamed(stationName);
    }

    @Step
    private void selectDestinationStation(String stationName) {
        chooseStations.selectTo(stationName
        chooseStations .clickOnStopInDropdownNamed(stationName);
    }
}

The second approach is to have Page Objects that are responsible solely for locating web elements, and have the methods in the action classes manipulate these elements. So rather than exposing methods which manipulate web elements, the Page Objects expose only locators.

Using this approach, our Page Object could look like this:

public class PlanMyJourneyUI {
    public static By FROM_STATION = By.id("InputFrom");
    public static By TO_STATION = By.id("InputTo");

    private static String STATION_SUGGESTION = 
      "//span[contains(@class,'tt-suggestion')][contains(.,'%s')]";

    public static By stationSuggestionFor(String stationName) {
        return By.xpath(
           String.format(STATION_SUGGESTION, stationName)
        );
    }
}

And the UI Interaction step class might look like this:

public class ChooseStationSteps extends UIInteractionSteps {

    @Step
    private void selectDepartureStation(String stationName) {
        find(FROM_STATION).sendKeys(stationName);
        find(stationSuggestionFor(stationName)).click();
    }

    @Step
    private void selectDestinationStation(String stationName) {
        find(TO_STATION).sendKeys(stationName);
        find(stationSuggestionFor(stationName)).click();
    }
}

In Serenity BDD, the UIInteractionSteps class marks a class as a WebDriver-enabled action class, and gives the methods full access to the Serenity WebDriver API. The code>@Step annotation tells Serenity BDD to include this step (with corresponding screenshots) in the test report.

The two approaches are similar, but the second approach tends to result in leaner, more flexible code and a better separation of concerns.

Separate actions and questions

So far our step methods have focused on doing things to the UI - clicking buttons, selecting values in dropdown lists, and so on. The other type of step method involves reading values from the UI (or from somewhere else) to check whether the application has done what we expect. You might call these methods “query” methods.

Some folks prefer to keep query methods in the same classes as the action methods. I like to put them in a separate class, as it makes the test code a little more readable. It also gives you a bit more flexibility - some query methods might use UI interactions, whereas others might query a REST end point or a database.

For example, on the result page of the TFL application, there is a short summary of the itinerary at the top of the page, which includes the departure and destination station:

Suppose that our test needs to check the Fromand To fields on this page. The Page Object where the locators for these fields is stored looks like this:

public class ResultsSummaryUI {
    public static By FROM = By.xpath("//div[contains(@class,'summary')][span[.='From:']]/span[2]");
    public static By TO = By.xpath("//div[contains(@class,'summary-row')][span[.='To:']]/span[2]");
}

We could keep the code that queries the web page in a JourneyResults question class, like this one:

public class JourneyResults extends UIInteractionSteps {

    public String departureStation() {
        return find(ResultsSummaryUI.FROM).getText();
    }

    public String destinationStation() {
        return find(ResultsSummaryUI.TO).getText();
    }
}

Finally, the test code would use the JourneyResults question class to check whether the departure and destination stations are correctly displayed.

    @Steps
    JourneyResults theJourneyResults;

    @Test
    public void see_trips_between_two_stations() {
        navigate.toTheTFLHomePage();
        planMyJourney.between("Paddington Underground Station",
                              "Liverpool Street");
        assertThat(theJourneyResults.departureStation())
                   .isEqualTo("Paddington Underground Station");
        assertThat(theJourneyResults.destinationStation())
                   .isEqualTo("Liverpool Street");
    }

Notice how the variable names vary between action classes (planMyJourney and navigate) and query classes (theJourneyResults). This happens a lot. Query methods and action methods are used in different contexts, and the test code generally reads better if they are kept in different classes.

Next Steps - consider adopting the Screenplay pattern

The techniques we have looked at in this article are often what happens naturally when we apply Object Oriented software development principles to test automation. We can take this approach further by modelling user actions and questions not as methods, but as objects in their own right. This more advanced approach makes it possible to write extremely expressive test automation code, using small, reusable domain-specific classes which act as building blocks for a more sophisticated test framework.

An example of a Screenplay test is shown below:

    @Test
    public void see_trips_between_two_stations() {
        Actor tim = Actor.named("Tim")
                     .whoCan(BrowseTheWeb.withDriver(driver));

        given(tim).wasAbleTo(Navigate.toTheTFLHomePage());

        when(tim).attemtptsTo(
            PlanAJourney.from("Paddington Underground Station")
                        .to("Liverpool Street")
                        .travellingBy("Bus")
        );

        then(tim).should(
            seeThat(
                TheProposedJourney.stops(),
                contains("205 bus to Liverpool Street Station")
            )
    );
    }

Many of the concepts we discussed in this article, such as lean page objects and the separation of actions and questions, are also core concepts in Screenplay. You can learn more about the Screenplay pattern here.

Conclusion

Left to their own devices, test automation frameworks build around Page Objects are prone to becoming unwieldy and hard to maintain. And this inevitably leads to test suites that are brittle and expensive to maintain.

But Page Objects don’t have to suck! In this article, we have discussed several simple techniques and approaches that you can use to organise and structure your Page Objects more effectively.

Using Lean Page Objects, which focus exclusively on locating elements on a page, leads to Page Object classes that are simpler and more focused. Action classes implement business-facing methods that manipulate the elements located by the page objects, which can be orchestrated together to form higher level business steps. And Question classes provide methods to query the system and make assertions about the results of a test.

Together, this gives us a simple but effective way to structure our test automation code and avoid many of the maintenance issues that come with traditional page objects.

Where to from here?

Writing best test automation code is great, but it's just one part of the puzzle. Teams often struggle with the bigger picture.

I've written up an approach to agile test automation that I've personally seen and used in dozens of projects, and that I can vouch for.

It's called "Flipped Testing", and you can read about how it works here:

Flipped Testing: A modern approach to building better agile test automation frameworks.

© 2019 John Ferguson Smart