Automated Tests of a Flutter application

11 min readOct 26, 2022

SNCF Connect developers are loving Flutter

Introduction

Many applications are now using the Flutter framework, to be available on iOS and Android (with the one code base). Since the last two years, the community has made the framework stable and has provided a lot of tools to test the code and the application. Within the same period, we’ve started to develop SNCF Connect, an application now used by millions people in France. Our ambition was to move to “continuous delivery” by using state of art solutions and automated tests.

If you have already made an application, you know how complicated it can be to test it, each time you’re adding a feature to it. You can write unit tests for the code, “golden” tests to check the changes in the UI, but theses tests are statics and are not validating the feature and navigation in the application.

To solve this issue, the pyramid of tests (check this article of Alexandre Poichet about the pyramid of tests with flutter) is providing I“ntegration tests” and “UI Tests”. But if you already have developed theses tests on native applications, you know how difficult it can be to maintain them, and trust their results when testing a complex application.

This article will focus on how we managed to develop, maintain and use automated tests in the continuous delivery of SNCF Connect. SNCF Connect is a new digital service, inheriting from OUI.sncf, that can be used to plan a trip door-to-door.

A bit of history

Before SNCF Connect were OUI.sncf and L’Assistant, two applications to book trains, search for a door-to-door itinerary buy, check trafic information, buy services… all over Europe. Theses applications were developed for AppStore and Google Play Store, using the native framework of each OS.

UI Tests were also developed using the native frameworks of UI Testing, provided by each OS (XCTest for iOS, Espresso for Android). This implied to code, test and maintain tests on each plateforme. Other frameworks like Appium were available, but not for testing an iOS app.

Moreover, theses framework were not stable: you couldn’t be sure to get the same results of an execution, each time you run the test. Why? Because, the more your app has a complexe route to test, the more you’re facing UI animation, network calls… and native frameworks are not working well on complex applications.

On OUI.sncf, we disabled UI animation and “mocked” the more we could, but on iOS, the XCTest framework failed inconsistently. Testers or CI jobs had to retry the test, in that case, so as to be sure of the failure (and avoid false negatives).

Choosing the right pill

With SNCF Connect, we tested several frameworks for the our Flutter app. The requirements were :

Stability: results should be reliable (if the test fails, the failure should be the same on every execution)
Integration: tests had to run on local machine and CI
Reporting: generated reports should be understandable for a developer or a tester
Intuitive: development of the test should be easy to start and maintain

Bonus: since we are building an iOS and Android application, it would be great to have only basis of code to test both platforms.

Since we already had a lot of experience with natives testing frameworks, we knew that they couldn’t fulfill our requirement of stability.

We tried several solutions and moved to Appium, but not the Flutter version of the SDK. At the time we started SNCF Connect, the Appium Flutter Driver was still in beta and was lacking of feature.

In the meantime, we decided the way we wanted to write and organize our tests. On OUI.sncf, we tested Gherkin and Cucumber to describe the way we wanted to test our backend (more about Gherkin/Cucumber). This language is making the test easier to understand, for developer and testers. In addition to that, when writing a Gherkin to describing a test step by step a functionality, we were also writing the specifications of our application (and website too). On SNCF Connect, we’ll use it to describe the steps of our tests, and a test will be used on web, mobile and BFF (Back-For-Front), if it is relevant.

So our final decision was clear: Appium + Cucumber, in Kotlin. Kotlin was the right language to use, since our backend (more precisely our BFF) was written in Kotlin. Moreover, Kotlin was already known by Android developers, and not that far from Swift, well known by iOS developers.

Assembling Lego

For those who have never implemented a test writin in Gherkin, Cucumber is making things really easy. Cucumber is reading each step (a step is describing the interaction a user will have with application) of a .feature file.

Gherkin Sample in one of our .feature files

Then, Cucumber tries to find the “glue” matching it and run the code linked to it. A glue is the exact sentence to match, or a regex :

class WhenSteps {

    @Lorsque("(?i)^l'utilisateur accède à l'info trafic de la zone \"(.+)\"$")
    fun selectZone(zoneName: String) {
        TrafficInfoScreen.selectZone(zoneName)
    }
}

Basically here, the glue is a regex matching the start of the sentence, and use the “(.+)” to extract the parameter used in the linked method. If you look what the code is doing, this method is only a call to an action on a screen.

Why? Because we’re using the Page Object Pattern to build up the architecture of the code. We already have used this pattern on OUI.sncf, and it was still perfectly adapted with Gherkin and Cucumber.

This Pattern is powerful, since each screen (or part of the screen) is described in its own class, with a method for each action/interaction available in it. Each action is returning the Page the app is moving too.

In our test module, we built an abstract class with all the required methods:

abstract class PageScreen {

    private val logger = KotlinLogging.logger {}
    private val modalErrorByElement = ByLocalizedTextAndClass("genericErrorTitle", Locators.viewClassName)
    private val backButtonByElement =
        ByLocalizedTextAndClass("accessibility_back_button", Locators.buttonClassName)

    abstract fun isScreenDisplayed(): Boolean

    fun activity(msg: String) {
        logger.info {
            this::class.toString().replace(packageName, "").replace("class", "").trim() + " => " + msg
        }
    }

    fun logError(msg: String) {
        logger.error {
            this::class.toString().replace(packageName, "").replace("class", "").trim() + " => " + msg
        }
    }

    fun checkIfErrorIsDisplayed(): PageScreen {
        if (doesScreenContainElement(modalErrorByElement, 15)) {
            logError("Une erreur de type inconnu s'est produite")
            fail("Une erreur de type inconnu s'est produite")
        }
        return this
    }

    fun goBackToPreviousScreen(): PageScreen {
        activity("Retour vers l'écran précédent")
        Getters.getElementBy(backButtonByElement).click()
        return this
    }
}

As you can see, this class is providing helpers, used to navigate in the application or to log Appium commands sent with the driver to the emulator/device (which covers our Reporting).

But you may have noticed:

abstract fun isScreenDisplayed(): Boolean

Implemented in each class inheriting from PageScreen, this method is mandatory to be sure that we are on the right screen before moving to the next action.

The way we are building things with the Page Object Pattern and Gherkin/Cucumber makes it also very simple to develop or maintain our basis of code: coding a step makes it available for all gherkins using it. Changing the code into it (to match a change in the application) updates it also for all the tests. And if you consider making a bigger change in the application (like adding/removing a new screen), all you need to do is to update the steps to navigate to the new screen, code the new interaction and chain the previous call made if necessary. All of this cover our Intuitive requirement.

Diving into the code

As explained earlier, we chose to use Appium for interacting with our application. Appium is providing helpers to select, scroll and click in the application, like a real user would do.

We have developed our own methods based on Appium:

/**
 * Return an element of the screen, using a By selector.
 * @param selector way to select the object.
 * @param secondsToWait timeout of max number of seconds to wait.
 *
 * @return MobileElement matching selector.
 */
fun getElementBy(selector: By, secondsToWait: Long = SECONDS_TO_WAIT, maxNumberOfSwipes: Int = 5): MobileElement {
    logger.debug { "$selector, timeout : $secondsToWait" }
    scrollTo(selector, maxNumberOfSwipes = maxNumberOfSwipes)
    waitForVisibility(selector, secondsToWait)
    return driver.findElement(selector) ?: error("Elément $selector non trouvé !")
}

This method is used to query the DOM of the application, to find a matching element. If no element is found, then an error (including the selector queried) is logged in the report. This could be either a bad selector coded, or a real error in the application.

Let’s see it in action:

private val stationsBoardBigLinesBnByElement = ByLocalizedTextAndClass(
    "stationBoards.big.lines.label",
    viewClassNamefun switchToBigLinesTab() {
    activity("Ouverture de l'onglet Grandes lignes")
    getElementBy(stationsBoardBigLinesBnByElement).click()
}

This basically how a glue is working: first logging the action trying to be run, then using the helper to find and element and clicking on it.

Regarding the selector:

private val stationsBoardBigLinesBnByElement = ByLocalizedTextAndClass(
    "stationBoards.big.lines.label",
    viewClassName

We developed our on method:

class ByLocalizedTextAndClass(
    accessibilityId: String,
    clazz: ByClassName,
    caseSensitive: Boolean = false,
    withoutAccent: Boolean = false,
    input: Map<String, String> = emptyMap()
) : ByTextAndClass(
    accessibilityId.localize(input),
    clazz,
    caseSensitive = caseSensitive,
    withoutAccent = withoutAccent
)

2 parameters are required:

the accessibilityId (which is the string/ID we’re looking on the screen): looking for it in the DOM is the only way we can find an object, since “real” IDs are changing
className: the type of object we’re looking for: this will speed up the search by filtering first on the type, and then looking for the accessibilityId.

The className provided is an alias to real object type, depending on the platform running the test:

val viewClassName: ByClassName = when (platform) {
    ANDROID -> ByClassName("android.view.View")
    IOS -> ByClassName("XCUIElementTypeStaticText")
}

With theses aliases, we only need to use the right className to find the object, no matter the OS running the application. Thus, we’re developing a test for both iOS and Android, at the same time. This covers our Bonus requirement.

About the other optional parameters:

caseSensitive: if we want the search to match using case
withoutAccent: if we want the search to get rid of accent
input: to replace string with params into it

You may have noticed the term Localized in the method name. This is because we’re using localisation for our tests: since finding an object means finding is accessibilityId (aka its text), it’s easier to code a selector with a non-changing key. Even if the translated text is evolving, no change on the code is needed (again a better stability of the test and a simplicity of development). Plus, the test developed can be used to test any language supported.

Running a test

Once the test and its steps have been developed, we need to execute it. First, to see if the test is running like it should, then to check if the development made is satisfying the Gherkin and the functionality.

Using Appium for Kotlin on SNCF Connect enabled us to launch a test as a gradle task. In the test project, we defined configuration for IDEs (IntelliJ, VSCode) used by our developers to launch a specific test, or a full set.

How? by using annotations:

@mobile @sdk @web
  @IVTS-XXXXX @test_module

We defined annotations describing on which platform the test can be run, since as I said earlier, we chose to use the same Gherkin no matter the OS.

Then, even if our Gherkin in feature files are classified by directories, matching a module, we used @IVTS-XXXX annotation as a test ID: IVTS was the initial code name if the project followed by the number of the Jira entry where you can find more details about the test). We also developped an annotation @module_name to describe which module the test is about.

In that way, when testing a specific feature, we can use theses tags to launch and run either a specific test, or all the tests impacted by the new developments.

This execution can be run in several ways with a launch configuration in an IDE, as explained before, or using a shell script with a gradle task in it.

This script is the same running on our Continuous Integration (based on gitlab-ci (that covers our Integration point).

Automated tests in Continuous Integration

I’ll not focus on how we build/deploy the website, the application, or the BFF on our CI.

SNCF Connect is based on a mono-repo: all the code of the application, the website, the BFF, Lambdas (Serverless), tests and other assets (even Infra-as-Code) are on the same Git Repository, shared with all the developpers.

This implies that each time a change is made on it, a Merge Request is opened and we have to detect failure or regression as soon as possible.

To do that, several tests jobs are running in a “merge” pipeline: unit tests, golden tests, BFF tests and of course automated tests of the application and the website. We can lower the number of tests, if the merge request has only an impact on a specific platform (ex. not running automated tests on the website, if we’re only modifying the application).

As you can see of this screen of a pipeline, several jobs (some of them are blocking) are running in parallel.

Theses tests are validating several layers of Front-end and BFF, that’s why we are naming them “end-2-end” tests, even if we’re mocking some external APIs.

Duration preparation step of the pipeline:

we build the code of the tests for each platform
we deploy a built application to Saucelabs, our provider of real devices (N.B. we can change this provider easily and replace it with another service based on Appium Driver)

During the test execution step of the pipeline:

we test the BFF, to be sure that all of its APIs are still valides
we test the application/website with a mocked automated test. This test is the most common case, with high value for our business.

In the next few weeks, we will add more tests to our CI job, with our top use cases, to detect potential regressions sooner.

Conclusion (and next steps)

Since the beginning of the development of SNCF Connect, we have made a lot of decisions regarding our automated tests, and have challenged them all along.

Regarding Appium, one great thing with it is that it’s very stable: if you have a failure on your test, it will fail the same way each time you run it (with the same mocked responses from the BFF), no matter running it on CI or on a local machine (that fulfills the Stability point).

Using Page Object pattern is making development/refinements easier. The stability of Appium, its ability to run tests either on iOS or Android, is also improving the reliability of the results on each platform directly, with our customized reporting.

Theses tests have helped to detect potential regressions in the app (and on the website too with its dedicated automated tests) and lacks in our CI architecture, before moving SNCF Connect to production, and making available to our millions of users.

Even if it took several months to have a large library of steps implemented, coding new ones is really fast now. Onboarding new developers and testers too, since tests are written in their native language, and implementing new steps means using stable helpers we developed above Appium.

What’s next?

We’re actually releasing a new version of the application each week (including features and not only fixes) of the application, the website and the BFF. Releasing at this pace is mandatory for us, to deliver continuously value to our customers (with a better “time-to-market” than with OUI.sncf). But, this implies a high workload for the functional testers. Since our Jira is connected to XRay, we plan to launch XRay executions, directly from our Continuous Integration platform, to test modules of the application or the website by using their tags.

Stay tuned here to see the updates about how we automatized the full process of non regression tests of SNCF Connect application.

And you? Have you ever faced theses problems? What were your solutions? Feel free to contact us or join us to work on this, we are looking for testers/automation specialists with open vacancies at the moment 😉