Flutter and the practical test pyramid with the BLoC pattern.

Alexandre POICHET
9 min readDec 21, 2021

--

Introduction

Flutter is still considered as a young framework. But Flutter knows the importance of testing within its ecosystem. Developers have a few tools available to test their code. You should know how these tools work together, including some interesting libraries that could help in this direction. So, technically, Flutter’s developer hasn’t good reasons to skip testing.

However, setting up testing software is often painful. It’s such a big world and full of pitfalls. There are different categories and sub-categories ranging from unit testing to user testing, including load and performance testing. How to choose an efficient strategy and stick to it? Let me help you.

In this article we will restrict the scope to functional and automated tests: tests often implemented by developers as part of a new feature, which we want to replay easily and continuously to detect a regression as early as possible in the development cycle.

Choose your strategy

Test Pyramid

When we develop a new functionality, we study the different possible scenarios upstream, going from the nominal case to the error cases and including the alternative cases. These are different use cases that we would like to find in our tests. But where? What level is? Directly in the user tests? And as always, completeness has a price. If you want to automate all the use cases in end-to-end testing (editor’s note: user tests) in the development cycle, then you need the corresponding resources to execute them in a reasonable time. And the bigger the project gets, the longer it takes to launch the tests. So, we must find a compromise, which is what the test pyramid teaches us.

Test Pyramid: base with Unit Tests, intermediate with Integration Tests and the top with UI Tests. (figure 1 of 9)
  • Unit test

They represent the base of the pyramid because they are numerous and quick to play. We can find here the exhaustiveness of different use cases but with a restricted perimeter in terms of interactions, there is no intrinsic functional value. Concretely, we will test a well isolated business rule or a well-defined algorithm. Finally, it is considered as a white box test because it is low level, it requires you to know the code to perform this kind of test.

  • Integration test

At the intermediate stage, we have the possibility to test calls between services or to verify a component assembly. We do not try to systematically find the functional value but that the interactions are well done. Depending on the test, we can say that we carry out a grey box test because we know part of the code but not necessarily all of it.

  • User interface test

At the top of the pyramid, we find the user interface tests which are intended to test a functionality from start to finish while examining the graphical part (positioning of elements, representation of images, presence of properties, etc.). As a user, we don’t know the details of the code behind it, we test the result of a functionality, so it’s called a black box test. Despite their usefulness, they are not numerous because they are costly in execution time and difficult to maintain due to their extensive coverage, so we will try to rationalize them.

State management with the BLoC pattern

Flutter is a declarative paradigm framework; it requires us to rebuild the view or a part of the view according to a changing state.

UI equals function of states: formula to get render UI from mutating state. (figure 2 of 9)

Depending on the size and functionality of your application, you may need to share this state across multiple views and, on the contrary, isolate a specific part of the application with a restricted state change.

And what about navigation? Often, we just want to redirect the user to a new page and not redesign it to preserve a navigation history that is easier to maintain.

Digram of state vs navigation: navigate can go to a a new screen with same state and rebuild same scree can give you a new state. (figure 3 of 9)

In the context of a complex application (number of consequent functionalities), it appears difficult to define a maintainable and scalable architecture simply with the tools of the Flutter framework. To help you in the state management of your application, there is a plethora of libraries.

We will choose the BLoC library. This pattern is fully in line with the state/navigation scheme shown above. It also allows us to comfortably separate the logic part from the presentation part. Thus, we will be able to categorize our tests more easily and try to prioritize them.

Applying the strategy

We will try to apply these principles with examples. Below is a small and very simplistic form for adding passengers:

Form to add traveler: simple app demonstration to add traveler with the Flutter framework and the pattern BLoC. (figure 4 of 9)
  • A form screen with field validation.
  • A Cubit to manage the display of the text according to the choice of the drop-down list.
  • A block for managing exchanges with the server.
  • A screen of success of addition of the passenger.
  • Error snack bar in case of error return.

Tests with the BLoC framework

Here we have an example of a test to confirm that the state returned by the traveler description (cubit) depends on the type of traveler selected. In this unit test, we can test the completeness of the selection cases.

In the following example, we check that, when a new traveler is added and according to the answers returned by the repository, we emit the corresponding states:

Widget test and navigation test

We can test the rendering according to the available states with widget tests. Indeed, we simulate an input state and we check the behavior of the output widgets.

We can also perform navigation tests with a widget test. At the end of the test, we check that the initial screen is no longer in the tree and that the redirection to the new screen is done.

GUI tests thanks to golden tests

Golden tests are issued from the golden_toolkit library and allow comparison to the pixel visual capture of a widget in png format.

These tests are very useful but costly in terms of execution. We will therefore try to rationalize them! They can check if the graphic charter is well respected and they can also check the accessibility.

In this example, we get captures of the main screen.

Tests golden screen captures: left to the right, standard screen / double zoom screen / semantics screen. (figure 5 of 9)

Then, what about the many intermediate states? For example, there are many possible combinations for a form. To check that the view fits well, should we perform a golden test for each variation? The answer is no, it is better to use the widget tests we saw before for these different cases.

Go further with user tests

At the end of the chain, it is possible to complete with end-to-end user tests. This time, no state simulation, the logical and presentation layers are tested together. We reproduce the user’s gestures in the test code and we confirm that the expected result is in conformity. It is often necessary to set up a dedicated environment to execute these tests, which unfortunately take a lot of time.

There are many tools available to perform this type of test, including :

  • The flutter_driver library and the so-called integration tests with a dedicated environment, available on the official Flutter doc.
  • Android/iOS test automation software such as Appium. The interest here is to be able to combine the execution with a test writing tool like Cucumber to build and validate its tests in the Gherkin language. You have some examples in this article made by a colleague https://medium.com/@maxime.pontoire/automated-tests-of-a-flutter-application-3d878e9d8a61
  • Firebase Test Lab can perform user testing on real devices and without code using a robot.

Since these are the tests that take the most time and resources to run, it is sometimes useful to modulate the number of these tests. Indeed, during a delivery phase, you can play more tests to cover a larger scope and gain confidence before launching the production.

Below is a complete example of an end to end test with flutter_driver.

We chose to automate the test case on the addition of a young traveler because it represents the most important functionality of our application.

Strategy Results

Code coverage

Code coverage results: table report with excellent code coverage. (figure 6 of 9)

We can see that we have excellent code coverage, even if it is only an indicator.

Execution time

The execution speed naturally varies from one machine to another, so it is difficult to provide a qualitative study. Nevertheless, I tried to perform a credible experiment to get the results presented in the graph below.

Graphical result of tests execution time: comparison of test execution time for End to End / Golden / Widget / BLoC. (figure 7 of 9)

These tests were performed on a MacBook Pro M1 using a script. We execute 10 times in a row all the tests at random and we make an average of the obtained results. We don’t take into account the time needed to set up an environment to run the tests from start to finish (estimated at 30 seconds)

Remarks:

  • the execution of the end-to-end test takes more time than a golden test (on average).
  • Widget tests save us a lot of time compared to golden tests.
  • The execution time of BLoCs tests is low compared to the others.

We arrive at a total execution time of 8.5 seconds. However, we could go much lower by running tests in parallel.

Let’s assume that we only have end-to-end tests to cover all our cases. We have four nominal cases according to the passenger typologies and some error cases to cover. We can estimate the execution time to be more than 15 seconds, and that without having any golden test to check the graphical part!

In this minimalist example, we estimate a gain in seconds. For a large project, we talk very quickly in minutes. And depending on the number of executions during the day, we can save several hours per day…

Conclusion

The separation of logic and presentation allowed us to isolate categories of tests and to be able to execute them in parallel in the development cycle called continuous integration.

But if we stopped there, there would be significant gaps in our testing strategy. We need to automate use cases that cross these two layers and reproduce user behavior as closely as possible.

Test Pyramid adapted in Flutter: base with BLoC tests, intermediate with Widget and Golden tests, and to finish the End to End tests. (figure 8 of 9)

To summarize, we can see that we have succeeded in reproducing and automating a test strategy that approaches the test pyramid.

We start from the base with rapid tests and consequently the possibility of obtaining the exhaustiveness of the cases. This is a good thing because it is at this point that we have isolated the logical part, frequently equivalent to the business part of the application.

In the intermediate part, we test the graphical response to the different states available thanks to the golden tests. And also thanks to the widget tests, we can test all the intermediate cases that can be avoided with a golden test, which is costly in terms of execution time, without forgetting the navigation tests.

Finally, we carefully choose which end-to-end user tests we want to include in our continuous integration. To help with the selection, we can use Pareto’s law, taking the 20% of cases that produce 80% of the effects.

Here is a decision tree to help you build your Flutter tests with the BLoC pattern.

Decisional diagram: decisional tree to apply tests in Flutter with BLoC pattern. (figure 9 of 9)

References

--

--