Mobile App Complexity: Things to Consider While Estimating Test Effort

Feyza Dayan
Trendyol Tech
Published in
8 min readMay 11, 2021
Image on cadabra.studio

“Prediction is not certain but gives direction.”

When testing mobile applications, before discussing the situations that affect our tests, let’s talk about the strategy we determine our scoring with. There are many estimation techniques. Since we are working as Agile, I would like to focus on some estimation techniques.

T-shirt Size Estimation: In this method used as an estimate, issues are grouped according to their size. Then, estimates are made by giving values as XS, S, M, L, XL.

Image by Ashish Dhawan on netsolutions

Affinity Grouping Estimation: In this method, the first issue is read to the team members and placed on the wall. The second issue is read and the team is asked whether it is smaller or larger than the first item; Placement on the wall corresponds to the reaction of the team (bigger on the right, smaller on the left). In this way, all issues are grouped and, after affinity grouping is complete, predictive unit values such as points can be assigned. The first set to the left is labeled as having a point value of 1, the second set to 2 points, the third to 3 points, the fourth to 5, and the rightmost to 8 points..

Image on chrissterling

Planning Poker Estimation: In this method, estimates are made by considering the Fibonacci sequence. According to the size of the issue, predictions are made by taking values such as 1 point and 2 points.
Fibonacci sequence: 1, 2, 3, 5, 8, 13, 21, 34, 55, 89,……

Image on agilestationery

So, the technique of estimation we apply while testing our Mobile Apps in Trendyol is the planning poker method. We make our evaluations on this technique. So how do we decide which number in the Fibonacci sequence our test scores correspond to? What do we consider to make the best estimate?

In addition to our functional tests, there are situations where our tests are affected among the issues to be developed. So what are these situations happening? How did we divide these situations? Let’s talk headlines one by one.

Let me note it. We score the issue with the lowest complex as 1, and the issue with the largest complex as 8, because we can produce an issue with a maximum of 8 points in 1 sprint. Accordingly, I explained below the parameters we pay attention to.

Story with A/B test

Image on waracle

At Trendyol, we attach great importance to user feedback and we use the A/B test technique in many of our features. Let me briefly state the A/B test technique as follows. Let’s plan a development. It is “A/B Testing of the Add to Cart Button Color” 50 percent of our users see the button color as green and 50 percent see it as orange. Since this development includes more than one complex situation, it automatically affects our test score to increase. Because there is a development we need to test, and these extra issues need to be tested A/B. This includes configs, DB connections, cross tests etc. So we say it can 3 or 5 points. Depends on the complexity of the issues.

After our feature comes alive with A/B, they are observed and measured for a while and the version that our users like the most continues to be 100%. The codes of the other version are handled and deleted in other sprints.

Just UI changes (text, color, font)

In some of our stories, the desired improvement is only a UI change. In the UI change, I mean text change, color change, or just font change. Since this does not increase the complexity of the test, we say it can only be 1 point.

We test everything, even if it’s just text change, color change, or font change. In addition to these, we also do User Acceptance Tests.

Marketing event tests

We are trying to measure our marketing event tests with more than one tool. These tools are tools such as firebase, google analytics. When the product owner brings any priority job in the backlog to the planning meeting, he or she can say that the events of this issue also need to be developed. As such, marketing tests should be carried out as well as functional tests of the development.

Event example of an issue with A/B test
Test result example

This situation increases test complexity and we give a higher score than we normally give. For example, if the work we receive does not include marketing tests, if that job requires an effort of 2 points, we give our score accordingly, as that job requires 3 points of effort with the addition of marketing tests.

Replatforming / Redesigning

At Trendyol, we offer a lot of new features to our users. New features can be small, sometimes changing the entire page. The product owner may want to change the design and functions of any page completely and measure it. In this case, all the casings of a page must be tested again, and the changed functions must be tested separately. Sometimes, when these issues are A/B jobs, the test complexity becomes too much. Sometimes we can only make performance-enhancing changes within the scope of the replatforming business.

Image by Kuba Lang on bpol.net

Rather than any design change or functional change, we aim for the user to take action faster while navigating the page. This situation again affects the score we give in the test effort and we evaluate it accordingly. As the replatform issues are large, we usually give a score of 5 or 8.

Technical issues

These improvements are not the backlog on the product owner side, but the technical backlog. Since we are doing great things, the quality of the code should not be disrupted. When we want to add any code, we have some technical work to add code more easily and quickly. These issues are Application Theme, Style, Modularization, Refactor, Performance & Scalability, CI / CD processes, and so on.

In the test estimation of these developments, we proceed according to the feedback we receive from the developer. The changes made, the affected pages are written in the job as a comment and the score of the job we tested varies accordingly. Depends on complexity, we usually estimate 3 or 5 points.

Adding deep link

In the works that are required to be improved, the necessary page orientation tests can also be requested. It is requested to open the desired page with Deeplink. We can give 2 points because the issues that are required to develop a deep link do not increase the test complexity much.

In the above image, a deeplink that leads to the basket page is tested. The URL is written and when the redirection is made, the basket page is expected to open.

We write the required URL and perform the orientation test with the notes application in the iOS application and from the browsers in Android or with any related app.

Working with a 3rd party

In our application, there may be improvements that we need third-party tools. Some of these are log in with Google, log in with Facebook, push notification tools, etc. Test complexity increases a little more if the development in these areas is desired in our application. Because we may need to get in touch with other people outside of our team, and that’s an extra step.

Even if we can communicate comfortably, there is a possibility that other problems will arise that we do not know. Tools not working, third-party test environments intermittent, etc…

Image by Lauren Johnson on adweek

We increase the test score to the next score in the Fibonacci sequence, as the test complexity increases a little more in our work that is under the initiative of not only third parties, but also other teams within the company. For example, we give 3 points to this improvement, which would be 2, and 5 points to the improvement which would be 3.

Unit test

We take care to write the unit test of each issue. Unit tests of the developed features are written. Since unit tests do not affect the functional tests of the application, we do not give any test points.

The unit tests written are merged from their branches to develop branches.

Example of unit test

Regression test

The regression test is our most comprehensive test.

In the regression test, we do not give any test points, we just open a task called Regression Task.

For more information about Regression Testing, you can refer to my previous article below.

Additional info; we also associate the bugs related to this with this task. When our regression is finished, we send updates to 1% of our users.

For more information about Release Process, you can refer to my previous article below.

Whether test scoring is done or not, it would be much more efficient to group things and then score. When testing your mobile applications, considering the above parameters will help you give the test estimate in the best way.

After starting the test, there may be changing situations and conditions, you should always be able to update your estimates.

--

--

Feyza Dayan
Trendyol Tech

Sr. Developer in Test at Trendyol International @Berlin, MBA, BSc. Computer Engineering https://www.linkedin.com/in/feyzadayan/