How we set a decentralized QA (Quality Assurance) culture at ManoMano

Published in

ManoMano Tech team

16 min readMay 15, 2020

Clockwork — Photo by Laura Ockel on Unsplash

QA, why always the fifth wheel of the wagon?

Everyone agrees to say quality is critical. Yet, so many companies don’t have strong QA processes not to mention QA roles. ManoMano is no exception and had to wait no less than 5 years (!) before staffing its first head of QA (Yann Person) who initiated a QA spirit before I took the role a year ago in mid 2019. So why is it so hard to do QA for real?

First, the ROI of QA is hard to measure. As long as you don’t experience severe bugs, you think everything is fine. You don’t realize that the cost of code entropy grows way faster without a QA department. You realize it when you want to “revamp” parts of your legacy system… Neither do you identify some real bugs that your users, on the contrary, do experience, hurting your conversion rate… You don’t realize the pain it creates among your tech employees: engineers struggling to do testing because they do not have the right tools or right data fixtures, PMs that lose time repeating critical tests manually by lack of a better solution…

Second, QA is hard to staff. Of course tech roles in general are hard to staff, but QA is even harder. The job seems to be less attractive to engineers who prefer coding and to business who prefers Product Management roles. So you end up with developers who were not good enough to code or frustrated product guys… Fortunately it does change, because QA becomes more of an engineering discipline and less of the “monkey” job it could have been at some point (here are the specs, we are launching tomorrow, could you do the tests?).

But most importantly, most people think of QA engineers as testers. In reality, it is not that simple. There’s a general confusion between Quality Assurance (QA) and Quality Control (QC).

Quality Assurance vs. Quality Control

Quality Control happens at the end of a development iteration. It aims at catching defects. It’s important to note that this happens after the fact, once the defects have made their way into the code (but hopefully before it reaches real customers). Testing is its main component and consists in running a list of checks tailored for the specific product, verifying that all features behave as expected and that nothing obvious is broken. QA engineers are experts in designing the test cases. As it’s mathematically impossible to predict and cover all possible defects, the value of QA engineers here resides in identifying those use cases that are the most critical.

The value of QA engineers is certainly not in manually executing the same tests again and again, at each and every release (regression testing). This is clearly a chore; it can and should be delegated to a machine (by automating tests), unless your company really is into wasting time and money¹.

Other types of tests, those that are not done repetitively, do not belong to QA. They are essential and belong to the whole team: development, product, UI/UX, QA, security… Different roles aren’t all looking at the same aspects of the product, so their tests are complementary.

Quality Assurance in the other hand, focuses on preventing defects by putting the right processes in place. That’s actually the hardest part of the QA job and what makes it so distinct. You could probably cover the QC effort by sharing it between product, designers and developers. But Quality Assurance is an entire other world and requires trained specialists.

Key challenges faced by QA at ManoMano

Clarifying the role of QA is one of our ongoing duties. Not having dedicated manual testers can be a source of frustration for teams that were used to this process. And since Quality Assurance doesn’t show its effects over short periods of time, it can seem counterproductive not to invest all efforts on pure testing.

Staffing good QA Engineers, as told above, is hard. We find it especially difficult in France where for decades, many French companies have been using methodologies directly transposed from the industrial era to build software². Candidates coming from that background are rarely prepared for a fast-paced e-commerce company like ours. They expect to be handed a checklist of tests that they’ll have to execute, while we actually need them to be creative and think out-of-the-box. Things change very fast in our market, it is absolutely vital to create value in small increments while focusing on what matters the most; this requires people that are agile.

The context of hyper-growth makes it hard to follow up with old-school QA approaches. Think whole new teams created frequently that come with more developers, loads of new features, codebases growing in size exponentially. We had to adopt a new strategy that could scale and put the focus on what’s important instead of trying to be exhaustive.

Another element that came into the picture over the few last months is when it became apparent that most of our traffic comes from mobile devices. All of our testing so far was done with a tool that only runs in Chrome on desktop computers; it became urgent to find solutions that cover mobile browsers but also our soon to come native applications.

In 2019, ManoMano migrated its infrastructure to the cloud to make its platform more scalable and improve time-to-market, among other benefits. Coupled with the generalization of the microservices architecture and a strong focus on APIzation, for QA this expanded a lot the surface to cover and required new tools.

With all these challenges in mind, we could start profiling an overall strategy.

QA Strategy for ManoMano

Our team mission is:

Enable and promote continual improvement across the tech organization

My role as a Head of QA was to define the strategy to fulfill this mission. I will explain the strategy more in detail in the following paragraphs but here are the 5 key steps I had in mind:

Step 1: defining key principles (less is more, quality at all stage, trust…) to guide us along the journey

Step 2: define the right QA team organization (tooling vs. functional teams) for scalability

Step 3: recruit great QA profiles (sourcing, interviewing…) for effectiveness

Step 4: lay the technical foundations (tooling for CI/CD, test repo, automation…) to have the right tools

Step 5: collect and manage feedbacks to measure effectiveness

Step 1: Define key principles

These are the principles that we stand for:

Less is More: we believe in the KISS principle. Making things complex doesn’t bring more value and it becomes harder to maintain. We don’t care much about raw code coverage, rather we want to focus on key paths of the user journey and make sure these components are tested. We use the MoSCoW method to classify test scenarios by level of importance, that we evaluate based on many factors such as the probability of occurrence and business risk. Also, we organize work so that scenarios of higher importance are written and automated first. Less important scenarios are handled later if time permits, or simply never.
Quality at All Stages: as seen earlier, testing is just one facet of quality assurance and is traditionally done at the very end of the development iteration. Quality at all stages permits detecting anomalies earlier in the process but also helps with preventing them.
Trust: test results must give a clear, unambiguous piece of information: can this feature be accepted? Can that release be shipped to production? 100% of successful tests means yes. Anything below that is a clear no. The code changes are rejected until someone fixes the failures and it gets back to 100%. If you start ignoring failures today, tomorrow you’ll ignore more of them³. And more. And developers will stop trusting the tests. Eventually they’ll bypass them. And then they’re pointless.
Automated over Manual: manual testing is and will remain the only way to get a sense of the actual user experience. It’s a powerful way of designing unanticipated test scenarios as part of exploratory testing sessions. But here the DRY principle applies: once scenarios are written down and they will clearly be executed more than a handful of times, we spend a bit of extra time automating them, then move to something else. Automating tests is not an easy task, especially for people without any programming experience. But it quickly pays off. Manual scripted testing is error prone, time consuming, not scalable. It’s a bottleneck for the whole team.

Step 2: Define a QA organization

We created two distinct areas within QA:

The QA Tooling team is made up of software development engineers. Their mission is to pick/make and integrate the right QA tools to be used by all the other teams. This team is small and its size doesn’t vary too much. This team is regarded as an enabler in our organization.
The Functional QA team is composed of QA engineers that are delegated to different Groups of product teams (a Group typically includes 2 to 3 teams). While they belong to the QA team, they work on a daily basis with product managers and developers. They are responsible for helping the FTs integrate agile QA methodologies, processes and tools. They advocate for good quality at all stages. They bring an expertise in both manual and automated testing and lead the testing effort by example. They participate in team discussions to ensure topics like testability and risk assessment are not left aside when designing new features. Testing is not their most valuable activity; communicating is.

This separation makes the team very scalable as we don’t need to reinvent the wheel just because new people join.

*Our QA team can grow at will, while keeping its core small*

Having all tools centralized makes it easier to improve by small iterations and close the feedback loop.

In the long run, the functional QA team tends to grow along with the rest of the tech organization. But we can absorb delays in recruitment by modulating the degree of involvement of each QA engineer in product teams, depending on various factors such as their level of maturity and the business risk of the features they are designing.

Step 3: Recruit QA profiles

So far we have been looking for two types of profiles:

Software Development engineers: this requires a software engineering background. These profiles usually are developers who became interested in QA at some point of their career. Their role is to build and maintain the technical foundations of the QA strategy. They select the right tools for our needs among those available on the market and the open source community. If no solution exists, they develop new tools. They listen to internal users’ feedback and improve accordingly. We constantly challenge our tooling and never hesitate to switch to totally different approaches.
Quality Assurance engineers: they are here to spread a QA spirit over the whole organization. Although they are part of the QA team, they spend most of their time within a product Group, meaning they are involved with up to 3 product teams. Among various other activities, they help PMs refine their user stories and write acceptance criteria following BDD techniques. They participate in automating test scenarios with developers. They help teams ensure the testability of features even before they start coding. They collect and enhance defect reports, reinjecting them in the teams’ process.

We know we have to work on our employer branding among the QA community. We hired contractors to help us cope with the short term. It took us around 1 year to staff the team, starting with the most critical product areas. We currently have 11 engineers in the functional team, and 2 in the tooling team. The team is distributed over 4 offices across 3 countries.

Recently, we created a third role: QA lead. This role involves managing a team of QA engineers, being the QA spokesperson at the Tribe level and having an influence beyond the sole scope of QA. At this moment, the team has one QA lead and we’re reflecting on having more.

Our recruitment interview process consists in 4 rounds:

Interview with a recruiter (phone call)
Management interview (phone or video conference)
Interview with QA engineers (in-person or video conference)
Interview with a lead developer and/or a product manager (in-person or video conference)

During the process, we try to determine if the candidates will fit our company values, while being very transparent on what they will find by working here but also what they won’t. To be considered for the job, the candidate has to demonstrate her QA knowledge and skills of course, but also that she has an agile mindset and will have a very functional approach to our business. It’s infrequent to find such gems, but they do exist. And I feel privileged that many of them decided to join our team.

Step 4: Lay the technical foundations

Code repository & CI/CD: Gitlab is the tool we use at ManoMano for managing, building, testing and deploying software projects. In the past, the regression test suite was run in Jenkins, as a separate process owned by the QA team. Now that we migrated it to Gitlab CI, regression tests are first-class citizens of the actual delivery flow. This is more consistent with our approach. Pipelines typically include many jobs such as unit tests, security checks, end-to-end tests in addition to actually building the code and deploying builds.

Test management: initially our test scenarios were stored in a Google Sheets document. While this did the job at the beginning, it could not be a solution for the longer term. After evaluating different solutions on the market, we finally picked Xray. Thanks to its seamless integration with Jira Cloud (it’s actually a plugin), we only had to make evolutions in our existing workflows. This also avoids the need for making everyone switch back and forth between different tools. Another reason we selected this tool is that it supports scenarios written in Gherkin, which was a prerequisite for our BDD approach. On the bad side, the cloud version of Xray doesn’t suggest known steps as you type and offers no proper synchronization with a Git repository.

Web UI test automation: until now, Cypress was the go-to tool for automating end-to-end tests in the browser. Even though it’s a smartly designed and powerful tool, it has one severe drawback for us: it only runs on desktop with Chromium browsers. TestCafé appeared to be the best option in our case: it runs on virtually any popular browser, even on actual mobile devices, it also uses JavaScript which makes the migration easier for people coming from Cypress. We started developing a custom software application that combines TestCafé and Cucumber (a BDD tool). But recently we came across an open source package named CodeceptJS that already does that. We are still evaluating it but it looks very promising.

Native mobile UI test automation: we are currently developing our very first native mobile application. As TestCafé only works with web browsers, we had to find another tool for UI testing. The initial choice was made on Detox, which seems to be the most powerful solution for a React Native app, but we had no time to ramp up QA engineers on that tool before the project would begin. We finally went for a combination of Appium and Cucumber Java, which our new QAs already had experience with. This decision allowed us to have a good functional coverage from the very first iteration. We may still consider moving to another tool in the future (Detox or another), yet for the moment it works just fine. By the way, CodeceptJS can use both Appium and Detox as internal controllers, another we reason we are considering it.

Monitoring: Datadog is the solution used at ManoMano for monitoring our infrastructure and applications. In addition, we’ll be using Firebase to monitor any crashes occurring in our mobile apps. Those are not specifically QA tools, it’s the responsibility of everyone to keep an eye on what is happening in production. SRE and development teams even set up alerts to make sure nothing critical would be left unnoticed.

API functional testing: Postman is our solution of choice, along with its command line counterpart named Newman. We find Postman very good for fast prototyping with its very visual interface. Collections designed in Postman can they be exported and run in CI pipelines as regression tests. We regret that Postman doesn’t integrate well with our Jira/Gitlab workflow.

HTTP debugging: we are soon going to use Charles Proxy for monitoring the communication between a mobile device and various server endpoints. This is specially useful for debugging during the development phases of mobile apps or diagnosing a bug. Determining whether the issue comes from a service or the mobile app itself is usually a major step toward resolution. They also have an iOS version that allows recording a session directly on the phone. Very handy when you have no computer handy.

The tools mapped to our different environments

Step 5: Collect and manage defects

Even with the best tools, processes, there will always be some bugs that make their way through to production. And coding mistakes are not the only thing that can interrupt the business or ruin the user experience: failure of a payment service provider (PSP), outage of a CDN… The challenge here is to make this as infrequent and less impactful as possible, at least for those things we have control over. Most critical defects in production can be quickly detected by the powerful tools (such as Datadog) put in place by our SRE team and used by all product teams to monitor their technical logs and metrics. But sometimes the case is too specific and customers are the ones to inform us. And even worse, it could happen that nobody is aware of an ongoing problem, while business is being impacted in a non-obvious way. Critical data not being tracked is one example.

As the QA team, we wanted to measure the frequency and the impact of these defects. Initially to get a sense of the situation, and later on to monitor the evolution as probe for continual improvement. All our teams already had some form of bug tracking in place, but the fact that they had multiple, inconsistent workflows made it virtually impossible to extract any meaningful data. Also we’ve gotten requests from our customer-facing teams for more visibility on the progress of the bugs they reported.

We needed a unique solution for reporting and tracking defects across the entire organization, with a normalized scale of severity levels. And not only addressing bugs this time, but all kinds of anomalies (for example a photo that doesn’t match the product description). It had to be easy to use, offer transparency over the full process. And it should come with metrics and dashboards, from which each team could monitor their key indicators. As we found no off-the-shelf solution, we decided to design ours, named “RAID” (for Report, Analyze, Immobilize Defects).

An overview of our unified defect management workflow

Defect reporting and tracking

At ManoMano, Slack is the main communication tool. It felt natural that people should be able to report a defect directly from there. From this perspective, we developed a custom Slack application in JavaScript running as a microservice on Amazon ECS. It can display a report form in Slack, collect input from the user, and eventually store it in a new Jira ticket. Additionally, the RAID application listens to updates made on defect tickets (through Jira webhooks) and notifies the watchers via Slack. Setting someone as watcher guarantees she will get updates for the lifetime of the ticket. Our system automatically adds users as watchers whether they reported the defect themselves, were involved in the resolution process, or were associated as stakeholders based on the information contained in the ticket. In any case, individual users can add/remove themselves directly from Jira.

Reporting & dashboard

The whole system was designed with data in mind. We need good visibility so we can identify recurring patterns, efficiency bottlenecks, suboptimal delivery processes… We first developed a minimal version of a dashboard as a Golang script that pulls defect tickets from the Jira Cloud API, augments the data with computed fields (time spent in each status, number of severity up/downgrades…) and uploads everything into a Google Sheets document, where we can make all kinds of post-processing and graphical representations.

Meanwhile, we are developing a more comprehensive version named ManoMeter. It is mainly composed of an always-on server and an instance of the Elastic Stack. Everything runs on AWS, like the rest of our platform. The server is a microservice written in Golang. It receives events (updates made to defect tickets) from Jira via webhooks, computes a bunch of derived fields and complex metrics. Finally it pours all results into an Elasticsearch database. Data can then be visualized in Kibana. Users can then create their own filters and graphical analysis. We also plan to have TV dashboards so teams can keep an eye on their key performance indicators.

Conclusion

We started our QA effort in March 2018. So it’s now been 2 years since we have been working on it. There are so many more things that we want to address in the future. The context evolves along the way, requiring us to rethink our strategy. And we’ll continue to challenge it.

First thing to say, recruiting has clearly been a bottleneck. Finding profiles that know the theory and practice of QA, have an agile mindset, have experience automating tests, do understand the product and can put themselves in the shoes of a customer is hard. Took us almost a year, but at the time of writing, we are fully staffed.

Second thing, creating two distinct roles within the QA team was key for making QA a scalable process. The QA Tooling team should be kept small, but has to be composed of highly technical people. The Functional QA team can scale by hiring more people without having to change the process nor the tools. And the value of its members is their ability to help feature teams continually improve the quality of their deliveries, reducing the risk of breaking things, while allowing a steady release pace.

And finally, we are very lucky at ManoMano to have tech employees that really care about quality, and that considerably eased our task!

So in a nutshell: still a lot of work but QA is now in everybody’s mind.

I’d like to thank all my fellow QA engineers, without them all the things detailed above would remain pure theory. And special thanks to Pierre FOURNIER for helping me put this article together and for being a mentor since I joined ManoMano.

[1]^ Robert C. Martin–The Clean Coder: A Code of Conduct for Professional Programmers — Chapter 7

[2]^ Look for MOA/MOE, this is so weird it doesn’t even translate to English

[3]^ Martin Fowler–Eradicating Non-Determinism in Tests