Trimble MAPS — Our Comprehensive Testing Strategy

by George Lutz

At Trimble MAPS we embrace the challenge of validating a wide variety of products, both software applications and map data, for official release. These products span both client-side and server-side, and they run on most major mobile operating systems like Android, Windows CE, Windows Mobile 6 (yes), Linux, iOS, and on Windows Servers and Desktops. The target hardware ranges from relatively ancient to bleeding edge. Beyond that, our products have a rich set of great features, all of which must both work correctly and run efficiently.

That’s the just software. We also release frequent map data updates, but here’s the point: We need a comprehensive testing strategy in order to constantly and successfully meet our customer needs as well as our own very high expectations.

The strategy is three-dimensional: testing every product across several functional and nonfunctional testing categories, and very importantly, during various time intervals.

Product X Testing Category X Time Interval

PRODUCTS

There are four main product groups: 1) PC*MILER, 2) CoPilot GPS and CoPilot Truck Navigation, 3) PC*MILER, ALK Maps, and other Web Services, and 4) map data for all regions of the world. These products often work together to form a complete solution but they do typically run in very different environments. PC*MILER usually runs on Windows Desktops, CoPilot on the full range of mobile devices, and Web Services on high performance servers. The map data is used across all products, so it must be designed, optimized, and tested to suit them all.

TESTING CATEGORIES

In order to release awesome products consistently, the following testing categories must always be part of a test plan. These testing categories create some natural redundancies in coverage, layering coverage rather than risking gaps coverage. An adequate test plan considers everything described here.

  1. Functional Testing: The product does what it’s supposed to do…
  2. Performance & Efficiency Testing: …and does it reasonable quickly.
  3. Endurance Testing: The product runs well for an extended time.
  4. Security Testing: The product itself is secured and can be trusted keep user data private.
  5. Fuzz Testing: The product is stable even under uncommon conditions.
  6. Real World Data Testing: Focusing closely on the known real world use cases.
  7. Exploratory Testing: Using the product like human beings, including UX testing, sans test plan.

Functional Testing — This is the backbone of the testing process. Any software product has precise functional requirements which must be written down and validated. Each release must pass each functional test case.

Functional testing may include, and should include, both manual testing and automated testing which are both fine paths to validate expected functionality. How and when to rely on automated testing is beyond the scope of this article. Many of the next few categories do lend themselves nicely to automated testing though. Beyond that, Functional Testing also includes unit testing, integration testing, and system testing, as do many of the categories that follow here.

Performance & Efficiency Testing — For most of our products, poor performance or regressions in performance are as problematic or even worse than functional defects. We’ve found that most performance problems on mobile devices can be identified simply by testing on low end (limited RAM, slow/single CPU) hardware and so we focus attention there. Also, rather than testing on the worst devices very late in the test cycle, we front load performance testing on low end hardware. We need to find performance problems sooner rather than later since they can often take some time to resolve. This is true in all product categories. We also need to test new data releases on low end hardware because changes in data sizes can certainly regress performance.

Usually, it follows that if our code is running well on the worst hardware, then our code will run fine on other hardware. This gets to another important point: what not to test. We can’t test everything everywhere — the matrix of products, platforms, features, and data regions is too large (consider the Android ecosystem alone); we need to be efficient about how we use our time. As such, we can focus most performance testing on the worst hardware that we’re required to support and not all hardware.

Regardless of product, performance testing requires maintaining historical performance benchmarks. Without historical data, it’s impossible to know what performance is acceptable. Without well organized benchmarking, we’d be blind to regressions unless they’re bad enough to be obvious.

Endurance Testing — Real world use of our software often requires that the application or service runs for many days or even many weeks without a restart. Our products need to win the sprints and win the marathons. This applies to Web Services especially and while they are being run under massive concurrent load. Functional and single-threaded testing only scratches the surface here.

For client side product testing, it is effective to leave them running for several days. Even during functional testing, there’s not really a need to restart so often. We have explicit test cases to run certain operations for several days in a row. Keeping an eye on memory consumption is key here since memory leaks and memory fragmentation are common reasons why a product may work well for a few hours but not for a few days. Flatlined memory use over several days is an encouraging sign that our products will be stable in the real world use cases mentioned above and that they will win the marathon!

Security Testing — For our applications, we focus on ensuring secure network transmissions and securing data at rest. This typically means using https for all network communications by default. Network analysers like Fiddler or the Chrome Dev Tools are needed here to confirm that Secure by Default is in place.

Security Testing also includes privacy testing. Negative testing is an important part of privacy testing. For example, if logged into account A, one must not be able to access data from account B in any form and vice versa. Access errors must be returned when the attempt is made.

Fuzz Testing — Fuzz Testing is straightforward and proven extremely effective when done well. Fuzz-ing means applying random inputs to user facing APIs or user interfaces in search of problematic edge case, performance issues, and functional defects. Fuzz Testing is used to discover totally unpredictable defects and, at the very least, provides another layer of redundancy beyond every other test category. For example, if we pass the coordinate of “-0, 0” into some API parameter, does it respond rationally? Or does it crash the server? Or does it result in several minutes of unnecessary processing on the server? Over time, a good Fuzz Test is more creative than any human and will expose some crazy scenarios long before the crazy scenarios expose you.

Another great example of Fuzz Testing is rapidly sending random user events (like screen clicks) to a UI. Can the application withstand this random barrage for several hours or days? If so, it’s a really good sign for application stability. We test CoPilot like this daily (Continuously). Over the years, we’ve found many crashes this way, many related to improper use of thread synchronization. These are usually the types of “random” crashes that a tester or user may see once and never duplicate. Fuzz Testing exposes it all eventually.

Real-World-Data Testing — Nothing is more deflating than thoroughly testing some feature with loads of fabricated data only to fail in the real world when a customer does something slightly different than was considered during design and testing. Feeding back real, anonymous data prevents these potential gaps in testing.

For example, for us, real-world-data may refer to real routes run by real users, locations being geocoded, etc. We definitely care more about the specific routes that our customers are running regularly than we do about a set of random routes. We run real-world routes regularly through our automated test suite. Twitter calls this tap compare, so we do too. This is especially important when the underlying map data changes.

Related, on mobile devices, we want to always focus testing against the mostly commonly used hardware, operating system versions, and customer use-scenarios. The more we understand how customers use the product, the better. There are some analytics for this of course, but there’s also plenty of value in talking to our colleagues who work directly with our customers, like the support team.

One last item to mention is the idea of exhaustive testing (which may also apply to the Endurance Testing category). For example, if a customer has 100000 routes that we know they run all of the time, how many should we test against? The correct answer is all of them but there can be a tendency to short the test and only try a few hundred. The cost of running the rest of them is relatively small though. A good example of exhaustive testing is something we call “geteverywhere”. It runs routes to and from each city in a data set. While the set is large, it’s not infinite and thus lends itself to exhaustive testing. Don’t short large, yet finite data sets. Test them exhaustively. Even if the set is in the millions, it is large very large, but still finite.

Exploratory Testing — While formal Functional, Performance, and other test plans are essential, they are insufficient because it is impossible to plan every scenario. There’s also the strong possibility that the quality of the whole product is much less than the sum of the result each individual test case.

Exploratory testing, in my opinion, is a professional phrasing for “using the product like an end user uses the product.” The more one knows about how the customer will actually use the product, the better one will be at Exploratory Testing. Much of Exploratory Testing is naturally overlapping with other categories. That’s the goal. We want to layer our testing, avoiding reliance on any given test category. Exploratory Testing in particular intends to provide redundancy and confirm that the quality of the whole is equal to the quality of the parts.

For example, is there a test case for using CoPilot in high traffic in NYC while driving through a tunnel, when a phone call comes in, while the internet connection is lost? Doubtful; instead there’s probably a single test case for each of those things separately. This type of craziness happens constantly in real world use though. Is there a test case to zoom in and out as quickly as possible 20 times in a row while rotating the map as fast as you can? Is there a test case to plan a trip with 20 stops and 180 turn instructions and rotate the phone while scrolling the giant itinerary list? Maybe, but the point is that there’s thousands of extreme, creative scenarios that a test plan will not realistically be able to capture. These scenarios, even if not totally common, are definitely great ways to expose defects.

We step back and use the product just as user will, while also creatively seeking to break it, and without being bound by a written test plan. A good example of exploratory testing is CoPilot drive testing. CoPilot is a navigation product and people use it while driving. For example, this year I’ve joined a few half-day drive tests from Princeton to New York City. Each time, we took 4–5 people total driving about 100 miles with 6–8 devices loaded up. That’s 600–800 miles total of driving across a variety of devices, in difficult driving conditions, with trained eyes fixed on the product. These drive tests produced a number of real, actionable defect tickets of varying severity, across categories like performance, functionality, and usability (note: usability testing is encapsulated in this category, although UX engineers may conduct their own extended usability testing too). Again, this is all without a structured test plan.

Exploratory Testing really is best when there are few barriers to entry. Anyone who understands the products can and should participate; Developers, Marketing, Sales, HR, Project and Product Managers all ought to join here. Everyone has a unique point of view. As long as getting a build to test is easy and there’s a clear feedback channel, lots of valuable information can be obtained from Exploratory Testing. This is also known as dogfooding, but try selling that.

The best testers are often the best exploratory testers. They surely have a test plan but they’re not bound to it; they’re intelligent end-users, genuine product experts, in addition to being thorough testers. Great testers naturally see and expose common functional pitfalls and have developed intuition for pushing software to breaking points. They can spot general performance issues, basic usability issues (without being UX experts), and design issues, and even act as a human fuzz tester.

Incidentally, Exploratory Testing will happen one way or the other. Our customers are not Exploratory Testers though.

TIME INTERVALS

Our code base is changing daily and our customers are constantly using it in the field 24x7 (like right now). It’s not nearly sufficient to execute a testing cycle prior to a release, then approve the release, then move onto the next thing. Releasing and maintaining a great product is just like working out: there’s no end game where things just cruise along nicely unattended in perpetuity. When we stop testing, we stop having a product.

There are four main time intervals to consider for any product release:

  1. Continuous Testing: Code changes every day. We must test it in parallel.
  2. Release Cycle Testing: Testing against a stable release branch in the final lead up to release.
  3. Beta Testing: A larger testing group, even if internal-only, using a stable release branch.
  4. Post-Release Analysis: Reviewing feedback from actual real world use, especially immediately after release.

Continuous Testing — Whenever code is changing testing must be happening, lest we lose control of our technology. Continuous test coverage ensures that by the time we enter into a Release Cycle, we already have strong confidence that our software is working well. Continuous Testing ultimately allows for faster Release Cycle Testing and more reliable release schedules. Without continuous testing, the typical Release Cycles drag on while the fix+validate cycle drains the life out of everyone involved.

Automated testing often falls into the Continuous Testing time interval. If a test is automated, then it’s easy to run continuously. In fact, it’s essential to run automated tests continuously, as in hourly/daily, so that the test itself remains in a known healthy state. Automation not run at least once daily is suspect and it will absolutely end up being less valuable. We run our most important tests nearly 100 times per week. Most importantly, this minimizes the window of failure for anything. For example, if a test was passing at 1pm and failing by 2:30pm, we will quickly know what caused the problem. The only thing preventing automation from running on each individual commit is finite computing resources.

Continuous Testing definitely need not be automated testing though. In fact, in the past year we’ve become aggressive about executing manual test plans off release cycles. Even if the next release is several months away, we care that the product is functional on the latest code base. The sooner we find problems, the better. We want a clean code base every day of the year, not just the month of a release. This is important because it prevents layering problems atop other problems and compels developers to fix problems before they’ve long moved on to other work. Our experience without Continuous Testing is painful. The result is typically stumbling into significant release-blocking problems many months after they entered the code, which is deeply disruptive to forward progress.

We’ve especially found that Exploratory Testing on a Continuous basis is an effective method of ensuring a constantly stable code base that allows us to know first-hand whether a public release is possible. In reality, every testing category should be run on a Continuous time interval even if continuous is less frequent than every single day or week.

Release Cycle Testing — This refers to the final stage before approving a public release. This is when we check every box on the formal written test plan. During this test cycle, we want to reduce changes in the code base significantly, ideally eliminating all change.

During this interval we must pass every test in the plan, for each category. If we’ve done a good job with Continuous Testing, Release Cycle Testing ought to be smooth sailing with very few surprises.

Again, every testing category should be run during the Release Cycle time interval.

Beta Testing — Beta testing generally refers to allowing a small number of actual customers to preview upcoming releases. While this is not always possible, internal betas are necessary and must be done for all releases. Beta Testing, in this case, can be considered an Exploratory Test just before official release Of course, this includes usability testing. Even during the Beta Testing phase, we want to look closely at real-time statistics.

Post-Release (including Real Time) Analysis — The product has been validated and released! This is a noteworthy milestone but if we stopped here, we’d be unsuccessful. Analysis and testing must continue when Release Cycle and Beta Testing are completed. There remains lots to learn about our products — mainly whether they actually ended up doing what we expected them to do all of the time for every customer.

Proactively analyzing feedback from a real, released product can yield huge benefits to product quality. Ultimately, the point of testing is to make sure the user has a perfect experience. Even if we miss a defect prior to release, having good post-release analysis in place can allow us to fix something before 99% of users encounter it. That’s still well worthwhile. Especially in the days immediately following release, there is much to learn.

The internal customer support team is a great source of analysis here. Communicating with them proactively is critical.

Crash logs are another great source of real time analysis. There must be a regular review of incoming crash logs. While we’ve found some difficult to analyze, many do direct us to real problems users are experiencing.

Server request/response logs are a great source as well. Beyond the baseline server/service monitoring, we’ve recently been aggressive about alerting on application level anomalies. For example, if a service received 100% more traffic than yesterday, that may indicate a problem. It’s definitely worth understanding why it happened. If some request processes for > 10 seconds when it usually takes 100ms, then that’s worth understanding as well. Anomaly alerting is awesome and every web service must implement it.

SUMMARY

We apply this testing strategy to everything we release now. This comprehensive, layered approach gives us the general direction needed to deliver high quality products into the market. Our Success depends on our collective ability to correctly, creatively, and thoroughly define, develop, and execute specific tests to cover each testing category at each time interval.

In upcoming posts, we’ll focus more on some of the actual tools we use here and discuss more about our test automation strategy. Welcome to the Trimble MAPS Engineering Blog!

Interested in joining our team at Trimble MAPS? Click here!

--

--

Trimble MAPS Engineering Blog
Trimble Maps Engineering Blog

At Trimble MAPS, we build great software products and platforms around our routing, mapping, and geocoding engines.