Save the Drama — For Project LLAMA
At Life360, we are working to bring peace of mind to families. One way we do that is using sensors and location. We let people be more connected and in sync by keeping them in touch with each other virtually for instance by knowing if you are kids are safe and if they are driving safely. In the test engineering team as we build tools to help constantly improve the quality, we need a way to fake these scenarios so that we can verify that the product works. For instance, how well do we report a location while the user is driving at 65 mph. Last year to make this super efficient and fast, the engineering team started using MQTT along with HTTP. To test MQTT, the test engineering team responded with our own tool - LLAMA.
LLAMA was the internal code name for Life360 Location and MQTT Automation (Test Engineering specializes in coming up with cool project code names as well). This was a project undertaken in late 2017 by the Test Engineering team.
MQTT is a messaging protocol that is based on publish — subscribe (pub-sub) mechanism. The MQTT Broker server will publish the location and the client subscribes to the location that is being published from the Broker Server.
The integration of MQTT based location updates was introduced around early 2017. From the beginning we were concerned as to how we could test it. Also, as a member of the test team, we proactively look for the missing gaps in the test coverage and see how we can help the developers by build a framework around the missing coverage and help them test it. So, around late 2017, we came up with the concrete plan to write the framework and the tests for the MQTT location updates.
The plan involved by starting to write the tests for the MQTT Broker Server (Publisher). The main reason to start from the platform side was to ensure that the location updates are always published. Once, we were sure that there are no issues from the Platform side (no regressions), we can confidently develop and test on the Client side.
We set off with an idea that we need to add the tests. Now, the problem at this time was that we were not clear about the implementation of MQTT in our app as MQTT protocol is usually implemented in Sensor based applications or IoT apps. We set off reading about MQTT protocol with the information that was provided by the development team. The next thing that we did was understand how it was integrated inside our mobile application. We read through the Android Code and with the help of developers, we had somewhat of a fair idea about MQTT is.
Once we were clear about the MQTT Protocol and its implementation, we decided to had the tests for the Publisher i.e. MQTT Broker Server. We had the approach in the mind that adding an API would be easy to start the validation. All of our tests are developed using JAVA for the Server APIs as well as the UI Tests. So, we had an intention of keeping the tests same as the other tests and not do something different for MQTT tests as well. We were toying with the ideas of writing the code in Python/Shell scripts but always had an idea of uniformity. So, we came across this library Paho Client— This library was being used on the Android Client and it was meant to be used for Java. So, we decided to adapt to this library and boom!
We had everything that we needed to write the Tests for the MQTT Broker Server.
After the tests for the Broker server were completed, we moved on to the Client tests. We have a way in the app to identify whether we are connected to the MQTT Broker Server and are receiving the updates based on the color code of a certain element. The challenge in this case was to do the color verification of that element. We toyed with the ideas of the screenshot comparisons — we concluded that idea will not work in our case as we had too many dynamic components to control and at the end of the day, there was still a chance that a false positive could sneak past the checks. So, we were reviewing the UI Elements that we could use in some way to consume it for testing. We decided to append the color code (hex-code) to the IDs of the elements. We went ahead and implemented this solution for the iOS and proved that it would work reliably and consistently.
Now came the turn of Android Client. It turns out that since this something that was not supposed to go out to the user, we could implement a hacky way and by the means of the experiment, we could control the visibility; it turns out Android client had a similar way of doing it. We had to work with the development team in this case and have them add it to an attribute of some other element on the same screen.
Positive Outcome — We ended up finding a client bug which had not been discovered in any of the previous Regression runs with the bug being that the MQTT updates were not received instantly, the user had to do some actions and refresh the screen to get the updates.
To identify the issues, we run these tests (Platform and UI Tests) every 3 hours and ensure that we have everything behaving as expected. Now with everything being setup, we needed the results to be reported somewhere so that the Infrastructure Engineering team have another arsenal to their weaponry to ensure alerts when there is a an unexpected outage. As a company, we are very heavy users of Slack and we a lot of channels dedicated for the corresponding topic. So, we report the results in the slack channels the team would monitor to identify any potential issues.
Infrastructure Issues — We run a lot of services in our server backend, which means we run a lot of monitoring of those services. Some services are easier to monitor than others. Prior to the Test Engineering team doing the LLAMA work, we had limited visibility into the MQTT services. Our monitoring provides us a bunch of external metrics about the service (how busy the CPU is, how much network traffic is flowing, whether the service is consuming its incoming messages) but could not directly tell us whether the MQTT broker was actually doing its job.
The LLAMA test suite provides a view into the behavior of our MQTT broker that simple scraping does not. The tests attempt to exercise all of the actions that the client would perform: authenticating to the service, requesting current status, and pulling data from subscribed topics. These tests give us visibility into the broker’s behavior that an external “black box” monitor cannot reach.
Now that we have the test suite running and publishing failures to our status channel in Slack, we have much better visibility into the situation when the broker goes awry. We are hoping the tests will even locate degradation before our external monitoring would pick it up, allowing us to fix problems that much faster.