Dixie, turning chaos to your advantage

What can you do, if
… your app is crashing due to unreliable network?
… you want to make sure your app can withstand edge cases?
… you want to test your app components against a variety of inputs?


Four members of Team Distinction (Phillip Wheatley, Tamas Flamich, Zsolt Varnai and Peter Adam Wiesner) came together and created Dixie, in order to help developers find an effective answer to these questions. The story begins here.


The problem

Today’s development teams have to create more and more complex software in less and less time to fulfill the needs of clients and investors. Here at Distinction, we also face this challenge while creating mobile apps across multiple platforms. We believe that being able to react immediately to constantly changing requirements without sacrificing the perfect user experience is a key element in the product development cycle.

As our teams scaled out, it became increasingly difficult for one developer to understand the entire codebase and visualize the impact a new modification could cause. This increased the risk of unexpected side effects and mysterious crashes. Ideally we would like to build an app that can also handle any situation the developers didn’t expect during the creation process.

One possible solution to this problem could be a high level of unit test coverage (approaching 100%) on a code base. While this is a possible solution, it has the side effect that reaction time is drastically reduced, and in many cases, reduced reaction time can be extremely damaging to the development cycle (not to mention 100% code coverage can, in some cases, be almost impossible to achieve). Alternatively, identifying and focusing on testing the most critical parts of the code thoroughly, can add great value to your development efforts.

Our idea for the Distinction Research Lab project was to create a tool, dubbed Dixie, which helps developers verify that the critical parts of an application can gracefully handle a wide variety of common malfunctions and unexpected behaviors.

The solution

The original inspiration for Dixie came from Peter Adam Wiesner, who read an article about a tool called Chaos Monkey, written by Netflix backend developers.

Chaos Monkey is a service which identifies groups of systems and randomly terminates one of the systems in a group.

Basically, Chaos Monkey allows developers to attack their own system to an extent which helps highlight potential weaknesses. Knowing of and reacting to these weaknesses helps increase long-term quality and builds confidence in the stability of the system. We thought a tool like this could be just as useful for our existing projects. Rather than a system of servers, our tool targets code components and modifies their behavior.

An application written in an object-oriented language can be visualized as similar to a network of servers communicating with each other. Just as a server can go down, a component of code can also start behaving incorrectly, which can then affect all the other components which are reliant upon it. This component could have all kinds of responsibilities, such as handling network communication, providing location information, managing user data or loading files from the file system.

A generally acceptable result in this scenario would be for the system to degrade gracefully, and recover from the error while minimizing the amount of harm to user experience. Ideally, the system will not continue to propagate errors, and should definitely not crash completely.

Dixie

Dixie can be thought of as a chaos generator, which can cause specified components to function differently and, in a manner similar to Chaos Monkey, help simulate worst-case scenarios. A developer using this tool can deliberately attempt to break the app and cause it to fail. If they are successful, and the app does not handle the breakage gracefully, then it is a clear sign to the developer that the code requires modifications to increase its fault tolerance. Dixie achieves this in a very simple way. The developer specifies the components, the methods that should be changed and chooses how he wishes them to be changed. We chose to call this set of developer specifications a ‘profile’. This profile can then be applied in a single line of code, which will cause Dixie to rewire the internal structure of the application.

The one line above causes the method URLWithString on the NSURL component to always return a value of nil. As we discovered during testing of Dixie, this turned out to be a very useful scenario, so we created a predefined profile for it.

Changes can be reverted at any time, which gives developers a huge amount of control over how and where, they choose to apply Dixie. As you can see, for the proof of concept prototype implementation, we chose the Objective-C language, because it allowed us to implement many low level changes during program execution.

Live test

In a live test scenario, the application could be running on a simulator or on a real device. The app can be interacted with manually or with the help of UI automation tools. The testing session should prove whether or not the application provides the expected level of functionality and fault tolerance when the modified behavior is applied.

Unit test

Specific components can also be tested in a unit test scenario to see how they behave when their dependencies start behaving incorrectly. Dixie is able to run the tested method multiple times providing a sequence of randomized input parameters. This can be extremely useful when we have no control over the inputs, or when we expect the method to be able to provide a consistent result, even when receiving unexpected inputs. Methods can also be called with predefined parameters and the results can be validated thoroughly by the developer, who would create a fix for the broken behavior and write a unit test to ensure there are no regressions in the future.

Creating chaos

In practice this altered behavior is achieved by replacing original method implementations with modified ones. This modified behavior is implemented by ‘chaos provider’ components that create the desired result and return it to the caller.

The most basic example is the nil chaos provider, which returns nil every time it is called. Expected and clearly invalid values are quickly spotted during development, but nil values are often overlooked. Using this chaos provider we can easily identify whether or not a component handles nil values effectively, and this helps minimize the number of potential bugs appearing in production code.

We created several other built-in providers which can, for example, throw exceptions or return specific results based on the method input parameters. You can easily create your own custom chaos provider implementations, using these predefined ones as templates. For example, you could create a sequential chaos provider, which simulates an unreliable network connection by providing the expected (and valid) result only after a number of failed attempts.

Examples

The possibilities are limitless; Dixie allows you to create your own tools, from the simplest custom profiles to complex patterns of behaviors. Some easily implementable examples:

  • Altering the GPS coordinates, dates or times returned by the operating system.
  • Replacing localization strings to test label truncations without polluting production code.
  • Simulating missing properties in your data objects.

Future

What can Dixie add to the life of a software engineer?

We believe using this tool can give you a greater sense of security regarding your code, so you can begin growing trust in the stability and security of your applications. You can easily create test cases, which, in many components, would otherwise require modifying production code. You can test your code, without the need to re-architecture your entire application to be easily testable.

During this Research Lab project we created the fundamental architecture of Dixie and defined a clean top-level API that makes it possible to implement all of the above-mentioned examples. In the future, it could be useful to examine how the current solution could be implemented on other platforms. Providing more advanced API’s would be another way in which to broaden the concept. This could allow us to easily apply systematically reproducible and randomizable changes to the code.

Creating Dixie was an interesting adventure for the team. In the past, we have all suffered from receiving crash reports which are either not reproducible or present difficult to test edge cases. We all hope that Dixie can help in solving these issues in a much more productive and effective way.

So, developers, as soon as we go public and you want a solution to quickly secure your apps against edge cases or unexpected scenarios, why not give Dixie a chance!

Tamas Flamich, Zsolt Varnai, Phillip Wheatley and Peter Adam Wiesner
Show your support

Clapping shows how much you appreciated Team Distinction’s story.