Behavior-Driven Penetration Testing

Published in

neXenio

7 min readJan 21, 2019

“Hey, what happens if I put someone else’s user ID here?” — a similar thought has likely come to many software developers during their everyday routine. In this situation a developer has two options: either they continue the task they are currently working on and probably forget about the question they had in mind, or they go out of their way to check if their intuition is correct. Obviously, we want the developer to check. If their suspicion is correct the server might just grant unauthorized access to a resource, which means we are looking at a severe security flaw.

The scenario above exemplifies why it is important to make it as easy as possible to test if a suspected security issue is present. At neXenio, we adapted the Behavior-Driven Development (BDD) approach in order to conceive penetration tests and execute them on our software. The setup we created allows us to build penetration tests very quickly and integrates well with the rest of our testing infrastructure. Here’s what it looks like in action:

In this article we want to present the infrastructure we built. We will detail the requirements we had, our design decisions arising from them, and discuss the system we implemented.

Requirements to a Penetration Testing Infrastructure

In order to explain how we arrived at the system we implemented we must briefly consider our environment at neXenio: The software we want to penetration test is a classic client-server application based on a REST architecture. It has, however, high demands regarding security. We employ end-to-end encryption and a zero-knowledge principle in order to ensure the client does not have to trust the server. But we don’t want our server to blindly trust the client either.

Organizationally, we adhere to an agile process and methodology. Our project is not driven by developers alone, but involves many other roles, such as product owners, designers, and QA. These personas should also meet a low barrier when they come up with an idea for a potential attack vector.

We illustrated in the beginning why it is important that a software developer can quickly check if a suspected security issue is present in the system. While this could be facilitated with a simple collection of Postman scripts, the requirements above motivated us to take it a few steps further than that. On a technical level, penetration testing infrastructure we build should act like a malicious client in order to verify that it cannot do any harm. It should be employed to test the entire system and the interplay of the components it is made of. Moreover, it must be integrated in our CI infrastructure: We should be able to quickly create a test case that serves as an automated and reproducible indicator for security issues. The test case should run quickly and often, and should instantly show the result in “all tests passed”-green or “uhh, there’s an issue”-red.

BDD with Python Behave

With the requirements specified in the previous section we have already established that we want to write tests in the sense of unit and integration tests. They will give us reproducible results that integrate well with our CI, which can automatically trigger them and display the results. Consequently, it only makes sense to base our penetration testing infrastructure on an existing testing framework.

Considering the non-technical requirement that not only developers should be enabled to specify test cases, Behavior-Driven Development is the perfect fit. BDD relies on test cases specified in natural language, based on the Gherkin syntax. This allows describing test cases without requiring knowledge in either the programming language we use or the tooling we create.

We found the Python framework Behave to suit our needs best. The Python programming language is a great language for scripting and many developers at neXenio are familiar with it. Furthermore, Python offers the excellent requests library. This is very handy: given the task to simulate a malicious client, one of the framework’s main responsibilities will be to forge and submit REST requests.

Behavior-Driven Penetration Testing

In this section we want to show you how we use Behave to address our individual requirements to a penetration testing infrastructure. Starting at the top, the business description of a (simplified, imaginary) flaw looks something like this in Gherkin syntax:

This description does not differ from any BDD scenario specification as it would used in integration testing. Of course, instead of only "Alice" cannot access the profile of "Bob” we would also want to test "Alice" can access the profile of "Alice" . However, compared to implementing this check directly as a part of our back-end's test infrastructure, developers need much less background knowledge about the internals of the back-end. The back-end can largely be treated as a black box and the tester can concentrate on playing the evil client. In our specific case at neXenio this helped a lot to encourage and facilitate our front-end developers to specify scenarios and implement the steps.

Speaking of which, as usual in BDD frameworks, Behave expects steps descriptions underpinning the scenario description above. For example, the final check in the scenario shown above could be implemented similar to this:

The implementation of accessing a user profile can be nicely encapsulated in the user class.

Maybe you noticed the users dictionary that appears somewhat magically in the snippet above. Admittedly, the code is simplified a little bit for readability, but it is a good example how the BDD approach helps us to implement test cases more quickly. Behave allows us to specify fixtures to run before and after scenarios. We use them to automate test setup and tear-down tasks. Before each scenario, for example, user objects are created and placed in the users dictionary. Similar to the step spelled out above, the [...] is a registered user -steps will reach into this dictionary for the appropriate user object and make sure the user is registered. After each scenario, users and other artifacts of our tests are removed again so the system is left in a clean state. As a result, the code for test setup and tear-down can easily, almost implicitly, be reused.

The last piece left to explain from the scenario description are the @critical , @fixed and @NEXAMPLE-1234 tags. Above all, these tags allow us to control test execution, running specific tests we are interested in and excluding others. For example we could only check for security issues classified as "critical", or run regression tests on issues that are already "fixed". In addition, we use this mechanism to link test cases to bug reports in our issue tracking system, in case the test turned out to reveal an actual issue.

Discussion Of Our Experiences

To get our new system up to speed, we created a dedicated team to implement the infrastructure in a hands-on matter. We figured the best way to implement the system would be to go full BDD: The “Pentesting Team” would come up with a test scenario, implement the steps required as well as the infrastructure to support it, come up with another test, and iterate. This approach worked very well and even during the initial phase of building the system we were able to detect flaws, as developers were now given the opportunity to try attack vectors they already had in mind for a while.

As for the question who specifies the scenarios, we have not yet encouraged our product owners to participate in creating scenarios quite as much as we hoped. Most of the scenarios were specified by the same team who then implemented the underlying steps. However, decoupling the penetration testing infrastructure from the back-end’s integration test infrastructure allows a very diverse mix of developers to participate. The “Pentesting Team” is very popular and in a cycle of roughly two weeks new pentesters join (or leave). Developers across all components (back-end, front-end, desktop client) already got their hands dirty on the tool.

Thanks to this diverse mix we also discovered that the tooling can easily be adjusted for other use cases. The scriptable client we created is not only employed to try to be malicious, but also to simulate regular user activity on our staging environments for load tests.

A disadvantage of our approach is that it creates some code duplication between the “real” client and the penetration testing tool. Although the malicious client tries to do many things the real client obviously should never do and the real client has a lot of state and logic that are not needed for penetration testing, some functionality (such as the login flow) are duplicated. We hope to address this in the future at least to some extend with automatic code generation. We already generate part of the real client’s network code based on a swagger description of the API. In the future we might be able to extend the amount of generated code in order to apply this for the penetration testing client as well.

Conclusion

We set out to minimize the time from anyone in the project thinking “What happens if…” to a flashing red light indicating a security issue. We discovered that it is important to create tooling that is accessible to wide range of personas in the project, and that we can integrate the penetration testing infrastructure with our existing CI infrastructure. Making use of Behavior-Driven Development we were able to adapt existing tools to suit our specific needs and found a development process that works very well for us. This way, we were able to address concrete issues in our software and encouraged developers with various technological backgrounds to participate in this endeavor to make the software we produce safer.

Are you interested in creating safer software with us? We are always looking for motivated engineers. Get in touch at nexenio.com.

Behavior-Driven Penetration Testing

Requirements to a Penetration Testing Infrastructure

BDD with Python Behave

Behavior-Driven Penetration Testing

Discussion Of Our Experiences

Conclusion

Written by hrantzsch