Applying Systemic Threat Modelling to a Complex System

Published in

Nationwide Technology

11 min readAug 8, 2022

1- Introduction

In Nationwide technology, security is at the heart of every service we provide for our members. We apply a systemic approach to study our IT systems, identify vulnerabilities, implement required controls to mitigate risks and develop trustworthy systems.

In my previous article, I described a method for building a view of the complex systems for threat modelling. I will perform a threat modelling exercise on a real-world system in this post. This exercise aims to show how the proposed method can be applied in practice and enable software engineers to adopt this approach more efficiently in their development teams. By the end of this article, you can see how performing this exercise within your team enables you to gain a holistic picture of the system and embed security controls in the application.

In section 2, I’ll define an example and go through the threat modelling activity in the subsequent sections.

It is worth mentioning that this article aims to show the application of a methodology to a real-world system. Therefore, it focuses on the procedure, and the list of identified threats is not comprehensive and only contains some examples.

2- Introducing an Example

The below diagram shows the high-level view of a real-world system. The actual names are obfuscated, but it shouldn’t adversely affect the exercise here. As mentioned in the previous article, we’re performing the threat modelling after the design has been completed, explaining why we’re dealing with a finalised design.

Example Application — High Level Architecture

At first glance, It might not be obvious how to start or progress with the exercise. This article aims to demonstrate a practical method that can be applied to similar enterprise applications.

You might notice that the components can be categorised into four logical sets at the highest level.

A client application
A collection of microservices that implements the business logic and developed by the development team
Several shared enterprise services are being used by multiple parts of the business
An array of external Software-as-a-Service that the company is utilising in their application

3- Grouping

Grouping the components based on the deployment environment would give us a clearer view of the system and assist us in initiating the threat modelling

In the next step, we define the threats in the highest level of abstraction and design suitable constructs to mitigate the identified risks. The exercise should list the required controls at a high level and define the components at the architecture level.

4- Abstraction Levels

Now, we start at the highest level of abstraction and study the components and their interactions. This abstraction layer has four main elements: client application, public cloud environment, on-premise environment and external services. We should focus on each element and its integration with other components to identify the threats and design the mitigation controls.

Let’s focus on the client application, cloud environment, and interaction. Below are some of the threats

Tampering of the messages in the communication channel
Spoofing of the backend APIs
Denial of Service attack on the exposed APIs

A robust authentication mechanism is needed to mitigate the risk of spoofing on the backend API. Other controls such as API throttling are required to minimise the risk of DoS attacks on the APIs. We might decide to use an API proxy to implement the policies. In an enterprise application, architecture decisions will be made in forums to ensure rigorous consideration of all aspects of the system, including performance, cost, scalability, etc. Let’s assume we decided to use an API proxy and selected Apigee for this application. In this case, we need to outline the available features that enable us to implement the controls

Configuration of TLS connection between client & server
Registering client apps and use of JWT for authorisation
Utilising of Apigee SpikeArrest policy for API throttling

At this level, we don’t define the details of the controls. For example, whether we need JWS or JWE, token claims, signing algorithm, and TLS cipher suite are not described here and will be studied when we move to the next level of abstraction (vertical move). It is crucial to continue the exercise and not leave the threat modelling at this level. Only a secure configuration of those details can ensure the effectiveness of the designed controls for the application.

The network segmentation of the public cloud infrastructure is another example of the controls that can be defined at this level and should be refined as we progress in the exercise.

5- Defining System Boundaries

Once we have completed our initial study of the system in the previous abstraction, we’ll move to the next abstraction level.

In this stage, we’ll define a set of user stories. The user story is a single system functionality described in a particular abstraction layer. At the highest abstraction level, the user stories might be similar to the system’s features visible to the system’s end users. As we move vertically to the lower abstraction levels, these user stories will contain finer grain details and describe the system’s internals, so they might not be visible to the system’s users. You can see some examples of those in the below diagram. TM01-TM09 are shown for illustration purposes.

There is more than one way to define system boundaries. The critical point is to keep the models as simple as possible so that they contain enough elements to construct a user story but not too much to make it hard to study and maintain.

This method should ideally be applied to the new features in the early stages of development to enable a secure-by-design delivery. However, it can help to gain a systemic view of the application and provide remediation actions to improve the security posture.

The next step is to move horizontally between the defined user stories and study them individually. In some cases, we might decide to dive deeper to investigate the internals of a particular user story. I’ll show an example of those investigations later in this article.

In my previous article, I showed how to construct a threat model for each of the above user stories, and I only focus on the example here. The exercise is an application of a methodology Matin Mavaddat introduced here.

6- Studying TM04

The microservice-G takes photos of a government-issued document and a user’s selfie and passes them to Ext-1 for verification. The external service validates the document and checks whether the person pictured in the selfie matches the document. Ext-1 also extracts some metadata from the document (e.g. name, address, date of birth)

6–1- Scope

User story: We want to enable the client application to call the microservice-G and perform ID verification.

Components

Client Application
API proxy: A reverse proxy for microservice-G
Microservice-G: The microservice responsible for orchestrating the ID verification
Microservice-F: The microservice responsible for the cryptographic operation
Ext-1: An external service that is performing the ID check

Actors:

Developer
Platform Engineer
Ext-1 admin

6–2- Interaction between system’s components

The graph below demonstrates the defined components’ interaction in the previous section. You might notice that a few elements in the diagram are not listed above. That’s an intentional effort to ensure we focus on the scope of the user story. We’ll focus on those components in other user stories.

6–3- The flow of data in the system

After defining the components, their communication channel, and the users, we capture our system’s data flow. The data flow diagram should show

Components
Communication channels
The message format and content
The protocol that is used in communication
The interfaces of each component

You might notice that some components might have more than one interface. In this example, the Ext-1 exposes both API and a GUI. It is essential to capture them in the exercise as each interface is an attack surface that requires separate controls.

6–4- Identifying vulnerabilities

Once we acquire visibility of the system’s components and data flow between those, we can identify the threats.

Threat: Exposure of the API key in the communication channel between microservice G & F
Component: Microservice-G & microservice-F communication channel
STRIDE: Information Disclosure
Mitigation Control: Revisit the design of the API. The API design shouldn’t include the API key in the URL! A better design would have had the API key in the “Authorisation” header as stated here https://www.rfc-editor.org/rfc/rfc7235

Threat: A compromise in the Ext-1 system will expose sensitive personal data
Component: Ext-1
STRIDE: Information Disclosure
Mitigation Control: Assurance from Ext-1 on secure storage of the sensitive data, Deletion of the sensitive data after ID verification. Ext-1 offers a deletion API to remove customer-sensitive data after verification is completed. This API can be utilised to reduce the risk of compromise.

Threat: Any malicious user who has access to the token can use the API token to delete data before verification is completed
Component: Ext-1
STRIDE: Denial of Service
Mitigation Control: Implementing authorisation — e.g. different tokens for deletion and retrieval

The microservice-F seems to be a critical service as it implements some of the cryptographic operations for our system. It is a sensible approach to dive deeper into the microservice implementation (vertical move) and study it separately. Further investigation might need to look into the detailed view diagram and application code to identify the threats. The additional investigation unfolds extra vulnerabilities that were not visible in the previous layer, such as

Threat: Misconfiguration of the JWT validation on microservice-F
STRIDE: Escalation of Privilege
Mitigation Control: Current JWT validation configuration in the code (below) is insecure and should be amended.

Threat: The microservice-F stored the *.pfx file in the version control. The file includes the private key for signing JWT
STRIDE: Spoofing, Information Disclosure
Mitigation Control: Storing the private key in a Key Management Solution, storing the password for protecting the *.pfx file in a Key Management Solution, protecting the file with a password with high entropy (The current value doesn’t meet this requirement).

7- Studying Details of TM08

In this section, I’ll show an example of zooming further into details of a particular user story to look into the cryptographic design of payload encryption and API authentication. I don’t include all the steps mentioned in the previous model and focus on the specific area of interest.

As part of a credit card application, a user can request a balance transfer. The applicant can provide a credit card number and the corresponding amount for the balance transfer, and an external service fulfils the request.

7–1- Scope

Secure design of the request payload encryption using symmetric cryptography

7–2- Interaction between system’s components

The below diagram illustrates the interactions between the major components of the design. The red arrow is going to be the focus of this study.

7–3- Security Review

In the following two sections, we’ll explore two aspects of the design, identify the areas for improvement and offer proposals.

7–3–1 Payload Encryption

The microservice-K encrypts the payload before sending it to Ext-4. The key is derived from the API key using PBKDF2 with the below details

PRF = HMAC-SHA512
Password = <API_Key>
Salt = <Access_Id>
Count = 10,000
Derived-key length = 256 bits

Below diagram show details of the construct

An initial review of the design shows:

OWASP suggests 120,000 iterations for HMAC-SHA512, and NIST recommends it should be as high as the environment can tolerate it. 120,000 can be decreased if it impacts the performance, but there is a significant gap between the recommendation and the current number.
PBKDF2 has been studied for decades and is considered secure in many use cases (including this case). There is a known GPU attack against that, which may not apply to this case.
AES GCM is an authenticated encryption; therefore, it simultaneously assures confidentiality and authenticity of data. Adding an HMAC signature in the header and encrypting the payload with AES/GCM can lead to an insecure pattern. AES GCM can authenticate some additional data that has not been encrypted, which can fulfil the purpose of HMAC in the current design. The proposal is illustrated in the below diagram.

7–3–2-API Authentication

The Ext-4 API uses the HMAC signature to authenticate the caller. The signature string consists of the HTTP Request method, the Request date, and the resource path of the URL. The API key is used to generate the signature with SHA-512.

The review of the design reveals the API Key is used on two occasions

HMAC signature generation for API authentication
Password for the input of PBKDF2 to derive an encryption key

Due to the low entropy and possibly insufficient randomness, passwords cannot be used directly as cryptographic keys. Therefore, PBKDF2 is used to derive cryptographic keys from the low entropy passwords. There are two issues with the design.

The API Key is used in one place as a cryptographic key (HMAC in the request header for API authentication) and hasn’t been considered random enough in another instance (encrypting the payload with AES/GCM)
API Key has been used for both signature generation and payload encryption. Generally, a single key shall be used for only one purpose, and the current practice is not recommended and should be avoided. (Section 5.2 https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-57pt1r5.pdf) . Unique cryptographic keys should be derived from the API key, which then can be used for HMAC generation and encryption.

The above-proposed design will address both issues and will be more secure.

We need to continue the exercise for the rest of the defined user stories to identify all of the vulnerabilities and design the required controls to manage the risk. It is essential to notice that as we develop the threat models, we constantly move horizontally and vertically between the abstraction layers. By doing so, we better comprehend the system design in multiple layers, identify the weaknesses and propose improvements.

8- Conclusion

In this article, I applied the threat modelling methodology I described in my previous blogs to a real-world example. The exercise demonstrates a systemic approach to securing complex systems in enterprise environments, like the ones we deal with in Nationwide Technology!

I showed (by concrete examples) how moving between abstraction levels enables us to identify the vulnerabilities in different layers and design and implement appropriate controls to manage the risks.

The security designs as the output of the threat modelling exercise facilitate the development of security design patterns, ultimately leading to enterprise technical security standards. I explored the path in another article.