Introduction to Software Architecture — Quality Attributes Requirements (Part 2)

Published in

Geek Culture

15 min readJul 28, 2021

In Introduction to Software Architecture (Part 1), we formally defined software architecture (SA) and discussed its importance in the project lifecycle. We also discussed topics such as architectural drivers, structures and perspectives, and then we talked about some of the barriers facing the application of SA as an engineering discipline.

In this post, we’ll go one step further and focus on quality attributes (QA), one of the most influential and critical architectural drivers that we’ve defined in part 1. As we’ve seen, QA can have even more impact on design decisions than functional requirements do, hence they require special attention from architects.

Even though system functionalities are the concrete observed output, they cannot satisfy all of the user’s needs by themselves. They need support to fulfil their responsibilities. Imagine a banking system that allows you to perform online wire transfer operations but your data is transferred through the network without any encryption. Or a web application that works perfectly when only 100 users are simultaneously using it but crashes when this number goes up.

Imagine a system that has all the needed features, but any tiny change requires a tremendous amount of time and effort. Or an infusion pump system that can deliver its fluid but is unreliable in the case of any unexpected event.

All these systems have the needed functionalities, but they are incomplete, which can make them potentially useless. Security, scalability, maintainability, reliability and safety are the missing parts in the previous examples.

Quality attributes of interest have the same importance as functionalities, or even more. As a user of the previously mentioned banking system, I might tolerate a bug in the functionality, but a security issue would be unforgivable.

Quality attributes are unseen until things go wrong. Most of the time, Systems are refactored, or even reimplemented, not because they lack features, but because they are slow, unsecure, unmaintainable, or lack any other quality attribute of interest.

In subsequent sections, we’ll formally define QA and show why they are critical for SA. We’ll talk about QA ambiguity and quantification and provide a framework to formally specify them in a measurable and testable way. We’ll also take a deeper look at some of the major quality attributes, and provide a general scenario for each of them as well as their strategies and tactics.

Definition

Sometimes called “ilities” after their suffix (e.g. “maintainab-ility”), or non-functional requirements, a quality attribute is a non-functional property of the system that should be measurable and testable and is used to evaluate a quality of interest of the system.

As we’ve seen In part 1 of this series, sometimes quality attributes are explicitly stated. We provided the following examples:

Currently, our website has 10,000 visitors a day. The website should operate exactly with the same levels of performance and availability, if we reach 1,000,000 visitors a day.
As a user, when I trigger operation “A”, it should complete and return its output in less than 0.5 seconds, since operation “B” will search for the output of “A” in exactly 0.5 seconds after “A” has been started.

In the first requirement, we have a target number (1,000,000), while in the second one, we have a clear deadline to be met by operation “A” (less than 0.5s). In most cases, however, QAs are either not explicitly stated or vaguely and ambiguously mentioned. It is the job of the architect to uncover and clearly specify them.

Here are 2 examples from part 1 of some ambiguously mentioned QAs:

The system should “quickly” respond to users’ requests.
The user interface should be “easy” to use.

Ambiguity

As we’ve seen, ambiguity is one of the challenges that architects need to deal with. We distinguish 2 types of ambiguities. The first one is linked to the quantification and measure of the QA of interest. Examples of quantification ambiguities might be “a quick response”, “a secure system”, “a user-friendly user interface”.

The second one is concerned with the scope of the QA. For example, when the system doesn’t respond to a request, is this a security attack, a performance issue, or an availability problem ?

Measurable and Testable QAs

A quality attribute, by definition, should be measurable and testable. Let’s consider the previously mentioned example: “The system should quickly respond to users’ requests.” Is 5 seconds considered “quick” ? What about 3 seconds ? Do we need to push it further down ? Well, we cannot answer this question. There is no clear specification about this.

What about the other ambiguously mentioned specification. ”The user interface should be easy to use.” What does “easy” mean ? Again, we cannot answer this question.

The architect has to clearly define QAs acceptance criteria to be able to verify whether or not the system fulfills its responsibilities. Quality attributes are measured using either numbers, durations, percentages, or simply a true/false response.

Quality Attributes Specification

To deal with ambiguity and quantification problems we’ve just discussed, there exists a nice framework called the six-part scenario that architects use to formally specify quality attributes. As implied by its name, it has 6 parts:

Stimulus: An event or request arriving to the system that needs a response. A normal user request, a security attack, an increase in the website traffic are some examples of stimulus
Source of stimulus: Who/What triggers the event or initiates the request. Knowing the source of stimulus is paramount. Dealing with a hacker is certainly different from responding to a normal user. A user, a system administrator, or an actuator are some examples of sources of stimulus.
Environment: System conditions under which the stimulus has been triggered. Dealing with a bug after a system upgrade is different from dealing with it before the upgrade. Similarly, dealing with an event in an overloaded situation is not the same as in normal operation. “Normal” is used to specify that there are no special conditions to mention before the arrival of the stimulus.
Artifact: The stimulus might concern a particular piece of the system, multiple pieces, or the whole system. The database system, the frontend layer, the registration module, and the whole system are some examples of possible artifacts.
Response: actions to be done as a response to the stimulus. The response to a modifiability request (stimulus) is the implementation, the testing and the deployment of the requested feature. The response to a security attack (stimulus) is that data and services should be protected from unauthorized access.
Response measure: the response should be measurable so that we can test if it fulfils a particular need. For example, a modifiability requirement might be measurable as the number of person-days needed to implement, test and deploy the update request. A performance requirement might be measurable as the latency or throughput. A reliability requirement might be measurable as the mean time to failure (MTTF).

The 6-part scenario is used to reason about a single QA under some conditions. For example, if we have five potential modifiability scenarios, we need a 6-part-scenario for each of them.

Architecting for Quality Attributes

The terms “tactics”, “strategies”, and “design decisions” are synonyms found in the literature and refer to the steps needed to promote a quality attribute of interest. The scope of these tactics and strategies concern a single QA. In other words, the architect should ignore the other quality attributes and how they impact the current one. We’ll see how to balance tradeoffs in a future article when we talk about styles and patterns.

Covering all QAs in a single article is not feasible. Therefore, we’ll focus on some of the most important ones, namely: Modifiability, usability, performance, and security. Along the way, we’ll briefly touch upon other QAs, such as maintainability, learnability, availability, scalability, and reliability.

Modifiability

Changes can be made to improve a functionality, to fix a defect, or to add a new feature. Changes can go beyond functionalities to target other quality attributes; they can improve performance, enable scalability, and make a system more secure.

Modifiability is primarily concerned with the cost and time needed to make the change. This means that the architect should be able to predict and/or specify the cost of change of future updates during design time. This seems to be a challenging task, and indeed it is.

Fortunately there are a number of tactics and patterns that promote modifiability and make it easier to predict the cost of change at design time. We’ll discuss modifiability tactics in the next section. Styles and patterns are the subjects of a coming article of this series.

Maintainability is part of modifiability. It is the degree to which a system can support changes.

Architecting for Modifiability

The main idea behind promoting modifiability is to avoid overly large modules: modules that handle multiple concerns and whose responsibilities overlap. Such structure means that any modification to any module will likely propagate to other modules.

We need to separate concerns. Avoiding big modules, reducing the coupling, and increasing the cohesion are the main strategies for promoting modifiability.

software architecture — quality attributes

The Static perspective can be used to reason about modifiability.

Reduce the size of modules: it is easier to maintain and modify multiple small modules that have limited responsibilities than a big module that manages multiple concerns. Splitting big modules into smaller ones is key to promoting modifiability. The following 2 sub-sections provide guidelines on how to appropriately split modules.

Low coupling: coupling is when the responsibilities of multiple modules overlap. High coupling means that any change to any module will likely affect other modules. To promote modifiability, coupling should be minimized so that a given change that concerns a given module will not propagate to other modules.

High cohesion is when responsibilities within a module are strongly related. An update in any module’s responsibilities will likely affect other responsibilities within the same module. High cohesion promotes modifiability. Low cohesion can be fixed by moving the unchanged responsibilities, for a given update, to another existing or new module.

Modifiability is the enemy of performance.

Six-part Scenario for Modifiability

Here are some examples of the six-part scenario for modifiability:

Source of stimulus: The client, a user, an administrator.
Stimulus: Add a new feature, fix a bug, improve the response time of a particular module, optimize some database queries.
Artifact: The whole system, the database system, the product management module.
Environment: Runtime, development time, test time.
Response: Make, test and deploy the change.
Response Measure: Cost of change, time needed to make the change (x person-days).

Usability

Usability is the degree to which a user can use a system effectively and efficiently. The main purpose of usability is to improve the user experience, and hence user satisfaction.

Usability is critical for some kinds of commercial systems in the sense that it directly affects the user’s perception of the system.

Usability and modifiability go hand in hand, primarily because the separation of concerns that we discussed in modifiability tactics also promotes usability and allows quick changes and fixes during operation mode.

Architecting for Usability

When designing for usability, we generally care about 2 things:

Learnability: do users quickly become familiar with the user interface? Chatbot systems, tooltips, help popups, and how-to videos and tutorials are good ways to guide users while using the system.
System initiative: What support does the system provide to the user? Progress bar for long-running operations, regular feedback messages, being able to undo an operation, and next-step suggestions based on history are some examples of support that the system may provide to users.

Prototyping and Iterative design: This is a user-driven approach and an effective way to design an efficient user interface. It consists of developing quick and iterative prototypes.

Prototypes should take into account positive and negative sides of previous versions (if any), preferences of end users (if known beforehand), and trends used in similar types of applications (if any). There exist several tools for creating quick and interactive user interfaces.

Empirical facts: the goal is to assess and quantify user satisfaction. The elapsed time for a given task, the ratio of successful operations to errors, the number of times users use the support provided by the system (e.g. undo an operation), and the types of questions users ask on chatbot, are a few examples of facts that architects can look at to enhance usability.

A/B Testing: This is a great way to explore what users like/dislike and adjust decisions accordingly. A/B testing is out of scope for this article, but in a nutshell, it consists of a user experience methodology that randomly splits users into two or more groups, and then for each group proposes a different version of the system. As shown in figure 1, the ultimate goal of A/B testing is to statistically determine which version of the system performs well.

Ask end users: surveys and polls are other ways to get valuable insights about what users need. Beta versions might also be a good way to test a new version of the system by end users.

Six-part Scenario for Usability

Examples of the six-part scenario for usability

Stimulus: Interaction with the system via its user interface (use a system feature, execute a long-running task, delete an item.)
Source of stimulus: End user, administrator.
Artifact: The whole system, the product management module, the wishlist module.
Environment: Runtime.
Response: The user performs the operation effectively and efficiently.
Response Measure: ratio of successful tasks to errors, user satisfaction (e.g. using a poll), the elapsed time of a task.

Performance

Performance is one of the most challenging QA to deal with (albeit a very exciting one). Not only because it conceals multiple complex concepts, but also it is the enemy of lots of other QAs such as modifiability and maintainability, testability, and security.

Performance is a time-based property:

how long it takes to respond to a request or an event (latency);
how many transactions have been processed in a given unit of time (throughput);
what is the deadline between states’ transitions of a sub-part of the system (deadline).

The architect cannot ignore performance, even if it is not explicitly stated or not a primary QA candidate for the system at hand. Simply because users do not accept very slow systems. “Quick” responses are always better.

When it comes to performance, one of the most important concepts that architects should carefully analyze is concurrency. Concurrency is a huge topic that needs a dedicated series of posts (I’ll get to it in the coming weeks).

It’s about the parallel execution of processes (and threads) and their synchronization and prioritization. Clearly, parallel execution is far better in terms of performance and resources utilization than sequential execution, but it comes with a cost.

Concurrency errors are one of the most difficult bugs to catch and fix. Race conditions, deadlocks, and starvation are some examples. The interleaving nature of processes and threads makes it very difficult for standard testing techniques to catch these kinds of errors. For critical systems, a formal verification technique is a must.

Another challenge is to deal with the occurrence frequency of events: whether it is deterministic or not, and what scheduling policy is adequate for the situation at hand.

Architecting for Performance

There are multiple tactics that promote performance, ranging from optimal and efficient algorithms to powerful CPUs and faster networks. However, not all of them are always feasible, and it is the job of the architect to optimally use what is available.

In this section, we’ll discuss some general rules to take into account when performance is critical. We will not dive into concurrency concerns even though we’ll mention some of them.

Concurrency: Parallel execution, along with an adequate scheduling policy, is a great way to improve performance. For example, the architect can use a priority-based scheme: high-priority events can be processed first.

In such a case, concerns such as starvation/fairness and preemption policy, among others, arise and should be carefully analyzed by the architect.

If, on the other hand, events have the same priority, a common strategy consists of limiting the number of events in a stream to be processed first before processing the next stream. This provides more predictability, but other concerns arise: queue management, overflow situations, lost events (miss rate) ...

For very hard-deadline constraints (the so-called “real-time”), tactics go beyond processes to co-locate multiple instructions — of different types — within the same process so that CPU interrupts are reduced, resulting in a reduction of the CPU context switches.

Again, concurrency is out of scope of this series, but it is worth mentioning the previous concepts and tactics in the context of SA.

Concurrency can be documented and viewed from the dynamic perspective.

Reducing intermediaries: As we’ve seen, separation of concerns and usage of intermediaries are good for modifiability, but they prohibit performance. This is a very common tradeoff between performance and modifiability.

In the context of performance, intermediaries should be reduced or completely removed (if applicable) and elements should be grouped and co-located. intermediaries can be message brokers linking some elements of the system, different nodes in the network requiring remote calls, or simply local subroutine calls.

The dynamic perspective can be used to reason about intermediaries.

Efficient use of resources: Dispatching heavy computations to multiple nodes significantly improves performance. Also, replicating and caching data is a way to have ready-to-use responses.

Issues such as which data should be cached and data consistency and synchronization are some of the concerns that the architect needs to deal with.

Of course, additional powerful CPUs and faster networks are other considerations for the architect to negotiate.

Scalability is the property of a system to handle load increases without compromising performance. It’s often associated with two common strategies: scaling horizontally (e.g. adding more nodes/servers) and scaling vertically (e.g. adding more resources to a single node)

Six-part Scenario for Performance

Examples of the six-part scenario for performance

Stimulus: The occurrence of an event (either in a deterministic on non-deterministic frequency).
Source of stimulus: An end user, an actuator, an external system.
Artifact: A sub-part of the system, the whole system.
Environment: Overload, emergency, normal.
Response: Processing the arriving events, system state transition.
Response Measure: Throughput, latency, miss rate, deadline.

Security

Security is a measure of the system’s ability to prevent unauthorized usage of data and services while providing services to legitimate users. Security combines 6 main concerns, which are:

Confidentiality: a property asserting that secret information is protected from unauthorized access. For example, your personal information on a website can only and exclusively be accessed by you.
Authorization: a property asserting that legitimate users can perform needed tasks.
Authentication: a property asserting that the user involved in a transaction is truly who they claim to be.
Integrity: a property asserting that data is delivered correctly without unauthorized modifications.
Availability: a property asserting that the system shall be available for authorized users. We’ll talk briefly about availability in a moment.
Non-repudiation: a property asserting that the performer of a task cannot deny later having performed the task. For example, the sender of an email cannot deny having sent the email, and the recipient cannot deny having received it.

Architects should care about 3 main concerns, which are: the business assets that need protection, the kinds of possible threats to consider, and how the system should respond to these threats.

Availability: It is part of reliability. Availability is the property of a system to be ready for use when the user needs it. It is expressed by the ratio of the available system time to the total working time.

Architecting for Security

Resisting attacks: This tactic aims to secure the system against potential attacks. Authentication, authorization, encryption, as well as data integrity checks are some examples. Using firewalls, setting allowed ports and IP addresses are some other ways to resist attacks.

Another tactic for resisting attacks is limiting the exposure of your resources. For example deploying your system on a geographically distributed infrastructure, so that if any given node is down because of a security attack, the rest of the infrastructure remains untouched.

Reliability is the property of a system to keep operating over time under some predefined conditions. Reliability is measured as the mean time to failure (MTTF).

Availability Tactics: Availability is a part of security, therefore availability tactics can be of help to recover from a security attack. Data mirroring, redundancy, and roll back to a previous known state are some strategies to consider.

Another important availability tactic, which is critical when reliability is among the primary QAs of interest, is degraded availability. This aims to maintain the critical functions of the system and turn off less important functions.

For example, if there is a bug in an aircraft software while flying, the aircraft must, at any cost, fly and land, even if the lighting or air conditioning in the cabin have failed.

Six-part Scenario for Security

Examples of the six-part scenario for security

Stimulus: An attack (denial of service, SQL injections, cross-site scripting, etc.)
Source of stimulus: A hacker (a human or a system).
Artifact: Data store, data produced by the system, some services of the system…
Environment: Runtime, test time.
Response: Data and services should be protected from unauthorized access. In case of an exploited vulnerability, the damage should be evaluated and a roll back to a previous safe state should be done. If availability or reliability are paramount, defined critical functions should continue to operate either in the compromised node or delegated to other safe nodes. In any case, the system should save the attack details in reports for analysis and security improvements.
Response Measure: The number of detected vulnerabilities, the number of exploited vulnerabilities, the number of known attacks that were resisted by the system, the number of services compromised for a given attack, how long it took to return back to the normal operation mode after a successful attack …

Wrapping up

That’s it for this part. We started by defining quality attributes and showed their importance in the software project lifecycle. We talked about the six-part scenario framework and showed how it formally specifies and quantifies QAs.

We also took a deeper look at modifiability, usability, performance, and security, and talked about their tactics. Then, we briefly discussed some other QAs, such as maintainability, learnability, availability, scalability, and reliability.

In coming posts, we’ll focus on styles and patterns, documentation and SA deliverables, as well as architecture evaluation.

References
[1] Software Architecture in Practice, 3nd Edition, Bass, Clements, & Kazman, 2013
[2] Documenting Software Architecture: Views and Beyond, 2nd Ed. Clements et al., 2011.
[3] Architecting Software Intensive Systems — A practitioner’s Guide — Anthony J.Lattanze, 2017