Applying Decision Analysis to a Safety Critical System

Choosing a Service Oriented Architecture to Replace a Legacy Monolith

Nicolas Malloy
The Interlock
10 min readJun 23, 2019

--

Dilemma

Software maintenance can be a grueling activity if a systems architectural design is not set up well to support it. For this analysis we consider a legacy system solution which is constructed around a large monolithic architecture. A monolithic architecture is the traditional unified model for the design of a software program. This kind of architecture is designed to be self-contained; components of the program are interconnected and interdependent (WhatIs, 2016).

The Safety Critical (SC) System under analysis provides weapon and attack control functions and the initiation and control of the weapon launch sequence for both tactical and defensive weapons. Historically, monolithic architectures like this are difficult to maintain. The issue has prompted the investigation of alternative architectures.

The decision analysis will identify an architectural approach that improves code maintenance by making the code base easier to maintain. Specifically, three SOA design approaches will be considered. A SOA is essentially a collection of services. These services communicate with each other. The communication can involve either simple data passing, or it can involve two or more services coordinating some activity. Some means of connecting services to each other is needed for SOA (Architecture, 2017).

The seven steps for a formal decision analysis process outlined in Michael Franks’ book, Choosing Safety will be followed. This formalized process will be followed to:

• Identify a Decision Opportunity

• Create a Problem Statement

• State the Objective and Attributes

• Create Alternatives

• Develop a Decision Model

• Develop a Value Model

• Synthesize Decision and Value Models to Rank Alternatives (Frank, 2008).

Identifying a decision opportunity

Code maintenance issues plague the current system design. Just about every time code maintenance is performed a new issue resulting from the maintenance is later discovered. These issues are inadvertent bugs. These bugs cause schedule delays and unplanned increases to program cost. There have been several instances where inadvertent bugs have affected SC System functionality.

If such a bug were to cause the inadvertent launch of a weapon the consequences could be catastrophic. The program has an opportunity to determine how these code maintenance issues can be dealt with using a SOA. If these changes are not corrected the program is at risk of decreasing its safety posture and missing critical milestones because maintenance development is taking too long.

Creating a problem statement

There was a time when systems were created to perform a specific task and only that task. When this was the case, software engineering was simpler than it is today. The reason for this is because once a system had been developed and had passed testing the only real work remaining was occasional maintenance. That is not very common by today’s standards.

Modern systems are constantly being updated to support new features that increase performance. This can be great for the operators, but it comes at a risk. Depending on how the system gets architected, adding new features can be problematic. In the case of the SC System, making changes to the software doesn’t always work out as planned.

When software engineers receive a Problem Report (PR) they make changes to the code to fix it. Sometimes when changes are made to the code there are dependencies on that code that exist elsewhere in the system. If dependencies are not fully understood when the changes are committed to the build the result can be a software build with inadvertent bugs.

Bugs can cause the system to perform in ways that were not intended. Peer reviews are always performed on the software changes to catch issues like this but sometimes things are missed. When dealing with a Combat System one bug can cause a lot of problems.

Stating the objective and attributes

The objective of this decision analysis is to analyze the options using a systematic process. This will help to identify the safest, most maintainable, and cost effective architectural design for the SC System. The issue is reoccurring, and an improvement needs to be identified. Such an improvement would allow developers to maintain code without worry of debilitating code dependencies.

The expectation is that a SOA would prevent changes from introducing inadvertent bugs that cascade throughout the system. If this could be accomplished, it would improve the system’s safety, eliminate the associated schedule impacts that drive cost increases, and improve overall system maintainability. There are three attributes that will be measured for this decision analysis:

• Safety — The system is considered safe if the proposed design complies with the Joint Software System Safety Engineering Handbook (JSSSEH) and uses a strong type language for implementation.

• Cost — The cost of the design is acceptable if the projected cost falls below $1,000,000.

• Maintainability — The system is considered maintainable if the risk associated with code modifications is low. Low risk will exist if the design is based on a SOA. Medium or High risk will exist if a monolithic architecture were to be proposed.

Safety has an obvious importance in the decision analysis because this system is responsible for launching weapons. When making architectural choices the system architect must take the design guidance provided in JSSSEH into consideration. The purpose of the handbook is to provide management and engineering guidelines. The guidelines aid in achieving a reasonable level of assurance that the software will execute within the system context with an acceptable level of safety risk (Activity, 2012). The JSSSEH also helps to ensure Combat System software safety design compliance is maintained.

For this analysis safety will be derived from a qualitative measurement. It is also important that the alternative system architecture use a strong type programming language. A strong-type programming language is one in which each type of data (such as integer, character, hexadecimal, packed decimal, and so forth) is predefined as part of the programming language and all constants or variables defined for a given program must be described with one of the data types (Harbeck, 1999). If the design architecture is compliant with the JSSSEH and uses a strong type programming language for its implementation, then it will have met the desired criteria. Safety is weighted as 0.50.

Cost is a huge factor in determining the feasibility of change because program budgets are generally fixed. Software development cannot account for 100% of the budget. Obviously, there are other areas that need funding to ensure a program is successful. These areas include but are not limited to systems engineering, training, testing, and documentation. No matter how revolutionary or beneficial a decision is if the cost exceeds the budget it won’t happen. For this analysis the cost will be derived from a quantitative measurement. If the design cost is below $1,000,000 then it meets the desired criteria. Cost is weighted as 0.30.

Software maintainability is heavily reliant upon the architectural design of a system. Depending on the architecture a system can be risky to update or modify. Peer reviews are used to catch issues that may find their way into the system as the result of code modifications but there are no guarantees that everything will be caught. For this analysis the software design is measured qualitatively. If there is Low risk in modifying the code used for the architectural design, then it will meet the desired criteria. Maintainability is weighted as 0.20. The attributes and weighted values are listed shown in Table 1.

Table 1 Attributes and Weighted Values

Creating the alternatives

As system design concepts evolve it is important to determine if adopting them can provide lasting benefits. Given the circumstances with the current monolithic architecture it is clear that SOA approaches should at the very least be investigated. The alternatives to the monolithic architecture that will reduce the likelihood of inadvertent bugs being generated during code maintenance are as follows:

Alternative 1: Re-architect the system with a service based approach from the ground up using C++ language. The system would be designed such that functions were allocated to specific services to eliminate the existing monolithic architecture all together. C++ would be used as the sole programming language to achieve this goal. This language was chosen because it can interface with just about any other language. Additionally, just about any system can run and compile C++. This language would provide the SCCS with flexibility for future development.

Alternative 2: Re-architect the system with a service based approach from the ground up using Ada language. The system would be designed with functions allocated to specific services eliminating would eliminate the existing monolithic architecture all together. This language was chosen because it is designed to support large programs with long lifespans. Additionally, Ada is particularly effective for systems where reliability and efficiency are important. Ada would be used as the sole programming language to achieve this goal.

Alternative 3: Modularize the existing system such that each module performs a specific service and thus retains the current Ada language. An approach known as application decoupling would be used to achieve the modularity. Essentially, the system will split up into modules using the existing code base. Ada would be used as the sole programming language to achieve this goal.

For a Combat System there are many factors that can contribute to design decisions outside of safety, cost, and maintainability but for this analysis it was determined that these three factors play the most significant role in the decision to address reoccurring issues with inadvertent bugs.

Of course, these are highly subjective determinations to cast upon a system design that has not yet been prototyped but for this analysis it is acceptable considering that the inputs are being provided by a Subject Matter Expert (SME). Each alternative has its benefits and disadvantages which will be measured against the attributes. Table 2 shows the objectives and attributes for this analysis.

Table 2 Objectives and Powers

Develop a decision model

The decision model was based off alternatives, attributes, and the outcomes of those attributes as they pertained to the alternatives. By taking this approach, it was possible to derive t a measurable result that can later be used in the value model. The attributes pulled the areas of interest into context for each of the alternatives so that the resulting consequences could be ascertained.

The documented results for each of the consequences were treated as having been met or not met. If the consequence was met, then a value of 1 was used to represent it. Otherwise, if it was not met a value of 0 was assigned. Table 3 shows the decision model scoring.

Table 3 Decision Model Scoring

Develop a Value Model

The converted scores derived during the decision model step are then used to generate a value model. The value model is comprised of the chosen attributes, the alternatives, and their converted scores that were derived from the consequences. Table 4 shows the weighted scoring of the attributes.

Table 4 Weighted Scoring of Attributes Value Model

Summary of Decision and Value Model Outcomes

Alternative 3 met all three of the design attributes captured in the decision and value models. The final weighted values show that Alternative 3 had a final score of 1.0 while alternatives 1 and 2 had scores of 0.2 and 0.8 respectively.

In terms of safety Alternative 2 and 3 met the criteria while Alternative 1 did not. This was due to the desire to implement using a strong type programming language. Strong type languages are preferred for safety critical systems and particularly those involving weapons.

Alternative 3 met the desired cost while Alternatives 1 and 2 did not. This is because of the effort required to totally re-architect a system from the ground up. It requires a program to standup new systems, software, test, integration, and documentation teams,. Wwhereas when modularizing a system those groups already exist. The one down side to this is the risk of breaking software interfaces. Fortunately, the potential cost is predicted to be low.

Alternatives 1, 2, and 3 presented maintainable approaches for the system architecture. Whether the system is being re-architected or modularized from an existing code base the maintainability will improve. Each system would be expected to exhibit fewer inadvertent bugs than the legacy monolithic design. The final ranking of the proposed alternatives based on the derived weighted scores are is as follows:

1. Alternative 3 received a weighted score of 1.0

2. Alternative 2 received a weighted score of 0.8

3. Alternative 1 received a weighted score of 0.2

Conclusion

The decision analysis identified Alternative 3 as the best architectural approach for improving code maintenance by making the code base more easily maintainable. Additionally, it ensured that an alternative was chosen that met the programmatic needs. These needs are maintainability, cost, and Department of Defense (DoD) standards for safety.

Alternative 3 will use an architectural approach that is compliant with the JSSSEH. This is important because the fielding of naval Combat Systems is influenced by the Weapon System Safety Engineering Review Board (WSSERB). The WSSSERB expects a system to meet the guidance spelled out in the JSSSEH for safety critical software systems.

Architecturally, Alternative 3 will be the cheapest approach to creating a SOA. Modularizing the system will save approximately $750,000 in development costs from start to finish when compared to a re-architecting effort. Lastly, Alternative 3 will result in a system that carries lower risk for code modifications. Once the system has been modularized it will reduce the likelihood of code changes affecting other areas of the system. Alternative 3 offers the safest, most cost effective, and maintainable approach.

References

Activity, N. O. (2012). Joint Software System Safety Engineering Handbook. Washington D.C.

Architecture, S. (2017, 11 13). Service-Oriented Architecture (SOA) Definition. Retrieved from Service Architecture: https://www.service-architecture.com/articles/web-services/service-oriented_architecture_soa_definition.html

Frank, M. V. (2008). Choosing Safety: A Guide to Using Probabilistic Risk Assessment and Decision Analysis in Complex, High-consequence Systems. Washington, DC: Resources for the Future.

Harbeck, R. (1999, 12 3). Strongly Typed. Retrieved from TechTarget: http://whatis.techtarget.com/definition/strongly-typedWhatIs. (2016, May). Monolithic Architecture. Retrieved from WhatIs: http://whatis.techtarget.com/definition/monolithic-architecture

--

--

Nicolas Malloy
The Interlock

AV System Safety Engineer | Passionate about Resilience Engineering and Data Science