Transparent Software Development

10 min readNov 19, 2018

Mount Baker, North Cascades — Photo by Andy Porter

I’ve spent my entire Release Engineering career chasing transparency. It started with the idea that we needed more transparency in the build and unit test process, so we collected data about these events and created a UI to visualize them. It later expanded into the traceability of the software development process as we tried to link software requirements to commits, commits to defects, and so forth. Host level monitoring, metrics collection, event messaging, and log aggregation all followed that same theme: collect the data, surface the data, utilize the data.

Only recently have I started to realize that each attempt at transparency has suffered from a lack of vision and cohesion. It’s a classic “forest for the trees” problem of implementing a singular solution to solve an immediate problem without understanding how it fits into a larger ecosystem. When we collected build event data, we sent it straight to a database because that was the tactical need. When we linked commits with defects, we configured the source control system to talk directly to the bug tracking system because that was the tactical need. After years of making incremental improvements in transparency, I feel I can take a step back and reflect on the forest. I hope that by talking about the variety of tools that comprise the Software Development Lifecycle (SDLC), you as the reader can agree that visibility matters.

Software Development Lifecycle Applications

When I talk about SDLC, I’m referring to the process of producing software. I’m referring to the collection of applications that facilitate software production. It’s requirements tracking, source control, continuous integration, bug tracking, and test case management. Ideally these applications all function together to provide greater transparency and traceability. There are commercial solutions such as IBM Jazz (formerly Rational) or HP ALM, but these can be incredibly expensive, difficult to implement, and seldom provide the best user experience. The goal should be to create a loosely coupled system where applications can be added or removed quickly, or even run in parallel, as the needs of the organization change.

The challenge with transparency is that it takes effort, especially when the applications are not be designed to integrate with each other. Some vendors attempt to provide a complete SDLC solution which meets all of the needs of a developer, but they are expensive and seldom provide a satisfying user experience. Other applications may provide hooks and plugins to integrate with other vendors. This often limits you to a very specific combination of applications that happen to integrate in very specific and pre-defined ways. What developers really need is the ability to select the best-of-breed applications that specifically suit their development process.

Although software development is not a uniform process across all teams, there needs to be a high degree of transparency and traceability throughout the process. Even within a single company, different projects or organizations may have different needs. Developer requirements, corporate acquisitions, and process evolution can all result in a multitude of overlapping applications and tools. Standardization is ideal, but seldom achievable. The solution to that problem is information sharing. The idea that applications should expose all their events in an open and transparent manner is what I’ll refer to as “transparent SDLC.”

In simple development environments, it is often possible to configure the applications to communicate directly with each other. The source control system might send commit information to the defect tracking system, establishing links between commits and defects. The continuous integration system might poll source control for changes which trigger actions. Although this is a step in the right direction, it creates a tightly coupled system that can be brittle and prone to failure. Taking a single application off-line for maintenance can disrupt the entire development process, or result in missed events which never get sent to their destination.

How can we get to that happy place where the development process is highly transparent, but the applications are loosely coupled? The solution is messaging. With the rise of big data, streaming platforms like Kafka or Pulsar have become the hub for centralizing and distributing data. By emitting event messages from each application, it is possible to create a loosely coupled environment with a high degree of transparency. The key is selecting a platform which allows multiple producers and multiple consumers to access the topics or message queues.

Once applications begin emitting event messages, it creates an open transaction log that can be consumed by any application, user, or process which is interested in acting on those events. For example, a message consumer might look for source control commits, parse the message for defect or requirements IDs, and add a message to the defect or requirements tracking system with the commit information. This approach simplifies the maintenance of the producing applications because they only need to emit messages. The code for consuming the events and taking action can be deployed independently of any of the applications it integrates with. This has a multitude of benefits such as:

Individual applications can be taken off-line without impacting any other applications by allowing producers and consumers to take action at their own pace.
Integration complexity is managed outside of the applications, using public APIs such as REST calls to interact with each application. This makes it possible to configure, deploy, and update the integration without impacting upstream or downstream applications.
Multiple clients can consume a message and take action. This allows the same message to trigger events in different applications, or to be replicated to a test and production environment simultaneously.
Anyone with access to the messaging streaming platform can implement their own consumer, performing actions or analysis based on their needs.

With the appropriate messaging platform and format in place, a world of possibilities are now available with minimal additional effort. In addition to application events triggering integration events with other SDLC applications, the events themselves can be analyzed for trends or errors. Events can be streamed to Elasticsearch and visualized on Kibana. Metrics can be collected in a time series database and presented with Grafana. With a little elbow grease, messages in the OpenTracing format can even be consumed by applications like Haystack to represent complex visualizations of an end-to-end process.

Messaging

In a complex software development environment, it is often difficult to predict or manage integration across multiple applications. The applications from various vendors may not support direct integration with other applications in the environment, or technical and architectural challenges may prevent the use of integration points. In order to achieve a loosely coupled architecture, a streaming platform can be used to receive events from each of the SDLC applications.

When properly implemented, event messages from each SDLC application will be published to a central messaging platform and any interested applications can consume those messages and perform the appropriate actions. For example, a change management system (source control) might publish an event each time a new change is committed to the system. A continuous build environment may listen for these changes and initiate compilation of the code to produce a consumable artifact. Similarly, a defect management system may examine the changes for references to defect identifiers and establish the appropriate links between a source code change and the corresponding defect. While not as convenient as direct integration between application, the loose coupling provides a much greater degree of flexibility.

In addition to application events, the messaging platform can also be used for a wide variety of application data such as log messages, metrics, and events. By writing logs and metrics to the message queue, the data becomes available to multiple message consumers. This allows the data to be consumed into a time series database or indexed by a search engine and exposed to the appropriate users. Standard third party tools to be used in the infrastructure to provide transparency and automation on a scale that might not otherwise be possible in a tightly coupled environment. Users can begin to analyze and interact with information in ways that meet their needs without the need to customize or impact the SDLC applications directly.

Some key requirements of a centralized streaming platform are:

High throughput
As the number of message producers increase and the volume of messages increases, the platform must be capable of handling high message volume. This is particularly true for log aggregation, which can have an extremely high message volume.
High availability
Applications will constantly be producing and consuming messages. Any outage or disruption may result in a loss or delay of data. The streaming platform must be designed with clustering and high availability in mind. Even upgrades of the platform itself must be possible without requiring an outage or downtime. When the message platform is highly available, it takes the burden off of the consumers to be highly available as well. Consumers can be taken offline for brief periods of time for upgrades or maintenance and can catch up on missed messages when they come back online.
Multiple producers and consumers
A cornerstone of this loosely coupled architecture is the ability for multiple producers to write to the same message stream, and multiple consumers to consume for the same stream. Supporting multiple producers allows distributed applications to merge their data into a single location. Supporting multiple consumers allows the data to be consumed from a single stream and used by a variety of applications. For example, the environment might currently publish performance metrics from each of the application hosts to a single message stream and then consume them into a time series database, but in the future a new consumer could be created to send the data to an alternate database in order to evaluate a new software project or use case.
Event stream processing
It may be necessary to analyze log messages in real-time to look for errors or patterns and then take action, such as triggering an alert or a self-healing action. Messages or logs may need to be parsed or enriches before being sent to their final destination. Event stream processing allows the log messages to be processed independently by different consumers. For example, HTTP access logs might be augmented with GeoIP data and persisted in Elasticsearch, and the metrics from those same logs might be aggregated and stored in a time series database.

Log Aggregation and Metrics Collection

In order to provide transparency to the health and capacity of any environment, the Operations team must have real-time access to information. For example, host metrics such as memory or CPU utilization can provide insight into the load that a system is currently under. For web-based applications or systems which make heavy use of REST API calls, analysis of the HTTP request logs can provide valuable insight about the number of requests or response times, which could help correlate performance degradation to user activity.

By collecting this data and exposing it through application dashboards, the Operations team can understand the application load and performance characteristics over a period of time. Creating meaningful dashboards from key logs and metrics allows the information to be presented in a meaningful and readily understandable format that can be used to make informed decisions about how to improve application performance, reliability, and scalability.

Application Logs

Logs are a valuable resource for understanding application behavior and usage patterns. When logs are collected from the various hosts into a central location and parsed into structured documents, they can be indexed and later presented in a graphical dashboard to provide meaningful view of the data. To use a specific example, a HTTP request log is typically composed of the following pieces of information:

Client information (IP Address, client software)
Requested URL
HTTP Response Code
Response time and/or response size

When indexed and presented as a dashboard, this information can be used to answer complex operational questions such as:

What IP address or geographic region generated the most requests?
How many requests took longer than 5 seconds?
How frequently did a particular error or response code get returned?

A well constructed dashboard can provide answers to common operational questions at a glance.

Getting Started

The task of establishing a streaming platform just to support your software development process may seem like a daunting and overly complex approach. But you don’t have to implement it all at once. Start by collecting logs and surfacing the data through Elasticsearch and Kibana. Next select a core application such as source control and begin emitting events, then tackle a use case such as establishing links between defects and commits. Once you are able to demonstrate the application of the technology, there will be multitude of opportunities to expand on its use. And the great part of the streaming platform approach is that it allows subject matter experts to take control of their own applications. The ability to integrate applications and increase transparency is no longer dictated by plugin availability or vendor support, it is available to anyone who can imagine a better way to connect the dots.

Transparent Software Development

Messaging

Log Aggregation and Metrics Collection

Application Logs

Getting Started

Written by Shawn Stafford