How to collect information about Jenkins for analysis and data visualization

Image for post
Image for post

If you support a reasonably large Jenkins instance, or you support a large number of instances, you have probably been faced with a performance problem. Like many Continuous Integration (CI) applications, Jenkins works quite well at a small scale but can degrade significantly without proper care and feeding. This article will present several examples of how to export Jenkins events, logs, and metrics to help identify opportunities for improvement.


Image for post
Image for post

If you work in a software development organization, you probably deal with various software development applications. These include core applications like source control, bug tracking, and continuous integration. In a small organization (less than 500 engineers), scalability is not typically a concern. Most applications in this area perform quite well when the number of users or the amount of concurrent activity remains small. This article is for the other end of the organizational spectrum, the ones who experience the challenges of using SDLC applications which were not designed to meet the needs of a large organization.

The term “commodity hardware” is nearly synonymous with horizontal scalability. When the hardware is inexpensive and seldom fault tolerant, you need to design your software to provide fault tolerance. It needs to be capable of distributing the workload across multiple systems and expanding to include more servers as usage demands. The idea of “commodity software” is similar: if the application is not designed to scale horizontally with increased load, you must distribute the projects and users across multiple instances.


Time series data is often used by operations teams to investigate performance issues. Since we cannot anticipate the source of a performance issue ahead of time, we often err on the side of caution and collect as much data as possible as frequently as possible. This makes it easy to get a very granular picture of what was happening in an environment within the last hour, day, or week. However, keeping more than a week or two of data can become difficult as the number of hosts and volume of data increases.

There are many reasons why it may be necessary to retain data for longer periods of time. It is often useful to refer to data that is weeks, months, or even years old in the following…


Image for post
Image for post

In a previous post I covered the topic of Software Development Applications, an overview of the various applications that support the SDLC process. This is a companion post which covers the variety of applications that can be used to provide operational support and transparency to the applications and infrastructure.

While software development applications provide a foundation for the software development process, the operations applications provide a foundation to support that infrastructure. They help provide traceability and transparency into the health of the software development applications. …


Image for post
Image for post
Mount Baker, North Cascades — Photo by Andy Porter

I’ve spent my entire Release Engineering career chasing transparency. It started with the idea that we needed more transparency in the build and unit test process, so we collected data about these events and created a UI to visualize them. It later expanded into the traceability of the software development process as we tried to link software requirements to commits, commits to defects, and so forth. Host level monitoring, metrics collection, event messaging, and log aggregation all followed that same theme: collect the data, surface the data, utilize the data.

Only recently have I started to realize that each attempt at transparency has suffered from a lack of vision and cohesion. It’s a classic “forest for the trees” problem of implementing a singular solution to solve an immediate problem without understanding how it fits into a larger ecosystem. When we collected build event data, we sent it straight to a database because that was the tactical need. When we linked commits with defects, we configured the source control system to talk directly to the bug tracking system because that was the tactical need. After years of making incremental improvements in transparency, I feel I can take a step back and reflect on the forest. I hope that by talking about the variety of tools that comprise the Software Development Lifecycle (SDLC), you as the reader can agree that visibility matters. …


“Care and Quality are internal and external aspects of the same thing. A person who sees Quality and feels it as he works is a person who cares. A person who cares about what he sees and does is a person who’s bound to have some characteristic of quality.”
- Robert Pirsig, Zen and the Art of Motorcycle Maintenance

Early on in my technical career, a manager recommended I read “Zen and the Art of Motorcycle Maintenance” by Robert Pirsig. I found it to be a thought provoking narrative that deals with the struggle for Quality even though you may not know exactly how to define it. Since then the idea of Quality has been a subconscious part of my decision making process. In this article I’ll talk about the role of Operations in the area of software application maintenance, and try to relate it back to the central theme of Quality. All quotes shown here are taken from Mr. …


What to do when it’s 3am and the servers are melting down

Image for post
Image for post

A runbook is an operational reference which is used to describe an application in a deployed environment. It should be easy to read, consistent across all applications, and accurate. This is the document an on-call responder would refer to at 3am when a SEV1 alert wakes them up, so it should be as straightforward and to-the-point as possible. Although this article assumes that there is a dedicated Operations team, it is equally useful for DevOps teams, system administrators, or just a plain old developer who needs to understand the deployment environment. …


Image for post
Image for post

Software development is a highly collaborative process which requires customer input, planning, development, and product testing. The application infrastructure which underlies the Software Development Lifecycle (SDLC) should support this collaboration and provide traceability and transparency. These tools should provide a foundation for taking software from the initial idea all the way through to release. In order to establish traceability, each phase of the development process should capture structured data which can be linked to the other phases of development.

The diagram below illustrates the interconnected nature of the software development process. Once an idea is formalized into a requirement, it must then be implemented in software and packaged for use, before finally being tested and released to a customer. Each of these phases may require unique software capable of managing the information associated with that stage of the software lifecycle. It is this interconnected feedback loop which ultimately forms the relationships between each phase and which provides transparency to the development lifecycle. …

About

Shawn Stafford

Release Engineer with an interest in pipeline traceability and observability.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store