Data for Engineering Ops

Emily Batchos
Dandy Engineering, Product & Data Blog

--

At Dandy we like to work as intelligently as possible, making sure that we are always working on the right things and that we’re not losing time on frivolous roadblocks that weren’t visible in the moment. This extends to software engineering as well, a practice that we call Eng(ineering) Ops.

How many tools does a typical engineering team use to understand its output? The answer is typically more than one, which makes monitoring output both difficult and ineffective. With our Eng Ops data work we pulled data from disparate sources — especially Jira and Github — together to build comprehensive analyses and provide a unified, authoritative view of Engineers Operations process for our Engineers.

Github Data

Because our Github PR has a designated template to log the Jira ticket — but only in the title — this first had to be parsed out.

In a few lines of SQL, we set up a few cases to pull the linking key. In the future, we could design more intuitive text parsing, but until then, we’re dependent on quality data entry.

For Github data, another important piece of context is that the “difference” measurements are calculated at the commit level against the head of the branch. This means that if we sum the incremental changes for a multi-commit PR, we would likely be under- or over-counting the changes. However, we can select the merge commit which can be found by looking at a merged PR to understand the net changes associated.

Example showing commit vs PR level data

Commit A: +15 lines — 5 lines

Commit B: +10 lines — 4 Lines

Commit C: (Merge Commit): +10 added — 3 removed lines

This means that we can now quantify the aggregate changes created by a PR. While work that was “done” and then “undone” in a subsequent commit prior to merge is at risk of not being captured, we ultimately capture the aggregate work product that an engineer submits to a codebase.

Now that we can quantify the volume of code changes and join the context to JIRA tickets, we access a new world of tracking possibilities. The ability to attach JIRA tickets and context to the data gives us the ability to look at a few crucial things that we couldn’t before.

Incorporating JIRA

Our Dandy Engineering team has a system that allows them to track work with estimated effort via a point system. We create a JIRA ticket (”issue”) for each task we work on, and establish a set of tickets to be completed in a week or a “sprint”. We use Github to track our codebase, and in our Pull Request (PR) submission process, we reference the associated JIRA ticket driving this work. This means a reviewer can easily reference the purpose for a code change and understand context.

On the JIRA side, we need to make sure that our JIRA data accounts for two Dandy-specific complexities:

  1. Sprint is a custom field, and has to be tracked with the field history section
  2. Sprint can also change over time for a ticket if a piece of assigned work is not completed in the expected week.

Both of these complications meant that we needed to do some manual searching in JIRA data to identify which custom field on an issue represents the sprint. This is easy enough to do with some SQL queries.

Select distinct custom_field_name , field_value

FROM jira_data.multi_select_field_history

WHERE field_value like ‘%sprintnamesample%’

Once we identified the custom field name (which shows as custom_field_XXX), we can directly reference that field in the our query to dynamically identify the sprints a ticket was associated with.

Since issues can be assigned to multiple sprints, we need to select the most recent sprint that an issue was assigned to, regardless if it was associated with multiple. We did include a quick calculation to identify if an issue (and associated Pull Request) predated a particular weekly sprint by checking the Sprint Start Date vs the Ticket start date. In a separate view, we can build out sprint history by issue, but our goal was to focus on the issues and associated pull requests that are in the current sprint.

Now that we have connected these two sources of tracking, we have access to powerful insights around the Code-driven activity of our teams. We can now identify the answers to key questions like:

  1. How much work (Pull Requests) needs review now? → This gives the current backlog
  2. How long has an PR been in review? → Is there a backlog preventing us from delivering work?
  3. How much work is approved, but not merged? → How much work that needs last-mile delivery?
  4. How how much work is associated with the current sprint, and how many individuals are completing it?
  5. How much work is already completed for this sprint?

While this is just the beginning of unlocking the capabilities of the productivity tracking tools, we can now succinctly identify action items for our team — thus reducing administrative burden and increasing the ability of engineers to engineer!

--

--