Ryan Sites
Nov 6 · 5 min read

Automate Everything… Even the minutia

Command Line Interface (cli)
Command Line Interface (cli)
Photo by Shahadat Shemul on Unsplash

What is DevOps?

DevOps is a set of practices that automates the processes between software development and IT teams, in order that they can build, test, and release software faster and more reliably. The concept of DevOps is founded on building a culture of collaboration between teams that historically functioned in relative silos.

What is CI/CD?

Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an automated build, allowing teams to detect problems early. By integrating regularly, you can detect errors quickly, and locate them more easily.

At the heart of every engineering firm, the fundamental concepts of automation are being spewed from the mouths of everyone within the organization. From leadership and the non-technical, all the way down to the systems analyst and junior engineers. “Automate, automate, automate” they say, “faster time to market”, “feature flags are a must”, and of course, “we should deploy to production every day”.

Let me preface the rest of this post with the fact that I do NOT disagree with any of the aforementioned quotes above. I have seen a tremendous increase in over-arching efficiencies within the DevOps space over the last decade. We see it with better source control tools such as git and GitHub, and open source delivery tools like Jenkins, allowing us to create automated deployments whether your team was using GitFlow or trunk-based development.

Let’s not exclude our friends docker and kubernetes, paving the way for next-level automation. First was docker, which gave us OS-level virtualization to deliver software in packages called containers. Then came kubernetes, a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. (We first had to learn how it was pronounced!) Throw in helm charts, which gave us a way to add a templating engine around the deployment of docker images to kubernetes, some would say we have reached engineering utopia. My work is done here…

… or is it?

Now that we have automated the process of delivering our software, including but not limited to the testing of said software (acceptance, performance, and e2e), the static code analysis, binary storage of packages, and deployment of components to production, it’s time we look elsewhere to find opportunities in efficiency.

The minutia

I recently spent time as a boots on the ground engineer for one of the product teams I oversee. It had been a hot minute since the last time I had done day-to-day scrumming with an agile team, but it did not take me long to see that there were opportunities to automate even the simplest of everyday tasks within software delivery. The team was using various communication channels such as Slack/Flowdock for chat, as well as GitHub Pages and Confluence wiki for documentation. However, there were several times we saw questions like “How did you set up a token to authenticate to that”, or “can someone send me the curl command used to post a document there”, etc. It became abundantly clear that there was ample time spent repeating mundane tasks over and over every day to accomplish simple and small tasks to complete user stories… the minutia.

I could go into detail how something as simple as creating gists in GitHub to share would suffice, and I use this technique quite frequently. As a matter of fact, an open source tool Lepton has become one of my favorite tools for managing all of my gists! I could also go on and and regarding tools like Inomnia as a REST client have simplified development workflows numerous times for my teams. But today, I want to talk about using a custom built, project specific command line interface (cli), which in my opinion might be one of the most under valued tool in the shed.

The cli

I have been promoting the concept of local sandbox environments for all of my engineering teams. The ability to quickly stand up production like replica environments have become increasingly easy with the advent of tools like kubernetes and helm. However, getting them initialized the first time, with all needed software components within a system running, and initiating an end-to-end test can be cumbersome at times, especially when new engineers are introduced to the team. The documentation to on-board an engineer alone can sometimes be several pages long.

The project mentioned leveraged SPARK on Kubernetes to perform traditional ETL jobs. Each client, has a specific set of instructions to transform their data to a common format used in downstream systems. Spark would connect to S3 buckets, consume delimited text files, leverage the instructions to perform typical data transformations on DataFrames, write transformed delimited files back to S3 buckets for consumption downstream.

Performing this e2e without automation wouldn’t be enjoyable in the least bit, ignoring the fact that it would take a tremendous amount of time putting all of the pieces in place to execute correctly. This is where the cli comes in! We decided to take one two-week sprint to engineer a cli that would automate 2 critical functions for this team:

  1. Initialize Local Environment
  2. Execute E2E Test

Init

The initialize local command (init) performed the following steps:

  1. Cloned all system components to temporary directory
  2. Executed helm charts for all system components
  3. Initialized local kubernetes environment
  4. Cleaned temporary directory

Our set-up instructions went from a myriad of prerequisites and 3 pages of download this, and run this to only 5 prerequisites (SPARK, Docker for Mac, git, SBT, Node), with SBT and SPARK the only two that aren’t required for all projects. The 3 pages of instructions to onboard a new engineer went to:

npm i @company/cli
cli init

E2E

Running the cli e2e command from the terminal was even more magical. 🎉

The e2e command was responsible for uploading test files to Minio an Amazon S3 Compatible Object Storage solution. Upon completion, it would then make http call to receive client specific instructions from a SpringBoot api in local kubernetes. It would pass the instructions as a parameter to the spark-submit call. This would create a SPARK driver pod in kubernetes that would execute the instructions provided, and place outputted files in a separate bucket in Minio. After completion of the spark-submit, the cli would retrieve the newly placed files, and do data-driven assertions that the transformations met the expected output. (You even get a 🍕 🍺 if all is well)

This e2e process ran in 1–2 minutes each time, and could be executed from soup-to-nuts in less than 5 minutes. In a world where engineers move into, and out of project teams constantly, I feel confident that I can stand behind my original statement that there is a ton of opportunity to create efficiencies in our work streams, and I think a custom cli is a great start!

Ryan Sites

Written by

Software Engineer who happens to love bourbon…

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade