“Ops by Code” — DevOps for Ops
Prologue
When I first heard about the DevOps movement, I was on fire. In my professional career, I have always tried to automate as many processes and tasks in operations as possible. Unfortunately, I stood alone with this vision for a long time and hoped that DevOps would change this. There weren’t many people in IT operations who had fun and interest in development and were looking for a contact to developers. Now, there was this term “DevOps,“ and I saw a better (IT Ops) world coming.
At that time, I observed this term on both sides (neutral). The developers build up Ops know-how and gain understanding on the problems of the IT department. At the same time, Operations builds up Developer know-how and learns to work with Git and Code.
Meanwhile, the movement has made a giant step forward. Mostly everyone in IT had heard the term, and many tools have been created with the establishment of the public cloud.
In contrast to the origins, much has developed in a positive direction.
- Most manufacturers deliver their products with APIs, which also follow a (REST) standard
- Git has established and developed a strong ecosystem (Github, Gitlab, etc.)
- CICD tools are widely available, and the entry barrier is lower than ever
- There are various tools for abstracting infrastructure (Terraform, Sparkel, Pulumi, Cloudformation)
- DevOps Engineers are wanted for many positions, not only the classic profiles
- The developers made huge steps in infrastructure direction, which has a positive effect on the above-mentioned tools
The problem
Due to the lack of a clear definition of DevOps, the term is very much subject to change and fashion.
The generic term “DevOps” is currently evolving into a pure software development-oriented description. As a result of this, my initial hope that IT Operations would also move in this direction is decreasing every day. Ops already see DevOps as a “developer hype” and don’t perceive it as relevant to them and their daily work.
Operations, therefore, do not share the same vision and enthusiasm for the subject and work in many areas the same way as before.
The direction changed — Photo by Scott Blake on Unsplash
Some of you will surely complain that Operations should use DevOps in the same way. The goal is to break down the boundaries between “Ops” and “Devs.” Unfortunately, in many organizations, Operations are much more than just contributors to the development. Operations have many tasks that are difficult or impossible to achieve with the current state of DevOps tools and processes. But these tasks are mostly not seen by developers or higher management.
Typical non-DevOps related tasks are:
- Client administration (Notebooks, Desktops, Smartphones, etc.)
- Software distribution and policy definition
- Operations of huge amounts of legacy systems (Cue Windows Server 2003+)
- Administration of physical networks, firewalls and proxies (yes dirty, but unfortunately partly necessary)
- Operations of “off the shelf” products, which typically cannot be changed
- Active directory and the surrounding ecosystem
- etc…
Okay, back to the problem. You see, there are sometimes completely different challenges. Of course, there are also many overlaps, which are already addressed via “DevOps.”
A more concrete concept for the Ops
IT Operations also need a strong vision and term to align and organize for the future. However, the term “DevOps“ alone cannot achieve this.
Everything is running great in operations, no need to change!
Well, we all know that this is not true, and this is one of the reasons for the DevOps movement itself. The operation is often slow; and using tickets, Excel spreasheets, and emails makes handling requests complicated. Accordingly, much manual work is involved, which results in long runtimes, errors, and non-transparency.
Even if this sounds very negative, it is the best that can be achieved currently for operations. Many tools have no (or bad) interfaces for automation. At the same time, even a medium-sized organization uses hundreds of such tools.
The highest goal in Operations is to keep stability, similar to the “faster“ speed of feature delivery of the developers. ITIL, and the bureaucracy around it, aims for this specific goal. It tries to show dependencies in order to plan for and minimize risks.
Due to the limitations of previous (and current) platforms and tools in Operations, this was only possible with a lot of paperwork. But now, more and more tools offer clean interfaces for automation. At the same time, Git offers an extreme degree of transparency on all changes. If we consider both together, they would offer enormous advantages for IT operations, as well.
I call it “Operations by Code” (or ObC)
At least, in theory, it should be possible to describe the complete infrastructure, or IT configuration, as code. This will result in enormous opportunities as well as challenges.
To make this pretty clear, I’m not talking about fancy WebServers or systems which are related to development. I am talking about the systems which keep the company together, such as Active Directory, the (physical) network or (client) software-distribution.
All of these need to be managed as well — and ideally as code.
Core points of this concept are:
- Everything is stored in code
- The code is the leading system
- Changes are tested for known problems
- Regular and frequent deployments
If you worked in IT Operations and tried to automate, you are most probably aware of how it looks is right now. A lot of software doesn’t have an interface at all (except for the UI). To archive the changes, you have to simulate the UI and reverse engineer the registry or config files. Even modifying the (raw) database is sometimes required.
It has become a cruel feature in the procurement process of “off the shelf” software to have these interfaces in place.
Here is a short example from practice
Now: Firewall requests are processed in many organizations via ITIL service requests in a semi-structured form (Excel, free-text, etc.). Afterwards, the request is manually checked (ranges too large, critical ports, request for someone else’s systems, etc.) and manually transferred to the firewall management.
Operations by Code: The rules are queried in a structured way (Form / JSON / Code), which is then automatically tested against the previous criteria. Then (Terraform) code is generated and merged (optional manual approval required) into the firewall master-configuration and deployed.
In addition to imitating the current situation, there is plenty of room for improvement: automatic pen-testing, multistage deployments, extending documentation, etc.
Even if at first glance automation seems like it would require less staff, I would assume the opposite. The new approach makes many topics visible which were currently hidden, especially in the quality (consistency, security, efficiency), which have to be dealt with permanently.
You can see within the procedure the risk of building barriers again. However, I see a higher chance of bringing development and IT Operations closer together. In the final stage, both concepts are very similar but approach the problem from two directions.
Do you see the same problems? Do you have ideas for concretization?
I would love to hear your comments!
Greetings from Stuttgart
Malte