We are building the soul of your ITOps team

Vinod Jayaraman
4 min readApr 25, 2024

--

All problems in computer science can be solved by another level of indirection” David Wheeler

That aphorism should be familiar to every software engineer. I have taken this to heart over the course of my career and almost every product I have built has its roots in virtualization — be it network, compute, or storage virtualization.

Building software with layers of abstraction makes sense. The cloud-native revolution is built on these principles: modularize, containerize, and distribute microservices.

The Problem of too many layers of indirection

While the principles mentioned above helped software developers write better code and deploy faster, it came at an operational cost. To make sense of the highly complicated dependencies between the different layers, a number of observability and monitoring products have been created. As an industry, we set out to solve this problem by creating a standardized way of communicating between layers using Open Metrics, Open Telemetry, and eBPF to name a few. But that only got us so far, and now we are getting inundated with this telemetry. The problem is that this amount of data cannot be processed by humans. Nor can static dashboards adapt or capture the state of highly dynamic environments. At least not in real-time.

Given infinite time and effort, humans can sift through all this data. I know this first hand — I spent an inordinate amount of time debugging complicated cloud-native application deployments. We were successful in solving the most complicated of problems but we had to rely on a select group of highly competent engineers. There is not enough skilled talent, and those who do exist, should use their time building the next generation of software.

Correlating metrics, logs, and tracing data created by layers of indirection needs a new approach and we are re-imagining how this can be built at NeuBird.

NeuBird’s born in the Gen AI Era

We are building a new runtime — for a new Kernel

In our previous lives, when we approached a problem in the field — we first built a general understanding of the deployment environment, looked at the problem from a high level, and then peeled the different layers of the onion. Each step was based on data seen thus far and knowledge of where to go next. — But this does not scale — there are too many layers for a human to peel and too few people who know where to go next.

At NeuBird, we’re taking a GenAI Native approach to replicate this. We are building a fine-tuned pipeline on top of LLMs that can do the humongous task of analyzing and correlating hundreds of thousands of lines of logs, metrics, traces, and other telemetry associated with the modern software stack. Using a sequence of targeted convolutional filters, we can quickly identify the cause of the problem and come up with a solution in real-time.

Building a new runtime for a new kernel

The LLM, as the new kernel, comes with infinite knowledge but the information is not reliable. Building on top of the LLM needs new primitives and a different programming model. Our approach to building primitives on top of this kernel is heavily influenced by the principles of Unix: modularity, composition, and simplicity. The programming model is a filter chain that embodies a chain of thought where each filter, building upon the knowledge transferred from the previous filter, works on an isolated part and, together, the filters solve one segment of the problem. In our world, these filters rely on infrastructure maps, logs, and metrics to perform their unit of work. The filter chain operating system provides filters with the runtime primitives of scheduling, asynchronous execution, memory, isolation, and tracing.

Retain abstractions yet remove operational complexity

Filter chains are extensible and composed of models trained to connect the dots across the different layers in the modern and complex infrastructure environment. We are solving the problem of correlating the layers of indirection. Retain abstractions yet remove operational complexity

“All problems”… Except the ones created by too many levels of indirection

So goes the corollary to the aphorism cited in the title. Armed with this new runtime environment with trained filters running on the LLM kernel, our mission is to solve the complexity of the modern software stack.

NeuBird is creating a cognitive ITOps workforce that is on the front lines, always on the on-call roster. We’re awake at 3 am and we’ll answer the first PagerDuty call — we are the soul of your new ITOps team.

--

--