Stories by Kartik Dudeja on Medium

FinOps in Kubernetes with OpenCost

Kartik Dudeja — Sun, 19 Oct 2025 05:40:03 GMT

Kubernetes makes it easy to scale workloads, but it also makes costs… slippery. Pods scale up, nodes scale down (hopefully), and suddenly you get a cloud bill that looks like an unsolved puzzle.

That’s where FinOps (Financial Operations) comes in — a practice of bringing financial accountability to cloud spend. And for Kubernetes clusters, OpenCost is one of the best open-source tools to track and optimize your workloads’ cost.

In this workshop, we’ll set up OpenCost in a Kubernetes cluster, collect cost metrics, and build a Grafana dashboard to visualize them. By the end, you’ll have a working FinOps setup for your K8s workloads.

FinOps in Kubernetes — Why it Matters

FinOps is not just about saving money — it’s about creating financial visibility, accountability, and optimization across engineering, operations, and finance.

In Kubernetes, costs are tricky because:

Resources are shared across namespaces, teams, and services.
Autoscaling makes spend dynamic and unpredictable.
Cloud bills don’t map neatly to Kubernetes objects (pods, nodes, namespaces).

FinOps helps bridge that gap by answering questions like:

How much does each team/namespace cost per month?
Which workloads are over-provisioned?
What is the cost impact of autoscaling?
Can we charge back costs to product teams?

Enter OpenCost: the open-source project that measures Kubernetes costs in real-time.

Prerequisites

Before diving in, make sure you have:

A running Kubernetes cluster (on-prem or cloud, minikube works too).
kubectl installed and configured.
helm (Helm 3).
Prometheus + Grafana installed in your cluster. (If not, you can install using the Prometheus & Grafana Helm charts.)

Installing OpenCost on Kubernetes

kubectl create namespace opencost

helm install opencost --repo https://opencost.github.io/opencost-helm-chart opencost \
  --namespace opencost

This deploys OpenCost as a service inside the cluster.

Verifying OpenCost Installation

Check if pods are running:

kubectl get pods -n opencost

You should see something like:

opencost-xxxxxxx   1/1   Running   0   2m

Exploring the OpenCost UI

OpenCost exposes a built-in UI for visualizing cost allocations in real-time.

Port-forward the OpenCost UI

Run:

kubectl port-forward -n opencost svc/opencost 9000:9090

Now open your browser:

http://localhost:9000

You should see the OpenCost dashboard.

Integrating OpenCost with Prometheus

OpenCost exposes cost metrics in Prometheus format at:

http://:9003/metrics

Add a scrape config so Prometheus pulls OpenCost metrics.

Edit your Prometheus config (prometheus.yaml or Helm values):

scrape_configs:
  - job_name: 'opencost'
    honor_labels: true
    static_configs:
      - targets: ['opencost.opencost:9003']

Apply the updated config and restart Prometheus.

Now you can query OpenCost metrics in Prometheus UI.

Adding Custom Pricing in OpenCost

By default, OpenCost uses public cloud list prices (AWS, GCP, Azure) to estimate costs. But in real-world FinOps, you often want to use:

Discounted rates (Reserved Instances, Savings Plans, enterprise agreements)
On-prem pricing (your internal cost per vCPU, GB RAM, GB storage)
Spot instance prices
Blended costs across regions

OpenCost lets you override default prices with a custom pricing configuration file.

Create a custom pricing config

Create a file named custom-pricing.json:

{
  "description": "Custom pricing for on-prem Kubernetes cluster",
  "CPU": "0.02", 
  "RAM": "0.005", 
  "GPU": "0.95", 
  "storage": "0.0002",
  "zoneNetworkEgress": "0.01",
  "internetNetworkEgress": "0.12"
}

Here’s what the fields mean:

CPU → price per vCPU per hour (e.g., $0.02/hr)
RAM → price per GB RAM per hour
GPU → price per GPU per hour
storage → price per GB storage per hour
network egress → per GB network cost

Mount custom pricing in the Helm chart

When installing/upgrading OpenCost with Helm, mount your custom pricing config:

helm upgrade opencost --repo https://opencost.github.io/opencost-helm-chart opencost \
  --namespace opencost \
  --set opencost.customPricing.enabled=true \
  --set-file opencost.customPricing.configMap=custom-pricing.json

Now your cost metrics reflect your actual business costs, not just public cloud pricing. This is critical for accurate chargeback/showback in FinOps.

FinOps Best Practices in Kubernetes

Now that you have visibility, here’s how to turn data into savings:

Showback & Chargeback

Attribute costs to namespaces, teams, or applications.
Create monthly dashboards for finance + engineering.
Use showback (reporting) or chargeback (actual billing).

Rightsizing Workloads

Identify workloads requesting more CPU/Memory than needed.
Use OpenCost + Metrics Server to compare requests vs actual usage.
Tune resource requests/limits to reduce waste.

Eliminate Idle Resources

Spot unused PVCs, idle nodes, or old namespaces.
Set policies for automatic cleanup.

Use Autoscaling Wisely

Scale workloads up/down with HPA/VPA, but track the cost impact.
Sometimes autoscaling saves money, sometimes it spikes spend.

Set Alerts

Use Prometheus + Alertmanager to notify when spend per namespace crosses thresholds.
Example: “Alert if namespace cost > $500/day”.

Optimize Node Mix

Compare workloads to spot if GPU nodes or high-memory nodes are underutilized.
Shift to cheaper node pools when possible.

Optimize & Automate

Once your FinOps dashboards are live, you can take it further:

Budgets & Forecasting → align costs with business units.
Cross-Cluster Costing → monitor multiple clusters at once.
Integrations with Cloud Billing → map OpenCost data to AWS, GCP, or Azure invoices.

With this, you’re not just observing Kubernetes costs — you’re bringing accountability and efficiency to your platform. That’s true FinOps in action.

{
    "author"   :  "Kartik Dudeja",
    "email"    :  "kartikdudeja21@gmail.com",
    "linkedin" :  "https://linkedin.com/in/kartik-dudeja",
    "github"   :  "https://github.com/Kartikdudeja"
}

LLM Observability with OpenTelemetry: A Practical Guide

Kartik Dudeja — Sat, 27 Sep 2025 06:46:57 GMT

Large Language Models (LLMs) have quickly become the backbone of many modern applications — from chatbots to Retrieval-Augmented Generation (RAG) systems. But here’s the challenge: these models often behave like black boxes.

Without observability, we’re left guessing:

Why did the model respond that way?
Which prompt caused this hallucination?
How much are we spending on tokens?
What’s the latency impact of retrieval vs. generation?

This is where OpenTelemetry (OTel) steps in. By instrumenting our LLM applications, we can capture traces, metrics, and logs — turning the black box into a glass box.

Core Observability Signals for LLMs

When instrumenting an LLM app, we focus on:

Request Traces

Span for retrieval (with metadata: source, number of documents, latency).
Span for LLM inference (with metadata: model name, temperature, prompt, response length).

2. Metrics

Request Volume: Counter of incoming user queries.
Request Duration: Histogram for latency distribution.
Token Counters: Number of tokens generated/consumed.
Cost: Gauge or counter for estimated token cost.

3. Logs

Structured logs that capture prompts, responses, and errors.
Correlated with traces via trace IDs.

Tech Stack

LLM runtime: Ollama (local inference of Mistral)
Framework: LangChain
Vector DB: Chroma
Observability: OpenTelemetry Python SDK
Backends: Jaeger (traces), Prometheus (metrics), Loki (logs)

Instrumenting a RAG Application

Let’s consider a simple RAG pipeline:

Use a retriever to fetch relevant documents.
Build a prompt.
Send it to the LLM (e.g., via Ollama).
Return the answer.

With OpenTelemetry, we wrap each stage in spans, collect metrics, and emit logs.

Prerequisites: Building a RAG Application

Before we dive into instrumentation, you’ll need a working RAG (Retrieval-Augmented Generation) application.

If you don’t have one yet, follow this step-by-step tutorial first:

Building a Simple RAG Application with Ollama and LangChain
(this guide walks through setting up embeddings, a vector store, and a basic question-answering loop)

Once you have your RAG pipeline up and running, come back here — we’ll add observability so you can monitor and debug it like a pro.

Observability in Action

Let’s break down how Traces, Metrics, and Logs bring observability to an LLM-powered RAG application.

Traces: Following the Flow

Why Traces Matter
Traces help you follow a user query as it flows through your RAG pipeline — retrieval, prompt building, LLM generation. They provide visibility into where time is spent and what inputs/outputs influenced the result.

Instrumentation Code for Traces

We’ll wire up OpenTelemetry to:

Export traces to Jaeger (via OTLP).

2. Create spans for:

The user question loop.
The retriever call.
The LLM call.

3. Add semantic attributes about:

App metadata (app.name, app.version, etc).
SDK details (telemetry.sdk.language, telemetry.sdk.name).
Query text, model name, retriever results count, etc.

import time
import json

# --- OpenTelemetry Tracing ---
from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# Define resource attributes (metadata about the service)
resource = Resource.create({
    "service.name": "faq-rag",
    "service.version": "1.0.0",
    "app.environment": "dev",
    "app.owner": "observability-team",
    "telemetry.sdk.language": "python",
    "telemetry.sdk.name": "opentelemetry"
})

# --- Configure Tracing ---
trace.set_tracer_provider(TracerProvider(resource=resource))
tracer = trace.get_tracer(__name__)

# Configure OTLP exporter (sending traces to Jaeger/Collector)
otlp_trace_exporter = OTLPSpanExporter(endpoint="http://127.0.0.1:4317", insecure=True)
span_processor = BatchSpanProcessor(otlp_trace_exporter)
trace.get_tracer_provider().add_span_processor(span_processor)

# --- LangChain + Ollama ---
from langchain_ollama.llms import OllamaLLM
from langchain_core.prompts import ChatPromptTemplate
from vector import retriever

# Initialize the Ollama model
model = OllamaLLM(
    model="mistral",
    temperature=0.7,
    top_p=0.9
)

# Define the prompt template
template = """
You are an expert in answering questions about a pizza restaurant.

Here are some relevant reviews: {reviews}

Here is the question to answer: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

# Build pipeline
chain = prompt | model

# --- Interactive Loop ---
while True:

    question = input("Ask your question (q to quit): ")

    if question.lower() == "q":
        break

    start_request = time.time()

    with tracer.start_as_current_span("rag-request") as span:

        span.set_attribute("rag.query", question)

        # --- Retrieval step ---
        with tracer.start_as_current_span("vector-retrieval") as retrieval_span:
            start_retrieval = time.time()
            reviews = retriever.invoke(question)
            retrieval_time = time.time() - start_retrieval
            retrieval_span.set_attribute("retriever.engine", "chroma")
            retrieval_span.set_attribute("retriever.search.k", 5)
            retrieval_span.set_attribute("retriever.latency.ms", retrieval_time * 1000)
            retrieval_span.set_attribute("retriever.documents.count", len(reviews))

            doc_previews = [
                (doc.page_content[:80] + "...") if len(doc.page_content) > 80 else doc.page_content
                for doc in reviews

            ]
            retrieval_span.set_attribute("retriever.documents.preview", json.dumps(doc_previews))

        # --- LLM Call ---
        formatted_prompt = prompt.format_prompt(
            reviews=reviews,
            question=question
        ).to_string()

        # --- LLM step ---
        with tracer.start_as_current_span("llm-call") as llm_span:

            llm_span.set_attribute("llm.provider", "ollama")
            llm_span.set_attribute("llm.model.name", "mistral")
            llm_span.set_attribute("llm.request.temperature", getattr(model, "temperature", None))
            llm_span.set_attribute("llm.request.top_p", getattr(model, "top_p", None))
            llm_span.set_attribute("llm.prompt.details", formatted_prompt)

            start_llm = time.time()

            result = chain.invoke({
                "reviews": reviews,
                "question": question
            })

            llm_latency = time.time() - start_llm
            tokens_in = len(formatted_prompt.split())
            tokens_out = len(str(result).split())
            cost_estimate = (tokens_in + tokens_out) * 0.000001  # fake cost

            # Response metadata
            llm_span.set_attribute("llm.response.details", str(result))
            llm_span.set_attribute("llm.response.tokens.input", tokens_in)
            llm_span.set_attribute("llm.response.tokens.output", tokens_out)
            llm_span.set_attribute("llm.response.tokens.total", tokens_in + tokens_out)
            llm_span.set_attribute("llm.response.cost.usd_estimate", cost_estimate)
            llm_span.set_attribute("llm.latency.ms", llm_latency * 1000)

        span.set_attribute("rag.answer.preview", str(result)[:120])

    print(f"\n{result}")
    print(80 * "-")

Logs: Capturing the Details

Why Logs Matter
Logs give you the raw evidence of what happened inside your LLM pipeline — including prompts, responses, and errors. Unlike traces (timing) and metrics (aggregates), logs capture content and context.

JSON Logging for Better ETL

Instead of plain text logs, it’s best to use structured JSON logs:

Easy to parse with tools like Loki, Elasticsearch, or any ETL pipeline.
Enables filtering and aggregation on fields (trace_id, span_id, user_id, etc.).
Standardized format across services.

With JSON, your observability backend can:

Extract fields for ETL pipelines (e.g., export tokens + cost for billing).
Enable structured search (e.g., “find all requests with doc_count < 2”).
Power dashboards that combine logs + metrics.

Correlating Logs and Traces

To connect logs with traces:

Include trace_id and span_id in every log line.
Use the current span context from OpenTelemetry.

Code (Python with OTel)

import time
import json

# --- OpenTelemetry Tracing ---
from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# --- OpenTelemetry Logging ---
import logging
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter

# Define resource attributes (metadata about the service)
resource = Resource.create({
    "service.name": "faq-rag",
    "service.version": "1.0.0",
    "app.environment": "dev",
    "app.owner": "observability-team",
    "telemetry.sdk.language": "python",
    "telemetry.sdk.name": "opentelemetry"
})

# --- Configure Tracing ---
trace.set_tracer_provider(TracerProvider(resource=resource))
tracer = trace.get_tracer(__name__)

# Configure OTLP exporter (sending traces to Jaeger/Collector)
otlp_trace_exporter = OTLPSpanExporter(endpoint="http://127.0.0.1:4317", insecure=True)
span_processor = BatchSpanProcessor(otlp_trace_exporter)
trace.get_tracer_provider().add_span_processor(span_processor)

# Setup logger provider
logger_provider = LoggerProvider(resource=resource)
log_exporter = OTLPLogExporter(endpoint="http://127.0.0.1:4317", insecure=True)
logger_provider.add_log_record_processor(BatchLogRecordProcessor(log_exporter))

# Custom JSON Formatter
class JSONFormatter(logging.Formatter):
    def format(self, record):
        span = trace.get_current_span()
        span_context = span.get_span_context()
        log_record = {
            "timestamp": self.formatTime(record, self.datefmt),
            "severity": record.levelname,
            "logger": record.name,
            "message": record.getMessage(),
            "trace_id": span_context.trace_id if span_context.is_valid else None,
            "span_id": span_context.span_id if span_context.is_valid else None            
        }

        # Add extra attributes if available
        if hasattr(record, "args") and isinstance(record.args, dict):
            log_record.update(record.args)
        if hasattr(record, "extra") and isinstance(record.extra, dict):
            log_record.update(record.extra)

        return json.dumps(log_record)

# Attach JSON formatter to OTel handler
otel_handler = LoggingHandler(level=logging.INFO, logger_provider=logger_provider)
otel_handler.setFormatter(JSONFormatter())

logging.basicConfig(level=logging.INFO, handlers=[otel_handler])
logger = logging.getLogger("faq-rag")

# --- LangChain + Ollama ---
from langchain_ollama.llms import OllamaLLM
from langchain_core.prompts import ChatPromptTemplate
from vector import retriever

# Initialize the Ollama model
model = OllamaLLM(
    model="mistral",
    temperature=0.7,
    top_p=0.9
)

# Define the prompt template

template = """
You are an expert in answering questions about a pizza restaurant.

Here are some relevant reviews: {reviews}

Here is the question to answer: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

# Build pipeline
chain = prompt | model

# --- Interactive Loop ---
while True:

    question = input("Ask your question (q to quit): ")

    if question.lower() == "q":
        break

    logger.info("Received user query", extra={"query": question})

    start_request = time.time()

    with tracer.start_as_current_span("rag-request") as span:

        span.set_attribute("rag.query", question)

        # --- Retrieval step ---
        with tracer.start_as_current_span("vector-retrieval") as retrieval_span:
            start_retrieval = time.time()
            reviews = retriever.invoke(question)
            retrieval_time = time.time() - start_retrieval

            logger.info("Retrieved documents", extra={
                "query": question,
                "retriever.latency_ms": retrieval_time * 1000,
                "retriever.documents.count": len(reviews),
            })

            retrieval_span.set_attribute("retriever.engine", "chroma")
            retrieval_span.set_attribute("retriever.search.k", 5)
            retrieval_span.set_attribute("retriever.latency.ms", retrieval_time * 1000)
            retrieval_span.set_attribute("retriever.documents.count", len(reviews))

            doc_previews = [
                (doc.page_content[:80] + "...") if len(doc.page_content) > 80 else doc.page_content
                for doc in reviews
            ]
            retrieval_span.set_attribute("retriever.documents.preview", json.dumps(doc_previews))

        # --- LLM Call ---
        formatted_prompt = prompt.format_prompt(
            reviews=reviews,
            question=question
        ).to_string()

        # --- LLM step ---
        with tracer.start_as_current_span("llm-call") as llm_span:
            llm_span.set_attribute("llm.provider", "ollama")
            llm_span.set_attribute("llm.model.name", "mistral")
            llm_span.set_attribute("llm.request.temperature", getattr(model, "temperature", None))
            llm_span.set_attribute("llm.request.top_p", getattr(model, "top_p", None))
            llm_span.set_attribute("llm.prompt.details", formatted_prompt)

            logger.info("Invoking LLM", extra={
                "model": "mistral",
                "temperature": getattr(model, "temperature", None),
                "top_p": getattr(model, "top_p", None),
                "prompt_preview": formatted_prompt[:120],
            })

            start_llm = time.time()

            result = chain.invoke({
                "reviews": reviews,
                "question": question
            })

            llm_latency = time.time() - start_llm
            tokens_in = len(formatted_prompt.split())
            tokens_out = len(str(result).split())
            cost_estimate = (tokens_in + tokens_out) * 0.000001  # fake cost

            logger.info("LLM response generated", extra={
                "latency_ms": llm_latency * 1000,
                "tokens_in": tokens_in,
                "tokens_out": tokens_out,
                "cost_estimate": cost_estimate,
                "answer_preview": str(result)[:120]
            })

            # Response metadata
            llm_span.set_attribute("llm.response.details", str(result))
            llm_span.set_attribute("llm.response.tokens.input", tokens_in)
            llm_span.set_attribute("llm.response.tokens.output", tokens_out)
            llm_span.set_attribute("llm.response.tokens.total", tokens_in + tokens_out)
            llm_span.set_attribute("llm.response.cost.usd_estimate", cost_estimate)

            llm_span.set_attribute("llm.latency.ms", llm_latency * 1000)

        span.set_attribute("rag.answer.preview", str(result)[:120])

    print(f"\n{result}")
    print(80 * "-")

How correlation helps

From a trace in Jaeger, you can jump to the corresponding logs in Loki by filtering on trace_id.
From a log line, you can pivot back to the full trace to see the request lifecycle.
This bridges high-cardinality events (logs) with low-cardinality context (traces).

Metrics: Measuring What Matters

Why Metrics Matter
Metrics provide aggregated, time-series insights into your system. While traces help debug individual requests and logs capture raw details, metrics allow you to monitor trends (e.g., request rates, latency, cost over time).

For an LLM RAG pipeline, the key metrics are:

What Metrics to Collect

Request Volume

Counts how many requests hit the RAG pipeline.
Helps detect traffic spikes, drops, or usage trends.

2. Request Duration

Measures latency of end-to-end RAG queries.
Useful for SLO/SLI dashboards and user experience monitoring.

3. Token Usage

Tracks prompt_tokens and completion_tokens.
Shows efficiency of prompts and cost correlation.

4. Cost Estimation

Approximates $$ cost based on tokens and model pricing.
Useful for FinOps and controlling LLM usage bills.

OTel Metrics Instrumentation

import time
import json

# --- OpenTelemetry Tracing ---
from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# --- OpenTelemetry Metrics ---
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader

# --- OpenTelemetry Logging ---
import logging
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter

# Define resource attributes (metadata about the service)
resource = Resource.create({
    "service.name": "faq-rag",
    "service.version": "1.0.0",
    "app.environment": "dev",
    "app.owner": "observability-team",
    "telemetry.sdk.language": "python",
    "telemetry.sdk.name": "opentelemetry"
})

# --- Configure Tracing ---
trace.set_tracer_provider(TracerProvider(resource=resource))
tracer = trace.get_tracer(__name__)

# Configure OTLP exporter (sending traces to Jaeger/Collector)
otlp_trace_exporter = OTLPSpanExporter(endpoint="http://127.0.0.1:4317", insecure=True)
span_processor = BatchSpanProcessor(otlp_trace_exporter)
trace.get_tracer_provider().add_span_processor(span_processor)

# --- Configure Metrics ---
metric_exporter = OTLPMetricExporter(endpoint="http://127.0.0.1:4317", insecure=True)
reader = PeriodicExportingMetricReader(metric_exporter, export_interval_millis=5000)

provider = MeterProvider(resource=resource, metric_readers=[reader])
metrics.set_meter_provider(provider)
meter = metrics.get_meter(__name__)

# Define custom metrics
request_counter = meter.create_counter(
    "rag_requests_total",
    unit="1",
    description="Total number of RAG requests"
)

request_duration_hist = meter.create_histogram(
    "rag_request_duration_ms",
    unit="ms",
    description="Duration of RAG requests in milliseconds"
)

token_input_counter = meter.create_counter(
    "rag_tokens_input_total",
    unit="tokens",
    description="Total input tokens sent to LLM"
)

token_output_counter = meter.create_counter(
    "rag_tokens_output_total",
    unit="tokens",
    description="Total output tokens generated by LLM"
)

token_total_counter = meter.create_counter(
    "rag_tokens_total",
    unit="tokens",
    description="Total tokens (input + output)"
)

cost_counter = meter.create_counter(
    "rag_cost_usd_total",
    unit="usd",
    description="Estimated total cost of LLM requests"
)

# Setup logger provider
logger_provider = LoggerProvider(resource=resource)
log_exporter = OTLPLogExporter(endpoint="http://127.0.0.1:4317", insecure=True)
logger_provider.add_log_record_processor(BatchLogRecordProcessor(log_exporter))

# Custom JSON Formatter
class JSONFormatter(logging.Formatter):
    def format(self, record):

        span = trace.get_current_span()
        span_context = span.get_span_context()

        log_record = {
            "timestamp": self.formatTime(record, self.datefmt),
            "severity": record.levelname,
            "logger": record.name,
            "message": record.getMessage(),
            "trace_id": span_context.trace_id if span_context.is_valid else None,
            "span_id": span_context.span_id if span_context.is_valid else None            
        }
        # Add extra attributes if available
        if hasattr(record, "args") and isinstance(record.args, dict):
            log_record.update(record.args)
        if hasattr(record, "extra") and isinstance(record.extra, dict):
            log_record.update(record.extra)

        return json.dumps(log_record)

# Attach JSON formatter to OTel handler
otel_handler = LoggingHandler(level=logging.INFO, logger_provider=logger_provider)
otel_handler.setFormatter(JSONFormatter())

logging.basicConfig(level=logging.INFO, handlers=[otel_handler])
logger = logging.getLogger("faq-rag")

# --- LangChain + Ollama ---
from langchain_ollama.llms import OllamaLLM
from langchain_core.prompts import ChatPromptTemplate
from vector import retriever

# Initialize the Ollama model
model = OllamaLLM(
    model="mistral",
    temperature=0.7,
    top_p=0.9
)

# Define the prompt template
template = """
You are an expert in answering questions about a pizza restaurant.

Here are some relevant reviews: {reviews}

Here is the question to answer: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

# Build pipeline
chain = prompt | model

# --- Interactive Loop ---
while True:
    question = input("Ask your question (q to quit): ")

    if question.lower() == "q":
        break

    logger.info("Received user query", extra={"query": question})

    start_request = time.time()
    with tracer.start_as_current_span("rag-request") as span:
        span.set_attribute("rag.query", question)

        # --- Retrieval step ---
        with tracer.start_as_current_span("vector-retrieval") as retrieval_span:
            start_retrieval = time.time()
            reviews = retriever.invoke(question)
            retrieval_time = time.time() - start_retrieval

            logger.info("Retrieved documents", extra={
                "query": question,
                "retriever.latency_ms": retrieval_time * 1000,
                "retriever.documents.count": len(reviews),
            })

            retrieval_span.set_attribute("retriever.engine", "chroma")
            retrieval_span.set_attribute("retriever.search.k", 5)
            retrieval_span.set_attribute("retriever.latency.ms", retrieval_time * 1000)
            retrieval_span.set_attribute("retriever.documents.count", len(reviews))

            doc_previews = [
                (doc.page_content[:80] + "...") if len(doc.page_content) > 80 else doc.page_content
                for doc in reviews
            ]
            
            retrieval_span.set_attribute("retriever.documents.preview", json.dumps(doc_previews))

        # --- LLM Call ---
        formatted_prompt = prompt.format_prompt(
            reviews=reviews,
            question=question
        ).to_string()

        # --- LLM step ---
        with tracer.start_as_current_span("llm-call") as llm_span:

            llm_span.set_attribute("llm.provider", "ollama")
            llm_span.set_attribute("llm.model.name", "mistral")
            llm_span.set_attribute("llm.request.temperature", getattr(model, "temperature", None))
            llm_span.set_attribute("llm.request.top_p", getattr(model, "top_p", None))
            llm_span.set_attribute("llm.prompt.details", formatted_prompt)

            logger.info("Invoking LLM", extra={
                "model": "mistral",
                "temperature": getattr(model, "temperature", None),
                "top_p": getattr(model, "top_p", None),
                "prompt_preview": formatted_prompt[:120],
            })

            start_llm = time.time()
            result = chain.invoke({
                "reviews": reviews,
                "question": question
            })
            llm_latency = time.time() - start_llm

            tokens_in = len(formatted_prompt.split())
            tokens_out = len(str(result).split())
            cost_estimate = (tokens_in + tokens_out) * 0.000001  # fake cost

            logger.info("LLM response generated", extra={
                "latency_ms": llm_latency * 1000,
                "tokens_in": tokens_in,
                "tokens_out": tokens_out,
                "cost_estimate": cost_estimate,
                "answer_preview": str(result)[:120]
            })

            # Response metadata
            llm_span.set_attribute("llm.response.details", str(result))
            llm_span.set_attribute("llm.response.tokens.input", tokens_in)
            llm_span.set_attribute("llm.response.tokens.output", tokens_out)
            llm_span.set_attribute("llm.response.tokens.total", tokens_in + tokens_out)
            llm_span.set_attribute("llm.response.cost.usd_estimate", cost_estimate)
            llm_span.set_attribute("llm.latency.ms", llm_latency * 1000)

            # --- Emit Metrics ---
            request_counter.add(1, {"rag.model": "mistral"})
            request_duration_hist.record((time.time() - start_request) * 1000, {"rag.model": "mistral"})
            token_input_counter.add(tokens_in, {"rag.model": "mistral"})
            token_output_counter.add(tokens_out, {"rag.model": "mistral"})
            token_total_counter.add(tokens_in + tokens_out, {"rag.model": "mistral"})
            cost_counter.add(cost_estimate, {"rag.model": "mistral"})

        span.set_attribute("rag.answer.preview", str(result)[:120])

    print(f"\n{result}")
    print(80 * "-")

Setting up the Observability Stack with Docker Compose

All telemetry (traces, metrics, logs) from your RAG app will flow into the OpenTelemetry Collector first, then get routed to the right backend:

Traces → Jaeger
Metrics → Prometheus
Logs → Loki
Visualization → Grafana

docker-compose.yml

services:

  otel-collector:
    container_name: otel-collector
    hostname: otel-collector
    image: otel/opentelemetry-collector-contrib:latest
    restart: always
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./config/otel-collector/otel-collector-config.yaml:/etc/otel-collector-config.yaml
    networks:
      - llm-obs-lab
    ports:
      - "4317:4317" # OTLP gRPC receiver
      - "4318:4318"

  jaeger:
    container_name: jaeger
    hostname: jaeger
    image: jaegertracing/all-in-one:latest
    restart: always
    volumes:
      - jaegar_data:/var/lib/jaeger
    networks:
      - llm-obs-lab
    ports:
      - "6831:6831/udp" # UDP port for Jaeger agent
      - "16686:16686" # Web UI
      - "14268:14268" # HTTP port for spans

  prometheus:
    container_name: prometheus
    hostname: prometheus
    image: prom/prometheus:latest
    restart: always
    command:
      - --storage.tsdb.retention.time=1d
      - --config.file=/etc/prometheus/prometheus.yml
    volumes:
      - prometheus_data:/prometheus
      - ./config/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
    networks:
      - llm-obs-lab
    ports:
      - "9090:9090"

grafana:
    container_name: grafana
    hostname: grafana
    image: grafana/grafana
    restart: always
    volumes:
      - grafana_data:/var/lib/grafana
      - "./config/grafana/datasources:/etc/grafana/provisioning/datasources"    
    networks:
      - llm-obs-lab
    ports:
      - "3000:3000"

  loki:
    container_name: loki
    hostname: loki
    image: grafana/loki:latest
    restart: always
    command:
      - -config.file=/etc/loki/local-config.yaml
    volumes:
      - loki_data:/loki
      - "./config/loki/loki-config.yaml:/etc/loki/local-config.yaml"
    networks:
      - llm-obs-lab
    ports:
      - "3100:3100"  

networks:
  llm-obs-lab:
    driver: bridge

volumes:
  loki_data: {}
  jaegar_data: {}
  grafana_data: {}  
  prometheus_data: {}

otel-collector-config.yaml

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch: {}

extensions:
  health_check: {}

exporters:

  otlp/jaeger:
    endpoint: jaeger:4317
    tls:
      insecure: true

  prometheus:
    endpoint: "0.0.0.0:9090"

  otlphttp:
    endpoint: http://loki:3100/otlp

service:
  pipelines:

    traces:
      receivers: [otlp]
      processors: [batch]      
      exporters: [otlp/jaeger]

    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp]

    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus]

prometheus.yml

global:
  scrape_interval: 5s
scrape_configs:
  - job_name: 'otel-collector'
    static_configs:
      - targets: ['otel-collector:9090']

loki-config.yaml

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  instance_addr: 127.0.0.1
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

frontend:
  max_outstanding_per_tenant: 2048

pattern_ingester:
  enabled: true

limits_config:
  max_global_streams_per_user: 0
  ingestion_rate_mb: 50000
  ingestion_burst_size_mb: 50000
  volume_enabled: true

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

analytics:
  reporting_enabled: false

Grafana datasources.yaml

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    url: http://prometheus:9090
    access: proxy
    basicAuth: false
    isDefault: true
    jsonData:
      tlsSkipVerify: true
    editable: false

  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    isDefault: false
    version: 1
    editable: false

  - name: Jaeger
    type: jaeger
    access: proxy
    url: http://jaeger:16686
    version: 1
    editable: false

Bring Up the Stack

docker-compose up -d

Jaeger UI → http://localhost:16686
Prometheus UI → http://localhost:9090
Grafana UI → http://localhost:3000 (user: admin, pass: admin)

How Data Flows

Your RAG app exports telemetry via OTLP (4317 gRPC, 4318 HTTP).OTel Collector ingests all telemetry, applies batching, and routes it:

Traces → Jaeger
Metrics → Prometheus (scraped at /metrics)
Logs → Loki
Grafana connects to all three for a unified view.

With this setup, you now have end-to-end observability for your RAG application:

Debug request flow in Jaeger
Track system health with Prometheus
Investigate application logs in Loki
Combine all of the above in Grafana dashboards

How This Improves Observability

Latency analysis: Traces show whether slow responses are due to retrieval or LLM generation.
Cost tracking: Token counts let you estimate $ spend directly from traces.
Debugging hallucinations: Seeing prompts + responses helps you identify if poor answers came from bad retrieval or bad generation.
Model governance: Attributes like model, temperature, top_p let you correlate behavior with configuration.

{
    "author"   :  "Kartik Dudeja",
    "email"    :  "kartikdudeja21@gmail.com",
    "linkedin" :  "https://linkedin.com/in/kartik-dudeja",
    "github"   :  "https://github.com/Kartikdudeja"
}

Going Serverless on Kubernetes with OpenFaaS

Kartik Dudeja — Sat, 30 Aug 2025 05:54:08 GMT

Build, ship, and scale functions — on your own Kubernetes cluster.

1. What is Serverless?

Serverless allows developers to write and deploy code without worrying about the underlying infrastructure. The server still exists — you just don’t manage it.

Instead of provisioning and scaling servers, you:

Write a function
Deploy it
Let the platform handle the rest (scaling, routing, etc.)

Serverless is about developer experience, efficiency, and auto-scaling — perfect for microservices, APIs, and background tasks.

2. What is OpenFaaS?

OpenFaaS (Functions-as-a-Service) is an open-source serverless framework built for Kubernetes and Docker.

Key Features:

Deploy serverless functions in containers
CLI, UI, and REST API support
Built-in Prometheus metrics
Auto-scaling via function invocation count
Supports multiple runtimes (Python, Node.js, Go, Bash, etc.)

Serverless on Kubernetes with OpenFaaS:

OpenFaaS runs as a set of Kubernetes components:

Gateway: Exposes functions over HTTP
Function Pods: Each function is a container
Prometheus: Scrapes function invocation metrics
Autoscaler: Adds/removes replicas based on load

3. Installing OpenFaaS on Kubernetes with Arkade

Arkade is a simple Kubernetes marketplace for installing apps.

Prerequisites

Kubernetes cluster (e.g., minikube, kind, k3s)
kubectl installed
arkade installed:

curl -sLS https://get.arkade.dev | sudo sh

Install OpenFaaS:

arkade install openfaas

It will:

Create openfaas and openfaas-fn namespaces
Deploy the gateway, faas-netes, UI, and Prometheus

4. Accessing the OpenFaaS UI

Get admin password:

PASSWORD=$(kubectl get secret -n openfaas basic-auth \
-o jsonpath="{.data.basic-auth}" | base64 --decode)
echo $PASSWORD

Port-forward the gateway:

kubectl port-forward -n openfaas svc/gateway 8080:8080

Visit: http://localhost:8080

Username: admin
Password: from the command above

5. Creating a Sample Python Function (python3-http)

We’ll use the python3-http template which supports GET/POST with JSON or plain text input.

Step 1: Install the OpenFaaS CLI

curl -sSL https://cli.openfaas.com | sudo sh

faas-cli login --username admin --password $PASSWORD

Pull templates from store supported by openfaas

faas-cli template store pull python3-http

Step 2: Create the function

faas-cli new openfaas-py-fn --lang python3-http

Edit openfaas-py-fn/handler.py:

def handle(event, context):
    name = event.body.decode('utf-8') or "World"
    return {
        "statusCode": 200,
        "body": f"Hello, {name}",
        "headers": {
            "Content-Type": "text/plain"
        }
    }

Step 3: Update stack.yaml file

version: 1.0
provider:
  name: openfaas
  gateway: http://127.0.0.1:8080
functions:
  openfaas-py-fn:
    lang: python3-http
    handler: ./openfaas-py-fn
    image: /openfaas-py-fn:1.1

Step 4: Build and Deploy

faas-cli build -f stack.yaml
faas-cli push -f stack.yaml
faas-cli deploy -f stack.yaml

6. Accessing the Function via cURL

curl -X POST http://localhost:8080/function/openfaas-py-fn -d "Testing"

Or through the UI → Click “Invoke” beside the function.

7. Configuring Auto-Scaling

OpenFaaS autoscaler monitors Prometheus metrics and scales functions automatically.

To customize:

Add this to stack.yaml:

annotations:
  com.openfaas.scale.min: "1"
  com.openfaas.scale.max: "5"

Redeploy:

faas-cli deploy -f stack.yaml

Now, your function can scale up to 5 replicas during high load.

8. Prometheus Metrics and Grafana Dashboard

OpenFaaS installs Prometheus by default.

Access Prometheus:

kubectl port-forward -n openfaas svc/prometheus 9090:9090

Visit http://localhost:9090

Sample queries:

gateway_function_invocation_total
gateway_function_invocation_duration_seconds

Grafana for Function Monitoring

You can install Grafana via Arkade:

arkade install grafana

Then port-forward it:

kubectl port-forward -n default svc/grafana 3000:3000

9. Load Testing with hey and Testing Auto-Scaling

hey is a lightweight load-testing tool.

Install hey:

go install github.com/rakyll/hey@latest

Test Load:

hey -z 10s -c 1 -m POST -d "Load Testing" http://127.0.0.1:8080/function/openfaas-py-fn

Flags:

-z 10s: Run for 10 seconds
-c 1: 1 concurrent users

Observe Auto-Scaling

Check current function replicas:

kubectl get deploy -n openfaas-fn openfaas-py-fn

It should scale up under heavy load.

You can also monitor this behavior live via the OpenFaaS UI or Prometheus.

Final Thoughts

OpenFaaS + Kubernetes brings the best of both worlds:

The flexibility and portability of containers
The simplicity and scalability of serverless

With a single CLI and a UI, OpenFaaS makes it fun to deploy and manage functions — without giving up observability, control, or performance.

{
    "author"   :  "Kartik Dudeja",
    "email"    :  "kartikdudeja21@gmail.com",
    "linkedin" :  "https://linkedin.com/in/kartik-dudeja",
    "github"   :  "https://github.com/Kartikdudeja"
}

OpenTelemetry in Action on Kubernetes: Part 9 — Cluster-Level Observability with OpenTelemetry…

Kartik Dudeja — Fri, 15 Aug 2025 06:10:14 GMT

OpenTelemetry in Action on Kubernetes: Part 9 — Cluster-Level Observability with OpenTelemetry Agent + Gateway

Welcome to the grand finale of our observability series! So far, we’ve added visibility into our application through logs, metrics, and traces — all flowing beautifully into Grafana via OpenTelemetry Collector.

But there’s still one big puzzle piece left: the Kubernetes cluster itself.

In this final part, we’ll:

Collect host and node-level metrics using hostmetrics
Deploy a centralized Collector in Deployment mode (gateway)
Introduce ServiceAccount for permissions
Collect Kubernetes control plane metrics using k8s_cluster
Use the debug exporter to troubleshoot data pipelines
And finally, conclude the series with a high-level recap

Why Cluster-Level Observability Matters

While we’ve focused on application telemetry so far, it’s just one piece of the puzzle. For full visibility, we must also observe the Kubernetes cluster itself — the infrastructure running our apps.

Cluster observability helps us:

Monitor node health and resource usage
Track control plane performance (API server, scheduler, etc.)
Understand pod scheduling and evictions
Improve scaling decisions
Troubleshoot infrastructure-level issues
Strengthen security and governance

In short, without visibility into the cluster, you’re flying blind. This part of the series ensures you’re watching not just the app, but the platform beneath it.

Add hostmetrics Receiver in the Agent

We’ll start by updating our otel-collector-agent (running as DaemonSet) to use the hostmetrics receiver. This receiver scrapes system-level metrics from each node, such as CPU, memory, disk, filesystem, and load.

Config — otel-collector-agent-configmap.yaml

receivers:
  hostmetrics:
    collection_interval: 1m
    scrapers:
      cpu: {}
      memory: {}
      disk: {}
      load: {}
      filesystem: {}
      network: {}
      system: {}
processors:
  memory_limiter:
    check_interval: 1s
    limit_percentage: 80
    spike_limit_percentage: 15
  batch:
    send_batch_size: 1000
    timeout: 5s
exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
    enable_open_metrics: true
    resource_to_telemetry_conversion:
      enabled: true
service:
  pipelines:
    # collect metrics from otlp and hostmetrics receiver and expose in prometheus compatible format
    metrics:
      receivers: [otlp, hostmetrics]
      processors: [memory_limiter, batch]
      exporters: [prometheus]

Each hostmetrics receiver runs inside the agent pod on every node, giving us node-specific insights.

Deploy the OpenTelemetry Gateway

1. Why Deployment Mode?

Deployment Mode is used for centralized collection, aggregation, and export of telemetry data.
Unlike the DaemonSet agent, which runs on each node, a Deployment collector can scrape and process cluster-wide metrics.

2. Create a ServiceAccount, ClusterRole, and ClusterRoleBinding

To use the k8s_cluster receiver, the collector must have permission to access Kubernetes objects like nodes, pods, namespaces, etc.

What is a ServiceAccount in Kubernetes?

A ServiceAccount in Kubernetes is an identity used by pods to authenticate and interact securely with the Kubernetes API. While every pod gets a default ServiceAccount, you often need to create custom ones with specific RBAC (Role-Based Access Control) permissions for security and least privilege.

In our case, the OpenTelemetry Collector needs to read cluster state — like nodes, pods, and namespaces — to collect metrics using the k8s_cluster receiver. So, we create a dedicated ServiceAccount and bind it to a ClusterRole with read-only access to those resources. This ensures our collector can operate properly without over-privileging it.

# otel-collector-gateway-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: otel-collector-gateway-sa
  namespace: observability
  labels:
    app: otel-collector-gateway  
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: otel-collector-gateway-role
  labels:
    app: otel-collector-gateway
rules:
- apiGroups:
  - ""
  resources:
  - events
  - namespaces
  - namespaces/status
  - nodes
  - nodes/spec
  - pods
  - pods/status
  - replicationcontrollers
  - replicationcontrollers/status
  - resourcequotas
  - services
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - apps
  resources:
  - daemonsets
  - deployments
  - replicasets
  - statefulsets
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - extensions
  resources:
  - daemonsets
  - deployments
  - replicasets
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - batch
  resources:
  - jobs
  - cronjobs
  verbs:
  - get
  - list
  - watch
- apiGroups:
    - autoscaling
  resources:
    - horizontalpodautoscalers
  verbs:
    - get
    - list
    - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otel-collector-gateway-binding
  labels:
    app: otel-collector-gateway
subjects:
  - kind: ServiceAccount
    name: otel-collector-gateway-sa
    namespace: observability
roleRef:
  kind: ClusterRole
  name: otel-collector-gateway-role
  apiGroup: rbac.authorization.k8s.io

Apply it:

kubectl -n observability apply -f otel-collector-gateway-serviceaccount.yaml

3. OpenTelemetry Collector Config with k8s_cluster Receiver

Create the config file as a ConfigMap.

# otel-collector-gateway-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-gateway-config
  namespace: observability
  labels:
    app: otel-collector-gateway
data:
  otel-collector-config.yaml: |
    receivers:
      k8s_cluster:
        auth_type: "serviceAccount"
        collection_interval: 30s
    processors:
      memory_limiter:
        check_interval: 1s
        limit_percentage: 80
        spike_limit_percentage: 15
      batch:
        send_batch_size: 1000
        timeout: 5s
    exporters:
      debug:
        verbosity: detailed
      prometheus:
        endpoint: "0.0.0.0:8889"
        enable_open_metrics: true
        resource_to_telemetry_conversion:
          enabled: true
    service:
      pipelines:
        metrics:
          receivers: [k8s_cluster]
          processors: [memory_limiter, batch]
          exporters: [prometheus]

Apply it:

kubectl -n observability apply -f otel-collector-gateway-configmap.yaml

4. Deploy the OpenTelemetry Collector

# otel-collector-gateway-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector-gateway
  namespace: observability
  labels:
    app: otel-collector-gateway  
spec:
  replicas: 1
  revisionHistoryLimit: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%           # Allow 25% more pods than desired during update
      maxUnavailable: 25%     # Allow 25% of desired pods to be unavailable during update
  selector:
    matchLabels:
      app: otel-collector-gateway
  template:
    metadata:
      labels:
        app: otel-collector-gateway
    spec:
      serviceAccountName: otel-collector-gateway-sa
      containers:
        - name: otel-collector
          image: otel/opentelemetry-collector-contrib:latest
          args: ["--config=/conf/otel-collector-config.yaml"]
          volumeMounts:
            - name: config-volume
              mountPath: /conf
          resources:
            requests:
              cpu: 10m
              memory: 32Mi
            limits:
              cpu: 50m
              memory: 128Mi
      volumes:
        - name: config-volume
          configMap:
            name: otel-collector-gateway-config

Apply it

kubectl -n observability apply -f otel-collector-gateway-deployment.yaml

5. Expose Collector to Prometheus

# otel-collector-gateway-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: otel-collector-gateway
  namespace: observability
  labels:
    app: otel-collector-gateway
spec:
  selector:
    app: otel-collector-gateway
  ports:
    - name: otlp-grpc
      port: 4317
      targetPort: 4317
      protocol: TCP
    - name: otlp-http
      port: 4318
      targetPort: 4318
      protocol: TCP
    - name: prometheus
      port: 8889
      targetPort: 8889
      protocol: TCP    
  type: ClusterIP

Apply:

kubectl -n observability apply -f otel-collector-gateway-service.yaml

Then add this to your Prometheus scrape_configs:

- job_name: 'otel-collector-gateway'
  static_configs:
    - targets: ['otel-collector-gateway.observability.svc.cluster.local:8889']

Test and Verify

Check deployment status:

kubectl -n observability get all -l app=otel-collector-gateway

Special Mention: Debug Exporter — Your Observability Wingman

The debug exporter in OpenTelemetry Collector is a lightweight and incredibly helpful tool for developers and DevOps engineers when building or troubleshooting telemetry pipelines.

Instead of exporting telemetry data (like logs, metrics, and traces) to a backend system like Prometheus or Jaeger, the debug exporter simply prints the data to the Collector’s stdout. This means:

You can see exactly what telemetry data is being received and processed — live in the logs.
It helps validate instrumentation quickly, without setting up full observability backends.
It’s especially useful when you’re testing new receivers, processors, or pipelines, and want a quick look at the output.

When to Use

Local testing or dev environments.
Debugging broken data flow — if Prometheus or Jaeger isn’t showing what you expect.
Learning how OpenTelemetry transforms and routes telemetry data.

Example Configuration Snippet

exporters:
  debug:
    verbosity: detailed  # outputs full content of each signal

Then, reference it in your pipeline like this:

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [jaeger, debug]

This ensures traces are sent to Jaeger and also printed to the console — great for double-checking what’s going in.

Conclusion: You Now Have Full Observability!

Over the past 9 parts, you’ve:

Containerized a real ML application
Instrumented it with OpenTelemetry
Collected traces, logs, and metrics
Deployed observability tools in Kubernetes
Visualized everything in Grafana
Monitored the entire Kubernetes cluster with Agent + Gateway mode

You’ve essentially built a production-grade observability platform from scratch — without cloud vendor lock-in.

Missed the previous article?

Check out Part 8: Visualize Everything, Building a Unified Observability Dashboard with Grafana to see how we got here.

{
    "author"   :  "Kartik Dudeja",
    "email"    :  "kartikdudeja21@gmail.com",
    "linkedin" :  "https://linkedin.com/in/kartik-dudeja",
    "github"   :  "https://github.com/Kartikdudeja"
}

OpenTelemetry in Action on Kubernetes: Part 8 — Visualize Everything, Building a Unified…

Kartik Dudeja — Sat, 02 Aug 2025 11:20:17 GMT

OpenTelemetry in Action on Kubernetes: Part 8 — Visualize Everything, Building a Unified Observability Dashboard with Grafana

Why Visualization Matters

Telemetry data — logs, metrics, and traces — gives you deep insights into your system’s behavior. But let’s be honest: staring at JSON traces or YAML logs isn’t exactly thrilling.

That’s where visualization comes in.

A good dashboard:

Gives instant visibility into system health
Helps correlate metrics, logs, and traces
Makes debugging, alerting, and capacity planning effortless

Meet Grafana: The Observatory for Observability

Grafana is an open-source analytics and visualization platform designed to work with various telemetry backends — including:

Prometheus (for metrics)
Loki (for logs)
Jaeger (for traces)

Grafana is:

Pluggable
Real-time
Customizable

It turns raw observability data into actionable dashboards.

Deploying Grafana in Kubernetes

We’ll deploy Grafana with a basic Deployment and Service. You can customize it with a persistent volume or admin credentials if needed.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
  labels:
    app: grafana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
        - name: grafana
          image: grafana/grafana:10.3.1
          resources:
            requests:
              cpu: "10m"
              memory: "56Mi"
            limits:
              cpu: "20m"
              memory: "128Mi"
          ports:
            - containerPort: 3000
          volumeMounts:
            - name: grafana-storage
              mountPath: /var/lib/grafana
          env:
            - name: GF_SECURITY_ADMIN_USER
              value: "admin"
            - name: GF_SECURITY_ADMIN_PASSWORD
              value: "admin"
      volumes:
        - name: grafana-storage
          emptyDir: {}  # Replace with PersistentVolumeClaim for persistence

---
apiVersion: v1
kind: Service
metadata:
  name: grafana
  labels:
    app: grafana
spec:
  selector:
    app: grafana
  ports:
    - protocol: TCP
      port: 3000
      targetPort: 3000
  type: ClusterIP

Deploy Grafana

# Apply deployment and service files
kubectl -n observability apply -f grafana.yaml

# Check Grafana pod logs
kubectl logs -l app=grafana -n observability

# Port-forward Grafana service to access UI locally
kubectl -n observability port-forward svc/grafana 3000:3000

Now visit http://localhost:3000 in your browser.
Default credentials:

Username: admin
Password: admin

Configure Datasources in Grafana

Once inside the Grafana UI, follow these steps to add your observability backends:

Add Prometheus as a Datasource:

Go to Home → Connections → Data sources
Click Add new data source
Choose Prometheus
Set URL to:

http://prometheus.observability.svc.cluster.local:9090

Click Save & Test

Add Loki as a Datasource:

Repeat above steps, choose Loki
Set URL to:

http://loki.observability.svc.cluster.local:3100

Save & Test

Add Jaeger as a Datasource:

Choose Jaeger from the list
Set URL to:

http://jaeger.observability.svc.cluster.local:16686

Save & Test

Explore Logs, Metrics, and Traces

Head over to the Explore tab in Grafana:

Select Loki → Run a log query like

{exporter="OTLP"} |= `house-price-service`

Select Jaeger → Search traces for your app, filtered by service name
Select Prometheus → Query custom app metrics

This is your real-time debugging playground.

Build a Unified Dashboard

Now let’s pull it all together.

Steps to Create a Dashboard:

Go to the Dashboards section → Click New Dashboard
Add a Panel:

For Metrics: Use Prometheus queries (e.g., request rate, latency)
For Logs: Use Loki query (e.g., by app label)
For Traces: Use Jaeger panel or link to trace visualizer

3. Organize the panels side-by-side:

App throughput (metric)
App logs (filtered view)
Recent traces

4. Save the dashboard and give it a name like House Price App Observability

Conclusion

You now have a complete, three-pillar observability stack running on Kubernetes:

Metrics via Prometheus
Logs via Loki
Traces via Jaeger
Visualized in Grafana

All powered by OpenTelemetry — the glue connecting them.

What’s Next?

You now have full visibility into your application — but what about the Kubernetes cluster itself?

In the final part of the series, we’ll expand our observability beyond the app and dive into cluster-level insights. This includes monitoring:

Node and pod CPU/memory usage
Kubernetes control plane metrics
Scheduler performance, kubelet stats, and more

Missed the previous article?

Check out Part 7: Let There Be Logs, Observability’s Final Pillar with Loki to see how we got here.

{
    "author"   :  "Kartik Dudeja",
    "email"    :  "kartikdudeja21@gmail.com",
    "linkedin" :  "https://linkedin.com/in/kartik-dudeja",
    "github"   :  "https://github.com/Kartikdudeja"
}

OpenTelemetry in Action on Kubernetes: Part 7 — Let There Be Logs, Observability’s Final Pillar…

Kartik Dudeja — Wed, 30 Jul 2025 02:39:22 GMT

OpenTelemetry in Action on Kubernetes: Part 7 — Let There Be Logs, Observability’s Final Pillar with Loki

Logs: The Footprints of Your System

Logs are timestamped records of events that happen in your system — like breadcrumbs left behind by your application as it performs operations. They help you understand what happened, when it happened, and often why it happened.

In observability, logs play a key role when:

Metrics show a spike but don’t tell you why.
Traces reveal latency but not the root cause.
You want to debug something that happened at 3 AM… last Thursday.

Meet Loki — Prometheus for Logs

Loki, built by the folks at Grafana Labs, is a log aggregation system designed to be:

Lightweight: It indexes only labels, not the full log content.
Kubernetes-native: Integrates beautifully with pod logs.
Prometheus-like: Designed to feel familiar if you’ve used Prometheus.

Instead of shipping logs to a bulky ELK stack, Loki works smoothly with Promtail, FluentBit, or OpenTelemetry Collector to aggregate logs from across your cluster.

Deploying Loki in Kubernetes

Let’s deploy Loki using a simple YAML manifest that includes:

A Deployment to run the Loki service
A Service to expose Loki inside the cluster
A ConfigMap to configure how Loki receives and stores logs

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: loki-config
  labels:
    app: loki
data:
  loki.yaml: |
    auth_enabled: false
    server:
      http_listen_port: 3100
    common:
      path_prefix: /loki
      ring:
        instance_addr: 127.0.0.1
        kvstore:
          store: inmemory
    ingester_client:
      grpc_client_config:
        max_send_msg_size: 104857600
        max_recv_msg_size: 104857600
      remote_timeout: 5s
    ingester:
      lifecycler:
        ring:
          kvstore:
            store: inmemory
          replication_factor: 1
    schema_config:
      configs:
        - from: 2020-10-27
          store: boltdb-shipper
          object_store: filesystem
          schema: v11
          index:
            prefix: index_
            period: 24h
    storage_config:
      boltdb_shipper:
        active_index_directory: /loki/index
        cache_location: /loki/cache
        shared_store: filesystem
      filesystem:
        directory: /loki/chunks
    limits_config:
      enforce_metric_name: false
      max_streams_per_user: 0
      max_chunks_per_query: 1000000
      max_query_series: 50000
      max_query_lookback: 720h
    ruler:
      storage:
        type: local
        local:
          directory: /loki/rules
      ring:
        kvstore:
          store: inmemory
    analytics:
      reporting_enabled: false

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: loki
  labels:
    app: loki
spec:
  replicas: 1
  selector:
    matchLabels:
      app: loki
  template:
    metadata:
      labels:
        app: loki
    spec:
      containers:
        - name: loki
          image: grafana/loki:2.9.2
          args:
            - "-config.file=/etc/loki/loki.yaml"
          ports:
            - name: http
              containerPort: 3100
          volumeMounts:
            - name: config
              mountPath: /etc/loki
              readOnly: true
            - name: storage
              mountPath: /loki
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
      volumes:
        - name: config
          configMap:
            name: loki-config
        - name: storage
          emptyDir: {}

---

apiVersion: v1
kind: Service
metadata:
  name: loki
  labels:
    app: loki
spec:
  selector:
    app: loki
  ports:
    - name: http-metrics
      port: 3100
      targetPort: 3100

The Loki manifest sets up a log aggregator inside your Kubernetes cluster that listens for incoming logs on a defined port. The service makes Loki accessible to other components, such as the OTEL Collector, while the ConfigMap gives Loki its brain — deciding how logs flow and where they go.

Updating OpenTelemetry Collector to Send Logs to Loki

We now need to tell the OTEL Collector Agent to collect logs using the filelog receiver and ship them off to Loki. Here's the flow:

filelog: Reads logs from Kubernetes pod files.
loki exporter: Pushes these logs to the Loki service using HTTP.

receivers:
  filelog:
    include: [ /var/log/pods/*/*/*.log ]
    start_at: beginning
    include_file_path: true
    include_file_name: true
exporters:
  loki:
    endpoint: "http://loki.observability.svc.cluster.local:3100/loki/api/v1/push"
    tls:
      insecure: true
    sending_queue:
      enabled: true
service:
  pipelines:
    # collect logs using 'filelog' receiver and ship them to loki
    logs:
      receivers: [filelog]
      processors: [memory_limiter, batch]
      exporters: [loki]

Deploying Loki to Kubernetes

# Apply the Loki manifests
kubectl -n observability apply -f loki.yaml

# Verify Loki is running
kubectl -n observability get pods -l app=loki

# check readiness of loki
curl -X GET "http://$(kubectl -n observability get svc -l app=loki -o json | jq -r '.items[].spec.clusterIP'):3100/ready"

What’s Next?

In Part 8, we’ll bring everything together with Grafana — the ultimate observability dashboard:

Visualizing traces from Jaeger
Querying metrics from Prometheus
Searching logs from Loki
All in a single unified interface.

The observability trifecta — complete, powerful, and open source. Stay tuned.

Missed the previous article?

Check out Part 6: Tracking Metrics with Prometheus and OpenTelemetry to see how we got here.

{
    "author"   :  "Kartik Dudeja",
    "email"    :  "kartikdudeja21@gmail.com",
    "linkedin" :  "https://linkedin.com/in/kartik-dudeja",
    "github"   :  "https://github.com/Kartikdudeja"
}

OpenTelemetry in Action on Kubernetes: Part 6 — Tracking Metrics with Prometheus and…

Kartik Dudeja — Sat, 19 Jul 2025 08:42:00 GMT

OpenTelemetry in Action on Kubernetes: Part 6 — Tracking Metrics with Prometheus and OpenTelemetry

Observability isn’t complete without metrics — the vital signs of your applications and services. In this part, we integrate Prometheus into our Kubernetes-based observability stack. You’ll learn how Prometheus works with OpenTelemetry, deploy it into your cluster, and finally visualize custom application metrics generated in Part 2.

What is Prometheus?

Prometheus is an open-source monitoring system that scrapes metrics from configured targets, stores them in a time-series database, and allows you to query them using PromQL. It’s widely adopted in the Kubernetes ecosystem for infrastructure and application monitoring.

Prometheus doesn’t “pull” metrics directly from applications. Instead, apps expose metrics at an endpoint, and Prometheus regularly scrapes these endpoints to collect data.

When integrated with OpenTelemetry, the OpenTelemetry Collector acts as a bridge — it collects metrics from instrumented applications and exposes them in a Prometheus-compatible format.

What are Metrics?

Metrics are numerical data points that capture the health, performance, and resource usage of your system. For example:

API requests per second
Response latency
CPU and memory usage

In our app, we’ve already defined two custom metrics:

api_requests_total: total number of requests per endpoint
api_latency_seconds: histogram for API latency

Now, let’s expose them to Prometheus.

Prometheus Deployment

Let’s deploy Prometheus into our Kubernetes cluster. You’ll need the following YAML file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  labels:
    app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
        - name: prometheus
          image: prom/prometheus:latest
          args:
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.tsdb.path=/prometheus"
            - "--log.level=debug"
          resources:
            requests:
              cpu: "10m"
              memory: "56Mi"
            limits:
              cpu: "20m"
              memory: "128Mi"
          ports:
            - containerPort: 9090
          volumeMounts:
            - name: config-volume
              mountPath: /etc/prometheus/
            - name: storage-volume
              mountPath: /prometheus
      volumes:
        - name: config-volume
          configMap:
            name: prometheus-config
        - name: storage-volume
          emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  labels:
    app: prometheus
spec:
  selector:
    app: prometheus
  ports:
    - protocol: TCP
      port: 9090
      targetPort: 9090
  type: ClusterIP
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
    scrape_configs:
      - job_name: 'prometheus'
        static_configs:
          - targets: ['localhost:9090']
      - job_name: 'otel-collector-agent'
        static_configs:
          - targets: ['otel-collector-agent.observability.svc.cluster.local:8889']

This YAML file deploys Prometheus into the Kubernetes cluster with three key components:

A Deployment that runs the Prometheus server using the official image and mounts a configuration volume,
A Service that exposes Prometheus on port 9090, enabling access to its UI and scrape endpoint, and
A ConfigMap that provides the Prometheus scrape configuration, telling it to scrape metrics from itself and from the OpenTelemetry Collector agent on port 8889.

Together, these resources allow Prometheus to run continuously, collect metrics from OTEL, and expose them for querying and visualization.

Prometheus Config Explained

Here’s a minimal Prometheus configuration we’ll use:

global:
  scrape_interval: 15s
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'otel-collector-agent'
    static_configs:
      - targets: ['otel-collector-agent.observability.svc.cluster.local:8889']

This configuration tells Prometheus to scrape metrics every 15 seconds. It monitors itself (localhost:9090) and also scrapes the OpenTelemetry Collector agent at its service endpoint (otel-collector-agent.observability.svc.cluster.local:8889). This is where our app metrics are exposed.

Updating OpenTelemetry Collector Config

We need to update our OTEL Collector configuration to export metrics to Prometheus.

Here’s the relevant config:

    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317  # receive traces and metrics from instrumented application

    processors:
      memory_limiter:
        check_interval: 1s
        limit_percentage: 80
        spike_limit_percentage: 15

      batch:
        send_batch_size: 1000
        timeout: 5s

    exporters:
      otlp/jaeger:
        endpoint: "http://jaeger.observability.svc.cluster.local:4317"  # export traces to jaeger
        tls:
          insecure: true

      prometheus:
        endpoint: "0.0.0.0:8889"
        enable_open_metrics: true
        resource_to_telemetry_conversion:
          enabled: true

    service:
      pipelines:
        # collect trace data using otlp receiver and send it to jaeger
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [otlp/jaeger]

        # collect metrics from otlp receiver and expose in prometheus compatible format
        metrics:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [prometheus]

Deploying Prometheus to Kubernetes

# Apply Prometheus config and deployment
kubectl -n observability apply -f prometheus.yaml

Visualizing Custom Metrics

Make some API calls to the application:

API_ENDPOINT_IP=$(kubectl -n mlapp get svc -l app=house-price-service -o json | jq -r '.items[].spec.clusterIP')

curl -X POST "http://${API_ENDPOINT_IP}:80/predict/" \
    -H "Content-Type: application/json" \
    -d '{"features": [1200]}'

Open Prometheus UI:

kubectl port-forward svc/prometheus -n observability 9090:9090

In the Prometheus UI (http://localhost:9090), search for:

api_requests_total
api_latency_seconds

You should see data flowing in!

What’s Next?

Now that we’ve captured and visualized metrics, the observability story is coming together. But there’s still one pillar left — logs.

In Part 7, we’ll deploy Loki, the log aggregation system, and configure the OpenTelemetry Collector to ship structured logs from our app to Loki. Stay tuned!

Missed the previous article?

Check out Part 5: Tracing the Lines, Sending Spans from App to Jaeger to see how we got here.

{
    "author"   :  "Kartik Dudeja",
    "email"    :  "kartikdudeja21@gmail.com",
    "linkedin" :  "https://linkedin.com/in/kartik-dudeja",
    "github"   :  "https://github.com/Kartikdudeja"
}

OpenTelemetry in Action on Kubernetes: Part 5 — Tracing the Lines, Sending Spans from App to…

Kartik Dudeja — Sat, 12 Jul 2025 16:03:57 GMT

OpenTelemetry in Action on Kubernetes: Part 5 — Tracing the Lines, Sending Spans from App to Jaeger

In the last part, we set up the OpenTelemetry Collector in agent mode to receive telemetry data from our ML app. But telemetry isn’t useful if it’s just sitting in logs, right? We want end-to-end traces that we can visualize, search, and troubleshoot.

And that’s exactly where Jaeger enters the scene.

What is Jaeger?

Jaeger is an open-source distributed tracing system, originally built by Uber, and now part of the CNCF. It helps you:

Monitor distributed transactions
Understand application latency
Perform root cause analysis
Visualize request flow across services

In short, if your app is a mystery novel, Jaeger is Sherlock Holmes.

What are Traces and Spans?

A trace is a complete journey of a request through your app — from start to finish.
A span is a single step in that journey, like one function call or one external API hit.

Think of a trace as the delivery of a pizza. Every span is a milestone in that process — order placed, pizza prepared, baked, out for delivery, delivered. Jaeger shows you the whole pizza journey.

Jaeger Deployment in Kubernetes

Let’s deploy Jaeger in our Kubernetes cluster.

This YAML configuration sets up a single-instance Jaeger deployment in all-in-one mode within a Kubernetes cluster, suitable for development environments. The deployment uses the jaegertracing/all-in-one image and exposes key ports for telemetry (OTLP gRPC on 4317) and visualization (UI on 16686).

The associated ClusterIP service allows internal communication within the cluster, enabling the OpenTelemetry Collector to send trace data to Jaeger and providing access to the Jaeger UI via port forwarding for trace analysis and visualization.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: jaeger
  labels:
    app: jaeger
spec:
  replicas: 1
  selector:
    matchLabels:
      app: jaeger
  template:
    metadata:
      labels:
        app: jaeger
    spec:
      containers:
      - name: jaeger
        image: jaegertracing/all-in-one:latest
        resources:
          requests:
            cpu: "10m"
            memory: "128Mi"
          limits:
            cpu: "20m"
            memory: "256Mi"        
        ports:
        - containerPort: 4317
        - containerPort: 6831
        - containerPort: 16686
        - containerPort: 14250
---
apiVersion: v1
kind: Service
metadata:
  name: jaeger
spec:
  selector:
    app: jaeger
  type: ClusterIP    
  ports:
  - name: ui
    port: 16686
    targetPort: 16686
  - name: grpc
    port: 4317
    targetPort: 4317

Save this configuration in a yaml file jaeger.yaml and deploy the jaeger using the following command:

kubectl -n observability apply -f jaeger.yaml

You can verify it’s up using:

kubectl -n observability get all -l app=jaeger

The Jaeger UI will be available at the service’s ClusterIP. Use kubectl port-forward to access the Jaeger UI locally:

kubectl -n observability port-forward svc/jaeger 16686:16686

Now open http://localhost:16686 in your browser.

Update the OpenTelemetry Collector Pipeline

Now that Jaeger is live, we need to update the OpenTelemetry Collector config to export spans to Jaeger.

# otel-collector-agent-configmap.yaml
apiVerson: v1
kind: ConfigMap
metadata:
  name: otel-collector-agent-config
  namespace: observability
data:
  otel-collector-config.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317  # receive traces and metrics from instrumented application
    processors:
      memory_limiter:
        check_interval: 1s
        limit_percentage: 80
        spike_limit_percentage: 15
      batch:
        send_batch_size: 1000
        timeout: 5s
    exporters:
      otlp/jaeger:
        endpoint: "http://jaeger.observability.svc.cluster.local:4317"  # export traces to jaeger
        tls:
          insecure: true
    service:
      pipelines:
        # collect trace data using otlp receiver and send it to jaeger
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [otlp/jaeger]

Apply the updated ConfigMap:

kubectl -n observability apply -f otel-collector-agent-configmap.yaml

Rollout the collector to pick up the new config:

kubectl -n observability rollout restart deployment otel-collector-agent

Test the Setup

Get the Endpoint IP from the K8s service:

API_ENDPOINT_IP=$(kubectl -n mlapp get svc -l app=house-price-service -o json | jq -r '.items[].spec.clusterIP')

Test it locally using curl or Postman:

curl -X POST "http://${API_ENDPOINT_IP}:80/predict/" \
  -H "Content-Type: application/json" \
  -d '{"features": [1200]}'

View Traces in Jaeger UI

Open Jaeger UI in your browser.

Select the house-price-service
Hit Find Traces
Voilà! You can now trace requests, view span timings, and debug latency in style.

Up Next: From Spans to Stats — Let’s Talk Metrics

Now that Jaeger is live and humming — collecting traces and giving us deep insights into our application’s behavior — it’s time to turn our attention to the second pillar of observability: metrics.

Stay tuned as we wire up Prometheus and bring metrics into the mix, completing another piece of our observability blueprint.

{
    "author"   :  "Kartik Dudeja",
    "email"    :  "kartikdudeja21@gmail.com",
    "linkedin" :  "https://linkedin.com/in/kartik-dudeja",
    "github"   :  "https://github.com/Kartikdudeja"
}

OpenTelemetry in Action on Kubernetes: Part 4 — Deploying OpenTelemetry Collector (Agent Mode)…

Kartik Dudeja — Mon, 07 Jul 2025 03:13:40 GMT

OpenTelemetry in Action on Kubernetes: Part 4 — Deploying OpenTelemetry Collector (Agent Mode) in Kubernetes

Welcome back, observability artisans! So far in our series:

We trained a simple ML model and wrapped it in a FastAPI app (Part 1).
We instrumented it with OpenTelemetry to emit traces, metrics, and logs (Part 2).
We dockerized and deployed the app in Kubernetes (Part 3).

Now it’s time to build the telemetry pipeline by deploying the OpenTelemetry Collector in agent mode. Think of it as your app’s personal observability sidekick — sitting beside your pod, collecting and forwarding telemetry like a seasoned ops ninja.

What Is the OpenTelemetry Collector?

The OpenTelemetry Collector is a vendor-agnostic service that can receive, process, and export telemetry data (metrics, logs, and traces). It acts like a modular observability router.

In agent mode, it’s typically deployed as a DaemonSet, meaning one collector pod runs on each node — perfect for scraping local app telemetry.

The Collector Pipeline — A Three-Stage Flow

The pipeline is made up of:

1. Receivers

These are the collectors’ “ears.” They listen for telemetry data from your app.
Example: OTLP receiver listens on port 4317 for gRPC telemetry.

Analogy: Like a parcel dropbox at the post office — it accepts incoming packages (telemetry).

2. Processors

Processors act like post-office sorters — they batch, sample, or modify telemetry before export.
Example: Batching to reduce load or adding attributes to spans.

Analogy: Sorting parcels by zip code before shipping.

3. Exporters

Exporters are your delivery trucks. They ship telemetry off to destinations like Prometheus, Jaeger, or Loki.

Analogy: The final delivery van that takes your parcel to your house.

Configuration in Kubernetes: The ConfigMap

We store our OpenTelemetry pipeline config in a Kubernetes ConfigMap — a way to inject config data into pods as files or environment variables.

Step-by-Step: Deploying Otel Collector (Agent)

We’ll deploy three components:

1. ConfigMap (Collector Pipeline)

# otel-collector-agent-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-agent-config
  namespace: observability
data:
  otel-collector-config.yaml: |
receivers:
      otlp:
        protocols:
          grpc:
    processors:      
      batch:
    exporters:
      debug:
        verbosity: detailed
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [debug]

This simple pipeline receives traces using OTLP (gRPC), batches them, and prints them to stdout using a debug exporter. We’ll replace this with Jaeger later in Part 5.

Deep Dive: Key Components of the OpenTelemetry Pipeline

Receiver: otlp with grpc Protocol

receivers:
  otlp:
    protocols:
      grpc:

What it does:

The receiver is the entry point into the Collector. In this case, we’re telling the collector to accept data over the OTLP (OpenTelemetry Protocol) using the gRPC transport.

OTLP is the default protocol for OpenTelemetry.
gRPC is a high-performance, open-source RPC framework — it’s fast, efficient, and used widely in modern telemetry systems.

Processor: batch

processors:
  batch:

What it does:

Processors manipulate or enhance telemetry after it’s received but before it’s exported.

The batch processor is highly recommended in most pipelines. It collects telemetry data in small batches and sends them together instead of one at a time. This improves performance and reduces resource usage.

Benefits:

Reduces the number of outgoing requests.
Improves throughput by sending larger payloads.
Helps smooth out traffic spikes.

Exporter: debug

exporters:
  debug:
    verbosity: detailed

What it does:

Exporters are responsible for sending telemetry to an external backend (e.g., Jaeger, Prometheus, Datadog).

In this case, we’re using the debug exporter — which doesn’t send data to an external system but prints it to stdout.

verbosity: detailed means it will output detailed telemetry, including span names, attributes, and events.
This is great for local testing or debugging, but not suitable for production.

2. Deployment (DaemonSet — Agent Mode)

# otel-collector-agent-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-collector-agent
  namespace: observability
spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1  # One pod at a time will be unavailable during update
  selector:
    matchLabels:
      app: otel-collector-agent
  template:
    metadata:
      labels:
        app: otel-collector-agent
    spec:
      containers:
        - name: otel-collector
          image: otel/opentelemetry-collector-contrib:latest
          args: ["--config=/conf/otel-collector-config.yaml"]
          resources:
            requests:
              cpu: 10m
              memory: 32Mi
            limits:
              cpu: 50m
              memory: 128Mi          
          volumeMounts:
            - name: config-volume
              mountPath: /conf
            - name: varlog
              mountPath: /var/log
      volumes:
        - name: config-volume
          configMap:
            name: otel-collector-agent-config
        - name: varlog
          hostPath:
            path: /var/log

3. Service (Internal Communication)

# otel-collector-agent-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: otel-collector-agent
  labels:
    app: otel-collector-agent
spec:
  selector:
    app: otel-collector-agent
  ports:
    - name: otlp-grpc
      port: 4317
      targetPort: 4317
      protocol: TCP
    - name: otlp-http
      port: 4318
      targetPort: 4318
      protocol: TCP
    - name: prometheus
      port: 8889
      targetPort: 8889
      protocol: TCP    
  type: ClusterIP

Deploying with kubectl

# create a new namespace
kubectl create namespace observability

Deploy the Collector:

kubectl -n observability apply -f otel-collector-agent-configmap.yaml -f otel-collector-agent-service.yaml -f otel-collector-agent-daemonset.yaml

To check the status:

kubectl -n observability get all -l app=otel-collector-agent

Updating the App Deployment

We now need to add the OTLP endpoint to the app as an environment variable.

When you create a Kubernetes Service, it gets a DNS name like this:

..svc.cluster.local

So our service name otel-collector-agent in namespace observability is reachable at:

otel-collector-agent.observability.svc.cluster.local:4317

Magic, courtesy of Kubernetes DNS.

Adding the OTEL_EXPORTER_OTLP_ENDPOINT environment variable in your application deployment tells the OpenTelemetry SDK where to send telemetry data (traces, metrics, and logs). This line effectively connects your instrumented app to the OpenTelemetry Collector, acting as the central receiver and router for all observability signals within the Kubernetes environment.

env:
  - name: OTEL_EXPORTER_OTLP_ENDPOINT
    value: "http://otel-collector-agent.observability.svc.cluster.local:4317"
  - name: OTEL_EXPORTER_OTLP_INSECURE
    value: "true"

You’ll insert this under the container spec in your house-price-app.yaml.

After adding the above config, we will have to apply new changes to Application Deployment:

kubectl -n mlapp apply -f house-price-app.yaml

You can check the deployment rollout status with the following command:

kubectl -n mlapp rollout status deployment house-price-service

What’s Next?

In Part 5, we’ll deploy Jaeger, a UI-based distributed tracing tool, and rewire our OTEL pipeline to send trace data there. You’ll get to see spans, visualize your API behavior, and debug latency like a real tracing wizard.

{
    "author"   :  "Kartik Dudeja",
    "email"    :  "kartikdudeja21@gmail.com",
    "linkedin" :  "https://linkedin.com/in/kartik-dudeja",
    "github"   :  "https://github.com/Kartikdudeja"
}

OpenTelemetry in Action on Kubernetes: Part 3 — Deploying the Application on Kubernetes

Kartik Dudeja — Sat, 28 Jun 2025 13:40:06 GMT

OpenTelemetry in Action on Kubernetes: Part 3 — Deploying the Application on Kubernetes

Deploying Our Instrumented ML App to Kubernetes

Welcome to Part 3! If you’ve followed along so far, by the end of Part 2 you had:

A FastAPI-based machine learning app
Instrumented with OpenTelemetry for full-stack observability
Dockerized and ready to ship

Now, it’s time to bring in the big orchestration guns — Kubernetes.

Understanding Kubernetes Deployment & Service

Before we throw YAML at a cluster, let’s understand what these two crucial building blocks do:

Deployment

A Deployment in Kubernetes manages a set of replicas (identical Pods running our app). It provides:

Declarative updates: You describe what you want, K8s makes it so.
Rolling updates: Smooth upgrades without downtime.
Self-healing: If a Pod dies, K8s spins up a new one.

Think of it as a smart manager for your app’s pods.

Service

A Service exposes your app inside the cluster (or externally, if needed). It:

Provides a stable DNS name.
Load balances traffic between pods.
In our case, exposes:
Port 80 → App port 8000 (FastAPI HTTP)
Port 4317 → OTLP gRPC (Telemetry)

Kubernetes Manifest Breakdown

Let’s break down the configuration:

Deployment: house-price-service

apiVersion: apps/v1
kind: Deployment
metadata:
  name: house-price-service

We declare a Deployment that manages our app.

spec:
  replicas: 2

We want 2 replicas of our app running — high availability for the win.

strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%

Kubernetes will update pods gracefully. It allows some extra pods during rollout and ensures some stay alive.

containers:
        - name: app
          image: house-price-predictor:v2

We use the Docker image built in Part 2, deployed as a container.

ports:
            - containerPort: 8000   # App port
            - containerPort: 4317   # OTLP telemetry port

Complete Deployment Manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: house-price-service
  labels:
    app: house-price-service
spec:
  replicas: 2
  revisionHistoryLimit: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%           # Allow 25% more pods than desired during update
      maxUnavailable: 25%     # Allow 25% of desired pods to be unavailable during update
  selector:
    matchLabels:
      app: house-price-service
  template:
    metadata:
      labels:
        app: house-price-service
    spec:
      containers:
        - name: app
          image: house-price-predictor:v2
          imagePullPolicy: IfNotPresent
          resources:
            requests:
              cpu: "10m"
              memory: "128Mi"
            limits:
              cpu: "20m"
              memory: "256Mi"
          ports:
            - containerPort: 8000   # Application Port
            - containerPort: 4317   # OTLP gRPC Port

Service: house-price-service

apiVersion: v1
kind: Service
metadata:
  name: house-price-service
  labels:
    app: house-price-service

This ClusterIP Service lets other K8s workloads communicate with our app.

ports:
    - port: 80
      targetPort: 8000
    - port: 4317
      targetPort: 4317

The Service maps:

Port 80 → App HTTP server
Port 4317 → For OTLP spans, metrics, logs

Complete Service Manifest File:

apiVersion: v1
kind: Service
metadata:
  name: house-price-service
  labels:
    app: house-price-service  
spec:
  selector:
    app: house-price-service
  ports:
    - name: http
      protocol: TCP
      port: 80
      targetPort: 8000
    - name: otlp-grpc
      protocol: TCP
      port: 4317
      targetPort: 4317
  type: ClusterIP

Add both in the one file: house-price-app.yaml

Deploying with kubectl

Before deploying the app, let’s create a Kubernetes namespace. This helps group related resources together.

kubectl create namespace mlapp

Run the following to deploy your app:

kubectl -n mlapp apply -f house-price-app.yaml

To check the deployment status:

kubectl -n mlapp get deployments
kubectl -n mlapp get pods

To see pod logs (structured JSON + OpenTelemetry info):

kubectl -n mlapp logs -f -l app=house-price-service

To view the exposed service:

kubectl -n mlapp get svc -l app=house-price-service

Testing the App in Kubernetes

Get the Endpoint IP from the K8s service:

API_ENDPOINT_IP=$(kubectl -n mlapp get svc -l app=house-price-service -o json | jq -r '.items[].spec.clusterIP')

Test it locally using curl or Postman:

curl -X POST "http://${API_ENDPOINT_IP}:80/predict/" \
  -H "Content-Type: application/json" \
  -d '{"features": [1200]}'

You should get a prediction response like:

{"predicted_price": 170000.0}

And voilà — telemetry data is flowing.

What’s Next: Meet the OpenTelemetry Collector

In Part 4, we’ll introduce the OpenTelemetry Collector Agent:

Deploy it as a DaemonSet alongside your app
Configure it to collect traces, metrics, and logs
Route the data to a gateway, and onward to backends like Prometheus, Jaeger, and Loki

TL;DR: It’s where the real observability magic begins.

{
    "author"   :  "Kartik Dudeja",
    "email"    :  "kartikdudeja21@gmail.com",
    "linkedin" :  "https://linkedin.com/in/kartik-dudeja",
    "github"   :  "https://github.com/Kartikdudeja"
}