<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Kartik Dudeja on Medium]]></title>
        <description><![CDATA[Stories by Kartik Dudeja on Medium]]></description>
        <link>https://medium.com/@kartikdudeja21?source=rss-14d13224d533------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/0*7xcdJOid6VP4HmJh</url>
            <title>Stories by Kartik Dudeja on Medium</title>
            <link>https://medium.com/@kartikdudeja21?source=rss-14d13224d533------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Wed, 20 May 2026 13:39:27 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@kartikdudeja21/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[FinOps in Kubernetes with OpenCost]]></title>
            <link>https://medium.com/@kartikdudeja21/finops-in-kubernetes-with-opencost-405560414928?source=rss-14d13224d533------2</link>
            <guid isPermaLink="false">https://medium.com/p/405560414928</guid>
            <category><![CDATA[devops]]></category>
            <category><![CDATA[cloud-finops]]></category>
            <category><![CDATA[finops]]></category>
            <category><![CDATA[opencost]]></category>
            <category><![CDATA[k8s]]></category>
            <dc:creator><![CDATA[Kartik Dudeja]]></dc:creator>
            <pubDate>Sun, 19 Oct 2025 05:40:03 GMT</pubDate>
            <atom:updated>2025-10-19T05:40:03.395Z</atom:updated>
            <content:encoded><![CDATA[<p>Kubernetes makes it easy to scale workloads, but it also makes costs… slippery. Pods scale up, nodes scale down (hopefully), and suddenly you get a cloud bill that looks like an unsolved puzzle.</p><p>That’s where <strong>FinOps</strong> (Financial Operations) comes in — a practice of bringing financial accountability to cloud spend. And for Kubernetes clusters, <strong>OpenCost</strong> is one of the best open-source tools to track and optimize your workloads’ cost.</p><p>In this workshop, we’ll set up OpenCost in a Kubernetes cluster, collect cost metrics, and build a Grafana dashboard to visualize them. By the end, you’ll have a working FinOps setup for your K8s workloads.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/812/1*86Nb0sFQZ8ty1Efw7kUX6Q.png" /></figure><h3>FinOps in Kubernetes — Why it Matters</h3><p>FinOps is not just about saving money — it’s about creating <strong>financial visibility, accountability, and optimization</strong> across engineering, operations, and finance.</p><p>In Kubernetes, costs are tricky because:</p><ul><li>Resources are <strong>shared across namespaces, teams, and services</strong>.</li><li>Autoscaling makes spend <strong>dynamic and unpredictable</strong>.</li><li>Cloud bills don’t map neatly to <strong>Kubernetes objects (pods, nodes, namespaces)</strong>.</li></ul><p>FinOps helps bridge that gap by answering questions like:</p><ul><li>How much does each <strong>team/namespace</strong> cost per month?</li><li>Which workloads are <strong>over-provisioned</strong>?</li><li>What is the cost impact of <strong>autoscaling</strong>?</li><li>Can we <strong>charge back</strong> costs to product teams?</li></ul><p>Enter <strong>OpenCost</strong>: the open-source project that measures Kubernetes costs in real-time.</p><h3>Prerequisites</h3><p>Before diving in, make sure you have:</p><ul><li>A running Kubernetes cluster (on-prem or cloud, minikube works too).</li><li>kubectl installed and configured.</li><li>helm (Helm 3).</li><li>Prometheus + Grafana installed in your cluster. (If not, you can install using the <strong>Prometheus &amp; Grafana Helm charts</strong>.)</li></ul><h3>Installing OpenCost on Kubernetes</h3><pre>kubectl create namespace opencost<br><br>helm install opencost --repo https://opencost.github.io/opencost-helm-chart opencost \<br>  --namespace opencost</pre><p>This deploys OpenCost as a service inside the cluster.</p><h3>Verifying OpenCost Installation</h3><p>Check if pods are running:</p><pre>kubectl get pods -n opencost</pre><p>You should see something like:</p><pre>opencost-xxxxxxx   1/1   Running   0   2m</pre><h3>Exploring the OpenCost UI</h3><p>OpenCost exposes a <strong>built-in UI</strong> for visualizing cost allocations in real-time.</p><h3>Port-forward the OpenCost UI</h3><p>Run:</p><pre>kubectl port-forward -n opencost svc/opencost 9000:9090</pre><p>Now open your browser:</p><p><a href="http://localhost:9000/">http://localhost:9000</a></p><p>You should see the <strong>OpenCost dashboard</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/0*7PAYxKt9lgCxZmV0.png" /></figure><h3>Integrating OpenCost with Prometheus</h3><p>OpenCost exposes cost metrics in <strong>Prometheus format</strong> at:</p><pre>http://&lt;opencost-service&gt;:9003/metrics</pre><p>Add a <strong>scrape config</strong> so Prometheus pulls OpenCost metrics.</p><p>Edit your Prometheus config (prometheus.yaml or Helm values):</p><pre>scrape_configs:<br>  - job_name: &#39;opencost&#39;<br>    honor_labels: true<br>    static_configs:<br>      - targets: [&#39;opencost.opencost:9003&#39;]</pre><p>Apply the updated config and restart Prometheus.</p><p>Now you can query OpenCost metrics in Prometheus UI.</p><h3>Adding Custom Pricing in OpenCost</h3><p>By default, OpenCost uses <strong>public cloud list prices</strong> (AWS, GCP, Azure) to estimate costs. But in real-world FinOps, you often want to use:</p><ul><li><strong>Discounted rates</strong> (Reserved Instances, Savings Plans, enterprise agreements)</li><li><strong>On-prem pricing</strong> (your internal cost per vCPU, GB RAM, GB storage)</li><li><strong>Spot instance prices</strong></li><li><strong>Blended costs across regions</strong></li></ul><p>OpenCost lets you override default prices with a <strong>custom pricing configuration file</strong>.</p><h3>Create a custom pricing config</h3><p>Create a file named custom-pricing.json:</p><pre>{<br>  &quot;description&quot;: &quot;Custom pricing for on-prem Kubernetes cluster&quot;,<br>  &quot;CPU&quot;: &quot;0.02&quot;, <br>  &quot;RAM&quot;: &quot;0.005&quot;, <br>  &quot;GPU&quot;: &quot;0.95&quot;, <br>  &quot;storage&quot;: &quot;0.0002&quot;,<br>  &quot;zoneNetworkEgress&quot;: &quot;0.01&quot;,<br>  &quot;internetNetworkEgress&quot;: &quot;0.12&quot;<br>}</pre><p>Here’s what the fields mean:</p><ul><li><strong>CPU</strong> → price per vCPU per hour (e.g., $0.02/hr)</li><li><strong>RAM</strong> → price per GB RAM per hour</li><li><strong>GPU</strong> → price per GPU per hour</li><li><strong>storage</strong> → price per GB storage per hour</li><li><strong>network egress</strong> → per GB network cost</li></ul><h3>Mount custom pricing in the Helm chart</h3><p>When installing/upgrading OpenCost with Helm, mount your custom pricing config:</p><pre>helm upgrade opencost --repo https://opencost.github.io/opencost-helm-chart opencost \<br>  --namespace opencost \<br>  --set opencost.customPricing.enabled=true \<br>  --set-file opencost.customPricing.configMap=custom-pricing.json</pre><p>Now your cost metrics reflect <strong>your actual business costs</strong>, not just public cloud pricing. This is critical for accurate <strong>chargeback/showback</strong> in FinOps.</p><h3>FinOps Best Practices in Kubernetes</h3><p>Now that you have visibility, here’s how to turn data into savings:</p><h3>Showback &amp; Chargeback</h3><ul><li>Attribute costs to <strong>namespaces, teams, or applications</strong>.</li><li>Create monthly dashboards for finance + engineering.</li><li>Use showback (reporting) or chargeback (actual billing).</li></ul><h3>Rightsizing Workloads</h3><ul><li>Identify workloads requesting more CPU/Memory than needed.</li><li>Use OpenCost + Metrics Server to compare requests vs actual usage.</li><li>Tune resource requests/limits to reduce waste.</li></ul><h3>Eliminate Idle Resources</h3><ul><li>Spot unused PVCs, idle nodes, or old namespaces.</li><li>Set policies for automatic cleanup.</li></ul><h3>Use Autoscaling Wisely</h3><ul><li>Scale workloads up/down with HPA/VPA, but track the <strong>cost impact</strong>.</li><li>Sometimes autoscaling saves money, sometimes it spikes spend.</li></ul><h3>Set Alerts</h3><ul><li>Use Prometheus + Alertmanager to notify when spend per namespace crosses thresholds.</li><li>Example: “Alert if namespace cost &gt; $500/day”.</li></ul><h3>Optimize Node Mix</h3><ul><li>Compare workloads to spot if <strong>GPU nodes</strong> or <strong>high-memory nodes</strong> are underutilized.</li><li>Shift to cheaper node pools when possible.</li></ul><h3>Optimize &amp; Automate</h3><p>Once your FinOps dashboards are live, you can take it further:</p><ul><li><strong>Budgets &amp; Forecasting</strong> → align costs with business units.</li><li><strong>Cross-Cluster Costing</strong> → monitor multiple clusters at once.</li><li><strong>Integrations with Cloud Billing</strong> → map OpenCost data to AWS, GCP, or Azure invoices.</li></ul><p>With this, you’re not just observing Kubernetes costs — you’re bringing <strong>accountability and efficiency</strong> to your platform. That’s true <strong>FinOps in action</strong>.</p><pre>{<br>    &quot;author&quot;   :  &quot;Kartik Dudeja&quot;,<br>    &quot;email&quot;    :  &quot;kartikdudeja21@gmail.com&quot;,<br>    &quot;linkedin&quot; :  &quot;https://linkedin.com/in/kartik-dudeja&quot;,<br>    &quot;github&quot;   :  &quot;https://github.com/Kartikdudeja&quot;<br>}</pre><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=405560414928" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[LLM Observability with OpenTelemetry: A Practical Guide]]></title>
            <link>https://medium.com/@kartikdudeja21/llm-observability-with-opentelemetry-a-practical-guide-18f3f51d6a50?source=rss-14d13224d533------2</link>
            <guid isPermaLink="false">https://medium.com/p/18f3f51d6a50</guid>
            <category><![CDATA[llm]]></category>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[opentelemetry]]></category>
            <category><![CDATA[observability]]></category>
            <dc:creator><![CDATA[Kartik Dudeja]]></dc:creator>
            <pubDate>Sat, 27 Sep 2025 06:46:57 GMT</pubDate>
            <atom:updated>2025-09-27T06:46:57.220Z</atom:updated>
            <content:encoded><![CDATA[<p>Large Language Models (LLMs) have quickly become the backbone of many modern applications — from chatbots to Retrieval-Augmented Generation (RAG) systems. But here’s the challenge: these models often behave like <strong>black boxes</strong>.</p><p>Without observability, we’re left guessing:</p><ul><li>Why did the model respond that way?</li><li>Which prompt caused this hallucination?</li><li>How much are we spending on tokens?</li><li>What’s the latency impact of retrieval vs. generation?</li></ul><p>This is where <strong>OpenTelemetry (OTel)</strong> steps in. By instrumenting our LLM applications, we can capture <strong>traces, metrics, </strong>and<strong> logs</strong> — turning the black box into a glass box.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zGlRIxf0QRmfiSDYg0lCfQ.png" /></figure><h3>Core Observability Signals for LLMs</h3><p>When instrumenting an LLM app, we focus on:</p><ol><li><strong>Request Traces</strong></li></ol><ul><li>Span for <strong>retrieval</strong> (with metadata: source, number of documents, latency).</li><li>Span for <strong>LLM inference</strong> (with metadata: model name, temperature, prompt, response length).</li></ul><p>2. <strong>Metrics</strong></p><ul><li><strong>Request Volume</strong>: Counter of incoming user queries.</li><li><strong>Request Duration</strong>: Histogram for latency distribution.</li><li><strong>Token Counters</strong>: Number of tokens generated/consumed.</li><li><strong>Cost</strong>: Gauge or counter for estimated token cost.</li></ul><p>3. <strong>Logs</strong></p><ul><li>Structured logs that capture <strong>prompts, responses, and errors</strong>.</li><li>Correlated with traces via trace IDs.</li></ul><h3>Tech Stack</h3><ul><li><strong>LLM runtime</strong>: <a href="https://ollama.ai/">Ollama</a> (local inference of Mistral)</li><li><strong>Framework</strong>: <a href="https://www.langchain.com/">LangChain</a></li><li><strong>Vector DB</strong>: Chroma</li><li><strong>Observability</strong>: OpenTelemetry Python SDK</li><li><strong>Backends</strong>: Jaeger (traces), Prometheus (metrics), Loki (logs)</li></ul><h3>Instrumenting a RAG Application</h3><p>Let’s consider a simple <strong>RAG pipeline</strong>:</p><ul><li>Use a retriever to fetch relevant documents.</li><li>Build a prompt.</li><li>Send it to the LLM (e.g., via <a href="https://ollama.ai/">Ollama</a>).</li><li>Return the answer.</li></ul><p>With OpenTelemetry, we wrap each stage in <strong>spans</strong>, collect <strong>metrics</strong>, and emit <strong>logs</strong>.</p><h3>Prerequisites: Building a RAG Application</h3><p>Before we dive into instrumentation, you’ll need a working <strong>RAG (Retrieval-Augmented Generation) application</strong>.</p><p>If you don’t have one yet, follow this step-by-step tutorial first:</p><p><a href="https://www.youtube.com/watch?v=E4l91XKQSgw"><strong>Building a Simple RAG Application with Ollama and LangChain</strong></a><br><em>(this guide walks through setting up embeddings, a vector store, and a basic question-answering loop)</em></p><p>Once you have your RAG pipeline up and running, come back here — we’ll add <strong>observability</strong> so you can monitor and debug it like a pro.</p><h3>Observability in Action</h3><p>Let’s break down how Traces, Metrics, and Logs bring observability to an LLM-powered RAG application.</p><h3>Traces: Following the Flow</h3><p><strong>Why Traces Matter</strong><br>Traces help you follow a user query as it flows through your RAG pipeline — retrieval, prompt building, LLM generation. They provide visibility into <strong>where time is spent</strong> and <strong>what inputs/outputs influenced the result</strong>.</p><h4>Instrumentation Code for Traces</h4><p>We’ll wire up OpenTelemetry to:</p><ol><li>Export traces to Jaeger (via OTLP).</li></ol><p>2. Create spans for:</p><ul><li>The <strong>user question</strong> loop.</li><li>The <strong>retriever call</strong>.</li><li>The <strong>LLM call</strong>.</li></ul><p>3. Add <strong>semantic attributes</strong> about:</p><ul><li>App metadata (app.name, app.version, etc).</li><li>SDK details (telemetry.sdk.language, telemetry.sdk.name).</li><li>Query text, model name, retriever results count, etc.</li></ul><pre>import time<br>import json<br><br># --- OpenTelemetry Tracing ---<br>from opentelemetry import trace, metrics<br>from opentelemetry.sdk.trace import TracerProvider<br>from opentelemetry.sdk.resources import Resource<br>from opentelemetry.sdk.trace.export import BatchSpanProcessor<br>from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter<br><br># Define resource attributes (metadata about the service)<br>resource = Resource.create({<br>    &quot;service.name&quot;: &quot;faq-rag&quot;,<br>    &quot;service.version&quot;: &quot;1.0.0&quot;,<br>    &quot;app.environment&quot;: &quot;dev&quot;,<br>    &quot;app.owner&quot;: &quot;observability-team&quot;,<br>    &quot;telemetry.sdk.language&quot;: &quot;python&quot;,<br>    &quot;telemetry.sdk.name&quot;: &quot;opentelemetry&quot;<br>})<br><br># --- Configure Tracing ---<br>trace.set_tracer_provider(TracerProvider(resource=resource))<br>tracer = trace.get_tracer(__name__)<br><br># Configure OTLP exporter (sending traces to Jaeger/Collector)<br>otlp_trace_exporter = OTLPSpanExporter(endpoint=&quot;http://127.0.0.1:4317&quot;, insecure=True)<br>span_processor = BatchSpanProcessor(otlp_trace_exporter)<br>trace.get_tracer_provider().add_span_processor(span_processor)<br><br># --- LangChain + Ollama ---<br>from langchain_ollama.llms import OllamaLLM<br>from langchain_core.prompts import ChatPromptTemplate<br>from vector import retriever<br><br># Initialize the Ollama model<br>model = OllamaLLM(<br>    model=&quot;mistral&quot;,<br>    temperature=0.7,<br>    top_p=0.9<br>)<br><br># Define the prompt template<br>template = &quot;&quot;&quot;<br>You are an expert in answering questions about a pizza restaurant.<br><br>Here are some relevant reviews: {reviews}<br><br>Here is the question to answer: {question}<br>&quot;&quot;&quot;<br><br>prompt = ChatPromptTemplate.from_template(template)<br><br># Build pipeline<br>chain = prompt | model<br><br># --- Interactive Loop ---<br>while True:<br><br>    question = input(&quot;Ask your question (q to quit): &quot;)<br><br>    if question.lower() == &quot;q&quot;:<br>        break<br><br>    start_request = time.time()<br><br>    with tracer.start_as_current_span(&quot;rag-request&quot;) as span:<br><br>        span.set_attribute(&quot;rag.query&quot;, question)<br><br>        # --- Retrieval step ---<br>        with tracer.start_as_current_span(&quot;vector-retrieval&quot;) as retrieval_span:<br>            start_retrieval = time.time()<br>            reviews = retriever.invoke(question)<br>            retrieval_time = time.time() - start_retrieval<br>            retrieval_span.set_attribute(&quot;retriever.engine&quot;, &quot;chroma&quot;)<br>            retrieval_span.set_attribute(&quot;retriever.search.k&quot;, 5)<br>            retrieval_span.set_attribute(&quot;retriever.latency.ms&quot;, retrieval_time * 1000)<br>            retrieval_span.set_attribute(&quot;retriever.documents.count&quot;, len(reviews))<br><br>            doc_previews = [<br>                (doc.page_content[:80] + &quot;...&quot;) if len(doc.page_content) &gt; 80 else doc.page_content<br>                for doc in reviews<br><br>            ]<br>            retrieval_span.set_attribute(&quot;retriever.documents.preview&quot;, json.dumps(doc_previews))<br><br>        # --- LLM Call ---<br>        formatted_prompt = prompt.format_prompt(<br>            reviews=reviews,<br>            question=question<br>        ).to_string()<br><br>        # --- LLM step ---<br>        with tracer.start_as_current_span(&quot;llm-call&quot;) as llm_span:<br><br>            llm_span.set_attribute(&quot;llm.provider&quot;, &quot;ollama&quot;)<br>            llm_span.set_attribute(&quot;llm.model.name&quot;, &quot;mistral&quot;)<br>            llm_span.set_attribute(&quot;llm.request.temperature&quot;, getattr(model, &quot;temperature&quot;, None))<br>            llm_span.set_attribute(&quot;llm.request.top_p&quot;, getattr(model, &quot;top_p&quot;, None))<br>            llm_span.set_attribute(&quot;llm.prompt.details&quot;, formatted_prompt)<br><br>            start_llm = time.time()<br><br>            result = chain.invoke({<br>                &quot;reviews&quot;: reviews,<br>                &quot;question&quot;: question<br>            })<br><br>            llm_latency = time.time() - start_llm<br>            tokens_in = len(formatted_prompt.split())<br>            tokens_out = len(str(result).split())<br>            cost_estimate = (tokens_in + tokens_out) * 0.000001  # fake cost<br><br>            # Response metadata<br>            llm_span.set_attribute(&quot;llm.response.details&quot;, str(result))<br>            llm_span.set_attribute(&quot;llm.response.tokens.input&quot;, tokens_in)<br>            llm_span.set_attribute(&quot;llm.response.tokens.output&quot;, tokens_out)<br>            llm_span.set_attribute(&quot;llm.response.tokens.total&quot;, tokens_in + tokens_out)<br>            llm_span.set_attribute(&quot;llm.response.cost.usd_estimate&quot;, cost_estimate)<br>            llm_span.set_attribute(&quot;llm.latency.ms&quot;, llm_latency * 1000)<br><br>        span.set_attribute(&quot;rag.answer.preview&quot;, str(result)[:120])<br><br>    print(f&quot;\n{result}&quot;)<br>    print(80 * &quot;-&quot;)</pre><h3>Logs: Capturing the Details</h3><p><strong>Why Logs Matter</strong><br>Logs give you the <strong>raw evidence</strong> of what happened inside your LLM pipeline — including prompts, responses, and errors. Unlike traces (timing) and metrics (aggregates), logs capture <strong>content and context</strong>.</p><h4>JSON Logging for Better ETL</h4><p>Instead of plain text logs, it’s best to use <strong>structured JSON logs</strong>:</p><ul><li>Easy to parse with tools like <strong>Loki</strong>, <strong>Elasticsearch</strong>, or any ETL pipeline.</li><li>Enables filtering and aggregation on fields (trace_id, span_id, user_id, etc.).</li><li>Standardized format across services.</li></ul><p>With JSON, your observability backend can:</p><ul><li>Extract fields for <strong>ETL pipelines</strong> (e.g., export tokens + cost for billing).</li><li>Enable <strong>structured search</strong> (e.g., “find all requests with doc_count &lt; 2”).</li><li>Power <strong>dashboards</strong> that combine logs + metrics.</li></ul><h4>Correlating Logs and Traces</h4><p>To connect logs with traces:</p><ul><li>Include <strong>trace_id</strong> and <strong>span_id</strong> in every log line.</li><li>Use the <strong>current span context</strong> from OpenTelemetry.</li></ul><p><strong>Code (Python with OTel)</strong></p><pre>import time<br>import json<br><br># --- OpenTelemetry Tracing ---<br>from opentelemetry import trace, metrics<br>from opentelemetry.sdk.trace import TracerProvider<br>from opentelemetry.sdk.resources import Resource<br>from opentelemetry.sdk.trace.export import BatchSpanProcessor<br>from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter<br><br># --- OpenTelemetry Logging ---<br>import logging<br>from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler<br>from opentelemetry.sdk._logs.export import BatchLogRecordProcessor<br>from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter<br><br># Define resource attributes (metadata about the service)<br>resource = Resource.create({<br>    &quot;service.name&quot;: &quot;faq-rag&quot;,<br>    &quot;service.version&quot;: &quot;1.0.0&quot;,<br>    &quot;app.environment&quot;: &quot;dev&quot;,<br>    &quot;app.owner&quot;: &quot;observability-team&quot;,<br>    &quot;telemetry.sdk.language&quot;: &quot;python&quot;,<br>    &quot;telemetry.sdk.name&quot;: &quot;opentelemetry&quot;<br>})<br><br># --- Configure Tracing ---<br>trace.set_tracer_provider(TracerProvider(resource=resource))<br>tracer = trace.get_tracer(__name__)<br><br># Configure OTLP exporter (sending traces to Jaeger/Collector)<br>otlp_trace_exporter = OTLPSpanExporter(endpoint=&quot;http://127.0.0.1:4317&quot;, insecure=True)<br>span_processor = BatchSpanProcessor(otlp_trace_exporter)<br>trace.get_tracer_provider().add_span_processor(span_processor)<br><br># Setup logger provider<br>logger_provider = LoggerProvider(resource=resource)<br>log_exporter = OTLPLogExporter(endpoint=&quot;http://127.0.0.1:4317&quot;, insecure=True)<br>logger_provider.add_log_record_processor(BatchLogRecordProcessor(log_exporter))<br><br># Custom JSON Formatter<br>class JSONFormatter(logging.Formatter):<br>    def format(self, record):<br>        span = trace.get_current_span()<br>        span_context = span.get_span_context()<br>        log_record = {<br>            &quot;timestamp&quot;: self.formatTime(record, self.datefmt),<br>            &quot;severity&quot;: record.levelname,<br>            &quot;logger&quot;: record.name,<br>            &quot;message&quot;: record.getMessage(),<br>            &quot;trace_id&quot;: span_context.trace_id if span_context.is_valid else None,<br>            &quot;span_id&quot;: span_context.span_id if span_context.is_valid else None            <br>        }<br><br>        # Add extra attributes if available<br>        if hasattr(record, &quot;args&quot;) and isinstance(record.args, dict):<br>            log_record.update(record.args)<br>        if hasattr(record, &quot;extra&quot;) and isinstance(record.extra, dict):<br>            log_record.update(record.extra)<br><br>        return json.dumps(log_record)<br><br># Attach JSON formatter to OTel handler<br>otel_handler = LoggingHandler(level=logging.INFO, logger_provider=logger_provider)<br>otel_handler.setFormatter(JSONFormatter())<br><br>logging.basicConfig(level=logging.INFO, handlers=[otel_handler])<br>logger = logging.getLogger(&quot;faq-rag&quot;)<br><br># --- LangChain + Ollama ---<br>from langchain_ollama.llms import OllamaLLM<br>from langchain_core.prompts import ChatPromptTemplate<br>from vector import retriever<br><br># Initialize the Ollama model<br>model = OllamaLLM(<br>    model=&quot;mistral&quot;,<br>    temperature=0.7,<br>    top_p=0.9<br>)<br><br># Define the prompt template<br><br>template = &quot;&quot;&quot;<br>You are an expert in answering questions about a pizza restaurant.<br><br>Here are some relevant reviews: {reviews}<br><br>Here is the question to answer: {question}<br>&quot;&quot;&quot;<br><br>prompt = ChatPromptTemplate.from_template(template)<br><br># Build pipeline<br>chain = prompt | model<br><br># --- Interactive Loop ---<br>while True:<br><br>    question = input(&quot;Ask your question (q to quit): &quot;)<br><br>    if question.lower() == &quot;q&quot;:<br>        break<br><br>    logger.info(&quot;Received user query&quot;, extra={&quot;query&quot;: question})<br><br>    start_request = time.time()<br><br>    with tracer.start_as_current_span(&quot;rag-request&quot;) as span:<br><br>        span.set_attribute(&quot;rag.query&quot;, question)<br><br>        # --- Retrieval step ---<br>        with tracer.start_as_current_span(&quot;vector-retrieval&quot;) as retrieval_span:<br>            start_retrieval = time.time()<br>            reviews = retriever.invoke(question)<br>            retrieval_time = time.time() - start_retrieval<br><br>            logger.info(&quot;Retrieved documents&quot;, extra={<br>                &quot;query&quot;: question,<br>                &quot;retriever.latency_ms&quot;: retrieval_time * 1000,<br>                &quot;retriever.documents.count&quot;: len(reviews),<br>            })<br><br>            retrieval_span.set_attribute(&quot;retriever.engine&quot;, &quot;chroma&quot;)<br>            retrieval_span.set_attribute(&quot;retriever.search.k&quot;, 5)<br>            retrieval_span.set_attribute(&quot;retriever.latency.ms&quot;, retrieval_time * 1000)<br>            retrieval_span.set_attribute(&quot;retriever.documents.count&quot;, len(reviews))<br><br>            doc_previews = [<br>                (doc.page_content[:80] + &quot;...&quot;) if len(doc.page_content) &gt; 80 else doc.page_content<br>                for doc in reviews<br>            ]<br>            retrieval_span.set_attribute(&quot;retriever.documents.preview&quot;, json.dumps(doc_previews))<br><br>        # --- LLM Call ---<br>        formatted_prompt = prompt.format_prompt(<br>            reviews=reviews,<br>            question=question<br>        ).to_string()<br><br>        # --- LLM step ---<br>        with tracer.start_as_current_span(&quot;llm-call&quot;) as llm_span:<br>            llm_span.set_attribute(&quot;llm.provider&quot;, &quot;ollama&quot;)<br>            llm_span.set_attribute(&quot;llm.model.name&quot;, &quot;mistral&quot;)<br>            llm_span.set_attribute(&quot;llm.request.temperature&quot;, getattr(model, &quot;temperature&quot;, None))<br>            llm_span.set_attribute(&quot;llm.request.top_p&quot;, getattr(model, &quot;top_p&quot;, None))<br>            llm_span.set_attribute(&quot;llm.prompt.details&quot;, formatted_prompt)<br><br>            logger.info(&quot;Invoking LLM&quot;, extra={<br>                &quot;model&quot;: &quot;mistral&quot;,<br>                &quot;temperature&quot;: getattr(model, &quot;temperature&quot;, None),<br>                &quot;top_p&quot;: getattr(model, &quot;top_p&quot;, None),<br>                &quot;prompt_preview&quot;: formatted_prompt[:120],<br>            })<br><br>            start_llm = time.time()<br><br>            result = chain.invoke({<br>                &quot;reviews&quot;: reviews,<br>                &quot;question&quot;: question<br>            })<br><br>            llm_latency = time.time() - start_llm<br>            tokens_in = len(formatted_prompt.split())<br>            tokens_out = len(str(result).split())<br>            cost_estimate = (tokens_in + tokens_out) * 0.000001  # fake cost<br><br>            logger.info(&quot;LLM response generated&quot;, extra={<br>                &quot;latency_ms&quot;: llm_latency * 1000,<br>                &quot;tokens_in&quot;: tokens_in,<br>                &quot;tokens_out&quot;: tokens_out,<br>                &quot;cost_estimate&quot;: cost_estimate,<br>                &quot;answer_preview&quot;: str(result)[:120]<br>            })<br><br>            # Response metadata<br>            llm_span.set_attribute(&quot;llm.response.details&quot;, str(result))<br>            llm_span.set_attribute(&quot;llm.response.tokens.input&quot;, tokens_in)<br>            llm_span.set_attribute(&quot;llm.response.tokens.output&quot;, tokens_out)<br>            llm_span.set_attribute(&quot;llm.response.tokens.total&quot;, tokens_in + tokens_out)<br>            llm_span.set_attribute(&quot;llm.response.cost.usd_estimate&quot;, cost_estimate)<br><br>            llm_span.set_attribute(&quot;llm.latency.ms&quot;, llm_latency * 1000)<br><br>        span.set_attribute(&quot;rag.answer.preview&quot;, str(result)[:120])<br><br>    print(f&quot;\n{result}&quot;)<br>    print(80 * &quot;-&quot;)</pre><p><strong>How correlation helps</strong></p><ul><li>From a <strong>trace in Jaeger</strong>, you can jump to the corresponding <strong>logs in Loki</strong> by filtering on trace_id.</li><li>From a <strong>log line</strong>, you can pivot back to the full <strong>trace</strong> to see the request lifecycle.</li><li>This bridges <strong>high-cardinality events</strong> (logs) with <strong>low-cardinality context</strong> (traces).</li></ul><h3>Metrics: Measuring What Matters</h3><p><strong>Why Metrics Matter</strong><br>Metrics provide <strong>aggregated, time-series insights</strong> into your system. While traces help debug individual requests and logs capture raw details, metrics allow you to <strong>monitor trends</strong> (e.g., request rates, latency, cost over time).</p><p>For an LLM RAG pipeline, the key metrics are:</p><h4>What Metrics to Collect</h4><ol><li><strong>Request Volume</strong></li></ol><ul><li>Counts how many requests hit the RAG pipeline.</li><li>Helps detect traffic spikes, drops, or usage trends.</li></ul><p>2. <strong>Request Duration</strong></p><ul><li>Measures latency of end-to-end RAG queries.</li><li>Useful for SLO/SLI dashboards and user experience monitoring.</li></ul><p>3. <strong>Token Usage</strong></p><ul><li>Tracks prompt_tokens and completion_tokens.</li><li>Shows efficiency of prompts and cost correlation.</li></ul><p>4. <strong>Cost Estimation</strong></p><ul><li>Approximates $$ cost based on tokens and model pricing.</li><li>Useful for <strong>FinOps</strong> and controlling LLM usage bills.</li></ul><h4>OTel Metrics Instrumentation</h4><pre>import time<br>import json<br><br># --- OpenTelemetry Tracing ---<br>from opentelemetry import trace, metrics<br>from opentelemetry.sdk.trace import TracerProvider<br>from opentelemetry.sdk.resources import Resource<br>from opentelemetry.sdk.trace.export import BatchSpanProcessor<br>from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter<br><br># --- OpenTelemetry Metrics ---<br>from opentelemetry.sdk.metrics import MeterProvider<br>from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter<br>from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader<br><br># --- OpenTelemetry Logging ---<br>import logging<br>from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler<br>from opentelemetry.sdk._logs.export import BatchLogRecordProcessor<br>from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter<br><br># Define resource attributes (metadata about the service)<br>resource = Resource.create({<br>    &quot;service.name&quot;: &quot;faq-rag&quot;,<br>    &quot;service.version&quot;: &quot;1.0.0&quot;,<br>    &quot;app.environment&quot;: &quot;dev&quot;,<br>    &quot;app.owner&quot;: &quot;observability-team&quot;,<br>    &quot;telemetry.sdk.language&quot;: &quot;python&quot;,<br>    &quot;telemetry.sdk.name&quot;: &quot;opentelemetry&quot;<br>})<br><br># --- Configure Tracing ---<br>trace.set_tracer_provider(TracerProvider(resource=resource))<br>tracer = trace.get_tracer(__name__)<br><br># Configure OTLP exporter (sending traces to Jaeger/Collector)<br>otlp_trace_exporter = OTLPSpanExporter(endpoint=&quot;http://127.0.0.1:4317&quot;, insecure=True)<br>span_processor = BatchSpanProcessor(otlp_trace_exporter)<br>trace.get_tracer_provider().add_span_processor(span_processor)<br><br># --- Configure Metrics ---<br>metric_exporter = OTLPMetricExporter(endpoint=&quot;http://127.0.0.1:4317&quot;, insecure=True)<br>reader = PeriodicExportingMetricReader(metric_exporter, export_interval_millis=5000)<br><br>provider = MeterProvider(resource=resource, metric_readers=[reader])<br>metrics.set_meter_provider(provider)<br>meter = metrics.get_meter(__name__)<br><br># Define custom metrics<br>request_counter = meter.create_counter(<br>    &quot;rag_requests_total&quot;,<br>    unit=&quot;1&quot;,<br>    description=&quot;Total number of RAG requests&quot;<br>)<br><br>request_duration_hist = meter.create_histogram(<br>    &quot;rag_request_duration_ms&quot;,<br>    unit=&quot;ms&quot;,<br>    description=&quot;Duration of RAG requests in milliseconds&quot;<br>)<br><br>token_input_counter = meter.create_counter(<br>    &quot;rag_tokens_input_total&quot;,<br>    unit=&quot;tokens&quot;,<br>    description=&quot;Total input tokens sent to LLM&quot;<br>)<br><br>token_output_counter = meter.create_counter(<br>    &quot;rag_tokens_output_total&quot;,<br>    unit=&quot;tokens&quot;,<br>    description=&quot;Total output tokens generated by LLM&quot;<br>)<br><br>token_total_counter = meter.create_counter(<br>    &quot;rag_tokens_total&quot;,<br>    unit=&quot;tokens&quot;,<br>    description=&quot;Total tokens (input + output)&quot;<br>)<br><br>cost_counter = meter.create_counter(<br>    &quot;rag_cost_usd_total&quot;,<br>    unit=&quot;usd&quot;,<br>    description=&quot;Estimated total cost of LLM requests&quot;<br>)<br><br># Setup logger provider<br>logger_provider = LoggerProvider(resource=resource)<br>log_exporter = OTLPLogExporter(endpoint=&quot;http://127.0.0.1:4317&quot;, insecure=True)<br>logger_provider.add_log_record_processor(BatchLogRecordProcessor(log_exporter))<br><br># Custom JSON Formatter<br>class JSONFormatter(logging.Formatter):<br>    def format(self, record):<br><br>        span = trace.get_current_span()<br>        span_context = span.get_span_context()<br><br>        log_record = {<br>            &quot;timestamp&quot;: self.formatTime(record, self.datefmt),<br>            &quot;severity&quot;: record.levelname,<br>            &quot;logger&quot;: record.name,<br>            &quot;message&quot;: record.getMessage(),<br>            &quot;trace_id&quot;: span_context.trace_id if span_context.is_valid else None,<br>            &quot;span_id&quot;: span_context.span_id if span_context.is_valid else None            <br>        }<br>        # Add extra attributes if available<br>        if hasattr(record, &quot;args&quot;) and isinstance(record.args, dict):<br>            log_record.update(record.args)<br>        if hasattr(record, &quot;extra&quot;) and isinstance(record.extra, dict):<br>            log_record.update(record.extra)<br><br>        return json.dumps(log_record)<br><br># Attach JSON formatter to OTel handler<br>otel_handler = LoggingHandler(level=logging.INFO, logger_provider=logger_provider)<br>otel_handler.setFormatter(JSONFormatter())<br><br>logging.basicConfig(level=logging.INFO, handlers=[otel_handler])<br>logger = logging.getLogger(&quot;faq-rag&quot;)<br><br># --- LangChain + Ollama ---<br>from langchain_ollama.llms import OllamaLLM<br>from langchain_core.prompts import ChatPromptTemplate<br>from vector import retriever<br><br># Initialize the Ollama model<br>model = OllamaLLM(<br>    model=&quot;mistral&quot;,<br>    temperature=0.7,<br>    top_p=0.9<br>)<br><br># Define the prompt template<br>template = &quot;&quot;&quot;<br>You are an expert in answering questions about a pizza restaurant.<br><br>Here are some relevant reviews: {reviews}<br><br>Here is the question to answer: {question}<br>&quot;&quot;&quot;<br><br>prompt = ChatPromptTemplate.from_template(template)<br><br># Build pipeline<br>chain = prompt | model<br><br># --- Interactive Loop ---<br>while True:<br>    question = input(&quot;Ask your question (q to quit): &quot;)<br><br>    if question.lower() == &quot;q&quot;:<br>        break<br><br>    logger.info(&quot;Received user query&quot;, extra={&quot;query&quot;: question})<br><br>    start_request = time.time()<br>    with tracer.start_as_current_span(&quot;rag-request&quot;) as span:<br>        span.set_attribute(&quot;rag.query&quot;, question)<br><br>        # --- Retrieval step ---<br>        with tracer.start_as_current_span(&quot;vector-retrieval&quot;) as retrieval_span:<br>            start_retrieval = time.time()<br>            reviews = retriever.invoke(question)<br>            retrieval_time = time.time() - start_retrieval<br><br>            logger.info(&quot;Retrieved documents&quot;, extra={<br>                &quot;query&quot;: question,<br>                &quot;retriever.latency_ms&quot;: retrieval_time * 1000,<br>                &quot;retriever.documents.count&quot;: len(reviews),<br>            })<br><br>            retrieval_span.set_attribute(&quot;retriever.engine&quot;, &quot;chroma&quot;)<br>            retrieval_span.set_attribute(&quot;retriever.search.k&quot;, 5)<br>            retrieval_span.set_attribute(&quot;retriever.latency.ms&quot;, retrieval_time * 1000)<br>            retrieval_span.set_attribute(&quot;retriever.documents.count&quot;, len(reviews))<br><br>            doc_previews = [<br>                (doc.page_content[:80] + &quot;...&quot;) if len(doc.page_content) &gt; 80 else doc.page_content<br>                for doc in reviews<br>            ]<br>            <br>            retrieval_span.set_attribute(&quot;retriever.documents.preview&quot;, json.dumps(doc_previews))<br><br>        # --- LLM Call ---<br>        formatted_prompt = prompt.format_prompt(<br>            reviews=reviews,<br>            question=question<br>        ).to_string()<br><br>        # --- LLM step ---<br>        with tracer.start_as_current_span(&quot;llm-call&quot;) as llm_span:<br><br>            llm_span.set_attribute(&quot;llm.provider&quot;, &quot;ollama&quot;)<br>            llm_span.set_attribute(&quot;llm.model.name&quot;, &quot;mistral&quot;)<br>            llm_span.set_attribute(&quot;llm.request.temperature&quot;, getattr(model, &quot;temperature&quot;, None))<br>            llm_span.set_attribute(&quot;llm.request.top_p&quot;, getattr(model, &quot;top_p&quot;, None))<br>            llm_span.set_attribute(&quot;llm.prompt.details&quot;, formatted_prompt)<br><br>            logger.info(&quot;Invoking LLM&quot;, extra={<br>                &quot;model&quot;: &quot;mistral&quot;,<br>                &quot;temperature&quot;: getattr(model, &quot;temperature&quot;, None),<br>                &quot;top_p&quot;: getattr(model, &quot;top_p&quot;, None),<br>                &quot;prompt_preview&quot;: formatted_prompt[:120],<br>            })<br><br>            start_llm = time.time()<br>            result = chain.invoke({<br>                &quot;reviews&quot;: reviews,<br>                &quot;question&quot;: question<br>            })<br>            llm_latency = time.time() - start_llm<br><br>            tokens_in = len(formatted_prompt.split())<br>            tokens_out = len(str(result).split())<br>            cost_estimate = (tokens_in + tokens_out) * 0.000001  # fake cost<br><br>            logger.info(&quot;LLM response generated&quot;, extra={<br>                &quot;latency_ms&quot;: llm_latency * 1000,<br>                &quot;tokens_in&quot;: tokens_in,<br>                &quot;tokens_out&quot;: tokens_out,<br>                &quot;cost_estimate&quot;: cost_estimate,<br>                &quot;answer_preview&quot;: str(result)[:120]<br>            })<br><br>            # Response metadata<br>            llm_span.set_attribute(&quot;llm.response.details&quot;, str(result))<br>            llm_span.set_attribute(&quot;llm.response.tokens.input&quot;, tokens_in)<br>            llm_span.set_attribute(&quot;llm.response.tokens.output&quot;, tokens_out)<br>            llm_span.set_attribute(&quot;llm.response.tokens.total&quot;, tokens_in + tokens_out)<br>            llm_span.set_attribute(&quot;llm.response.cost.usd_estimate&quot;, cost_estimate)<br>            llm_span.set_attribute(&quot;llm.latency.ms&quot;, llm_latency * 1000)<br><br>            # --- Emit Metrics ---<br>            request_counter.add(1, {&quot;rag.model&quot;: &quot;mistral&quot;})<br>            request_duration_hist.record((time.time() - start_request) * 1000, {&quot;rag.model&quot;: &quot;mistral&quot;})<br>            token_input_counter.add(tokens_in, {&quot;rag.model&quot;: &quot;mistral&quot;})<br>            token_output_counter.add(tokens_out, {&quot;rag.model&quot;: &quot;mistral&quot;})<br>            token_total_counter.add(tokens_in + tokens_out, {&quot;rag.model&quot;: &quot;mistral&quot;})<br>            cost_counter.add(cost_estimate, {&quot;rag.model&quot;: &quot;mistral&quot;})<br><br>        span.set_attribute(&quot;rag.answer.preview&quot;, str(result)[:120])<br><br>    print(f&quot;\n{result}&quot;)<br>    print(80 * &quot;-&quot;)</pre><h3>Setting up the Observability Stack with Docker Compose</h3><p>All telemetry (traces, metrics, logs) from your RAG app will flow into the <strong>OpenTelemetry Collector</strong> first, then get routed to the right backend:</p><ul><li><strong>Traces → Jaeger</strong></li><li><strong>Metrics → Prometheus</strong></li><li><strong>Logs → Loki</strong></li><li><strong>Visualization → Grafana</strong></li></ul><h3>docker-compose.yml</h3><pre>services:<br><br>  otel-collector:<br>    container_name: otel-collector<br>    hostname: otel-collector<br>    image: otel/opentelemetry-collector-contrib:latest<br>    restart: always<br>    command: [&quot;--config=/etc/otel-collector-config.yaml&quot;]<br>    volumes:<br>      - ./config/otel-collector/otel-collector-config.yaml:/etc/otel-collector-config.yaml<br>    networks:<br>      - llm-obs-lab<br>    ports:<br>      - &quot;4317:4317&quot; # OTLP gRPC receiver<br>      - &quot;4318:4318&quot;<br><br>  jaeger:<br>    container_name: jaeger<br>    hostname: jaeger<br>    image: jaegertracing/all-in-one:latest<br>    restart: always<br>    volumes:<br>      - jaegar_data:/var/lib/jaeger<br>    networks:<br>      - llm-obs-lab<br>    ports:<br>      - &quot;6831:6831/udp&quot; # UDP port for Jaeger agent<br>      - &quot;16686:16686&quot; # Web UI<br>      - &quot;14268:14268&quot; # HTTP port for spans<br><br>  prometheus:<br>    container_name: prometheus<br>    hostname: prometheus<br>    image: prom/prometheus:latest<br>    restart: always<br>    command:<br>      - --storage.tsdb.retention.time=1d<br>      - --config.file=/etc/prometheus/prometheus.yml<br>    volumes:<br>      - prometheus_data:/prometheus<br>      - ./config/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml<br>    networks:<br>      - llm-obs-lab<br>    ports:<br>      - &quot;9090:9090&quot;<br><br>grafana:<br>    container_name: grafana<br>    hostname: grafana<br>    image: grafana/grafana<br>    restart: always<br>    volumes:<br>      - grafana_data:/var/lib/grafana<br>      - &quot;./config/grafana/datasources:/etc/grafana/provisioning/datasources&quot;    <br>    networks:<br>      - llm-obs-lab<br>    ports:<br>      - &quot;3000:3000&quot;<br><br>  loki:<br>    container_name: loki<br>    hostname: loki<br>    image: grafana/loki:latest<br>    restart: always<br>    command:<br>      - -config.file=/etc/loki/local-config.yaml<br>    volumes:<br>      - loki_data:/loki<br>      - &quot;./config/loki/loki-config.yaml:/etc/loki/local-config.yaml&quot;<br>    networks:<br>      - llm-obs-lab<br>    ports:<br>      - &quot;3100:3100&quot;  <br><br>networks:<br>  llm-obs-lab:<br>    driver: bridge<br><br>volumes:<br>  loki_data: {}<br>  jaegar_data: {}<br>  grafana_data: {}  <br>  prometheus_data: {}</pre><h3>otel-collector-config.yaml</h3><pre>receivers:<br>  otlp:<br>    protocols:<br>      grpc:<br>        endpoint: 0.0.0.0:4317<br>      http:<br>        endpoint: 0.0.0.0:4318<br><br>processors:<br>  batch: {}<br><br>extensions:<br>  health_check: {}<br><br>exporters:<br><br>  otlp/jaeger:<br>    endpoint: jaeger:4317<br>    tls:<br>      insecure: true<br><br>  prometheus:<br>    endpoint: &quot;0.0.0.0:9090&quot;<br><br>  otlphttp:<br>    endpoint: http://loki:3100/otlp<br><br>service:<br>  pipelines:<br><br>    traces:<br>      receivers: [otlp]<br>      processors: [batch]      <br>      exporters: [otlp/jaeger]<br><br>    logs:<br>      receivers: [otlp]<br>      processors: [batch]<br>      exporters: [otlphttp]<br><br>    metrics:<br>      receivers: [otlp]<br>      processors: [batch]<br>      exporters: [prometheus]</pre><h3>prometheus.yml</h3><pre>global:<br>  scrape_interval: 5s<br>scrape_configs:<br>  - job_name: &#39;otel-collector&#39;<br>    static_configs:<br>      - targets: [&#39;otel-collector:9090&#39;]</pre><h3>loki-config.yaml</h3><pre>auth_enabled: false<br><br>server:<br>  http_listen_port: 3100<br>  grpc_listen_port: 9096<br><br>common:<br>  instance_addr: 127.0.0.1<br>  path_prefix: /tmp/loki<br>  storage:<br>    filesystem:<br>      chunks_directory: /tmp/loki/chunks<br>      rules_directory: /tmp/loki/rules<br>  replication_factor: 1<br>  ring:<br>    kvstore:<br>      store: inmemory<br><br>frontend:<br>  max_outstanding_per_tenant: 2048<br><br>pattern_ingester:<br>  enabled: true<br><br>limits_config:<br>  max_global_streams_per_user: 0<br>  ingestion_rate_mb: 50000<br>  ingestion_burst_size_mb: 50000<br>  volume_enabled: true<br><br>query_range:<br>  results_cache:<br>    cache:<br>      embedded_cache:<br>        enabled: true<br>        max_size_mb: 100<br><br>schema_config:<br>  configs:<br>    - from: 2020-10-24<br>      store: tsdb<br>      object_store: filesystem<br>      schema: v13<br>      index:<br>        prefix: index_<br>        period: 24h<br><br>analytics:<br>  reporting_enabled: false</pre><h3>Grafana datasources.yaml</h3><pre>apiVersion: 1<br><br>datasources:<br>  - name: Prometheus<br>    type: prometheus<br>    url: http://prometheus:9090<br>    access: proxy<br>    basicAuth: false<br>    isDefault: true<br>    jsonData:<br>      tlsSkipVerify: true<br>    editable: false<br><br>  - name: Loki<br>    type: loki<br>    access: proxy<br>    url: http://loki:3100<br>    isDefault: false<br>    version: 1<br>    editable: false<br><br>  - name: Jaeger<br>    type: jaeger<br>    access: proxy<br>    url: http://jaeger:16686<br>    version: 1<br>    editable: false</pre><h3>Bring Up the Stack</h3><pre>docker-compose up -d</pre><ul><li><strong>Jaeger UI</strong> → <a href="http://localhost:16686/">http://localhost:16686</a></li><li><strong>Prometheus UI</strong> → <a href="http://localhost:9090/">http://localhost:9090</a></li><li><strong>Grafana UI</strong> → <a href="http://localhost:3000/">http://localhost:3000</a> (user: admin, pass: admin)</li></ul><h3>How Data Flows</h3><p>Your <strong>RAG app</strong> exports telemetry via OTLP (4317 gRPC, 4318 HTTP).<strong>OTel Collector</strong> ingests all telemetry, applies batching, and routes it:</p><ul><li><strong>Traces → Jaeger</strong></li><li><strong>Metrics → Prometheus (scraped at </strong><strong>/metrics)</strong></li><li><strong>Logs → Loki</strong></li><li><strong>Grafana</strong> connects to all three for a unified view.</li></ul><p>With this setup, you now have <strong>end-to-end observability for your RAG application</strong>:</p><ul><li>Debug request flow in <strong>Jaeger</strong></li><li>Track system health with <strong>Prometheus</strong></li><li>Investigate application logs in <strong>Loki</strong></li><li>Combine all of the above in <strong>Grafana dashboards</strong></li></ul><h3>How This Improves Observability</h3><ul><li><strong>Latency analysis</strong>: Traces show whether slow responses are due to retrieval or LLM generation.</li><li><strong>Cost tracking</strong>: Token counts let you estimate $ spend directly from traces.</li><li><strong>Debugging hallucinations</strong>: Seeing prompts + responses helps you identify if poor answers came from bad retrieval or bad generation.</li><li><strong>Model governance</strong>: Attributes like model, temperature, top_p let you correlate behavior with configuration.</li></ul><pre>{<br>    &quot;author&quot;   :  &quot;Kartik Dudeja&quot;,<br>    &quot;email&quot;    :  &quot;kartikdudeja21@gmail.com&quot;,<br>    &quot;linkedin&quot; :  &quot;https://linkedin.com/in/kartik-dudeja&quot;,<br>    &quot;github&quot;   :  &quot;https://github.com/Kartikdudeja&quot;<br>}</pre><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=18f3f51d6a50" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Going Serverless on Kubernetes with OpenFaaS]]></title>
            <link>https://medium.com/@kartikdudeja21/going-serverless-on-kubernetes-with-openfaas-1c2e18473468?source=rss-14d13224d533------2</link>
            <guid isPermaLink="false">https://medium.com/p/1c2e18473468</guid>
            <category><![CDATA[function-as-a-service]]></category>
            <category><![CDATA[serverless]]></category>
            <category><![CDATA[openfaas]]></category>
            <category><![CDATA[kubernetes]]></category>
            <dc:creator><![CDATA[Kartik Dudeja]]></dc:creator>
            <pubDate>Sat, 30 Aug 2025 05:54:08 GMT</pubDate>
            <atom:updated>2025-08-30T09:15:56.009Z</atom:updated>
            <content:encoded><![CDATA[<blockquote><strong><em>Build, ship, and scale functions — on your own Kubernetes cluster.</em></strong></blockquote><figure><img alt="" src="https://cdn-images-1.medium.com/max/922/1*NZNr5fYws6pOtg6xehqnyQ.png" /></figure><h3>1. What is Serverless?</h3><p><strong>Serverless</strong> allows developers to write and deploy code without worrying about the underlying infrastructure. The server still exists — you just don’t manage it.</p><p>Instead of provisioning and scaling servers, you:</p><ul><li>Write a function</li><li>Deploy it</li><li>Let the platform handle the rest (scaling, routing, etc.)</li></ul><blockquote><em>Serverless is about </em><strong><em>developer experience</em></strong><em>, </em><strong><em>efficiency</em></strong><em>, and </em><strong><em>auto-scaling</em></strong><em> — perfect for microservices, APIs, and background tasks.</em></blockquote><h3>2. What is OpenFaaS?</h3><p><strong>OpenFaaS</strong> (Functions-as-a-Service) is an open-source serverless framework built for Kubernetes and Docker.</p><h3>Key Features:</h3><ul><li>Deploy serverless functions in containers</li><li>CLI, UI, and REST API support</li><li>Built-in Prometheus metrics</li><li>Auto-scaling via function invocation count</li><li>Supports multiple runtimes (Python, Node.js, Go, Bash, etc.)</li></ul><h3>Serverless on Kubernetes with OpenFaaS:</h3><p>OpenFaaS runs as a set of Kubernetes components:</p><ul><li><strong>Gateway</strong>: Exposes functions over HTTP</li><li><strong>Function Pods</strong>: Each function is a container</li><li><strong>Prometheus</strong>: Scrapes function invocation metrics</li><li><strong>Autoscaler</strong>: Adds/removes replicas based on load</li></ul><h3>3. Installing OpenFaaS on Kubernetes with Arkade</h3><p><a href="https://github.com/alexellis/arkade">Arkade</a> is a simple Kubernetes marketplace for installing apps.</p><h3>Prerequisites</h3><ul><li>Kubernetes cluster (e.g., minikube, kind, k3s)</li><li>kubectl installed</li><li>arkade installed:</li></ul><pre>curl -sLS https://get.arkade.dev | sudo sh</pre><h3>Install OpenFaaS:</h3><pre>arkade install openfaas</pre><p>It will:</p><ul><li>Create openfaas and openfaas-fn namespaces</li><li>Deploy the gateway, faas-netes, UI, and Prometheus</li></ul><h3>4. Accessing the OpenFaaS UI</h3><h3>Get admin password:</h3><pre>PASSWORD=$(kubectl get secret -n openfaas basic-auth \<br>-o jsonpath=&quot;{.data.basic-auth}&quot; | base64 --decode)<br>echo $PASSWORD</pre><h3>Port-forward the gateway:</h3><pre>kubectl port-forward -n openfaas svc/gateway 8080:8080</pre><p>Visit: <a href="http://localhost:8080/">http://localhost:8080</a></p><p>Login with:</p><ul><li><strong>Username</strong>: admin</li><li><strong>Password</strong>: from the command above</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*UJatzaVx7lyGqlN1CPpCag.png" /></figure><h3>5. Creating a Sample Python Function (python3-http)</h3><p>We’ll use the python3-http template which supports GET/POST with JSON or plain text input.</p><h3>Step 1: Install the OpenFaaS CLI</h3><pre>curl -sSL https://cli.openfaas.com | sudo sh</pre><p>Login via CLI:</p><pre>faas-cli login --username admin --password $PASSWORD</pre><h4>Pull templates from store supported by openfaas</h4><pre>faas-cli template store pull python3-http</pre><h3>Step 2: Create the function</h3><pre>faas-cli new openfaas-py-fn --lang python3-http</pre><p>Edit openfaas-py-fn/handler.py:</p><pre>def handle(event, context):<br>    name = event.body.decode(&#39;utf-8&#39;) or &quot;World&quot;<br>    return {<br>        &quot;statusCode&quot;: 200,<br>        &quot;body&quot;: f&quot;Hello, {name}&quot;,<br>        &quot;headers&quot;: {<br>            &quot;Content-Type&quot;: &quot;text/plain&quot;<br>        }<br>    }</pre><h3>Step 3: Update stack.yaml file</h3><pre>version: 1.0<br>provider:<br>  name: openfaas<br>  gateway: http://127.0.0.1:8080<br>functions:<br>  openfaas-py-fn:<br>    lang: python3-http<br>    handler: ./openfaas-py-fn<br>    image: &lt;DOCKERHUB_USERNAME&gt;/openfaas-py-fn:1.1</pre><h3>Step 4: Build and Deploy</h3><pre>faas-cli build -f stack.yaml<br>faas-cli push -f stack.yaml<br>faas-cli deploy -f stack.yaml</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/862/1*DXVzv3uZT8Sdt9R8jcq2tw.png" /></figure><h3>6. Accessing the Function via cURL</h3><pre>curl -X POST http://localhost:8080/function/openfaas-py-fn -d &quot;Testing&quot;</pre><p>Or through the <strong>UI</strong> → Click “Invoke” beside the function.</p><h3>7. Configuring Auto-Scaling</h3><p>OpenFaaS autoscaler monitors Prometheus metrics and scales functions automatically.</p><p>To customize:</p><p>Add this to stack.yaml:</p><pre>annotations:<br>  com.openfaas.scale.min: &quot;1&quot;<br>  com.openfaas.scale.max: &quot;5&quot;</pre><p>Redeploy:</p><pre>faas-cli deploy -f stack.yaml</pre><p>Now, your function can scale up to 5 replicas during high load.</p><h3>8. Prometheus Metrics and Grafana Dashboard</h3><p>OpenFaaS installs <strong>Prometheus</strong> by default.</p><h3>Access Prometheus:</h3><pre>kubectl port-forward -n openfaas svc/prometheus 9090:9090</pre><p>Visit <a href="http://localhost:9090/">http://localhost:9090</a></p><p>Sample queries:</p><ul><li>gateway_function_invocation_total</li><li>gateway_function_invocation_duration_seconds</li></ul><h3>Grafana for Function Monitoring</h3><p>You can install Grafana via Arkade:</p><pre>arkade install grafana</pre><p>Then port-forward it:</p><pre>kubectl port-forward -n default svc/grafana 3000:3000</pre><p>Login (default: admin/admin) and add Prometheus as a data source to create dashboards for OpenFaaS function metrics.</p><h3>9. Load Testing with hey and Testing Auto-Scaling</h3><p><a href="https://github.com/rakyll/hey">hey</a> is a lightweight load-testing tool.</p><h3>Install hey:</h3><pre>go install github.com/rakyll/hey@latest</pre><h3>Test Load:</h3><pre>hey -z 10s -c 1 -m POST -d &quot;Load Testing&quot; http://127.0.0.1:8080/function/openfaas-py-fn</pre><p>Flags:</p><ul><li>-z 10s: Run for 10 seconds</li><li>-c 1: 1 concurrent users</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*rlUpr_d-UjAaxAAoL7wqSA.png" /></figure><h3>Observe Auto-Scaling</h3><p>Check current function replicas:</p><pre>kubectl get deploy -n openfaas-fn openfaas-py-fn</pre><p>It should scale up under heavy load.</p><p>You can also monitor this behavior live via the <strong>OpenFaaS UI</strong> or <strong>Prometheus</strong>.</p><h3>Final Thoughts</h3><p><strong>OpenFaaS + Kubernetes</strong> brings the best of both worlds:</p><ul><li>The flexibility and portability of containers</li><li>The simplicity and scalability of serverless</li></ul><p>With a single CLI and a UI, OpenFaaS makes it fun to deploy and manage functions — without giving up observability, control, or performance.</p><pre>{<br>    &quot;author&quot;   :  &quot;Kartik Dudeja&quot;,<br>    &quot;email&quot;    :  &quot;kartikdudeja21@gmail.com&quot;,<br>    &quot;linkedin&quot; :  &quot;https://linkedin.com/in/kartik-dudeja&quot;,<br>    &quot;github&quot;   :  &quot;https://github.com/Kartikdudeja&quot;<br>}</pre><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=1c2e18473468" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[OpenTelemetry in Action on Kubernetes: Part 9 — Cluster-Level Observability with OpenTelemetry…]]></title>
            <link>https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-9-cluster-level-observability-with-opentelemetry-2116cd014e47?source=rss-14d13224d533------2</link>
            <guid isPermaLink="false">https://medium.com/p/2116cd014e47</guid>
            <category><![CDATA[opentelemetry]]></category>
            <category><![CDATA[observability]]></category>
            <category><![CDATA[kubernetes]]></category>
            <dc:creator><![CDATA[Kartik Dudeja]]></dc:creator>
            <pubDate>Fri, 15 Aug 2025 06:10:14 GMT</pubDate>
            <atom:updated>2025-08-30T09:05:30.006Z</atom:updated>
            <content:encoded><![CDATA[<h3>OpenTelemetry in Action on Kubernetes: Part 9 — Cluster-Level Observability with OpenTelemetry Agent + Gateway</h3><p>Welcome to the grand finale of our observability series! So far, we’ve added visibility into our <strong>application</strong> through <strong>logs</strong>, <strong>metrics</strong>, and <strong>traces</strong> — all flowing beautifully into <strong>Grafana</strong> via <strong>OpenTelemetry Collector</strong>.</p><p>But there’s still one big puzzle piece left: <strong>the Kubernetes cluster itself</strong>.</p><p>In this final part, we’ll:</p><ul><li>Collect host and node-level metrics using hostmetrics</li><li>Deploy a <strong>centralized Collector in Deployment mode (gateway)</strong></li><li>Introduce ServiceAccount for permissions</li><li>Collect Kubernetes control plane metrics using k8s_cluster</li><li>Use the debug exporter to troubleshoot data pipelines</li><li>And finally, conclude the series with a high-level recap</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/962/1*9NkJmDXB8KkZz4DUu3Bv8Q.png" /></figure><h3>Why Cluster-Level Observability Matters</h3><p>While we’ve focused on application telemetry so far, it’s just one piece of the puzzle. For full visibility, we must also observe the <strong>Kubernetes cluster</strong> itself — the infrastructure running our apps.</p><p>Cluster observability helps us:</p><ul><li>Monitor <strong>node health and resource usage</strong></li><li>Track <strong>control plane performance</strong> (API server, scheduler, etc.)</li><li>Understand <strong>pod scheduling and evictions</strong></li><li>Improve <strong>scaling decisions</strong></li><li>Troubleshoot <strong>infrastructure-level issues</strong></li><li>Strengthen <strong>security and governance</strong></li></ul><p>In short, without visibility into the cluster, you’re flying blind. This part of the series ensures you’re watching <strong>not just the app, but the platform beneath it.</strong></p><h3>Add hostmetrics Receiver in the Agent</h3><p>We’ll start by updating our <strong>otel-collector-agent</strong> (running as DaemonSet) to use the hostmetrics receiver. This receiver scrapes system-level metrics from each node, such as CPU, memory, disk, filesystem, and load.</p><p><strong>Config — </strong><strong>otel-collector-agent-configmap.yaml</strong></p><pre>receivers:<br>  hostmetrics:<br>    collection_interval: 1m<br>    scrapers:<br>      cpu: {}<br>      memory: {}<br>      disk: {}<br>      load: {}<br>      filesystem: {}<br>      network: {}<br>      system: {}<br>processors:<br>  memory_limiter:<br>    check_interval: 1s<br>    limit_percentage: 80<br>    spike_limit_percentage: 15<br>  batch:<br>    send_batch_size: 1000<br>    timeout: 5s<br>exporters:<br>  prometheus:<br>    endpoint: &quot;0.0.0.0:8889&quot;<br>    enable_open_metrics: true<br>    resource_to_telemetry_conversion:<br>      enabled: true<br>service:<br>  pipelines:<br>    # collect metrics from otlp and hostmetrics receiver and expose in prometheus compatible format<br>    metrics:<br>      receivers: [otlp, hostmetrics]<br>      processors: [memory_limiter, batch]<br>      exporters: [prometheus]</pre><blockquote><em>Each </em><em>hostmetrics receiver runs inside the agent pod on every node, giving us </em><strong><em>node-specific insights</em></strong><em>.</em></blockquote><h3>Deploy the OpenTelemetry Gateway</h3><h3>1. Why Deployment Mode?</h3><ul><li><strong>Deployment Mode</strong> is used for centralized collection, aggregation, and export of telemetry data.</li><li>Unlike the <strong>DaemonSet agent</strong>, which runs on each node, a <strong>Deployment</strong> collector can scrape and process cluster-wide metrics.</li></ul><h3>2. Create a ServiceAccount, ClusterRole, and ClusterRoleBinding</h3><p>To use the k8s_cluster receiver, the collector must have permission to access Kubernetes objects like nodes, pods, namespaces, etc.</p><h4>What is a ServiceAccount in Kubernetes?</h4><p>A <strong>ServiceAccount</strong> in Kubernetes is an identity used by pods to authenticate and interact securely with the Kubernetes API. While every pod gets a default ServiceAccount, you often need to create custom ones with specific <strong>RBAC (Role-Based Access Control)</strong> permissions for security and least privilege.</p><p>In our case, the OpenTelemetry Collector needs to <strong>read cluster state</strong> — like nodes, pods, and namespaces — to collect metrics using the k8s_cluster receiver. So, we create a dedicated ServiceAccount and bind it to a <strong>ClusterRole</strong> with read-only access to those resources. This ensures our collector can operate properly without over-privileging it.</p><pre># otel-collector-gateway-serviceaccount.yaml<br>apiVersion: v1<br>kind: ServiceAccount<br>metadata:<br>  name: otel-collector-gateway-sa<br>  namespace: observability<br>  labels:<br>    app: otel-collector-gateway  <br>---<br>apiVersion: rbac.authorization.k8s.io/v1<br>kind: ClusterRole<br>metadata:<br>  name: otel-collector-gateway-role<br>  labels:<br>    app: otel-collector-gateway<br>rules:<br>- apiGroups:<br>  - &quot;&quot;<br>  resources:<br>  - events<br>  - namespaces<br>  - namespaces/status<br>  - nodes<br>  - nodes/spec<br>  - pods<br>  - pods/status<br>  - replicationcontrollers<br>  - replicationcontrollers/status<br>  - resourcequotas<br>  - services<br>  verbs:<br>  - get<br>  - list<br>  - watch<br>- apiGroups:<br>  - apps<br>  resources:<br>  - daemonsets<br>  - deployments<br>  - replicasets<br>  - statefulsets<br>  verbs:<br>  - get<br>  - list<br>  - watch<br>- apiGroups:<br>  - extensions<br>  resources:<br>  - daemonsets<br>  - deployments<br>  - replicasets<br>  verbs:<br>  - get<br>  - list<br>  - watch<br>- apiGroups:<br>  - batch<br>  resources:<br>  - jobs<br>  - cronjobs<br>  verbs:<br>  - get<br>  - list<br>  - watch<br>- apiGroups:<br>    - autoscaling<br>  resources:<br>    - horizontalpodautoscalers<br>  verbs:<br>    - get<br>    - list<br>    - watch<br>---<br>apiVersion: rbac.authorization.k8s.io/v1<br>kind: ClusterRoleBinding<br>metadata:<br>  name: otel-collector-gateway-binding<br>  labels:<br>    app: otel-collector-gateway<br>subjects:<br>  - kind: ServiceAccount<br>    name: otel-collector-gateway-sa<br>    namespace: observability<br>roleRef:<br>  kind: ClusterRole<br>  name: otel-collector-gateway-role<br>  apiGroup: rbac.authorization.k8s.io</pre><p>Apply it:</p><pre>kubectl -n observability apply -f otel-collector-gateway-serviceaccount.yaml</pre><h3>3. OpenTelemetry Collector Config with k8s_cluster Receiver</h3><p>Create the config file as a ConfigMap.</p><pre># otel-collector-gateway-configmap.yaml<br>apiVersion: v1<br>kind: ConfigMap<br>metadata:<br>  name: otel-collector-gateway-config<br>  namespace: observability<br>  labels:<br>    app: otel-collector-gateway<br>data:<br>  otel-collector-config.yaml: |<br>    receivers:<br>      k8s_cluster:<br>        auth_type: &quot;serviceAccount&quot;<br>        collection_interval: 30s<br>    processors:<br>      memory_limiter:<br>        check_interval: 1s<br>        limit_percentage: 80<br>        spike_limit_percentage: 15<br>      batch:<br>        send_batch_size: 1000<br>        timeout: 5s<br>    exporters:<br>      debug:<br>        verbosity: detailed<br>      prometheus:<br>        endpoint: &quot;0.0.0.0:8889&quot;<br>        enable_open_metrics: true<br>        resource_to_telemetry_conversion:<br>          enabled: true<br>    service:<br>      pipelines:<br>        metrics:<br>          receivers: [k8s_cluster]<br>          processors: [memory_limiter, batch]<br>          exporters: [prometheus]</pre><p>Apply it:</p><pre>kubectl -n observability apply -f otel-collector-gateway-configmap.yaml</pre><h3>4. Deploy the OpenTelemetry Collector</h3><pre># otel-collector-gateway-deployment.yaml<br>apiVersion: apps/v1<br>kind: Deployment<br>metadata:<br>  name: otel-collector-gateway<br>  namespace: observability<br>  labels:<br>    app: otel-collector-gateway  <br>spec:<br>  replicas: 1<br>  revisionHistoryLimit: 3<br>  strategy:<br>    type: RollingUpdate<br>    rollingUpdate:<br>      maxSurge: 25%           # Allow 25% more pods than desired during update<br>      maxUnavailable: 25%     # Allow 25% of desired pods to be unavailable during update<br>  selector:<br>    matchLabels:<br>      app: otel-collector-gateway<br>  template:<br>    metadata:<br>      labels:<br>        app: otel-collector-gateway<br>    spec:<br>      serviceAccountName: otel-collector-gateway-sa<br>      containers:<br>        - name: otel-collector<br>          image: otel/opentelemetry-collector-contrib:latest<br>          args: [&quot;--config=/conf/otel-collector-config.yaml&quot;]<br>          volumeMounts:<br>            - name: config-volume<br>              mountPath: /conf<br>          resources:<br>            requests:<br>              cpu: 10m<br>              memory: 32Mi<br>            limits:<br>              cpu: 50m<br>              memory: 128Mi<br>      volumes:<br>        - name: config-volume<br>          configMap:<br>            name: otel-collector-gateway-config</pre><p>Apply it</p><pre>kubectl -n observability apply -f otel-collector-gateway-deployment.yaml</pre><h3>5. Expose Collector to Prometheus</h3><pre># otel-collector-gateway-service.yaml<br>apiVersion: v1<br>kind: Service<br>metadata:<br>  name: otel-collector-gateway<br>  namespace: observability<br>  labels:<br>    app: otel-collector-gateway<br>spec:<br>  selector:<br>    app: otel-collector-gateway<br>  ports:<br>    - name: otlp-grpc<br>      port: 4317<br>      targetPort: 4317<br>      protocol: TCP<br>    - name: otlp-http<br>      port: 4318<br>      targetPort: 4318<br>      protocol: TCP<br>    - name: prometheus<br>      port: 8889<br>      targetPort: 8889<br>      protocol: TCP    <br>  type: ClusterIP</pre><p>Apply:</p><pre>kubectl -n observability apply -f otel-collector-gateway-service.yaml</pre><p>Then add this to your Prometheus scrape_configs:</p><pre>- job_name: &#39;otel-collector-gateway&#39;<br>  static_configs:<br>    - targets: [&#39;otel-collector-gateway.observability.svc.cluster.local:8889&#39;]</pre><h3>Test and Verify</h3><p>Check deployment status:</p><pre>kubectl -n observability get all -l app=otel-collector-gateway</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Tk8OFA1gUGec6Q1ybhCqfA.png" /></figure><h3>Special Mention: Debug Exporter — Your Observability Wingman</h3><p>The <strong>debug exporter</strong> in OpenTelemetry Collector is a lightweight and incredibly helpful tool for developers and DevOps engineers when building or troubleshooting telemetry pipelines.</p><p>Instead of exporting telemetry data (like logs, metrics, and traces) to a backend system like Prometheus or Jaeger, the debug exporter simply <strong>prints the data to the Collector’s stdout</strong>. This means:</p><ul><li>You can <strong>see exactly what telemetry data is being received and processed</strong> — live in the logs.</li><li>It helps <strong>validate instrumentation quickly</strong>, without setting up full observability backends.</li><li>It’s especially useful when you’re <strong>testing new receivers, processors, or pipelines</strong>, and want a quick look at the output.</li></ul><h3>When to Use</h3><ul><li><strong>Local testing or dev environments</strong>.</li><li><strong>Debugging broken data flow</strong> — if Prometheus or Jaeger isn’t showing what you expect.</li><li><strong>Learning how OpenTelemetry transforms and routes telemetry data.</strong></li></ul><h3>Example Configuration Snippet</h3><pre>exporters:<br>  debug:<br>    verbosity: detailed  # outputs full content of each signal</pre><p>Then, reference it in your pipeline like this:</p><pre>service:<br>  pipelines:<br>    traces:<br>      receivers: [otlp]<br>      processors: [batch]<br>      exporters: [jaeger, debug]</pre><p>This ensures traces are sent to Jaeger <strong>and also printed</strong> to the console — great for double-checking what’s going in.</p><h3>Conclusion: You Now Have Full Observability!</h3><p>Over the past 9 parts, you’ve:</p><ul><li>Containerized a real ML application</li><li>Instrumented it with OpenTelemetry</li><li>Collected traces, logs, and metrics</li><li>Deployed observability tools in Kubernetes</li><li>Visualized everything in Grafana</li><li>Monitored the entire Kubernetes cluster with Agent + Gateway mode</li></ul><p>You’ve essentially built a <strong>production-grade observability platform</strong> from scratch — <strong>without cloud vendor lock-in</strong>.</p><h3>Missed the previous article?</h3><p>Check out <a href="https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-8-visualize-everything-building-a-unified-255450edd4d4"><strong>Part 8: Visualize Everything, Building a Unified Observability Dashboard with Grafana</strong></a> to see how we got here.</p><pre>{<br>    &quot;author&quot;   :  &quot;Kartik Dudeja&quot;,<br>    &quot;email&quot;    :  &quot;kartikdudeja21@gmail.com&quot;,<br>    &quot;linkedin&quot; :  &quot;https://linkedin.com/in/kartik-dudeja&quot;,<br>    &quot;github&quot;   :  &quot;https://github.com/Kartikdudeja&quot;<br>}</pre><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=2116cd014e47" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[OpenTelemetry in Action on Kubernetes: Part 8 — Visualize Everything, Building a Unified…]]></title>
            <link>https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-8-visualize-everything-building-a-unified-255450edd4d4?source=rss-14d13224d533------2</link>
            <guid isPermaLink="false">https://medium.com/p/255450edd4d4</guid>
            <category><![CDATA[kubernetes]]></category>
            <category><![CDATA[opentelemetry]]></category>
            <category><![CDATA[observability]]></category>
            <dc:creator><![CDATA[Kartik Dudeja]]></dc:creator>
            <pubDate>Sat, 02 Aug 2025 11:20:17 GMT</pubDate>
            <atom:updated>2025-08-30T09:02:37.537Z</atom:updated>
            <content:encoded><![CDATA[<h3>OpenTelemetry in Action on Kubernetes: Part 8 — Visualize Everything, Building a Unified Observability Dashboard with Grafana</h3><h3>Why Visualization Matters</h3><p>Telemetry data — logs, metrics, and traces — gives you deep insights into your system’s behavior. But let’s be honest: staring at JSON traces or YAML logs isn’t exactly thrilling.</p><p><strong>That’s where visualization comes in.</strong></p><p>A good dashboard:</p><ul><li>Gives instant visibility into system health</li><li>Helps correlate metrics, logs, and traces</li><li>Makes debugging, alerting, and capacity planning effortless</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/962/1*9NkJmDXB8KkZz4DUu3Bv8Q.png" /></figure><h3>Meet Grafana: The Observatory for Observability</h3><p><strong>Grafana</strong> is an open-source analytics and visualization platform designed to work with various telemetry backends — including:</p><ul><li><strong>Prometheus</strong> (for metrics)</li><li><strong>Loki</strong> (for logs)</li><li><strong>Jaeger</strong> (for traces)</li></ul><p>Grafana is:</p><ul><li>Pluggable</li><li>Real-time</li><li>Customizable</li></ul><p>It turns raw observability data into <strong>actionable dashboards</strong>.</p><h3>Deploying Grafana in Kubernetes</h3><p>We’ll deploy Grafana with a basic Deployment and Service. You can customize it with a persistent volume or admin credentials if needed.</p><pre>apiVersion: apps/v1<br>kind: Deployment<br>metadata:<br>  name: grafana<br>  labels:<br>    app: grafana<br>spec:<br>  replicas: 1<br>  selector:<br>    matchLabels:<br>      app: grafana<br>  template:<br>    metadata:<br>      labels:<br>        app: grafana<br>    spec:<br>      containers:<br>        - name: grafana<br>          image: grafana/grafana:10.3.1<br>          resources:<br>            requests:<br>              cpu: &quot;10m&quot;<br>              memory: &quot;56Mi&quot;<br>            limits:<br>              cpu: &quot;20m&quot;<br>              memory: &quot;128Mi&quot;<br>          ports:<br>            - containerPort: 3000<br>          volumeMounts:<br>            - name: grafana-storage<br>              mountPath: /var/lib/grafana<br>          env:<br>            - name: GF_SECURITY_ADMIN_USER<br>              value: &quot;admin&quot;<br>            - name: GF_SECURITY_ADMIN_PASSWORD<br>              value: &quot;admin&quot;<br>      volumes:<br>        - name: grafana-storage<br>          emptyDir: {}  # Replace with PersistentVolumeClaim for persistence<br><br>---<br>apiVersion: v1<br>kind: Service<br>metadata:<br>  name: grafana<br>  labels:<br>    app: grafana<br>spec:<br>  selector:<br>    app: grafana<br>  ports:<br>    - protocol: TCP<br>      port: 3000<br>      targetPort: 3000<br>  type: ClusterIP</pre><h3>Deploy Grafana</h3><pre># Apply deployment and service files<br>kubectl -n observability apply -f grafana.yaml<br><br># Check Grafana pod logs<br>kubectl logs -l app=grafana -n observability<br><br># Port-forward Grafana service to access UI locally<br>kubectl -n observability port-forward svc/grafana 3000:3000</pre><p>Now visit <a href="http://localhost:3000/">http://localhost:3000</a> in your browser.<br><strong>Default credentials</strong>:</p><ul><li>Username: admin</li><li>Password: admin</li></ul><h3>Configure Datasources in Grafana</h3><p>Once inside the Grafana UI, follow these steps to add your observability backends:</p><h3>Add Prometheus as a Datasource:</h3><ul><li>Go to <strong>Home</strong> → <strong>Connections</strong> → <strong>Data sources</strong></li><li>Click <strong>Add new data source</strong></li><li>Choose <strong>Prometheus</strong></li><li>Set URL to:</li></ul><pre>http://prometheus.observability.svc.cluster.local:9090</pre><ul><li>Click <strong>Save &amp; Test</strong></li></ul><h3>Add Loki as a Datasource:</h3><ul><li>Repeat above steps, choose <strong>Loki</strong></li><li>Set URL to:</li></ul><pre>http://loki.observability.svc.cluster.local:3100</pre><ul><li>Save &amp; Test</li></ul><h3>Add Jaeger as a Datasource:</h3><ul><li>Choose <strong>Jaeger</strong> from the list</li><li>Set URL to:</li></ul><pre>http://jaeger.observability.svc.cluster.local:16686</pre><ul><li>Save &amp; Test</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*aWRMibMABdte8P2psEIIzg.png" /></figure><h3>Explore Logs, Metrics, and Traces</h3><p>Head over to the <strong>Explore</strong> tab in Grafana:</p><ul><li>Select <strong>Loki</strong> → Run a log query like</li></ul><pre>{exporter=&quot;OTLP&quot;} |= `house-price-service`</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*hXefmWfYnyegRUQhIhOX_w.png" /></figure><ul><li>Select <strong>Jaeger</strong> → Search traces for your app, filtered by service name</li><li>Select <strong>Prometheus</strong> → Query custom app metrics</li></ul><blockquote><em>This is your real-time debugging playground.</em></blockquote><h3>Build a Unified Dashboard</h3><p>Now let’s pull it all together.</p><h3>Steps to Create a Dashboard:</h3><ol><li>Go to the <strong>Dashboards</strong> section → Click <strong>New Dashboard</strong></li><li>Add a <strong>Panel</strong>:</li></ol><ul><li><strong>For Metrics</strong>: Use Prometheus queries (e.g., request rate, latency)</li><li><strong>For Logs</strong>: Use Loki query (e.g., by app label)</li><li><strong>For Traces</strong>: Use Jaeger panel or link to trace visualizer</li></ul><p>3. Organize the panels side-by-side:</p><ul><li>App throughput (metric)</li><li>App logs (filtered view)</li><li>Recent traces</li></ul><p>4. Save the dashboard and give it a name like House Price App Observability</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/0*mbxmb9D2AuNYOGIo.png" /></figure><h3>Conclusion</h3><p>You now have a complete, <strong>three-pillar observability stack</strong> running on Kubernetes:</p><ul><li>Metrics via <strong>Prometheus</strong></li><li>Logs via <strong>Loki</strong></li><li>Traces via <strong>Jaeger</strong></li><li>Visualized in <strong>Grafana</strong></li></ul><p>All powered by OpenTelemetry — the glue connecting them.</p><h3>What’s Next?</h3><p>You now have full visibility into your application — but what about the <strong>Kubernetes cluster itself</strong>?</p><p>In the <a href="https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-9-cluster-level-observability-with-opentelemetry-2116cd014e47">final part</a> of the series, we’ll expand our observability beyond the app and dive into <strong>cluster-level insights</strong>. This includes monitoring:</p><ul><li>Node and pod CPU/memory usage</li><li>Kubernetes control plane metrics</li><li>Scheduler performance, kubelet stats, and more</li></ul><h3>Missed the previous article?</h3><p>Check out <a href="https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-7-let-there-be-logs-observabilitys-final-pillar-02f31e59ff55"><strong>Part 7: Let There Be Logs, Observability’s Final Pillar with Loki</strong></a> to see how we got here.</p><pre>{<br>    &quot;author&quot;   :  &quot;Kartik Dudeja&quot;,<br>    &quot;email&quot;    :  &quot;kartikdudeja21@gmail.com&quot;,<br>    &quot;linkedin&quot; :  &quot;https://linkedin.com/in/kartik-dudeja&quot;,<br>    &quot;github&quot;   :  &quot;https://github.com/Kartikdudeja&quot;<br>}</pre><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=255450edd4d4" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[OpenTelemetry in Action on Kubernetes: Part 7 — Let There Be Logs, Observability’s Final Pillar…]]></title>
            <link>https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-7-let-there-be-logs-observabilitys-final-pillar-02f31e59ff55?source=rss-14d13224d533------2</link>
            <guid isPermaLink="false">https://medium.com/p/02f31e59ff55</guid>
            <category><![CDATA[opentelemetry]]></category>
            <category><![CDATA[observability]]></category>
            <category><![CDATA[kubernetes]]></category>
            <dc:creator><![CDATA[Kartik Dudeja]]></dc:creator>
            <pubDate>Wed, 30 Jul 2025 02:39:22 GMT</pubDate>
            <atom:updated>2025-09-05T11:19:06.742Z</atom:updated>
            <content:encoded><![CDATA[<h3><strong>OpenTelemetry in Action on Kubernetes: Part 7 — Let There Be Logs, Observability’s Final Pillar with Loki</strong></h3><h3>Logs: The Footprints of Your System</h3><p><strong>Logs</strong> are timestamped records of events that happen in your system — like breadcrumbs left behind by your application as it performs operations. They help you understand what happened, when it happened, and often why it happened.</p><p>In observability, logs play a key role when:</p><ul><li>Metrics show a spike but don’t tell you why.</li><li>Traces reveal latency but not the root cause.</li><li>You want to debug something that happened at 3 AM… last Thursday.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/962/1*9NkJmDXB8KkZz4DUu3Bv8Q.png" /></figure><h3>Meet Loki — Prometheus for Logs</h3><p><strong>Loki</strong>, built by the folks at Grafana Labs, is a log aggregation system designed to be:</p><ul><li><strong>Lightweight</strong>: It indexes only labels, not the full log content.</li><li><strong>Kubernetes-native</strong>: Integrates beautifully with pod logs.</li><li><strong>Prometheus-like</strong>: Designed to feel familiar if you’ve used Prometheus.</li></ul><p>Instead of shipping logs to a bulky ELK stack, Loki works smoothly with Promtail, FluentBit, or OpenTelemetry Collector to aggregate logs from across your cluster.</p><h3>Deploying Loki in Kubernetes</h3><p>Let’s deploy Loki using a simple YAML manifest that includes:</p><ol><li>A <strong>Deployment</strong> to run the Loki service</li><li>A <strong>Service</strong> to expose Loki inside the cluster</li><li>A <strong>ConfigMap</strong> to configure how Loki receives and stores logs</li></ol><pre>---<br>apiVersion: v1<br>kind: ConfigMap<br>metadata:<br>  name: loki-config<br>  labels:<br>    app: loki<br>data:<br>  loki.yaml: |<br>    auth_enabled: false<br>    server:<br>      http_listen_port: 3100<br>    common:<br>      path_prefix: /loki<br>      ring:<br>        instance_addr: 127.0.0.1<br>        kvstore:<br>          store: inmemory<br>    ingester_client:<br>      grpc_client_config:<br>        max_send_msg_size: 104857600<br>        max_recv_msg_size: 104857600<br>      remote_timeout: 5s<br>    ingester:<br>      lifecycler:<br>        ring:<br>          kvstore:<br>            store: inmemory<br>          replication_factor: 1<br>    schema_config:<br>      configs:<br>        - from: 2020-10-27<br>          store: boltdb-shipper<br>          object_store: filesystem<br>          schema: v11<br>          index:<br>            prefix: index_<br>            period: 24h<br>    storage_config:<br>      boltdb_shipper:<br>        active_index_directory: /loki/index<br>        cache_location: /loki/cache<br>        shared_store: filesystem<br>      filesystem:<br>        directory: /loki/chunks<br>    limits_config:<br>      enforce_metric_name: false<br>      max_streams_per_user: 0<br>      max_chunks_per_query: 1000000<br>      max_query_series: 50000<br>      max_query_lookback: 720h<br>    ruler:<br>      storage:<br>        type: local<br>        local:<br>          directory: /loki/rules<br>      ring:<br>        kvstore:<br>          store: inmemory<br>    analytics:<br>      reporting_enabled: false<br><br>---<br><br>apiVersion: apps/v1<br>kind: Deployment<br>metadata:<br>  name: loki<br>  labels:<br>    app: loki<br>spec:<br>  replicas: 1<br>  selector:<br>    matchLabels:<br>      app: loki<br>  template:<br>    metadata:<br>      labels:<br>        app: loki<br>    spec:<br>      containers:<br>        - name: loki<br>          image: grafana/loki:2.9.2<br>          args:<br>            - &quot;-config.file=/etc/loki/loki.yaml&quot;<br>          ports:<br>            - name: http<br>              containerPort: 3100<br>          volumeMounts:<br>            - name: config<br>              mountPath: /etc/loki<br>              readOnly: true<br>            - name: storage<br>              mountPath: /loki<br>          resources:<br>            requests:<br>              memory: &quot;256Mi&quot;<br>              cpu: &quot;250m&quot;<br>            limits:<br>              memory: &quot;512Mi&quot;<br>              cpu: &quot;500m&quot;<br>      volumes:<br>        - name: config<br>          configMap:<br>            name: loki-config<br>        - name: storage<br>          emptyDir: {}<br><br>---<br><br>apiVersion: v1<br>kind: Service<br>metadata:<br>  name: loki<br>  labels:<br>    app: loki<br>spec:<br>  selector:<br>    app: loki<br>  ports:<br>    - name: http-metrics<br>      port: 3100<br>      targetPort: 3100<br></pre><p>The Loki manifest sets up a log aggregator inside your Kubernetes cluster that listens for incoming logs on a defined port. The service makes Loki accessible to other components, such as the OTEL Collector, while the ConfigMap gives Loki its brain — deciding how logs flow and where they go.</p><h3>Updating OpenTelemetry Collector to Send Logs to Loki</h3><p>We now need to tell the <strong>OTEL Collector Agent</strong> to collect logs using the filelog receiver and ship them off to Loki. Here&#39;s the flow:</p><ul><li>filelog: Reads logs from Kubernetes pod files.</li><li>loki exporter: Pushes these logs to the Loki service using HTTP.</li></ul><pre>receivers:<br>  filelog:<br>    include: [ /var/log/pods/*/*/*.log ]<br>    start_at: beginning<br>    include_file_path: true<br>    include_file_name: true<br>exporters:<br>  loki:<br>    endpoint: &quot;http://loki.observability.svc.cluster.local:3100/loki/api/v1/push&quot;<br>    tls:<br>      insecure: true<br>    sending_queue:<br>      enabled: true<br>service:<br>  pipelines:<br>    # collect logs using &#39;filelog&#39; receiver and ship them to loki<br>    logs:<br>      receivers: [filelog]<br>      processors: [memory_limiter, batch]<br>      exporters: [loki]</pre><h3>Deploying Loki to Kubernetes</h3><pre># Apply the Loki manifests<br>kubectl -n observability apply -f loki.yaml<br><br># Verify Loki is running<br>kubectl -n observability get pods -l app=loki<br><br># check readiness of loki<br>curl -X GET &quot;http://$(kubectl -n observability get svc -l app=loki -o json | jq -r &#39;.items[].spec.clusterIP&#39;):3100/ready&quot;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*D2tUPLJMcM6-6TI-c03EXg.png" /></figure><h3>What’s Next?</h3><p>In <a href="https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-8-visualize-everything-building-a-unified-255450edd4d4">Part 8</a>, we’ll bring everything together with <strong>Grafana</strong> — the ultimate observability dashboard:</p><ul><li>Visualizing traces from <strong>Jaeger</strong></li><li>Querying metrics from <strong>Prometheus</strong></li><li>Searching logs from <strong>Loki</strong></li><li>All in a single unified interface.</li></ul><p>The observability trifecta — complete, powerful, and open source. Stay tuned.</p><h3>Missed the previous article?</h3><p>Check out <a href="https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-6-tracking-metrics-with-prometheus-and-6ba9e4ca9442"><strong>Part 6: Tracking Metrics with Prometheus and OpenTelemetry</strong></a> to see how we got here.</p><pre>{<br>    &quot;author&quot;   :  &quot;Kartik Dudeja&quot;,<br>    &quot;email&quot;    :  &quot;kartikdudeja21@gmail.com&quot;,<br>    &quot;linkedin&quot; :  &quot;https://linkedin.com/in/kartik-dudeja&quot;,<br>    &quot;github&quot;   :  &quot;https://github.com/Kartikdudeja&quot;<br>}</pre><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=02f31e59ff55" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[OpenTelemetry in Action on Kubernetes: Part 6 — Tracking Metrics with Prometheus and…]]></title>
            <link>https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-6-tracking-metrics-with-prometheus-and-6ba9e4ca9442?source=rss-14d13224d533------2</link>
            <guid isPermaLink="false">https://medium.com/p/6ba9e4ca9442</guid>
            <category><![CDATA[opentelemetry]]></category>
            <category><![CDATA[observability]]></category>
            <category><![CDATA[kubernetes]]></category>
            <dc:creator><![CDATA[Kartik Dudeja]]></dc:creator>
            <pubDate>Sat, 19 Jul 2025 08:42:00 GMT</pubDate>
            <atom:updated>2025-08-30T09:13:57.919Z</atom:updated>
            <content:encoded><![CDATA[<h3>OpenTelemetry in Action on Kubernetes: Part 6 — Tracking Metrics with Prometheus and OpenTelemetry</h3><p>Observability isn’t complete without <strong>metrics</strong> — the vital signs of your applications and services. In this part, we integrate <strong>Prometheus</strong> into our Kubernetes-based observability stack. You’ll learn how Prometheus works with OpenTelemetry, deploy it into your cluster, and finally visualize custom application metrics generated in <a href="https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-2-instrument-and-dockerizing-15df04e2319a">Part 2</a>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/962/1*9NkJmDXB8KkZz4DUu3Bv8Q.png" /></figure><h3>What is Prometheus?</h3><p><strong>Prometheus</strong> is an open-source monitoring system that scrapes metrics from configured targets, stores them in a time-series database, and allows you to query them using PromQL. It’s widely adopted in the Kubernetes ecosystem for infrastructure and application monitoring.</p><p>Prometheus doesn’t “pull” metrics directly from applications. Instead, apps <strong>expose metrics at an endpoint</strong>, and Prometheus regularly scrapes these endpoints to collect data.</p><p>When integrated with OpenTelemetry, the <strong>OpenTelemetry Collector</strong> acts as a bridge — it collects metrics from instrumented applications and exposes them in a Prometheus-compatible format.</p><h3>What are Metrics?</h3><p><strong>Metrics</strong> are numerical data points that capture the health, performance, and resource usage of your system. For example:</p><ul><li>API requests per second</li><li>Response latency</li><li>CPU and memory usage</li></ul><p>In our app, we’ve already defined two custom metrics:</p><ul><li>api_requests_total: total number of requests per endpoint</li><li>api_latency_seconds: histogram for API latency</li></ul><p>Now, let’s expose them to Prometheus.</p><h3>Prometheus Deployment</h3><p>Let’s deploy Prometheus into our Kubernetes cluster. You’ll need the following YAML file:</p><pre>apiVersion: apps/v1<br>kind: Deployment<br>metadata:<br>  name: prometheus<br>  labels:<br>    app: prometheus<br>spec:<br>  replicas: 1<br>  selector:<br>    matchLabels:<br>      app: prometheus<br>  template:<br>    metadata:<br>      labels:<br>        app: prometheus<br>    spec:<br>      containers:<br>        - name: prometheus<br>          image: prom/prometheus:latest<br>          args:<br>            - &quot;--config.file=/etc/prometheus/prometheus.yml&quot;<br>            - &quot;--storage.tsdb.path=/prometheus&quot;<br>            - &quot;--log.level=debug&quot;<br>          resources:<br>            requests:<br>              cpu: &quot;10m&quot;<br>              memory: &quot;56Mi&quot;<br>            limits:<br>              cpu: &quot;20m&quot;<br>              memory: &quot;128Mi&quot;<br>          ports:<br>            - containerPort: 9090<br>          volumeMounts:<br>            - name: config-volume<br>              mountPath: /etc/prometheus/<br>            - name: storage-volume<br>              mountPath: /prometheus<br>      volumes:<br>        - name: config-volume<br>          configMap:<br>            name: prometheus-config<br>        - name: storage-volume<br>          emptyDir: {}<br>---<br>apiVersion: v1<br>kind: Service<br>metadata:<br>  name: prometheus<br>  labels:<br>    app: prometheus<br>spec:<br>  selector:<br>    app: prometheus<br>  ports:<br>    - protocol: TCP<br>      port: 9090<br>      targetPort: 9090<br>  type: ClusterIP<br>---<br>apiVersion: v1<br>kind: ConfigMap<br>metadata:<br>  name: prometheus-config<br>data:<br>  prometheus.yml: |<br>    global:<br>      scrape_interval: 15s<br>    scrape_configs:<br>      - job_name: &#39;prometheus&#39;<br>        static_configs:<br>          - targets: [&#39;localhost:9090&#39;]<br>      - job_name: &#39;otel-collector-agent&#39;<br>        static_configs:<br>          - targets: [&#39;otel-collector-agent.observability.svc.cluster.local:8889&#39;]</pre><p>This YAML file deploys <strong>Prometheus</strong> into the Kubernetes cluster with three key components:</p><ul><li>A <strong>Deployment</strong> that runs the Prometheus server using the official image and mounts a configuration volume,</li><li>A <strong>Service</strong> that exposes Prometheus on port 9090, enabling access to its UI and scrape endpoint, and</li><li>A <strong>ConfigMap</strong> that provides the Prometheus scrape configuration, telling it to scrape metrics from itself and from the OpenTelemetry Collector agent on port 8889.</li></ul><p>Together, these resources allow Prometheus to run continuously, collect metrics from OTEL, and expose them for querying and visualization.</p><h3>Prometheus Config Explained</h3><p>Here’s a minimal Prometheus configuration we’ll use:</p><pre>global:<br>  scrape_interval: 15s<br>scrape_configs:<br>  - job_name: &#39;prometheus&#39;<br>    static_configs:<br>      - targets: [&#39;localhost:9090&#39;]<br>  - job_name: &#39;otel-collector-agent&#39;<br>    static_configs:<br>      - targets: [&#39;otel-collector-agent.observability.svc.cluster.local:8889&#39;]</pre><p>This configuration tells Prometheus to scrape metrics every <strong>15 seconds</strong>. It monitors itself (localhost:9090) and also scrapes the <strong>OpenTelemetry Collector agent</strong> at its service endpoint (otel-collector-agent.observability.svc.cluster.local:8889). This is where our app metrics are exposed.</p><h3>Updating OpenTelemetry Collector Config</h3><p>We need to update our OTEL Collector configuration to export metrics to Prometheus.</p><p>Here’s the relevant config:</p><pre>    receivers:<br>      otlp:<br>        protocols:<br>          grpc:<br>            endpoint: 0.0.0.0:4317  # receive traces and metrics from instrumented application<br><br>    processors:<br>      memory_limiter:<br>        check_interval: 1s<br>        limit_percentage: 80<br>        spike_limit_percentage: 15<br><br>      batch:<br>        send_batch_size: 1000<br>        timeout: 5s<br><br>    exporters:<br>      otlp/jaeger:<br>        endpoint: &quot;http://jaeger.observability.svc.cluster.local:4317&quot;  # export traces to jaeger<br>        tls:<br>          insecure: true<br><br>      prometheus:<br>        endpoint: &quot;0.0.0.0:8889&quot;<br>        enable_open_metrics: true<br>        resource_to_telemetry_conversion:<br>          enabled: true<br><br>    service:<br>      pipelines:<br>        # collect trace data using otlp receiver and send it to jaeger<br>        traces:<br>          receivers: [otlp]<br>          processors: [memory_limiter, batch]<br>          exporters: [otlp/jaeger]<br><br>        # collect metrics from otlp receiver and expose in prometheus compatible format<br>        metrics:<br>          receivers: [otlp]<br>          processors: [memory_limiter, batch]<br>          exporters: [prometheus]</pre><h3>Deploying Prometheus to Kubernetes</h3><pre># Apply Prometheus config and deployment<br>kubectl -n observability apply -f prometheus.yaml</pre><h3>Visualizing Custom Metrics</h3><ol><li>Make some API calls to the application:</li></ol><pre>API_ENDPOINT_IP=$(kubectl -n mlapp get svc -l app=house-price-service -o json | jq -r &#39;.items[].spec.clusterIP&#39;)</pre><pre>curl -X POST &quot;http://${API_ENDPOINT_IP}:80/predict/&quot; \<br>    -H &quot;Content-Type: application/json&quot; \<br>    -d &#39;{&quot;features&quot;: [1200]}&#39;</pre><ol><li>Open Prometheus UI:</li></ol><pre>kubectl port-forward svc/prometheus -n observability 9090:9090</pre><ol><li>In the Prometheus UI (http://localhost:9090), search for:</li></ol><ul><li>api_requests_total</li><li>api_latency_seconds</li></ul><p>You should see data flowing in!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*vot_L58kmi-pEZigHSBr1Q.png" /></figure><h3>What’s Next?</h3><p>Now that we’ve captured and visualized <strong>metrics</strong>, the observability story is coming together. But there’s still one pillar left — <strong>logs</strong>.</p><p>In <a href="https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-7-let-there-be-logs-observabilitys-final-pillar-02f31e59ff55">Part 7</a>, we’ll deploy <strong>Loki</strong>, the log aggregation system, and configure the OpenTelemetry Collector to ship structured logs from our app to Loki. Stay tuned!</p><h3>Missed the previous article?</h3><p>Check out <a href="https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-5-tracing-the-lines-sending-spans-from-app-to-9891b1839e32"><strong>Part 5: Tracing the Lines, Sending Spans from App to Jaeger</strong></a> to see how we got here.</p><pre>{<br>    &quot;author&quot;   :  &quot;Kartik Dudeja&quot;,<br>    &quot;email&quot;    :  &quot;kartikdudeja21@gmail.com&quot;,<br>    &quot;linkedin&quot; :  &quot;https://linkedin.com/in/kartik-dudeja&quot;,<br>    &quot;github&quot;   :  &quot;https://github.com/Kartikdudeja&quot;<br>}</pre><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=6ba9e4ca9442" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[OpenTelemetry in Action on Kubernetes: Part 5 — Tracing the Lines, Sending Spans from App to…]]></title>
            <link>https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-5-tracing-the-lines-sending-spans-from-app-to-9891b1839e32?source=rss-14d13224d533------2</link>
            <guid isPermaLink="false">https://medium.com/p/9891b1839e32</guid>
            <category><![CDATA[observability]]></category>
            <category><![CDATA[kubernetes]]></category>
            <category><![CDATA[opentelemetry]]></category>
            <dc:creator><![CDATA[Kartik Dudeja]]></dc:creator>
            <pubDate>Sat, 12 Jul 2025 16:03:57 GMT</pubDate>
            <atom:updated>2025-08-30T09:13:19.494Z</atom:updated>
            <content:encoded><![CDATA[<h3>OpenTelemetry in Action on Kubernetes: Part 5 — Tracing the Lines, Sending Spans from App to Jaeger</h3><p>In the <a href="https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-4-deploying-opentelemetry-collector-agent-mode-df1b060b3a01">last part</a>, we set up the OpenTelemetry Collector in <strong>agent mode</strong> to receive telemetry data from our ML app. But telemetry isn’t useful if it’s just sitting in logs, right? We want <strong>end-to-end traces</strong> that we can <strong>visualize, search, and troubleshoot</strong>.</p><p>And that’s exactly where <strong>Jaeger</strong> enters the scene.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/962/1*9NkJmDXB8KkZz4DUu3Bv8Q.png" /></figure><h3>What is Jaeger?</h3><p>Jaeger is an <strong>open-source distributed tracing system</strong>, originally built by Uber, and now part of the CNCF. It helps you:</p><ul><li>Monitor distributed transactions</li><li>Understand application latency</li><li>Perform root cause analysis</li><li>Visualize request flow across services</li></ul><p>In short, if your app is a mystery novel, Jaeger is Sherlock Holmes.</p><h3>What are Traces and Spans?</h3><ul><li>A <strong>trace</strong> is a complete journey of a request through your app — from start to finish.</li><li>A <strong>span</strong> is a single step in that journey, like one function call or one external API hit.</li></ul><p>Think of a <strong>trace</strong> as the delivery of a pizza. Every <strong>span</strong> is a milestone in that process — order placed, pizza prepared, baked, out for delivery, delivered. Jaeger shows you the whole pizza journey.</p><h3>Jaeger Deployment in Kubernetes</h3><p>Let’s deploy Jaeger in our Kubernetes cluster.</p><p>This YAML configuration sets up a single-instance Jaeger deployment in all-in-one mode within a Kubernetes cluster, suitable for development environments. The deployment uses the jaegertracing/all-in-one image and exposes key ports for telemetry (OTLP gRPC on 4317) and visualization (UI on 16686).</p><p>The associated ClusterIP service allows internal communication within the cluster, enabling the OpenTelemetry Collector to send trace data to Jaeger and providing access to the Jaeger UI via port forwarding for trace analysis and visualization.</p><pre>apiVersion: apps/v1<br>kind: Deployment<br>metadata:<br>  name: jaeger<br>  labels:<br>    app: jaeger<br>spec:<br>  replicas: 1<br>  selector:<br>    matchLabels:<br>      app: jaeger<br>  template:<br>    metadata:<br>      labels:<br>        app: jaeger<br>    spec:<br>      containers:<br>      - name: jaeger<br>        image: jaegertracing/all-in-one:latest<br>        resources:<br>          requests:<br>            cpu: &quot;10m&quot;<br>            memory: &quot;128Mi&quot;<br>          limits:<br>            cpu: &quot;20m&quot;<br>            memory: &quot;256Mi&quot;        <br>        ports:<br>        - containerPort: 4317<br>        - containerPort: 6831<br>        - containerPort: 16686<br>        - containerPort: 14250<br>---<br>apiVersion: v1<br>kind: Service<br>metadata:<br>  name: jaeger<br>spec:<br>  selector:<br>    app: jaeger<br>  type: ClusterIP    <br>  ports:<br>  - name: ui<br>    port: 16686<br>    targetPort: 16686<br>  - name: grpc<br>    port: 4317<br>    targetPort: 4317</pre><p>Save this configuration in a yaml file jaeger.yaml and deploy the jaeger using the following command:</p><pre>kubectl -n observability apply -f jaeger.yaml</pre><p>You can verify it’s up using:</p><pre>kubectl -n observability get all -l app=jaeger</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/693/1*5WQKrs1ZzDr9ZfX5TAVjag.png" /></figure><p>The Jaeger UI will be available at the service’s ClusterIP. Use kubectl port-forward to access the Jaeger UI locally:</p><pre>kubectl -n observability port-forward svc/jaeger 16686:16686</pre><p>Now open <a href="http://localhost:16686/">http://localhost:16686</a> in your browser.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*12GU3I-XA06P9PAXMrGf8g.png" /></figure><h3>Update the OpenTelemetry Collector Pipeline</h3><p>Now that Jaeger is live, we need to <strong>update the OpenTelemetry Collector config</strong> to <strong>export spans to Jaeger</strong>.</p><pre># otel-collector-agent-configmap.yaml<br>apiVerson: v1<br>kind: ConfigMap<br>metadata:<br>  name: otel-collector-agent-config<br>  namespace: observability<br>data:<br>  otel-collector-config.yaml: |<br>    receivers:<br>      otlp:<br>        protocols:<br>          grpc:<br>            endpoint: 0.0.0.0:4317  # receive traces and metrics from instrumented application<br>    processors:<br>      memory_limiter:<br>        check_interval: 1s<br>        limit_percentage: 80<br>        spike_limit_percentage: 15<br>      batch:<br>        send_batch_size: 1000<br>        timeout: 5s<br>    exporters:<br>      otlp/jaeger:<br>        endpoint: &quot;http://jaeger.observability.svc.cluster.local:4317&quot;  # export traces to jaeger<br>        tls:<br>          insecure: true<br>    service:<br>      pipelines:<br>        # collect trace data using otlp receiver and send it to jaeger<br>        traces:<br>          receivers: [otlp]<br>          processors: [memory_limiter, batch]<br>          exporters: [otlp/jaeger]</pre><p>Apply the updated ConfigMap:</p><pre>kubectl -n observability apply -f otel-collector-agent-configmap.yaml</pre><p>Rollout the collector to pick up the new config:</p><pre>kubectl -n observability rollout restart deployment otel-collector-agent</pre><h3>Test the Setup</h3><p>Get the Endpoint IP from the K8s service:</p><pre>API_ENDPOINT_IP=$(kubectl -n mlapp get svc -l app=house-price-service -o json | jq -r &#39;.items[].spec.clusterIP&#39;)</pre><p>Test it locally using curl or Postman:</p><pre>curl -X POST &quot;http://${API_ENDPOINT_IP}:80/predict/&quot; \<br>  -H &quot;Content-Type: application/json&quot; \<br>  -d &#39;{&quot;features&quot;: [1200]}&#39;</pre><h3>View Traces in Jaeger UI</h3><p>Open Jaeger UI in your browser.</p><ul><li>Select the house-price-service</li><li>Hit <strong>Find Traces</strong></li><li>Voilà! You can now <strong>trace requests</strong>, <strong>view span timings</strong>, and <strong>debug latency</strong> in style.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*FET85nLZ-cHeuX9o3CG42A.png" /></figure><h4><a href="https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-6-tracking-metrics-with-prometheus-and-6ba9e4ca9442">Up Next</a>: From Spans to Stats — Let’s Talk Metrics</h4><p>Now that Jaeger is live and humming — collecting traces and giving us deep insights into our application’s behavior — it’s time to turn our attention to the second pillar of observability: <strong>metrics</strong>.</p><p>Stay tuned as we wire up Prometheus and bring <strong>metrics into the mix</strong>, completing another piece of our observability blueprint.</p><pre>{<br>    &quot;author&quot;   :  &quot;Kartik Dudeja&quot;,<br>    &quot;email&quot;    :  &quot;kartikdudeja21@gmail.com&quot;,<br>    &quot;linkedin&quot; :  &quot;https://linkedin.com/in/kartik-dudeja&quot;,<br>    &quot;github&quot;   :  &quot;https://github.com/Kartikdudeja&quot;<br>}</pre><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=9891b1839e32" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[OpenTelemetry in Action on Kubernetes: Part 4 — Deploying OpenTelemetry Collector (Agent Mode)…]]></title>
            <link>https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-4-deploying-opentelemetry-collector-agent-mode-df1b060b3a01?source=rss-14d13224d533------2</link>
            <guid isPermaLink="false">https://medium.com/p/df1b060b3a01</guid>
            <category><![CDATA[opentelemetry]]></category>
            <category><![CDATA[kubernetes]]></category>
            <category><![CDATA[observability]]></category>
            <dc:creator><![CDATA[Kartik Dudeja]]></dc:creator>
            <pubDate>Mon, 07 Jul 2025 03:13:40 GMT</pubDate>
            <atom:updated>2025-08-30T09:12:25.802Z</atom:updated>
            <content:encoded><![CDATA[<h3>OpenTelemetry in Action on Kubernetes: Part 4 — Deploying OpenTelemetry Collector (Agent Mode) in Kubernetes</h3><p>Welcome back, observability artisans! So far in our series:</p><ul><li>We trained a simple ML model and wrapped it in a FastAPI app (<a href="https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-1-building-a-simple-ml-app-with-fastapi-626ec2fb818d">Part 1</a>).</li><li>We instrumented it with OpenTelemetry to emit traces, metrics, and logs (<a href="https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-2-instrument-and-dockerizing-15df04e2319a">Part 2</a>).</li><li>We dockerized and deployed the app in Kubernetes (<a href="https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-3-deploying-the-application-on-kubernetes-5cba4bf9d52a">Part 3</a>).</li></ul><p>Now it’s time to <strong>build the telemetry pipeline</strong> by deploying the <strong>OpenTelemetry Collector</strong> in <strong>agent mode</strong>. Think of it as your app’s personal observability sidekick — sitting beside your pod, collecting and forwarding telemetry like a seasoned ops ninja.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/962/1*9NkJmDXB8KkZz4DUu3Bv8Q.png" /></figure><h3>What Is the OpenTelemetry Collector?</h3><p>The <strong>OpenTelemetry Collector</strong> is a vendor-agnostic service that can receive, process, and export telemetry data (metrics, logs, and traces). It acts like a modular observability router.</p><p>In <strong>agent mode</strong>, it’s typically deployed as a <strong>DaemonSet</strong>, meaning one collector pod runs on each node — perfect for scraping local app telemetry.</p><h3>The Collector Pipeline — A Three-Stage Flow</h3><p>The pipeline is made up of:</p><h3>1. Receivers</h3><p>These are the collectors’ “ears.” They listen for telemetry data from your app.<br><strong>Example:</strong> OTLP receiver listens on port 4317 for gRPC telemetry.</p><p><em>Analogy:</em> Like a parcel dropbox at the post office — it accepts incoming packages (telemetry).</p><h3>2. Processors</h3><p>Processors act like post-office sorters — they batch, sample, or modify telemetry before export.<br><strong>Example:</strong> Batching to reduce load or adding attributes to spans.</p><p><em>Analogy:</em> Sorting parcels by zip code before shipping.</p><h3>3. Exporters</h3><p>Exporters are your delivery trucks. They ship telemetry off to destinations like Prometheus, Jaeger, or Loki.</p><p><em>Analogy:</em> The final delivery van that takes your parcel to your house.</p><h3>Configuration in Kubernetes: The ConfigMap</h3><p>We store our OpenTelemetry pipeline config in a Kubernetes <strong>ConfigMap</strong> — a way to inject config data into pods as files or environment variables.</p><h3>Step-by-Step: Deploying Otel Collector (Agent)</h3><p>We’ll deploy three components:</p><h3>1. ConfigMap (Collector Pipeline)</h3><pre># otel-collector-agent-configmap.yaml<br>apiVersion: v1<br>kind: ConfigMap<br>metadata:<br>  name: otel-collector-agent-config<br>  namespace: observability<br>data:<br>  otel-collector-config.yaml: |<br>receivers:<br>      otlp:<br>        protocols:<br>          grpc:<br>    processors:      <br>      batch:<br>    exporters:<br>      debug:<br>        verbosity: detailed<br>    service:<br>      pipelines:<br>        traces:<br>          receivers: [otlp]<br>          processors: [batch]<br>          exporters: [debug]</pre><blockquote>This simple pipeline receives traces using OTLP (gRPC), batches them, and prints them to stdout using a debug exporter. We’ll replace this with Jaeger later in Part 5.</blockquote><h4>Deep Dive: Key Components of the OpenTelemetry Pipeline</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*A4eniMI1t_xlmfZeaYUn1w.png" /></figure><p>Receiver: otlp with grpc Protocol</p><pre>receivers:<br>  otlp:<br>    protocols:<br>      grpc:</pre><p>What it does:</p><p>The <strong>receiver</strong> is the entry point into the Collector. In this case, we’re telling the collector to accept data over the <strong>OTLP (OpenTelemetry Protocol)</strong> using the <strong>gRPC</strong> transport.</p><ul><li><strong>OTLP</strong> is the default protocol for OpenTelemetry.</li><li><strong>gRPC</strong> is a high-performance, open-source RPC framework — it’s fast, efficient, and used widely in modern telemetry systems.</li></ul><h4>Processor: batch</h4><pre>processors:<br>  batch:</pre><p>What it does:</p><p>Processors manipulate or enhance telemetry <strong>after it’s received but before it’s exported</strong>.</p><p>The batch processor is highly recommended in most pipelines. It collects telemetry data in small batches and sends them together instead of one at a time. This improves performance and reduces resource usage.</p><p>Benefits:</p><ul><li>Reduces the number of outgoing requests.</li><li>Improves throughput by sending larger payloads.</li><li>Helps smooth out traffic spikes.</li></ul><h4>Exporter: debug</h4><pre>exporters:<br>  debug:<br>    verbosity: detailed</pre><p>What it does:</p><p>Exporters are responsible for sending telemetry to an external backend (e.g., Jaeger, Prometheus, Datadog).</p><p>In this case, we’re using the <strong>debug exporter</strong> — which doesn’t send data to an external system but <strong>prints it to stdout</strong>.</p><ul><li>verbosity: detailed means it will output detailed telemetry, including span names, attributes, and events.</li><li>This is great for <strong>local testing or debugging</strong>, but not suitable for production.</li></ul><h3>2. Deployment (DaemonSet — Agent Mode)</h3><pre># otel-collector-agent-daemonset.yaml<br>apiVersion: apps/v1<br>kind: DaemonSet<br>metadata:<br>  name: otel-collector-agent<br>  namespace: observability<br>spec:<br>  updateStrategy:<br>    type: RollingUpdate<br>    rollingUpdate:<br>      maxUnavailable: 1  # One pod at a time will be unavailable during update<br>  selector:<br>    matchLabels:<br>      app: otel-collector-agent<br>  template:<br>    metadata:<br>      labels:<br>        app: otel-collector-agent<br>    spec:<br>      containers:<br>        - name: otel-collector<br>          image: otel/opentelemetry-collector-contrib:latest<br>          args: [&quot;--config=/conf/otel-collector-config.yaml&quot;]<br>          resources:<br>            requests:<br>              cpu: 10m<br>              memory: 32Mi<br>            limits:<br>              cpu: 50m<br>              memory: 128Mi          <br>          volumeMounts:<br>            - name: config-volume<br>              mountPath: /conf<br>            - name: varlog<br>              mountPath: /var/log<br>      volumes:<br>        - name: config-volume<br>          configMap:<br>            name: otel-collector-agent-config<br>        - name: varlog<br>          hostPath:<br>            path: /var/log</pre><h3>3. Service (Internal Communication)</h3><pre># otel-collector-agent-service.yaml<br>apiVersion: v1<br>kind: Service<br>metadata:<br>  name: otel-collector-agent<br>  labels:<br>    app: otel-collector-agent<br>spec:<br>  selector:<br>    app: otel-collector-agent<br>  ports:<br>    - name: otlp-grpc<br>      port: 4317<br>      targetPort: 4317<br>      protocol: TCP<br>    - name: otlp-http<br>      port: 4318<br>      targetPort: 4318<br>      protocol: TCP<br>    - name: prometheus<br>      port: 8889<br>      targetPort: 8889<br>      protocol: TCP    <br>  type: ClusterIP</pre><h3>Deploying with kubectl</h3><pre># create a new namespace<br>kubectl create namespace observability</pre><p>Deploy the Collector:</p><pre>kubectl -n observability apply -f otel-collector-agent-configmap.yaml -f otel-collector-agent-service.yaml -f otel-collector-agent-daemonset.yaml</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Wr1y6m7nMkag64JRBn2gWQ.png" /></figure><p>To check the status:</p><pre>kubectl -n observability get all -l app=otel-collector-agent</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*JG0DWQv-VU6Z3qVlDiUiuA.png" /></figure><h3>Updating the App Deployment</h3><p>We now need to <strong>add the OTLP endpoint to the app</strong> as an environment variable.</p><p>When you create a Kubernetes <strong>Service</strong>, it gets a DNS name like this:</p><pre>&lt;service-name&gt;.&lt;namespace&gt;.svc.cluster.local</pre><p>So our service name otel-collector-agent in namespace observability is reachable at:</p><pre>otel-collector-agent.observability.svc.cluster.local:4317</pre><p>Magic, courtesy of Kubernetes DNS.</p><p>Adding the OTEL_EXPORTER_OTLP_ENDPOINT environment variable in your application deployment tells the OpenTelemetry SDK where to send telemetry data (traces, metrics, and logs). This line effectively connects your instrumented app to the OpenTelemetry Collector, acting as the central receiver and router for all observability signals within the Kubernetes environment.</p><pre>env:<br>  - name: OTEL_EXPORTER_OTLP_ENDPOINT<br>    value: &quot;http://otel-collector-agent.observability.svc.cluster.local:4317&quot;<br>  - name: OTEL_EXPORTER_OTLP_INSECURE<br>    value: &quot;true&quot;</pre><p>You’ll insert this under the container spec in your <strong>house-price-app.yaml</strong>.</p><p>After adding the above config, we will have to apply new changes to Application Deployment:</p><pre>kubectl -n mlapp apply -f house-price-app.yaml</pre><p>You can check the deployment rollout status with the following command:</p><pre>kubectl -n mlapp rollout status deployment house-price-service</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*0V5Lbf8HzJxaUD_vmB-I8A.png" /></figure><h3>What’s Next?</h3><p>In <a href="https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-5-tracing-the-lines-sending-spans-from-app-to-9891b1839e32">Part 5</a>, we’ll deploy <strong>Jaeger</strong>, a UI-based distributed tracing tool, and rewire our OTEL pipeline to send trace data there. You’ll get to <em>see</em> spans, visualize your API behavior, and debug latency like a real tracing wizard.</p><pre>{<br>    &quot;author&quot;   :  &quot;Kartik Dudeja&quot;,<br>    &quot;email&quot;    :  &quot;kartikdudeja21@gmail.com&quot;,<br>    &quot;linkedin&quot; :  &quot;https://linkedin.com/in/kartik-dudeja&quot;,<br>    &quot;github&quot;   :  &quot;https://github.com/Kartikdudeja&quot;<br>}</pre><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=df1b060b3a01" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[OpenTelemetry in Action on Kubernetes: Part 3 — Deploying the Application on Kubernetes]]></title>
            <link>https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-3-deploying-the-application-on-kubernetes-5cba4bf9d52a?source=rss-14d13224d533------2</link>
            <guid isPermaLink="false">https://medium.com/p/5cba4bf9d52a</guid>
            <category><![CDATA[observability]]></category>
            <category><![CDATA[opentelemetry]]></category>
            <category><![CDATA[kubernetes]]></category>
            <dc:creator><![CDATA[Kartik Dudeja]]></dc:creator>
            <pubDate>Sat, 28 Jun 2025 13:40:06 GMT</pubDate>
            <atom:updated>2025-08-30T09:11:36.228Z</atom:updated>
            <content:encoded><![CDATA[<h3>OpenTelemetry in Action on Kubernetes: Part 3 — Deploying the Application on Kubernetes</h3><h3>Deploying Our Instrumented ML App to Kubernetes</h3><p>Welcome to Part 3! If you’ve followed along so far, by the end of <a href="https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-2-instrument-and-dockerizing-15df04e2319a">Part 2</a> you had:</p><ul><li>A FastAPI-based machine learning app</li><li>Instrumented with OpenTelemetry for full-stack observability</li><li>Dockerized and ready to ship</li></ul><p>Now, it’s time to <strong>bring in the big orchestration guns — Kubernetes</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/962/1*9NkJmDXB8KkZz4DUu3Bv8Q.png" /></figure><h3>Understanding Kubernetes Deployment &amp; Service</h3><p>Before we throw YAML at a cluster, let’s understand what these two crucial building blocks do:</p><h3>Deployment</h3><p>A <strong>Deployment</strong> in Kubernetes manages a set of replicas (identical Pods running our app). It provides:</p><ul><li><strong>Declarative updates</strong>: You describe <em>what</em> you want, K8s makes it so.</li><li><strong>Rolling updates</strong>: Smooth upgrades without downtime.</li><li><strong>Self-healing</strong>: If a Pod dies, K8s spins up a new one.</li></ul><p>Think of it as a smart manager for your app’s pods.</p><h3>Service</h3><p>A <strong>Service</strong> exposes your app inside the cluster (or externally, if needed). It:</p><ul><li>Provides a stable DNS name.</li><li>Load balances traffic between pods.</li><li>In our case, exposes:</li><li>Port 80 → App port 8000 (FastAPI HTTP)</li><li>Port 4317 → OTLP gRPC (Telemetry)</li></ul><h3>Kubernetes Manifest Breakdown</h3><p>Let’s break down the configuration:</p><h3>Deployment: house-price-service</h3><pre>apiVersion: apps/v1<br>kind: Deployment<br>metadata:<br>  name: house-price-service</pre><p>We declare a Deployment that manages our app.</p><pre>spec:<br>  replicas: 2</pre><p>We want <strong>2 replicas</strong> of our app running — high availability for the win.</p><pre>strategy:<br>    type: RollingUpdate<br>    rollingUpdate:<br>      maxSurge: 25%<br>      maxUnavailable: 25%</pre><p>Kubernetes will update pods <em>gracefully</em>. It allows some extra pods during rollout and ensures some stay alive.</p><pre>containers:<br>        - name: app<br>          image: house-price-predictor:v2</pre><p>We use the Docker image built in Part 2, deployed as a container.</p><pre>ports:<br>            - containerPort: 8000   # App port<br>            - containerPort: 4317   # OTLP telemetry port</pre><p>Complete Deployment Manifest:</p><pre>apiVersion: apps/v1<br>kind: Deployment<br>metadata:<br>  name: house-price-service<br>  labels:<br>    app: house-price-service<br>spec:<br>  replicas: 2<br>  revisionHistoryLimit: 3<br>  strategy:<br>    type: RollingUpdate<br>    rollingUpdate:<br>      maxSurge: 25%           # Allow 25% more pods than desired during update<br>      maxUnavailable: 25%     # Allow 25% of desired pods to be unavailable during update<br>  selector:<br>    matchLabels:<br>      app: house-price-service<br>  template:<br>    metadata:<br>      labels:<br>        app: house-price-service<br>    spec:<br>      containers:<br>        - name: app<br>          image: house-price-predictor:v2<br>          imagePullPolicy: IfNotPresent<br>          resources:<br>            requests:<br>              cpu: &quot;10m&quot;<br>              memory: &quot;128Mi&quot;<br>            limits:<br>              cpu: &quot;20m&quot;<br>              memory: &quot;256Mi&quot;<br>          ports:<br>            - containerPort: 8000   # Application Port<br>            - containerPort: 4317   # OTLP gRPC Port</pre><h3>Service: house-price-service</h3><pre>apiVersion: v1<br>kind: Service<br>metadata:<br>  name: house-price-service<br>  labels:<br>    app: house-price-service</pre><p>This <strong>ClusterIP</strong> Service lets other K8s workloads communicate with our app.</p><pre>ports:<br>    - port: 80<br>      targetPort: 8000<br>    - port: 4317<br>      targetPort: 4317</pre><p>The Service maps:</p><ul><li>Port 80 → App HTTP server</li><li>Port 4317 → For OTLP spans, metrics, logs</li></ul><p>Complete Service Manifest File:</p><pre>apiVersion: v1<br>kind: Service<br>metadata:<br>  name: house-price-service<br>  labels:<br>    app: house-price-service  <br>spec:<br>  selector:<br>    app: house-price-service<br>  ports:<br>    - name: http<br>      protocol: TCP<br>      port: 80<br>      targetPort: 8000<br>    - name: otlp-grpc<br>      protocol: TCP<br>      port: 4317<br>      targetPort: 4317<br>  type: ClusterIP</pre><p>Add both in the one file: house-price-app.yaml</p><h3>Deploying with kubectl</h3><p>Before deploying the app, let’s create a Kubernetes namespace. This helps group related resources together.</p><pre>kubectl create namespace mlapp</pre><p>Run the following to deploy your app:</p><pre>kubectl -n mlapp apply -f house-price-app.yaml</pre><p>To check the deployment status:</p><pre>kubectl -n mlapp get deployments<br>kubectl -n mlapp get pods</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/948/1*VzetOdbRM9GYG7lYMuO3ow.png" /></figure><p>To see pod logs (structured JSON + OpenTelemetry info):</p><pre>kubectl -n mlapp logs -f -l app=house-price-service</pre><p>To view the exposed service:</p><pre>kubectl -n mlapp get svc -l app=house-price-service</pre><h3>Testing the App in Kubernetes</h3><p>Get the Endpoint IP from the K8s service:</p><pre>API_ENDPOINT_IP=$(kubectl -n mlapp get svc -l app=house-price-service -o json | jq -r &#39;.items[].spec.clusterIP&#39;)</pre><p>Test it locally using curl or Postman:</p><pre>curl -X POST &quot;http://${API_ENDPOINT_IP}:80/predict/&quot; \<br>  -H &quot;Content-Type: application/json&quot; \<br>  -d &#39;{&quot;features&quot;: [1200]}&#39;</pre><p>You should get a prediction response like:</p><pre>{&quot;predicted_price&quot;: 170000.0}</pre><p>And voilà — telemetry data is flowing.</p><h3>What’s Next: Meet the OpenTelemetry Collector</h3><p>In <a href="https://medium.com/@kartikdudeja21/opentelemetry-in-action-on-kubernetes-part-4-deploying-opentelemetry-collector-agent-mode-df1b060b3a01">Part 4</a>, we’ll introduce the <strong>OpenTelemetry Collector Agent</strong>:</p><ul><li>Deploy it as a DaemonSet alongside your app</li><li>Configure it to collect traces, metrics, and logs</li><li>Route the data to a gateway, and onward to backends like Prometheus, Jaeger, and Loki</li></ul><blockquote><em>TL;DR: It’s where the real observability magic begins.</em></blockquote><pre>{<br>    &quot;author&quot;   :  &quot;Kartik Dudeja&quot;,<br>    &quot;email&quot;    :  &quot;kartikdudeja21@gmail.com&quot;,<br>    &quot;linkedin&quot; :  &quot;https://linkedin.com/in/kartik-dudeja&quot;,<br>    &quot;github&quot;   :  &quot;https://github.com/Kartikdudeja&quot;<br>}</pre><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=5cba4bf9d52a" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>