<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Suyash Jain on Medium]]></title>
        <description><![CDATA[Stories by Suyash Jain on Medium]]></description>
        <link>https://medium.com/@jainsuyash2003?source=rss-e3e2d6a0739b------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*H7nc0u519y9P3AUD5-Fz9Q.png</url>
            <title>Stories by Suyash Jain on Medium</title>
            <link>https://medium.com/@jainsuyash2003?source=rss-e3e2d6a0739b------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sat, 30 May 2026 07:54:59 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@jainsuyash2003/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[From 17 Hours to 5 Minutes: Engineering a Production-Grade Reports Microservice]]></title>
            <link>https://medium.com/@jainsuyash2003/from-17-hours-to-5-minutes-engineering-a-production-grade-reports-microservice-b6c79ccd1def?source=rss-e3e2d6a0739b------2</link>
            <guid isPermaLink="false">https://medium.com/p/b6c79ccd1def</guid>
            <category><![CDATA[microservices]]></category>
            <category><![CDATA[software-development]]></category>
            <category><![CDATA[optimisation]]></category>
            <category><![CDATA[reporting]]></category>
            <dc:creator><![CDATA[Suyash Jain]]></dc:creator>
            <pubDate>Thu, 28 May 2026 06:12:57 GMT</pubDate>
            <atom:updated>2026-05-28T06:12:57.456Z</atom:updated>
            <content:encoded><![CDATA[<p><em>A software engineer’s honest account of inheriting a broken reporting system, diagnosing it from first principles, and rebuilding it into something that actually works.</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*r5UoSUkhb2QNszsRYVJuZw.png" /></figure><h3><strong>The Monday Morning Crisis</strong></h3><p>It was a routine Monday when the operations team flagged it first — the nightly reports hadn’t arrived. Again.</p><p>The system was supposed to generate 19 detailed operational reports every day, each aggregating data across 22+ MongoDB collections, powering dashboards that C-level stakeholders used to track field technician performance, work order completion rates, equipment installations, and service delays — nationwide. When a report failed or took too long, blind spots opened up across the entire operation.</p><p>I was six months into working on this Field Service Management platform when I inherited the reporting microservice. It was taking <strong>17+ hours</strong> to generate a single report cycle. The service crashed intermittently. Reports sometimes generated twice. And no one quite knew why.</p><p>This is the story of how I diagnosed, redesigned, and optimised that service — and every lesson that came from it.</p><p><strong>Understanding the Beast First</strong></p><p>Before touching a single line of code, I mapped out what the service was actually doing.</p><p>The reports were built on MongoDB aggregation pipelines — sequences of transformation stages that joined data from 22+ collections: work orders, assignments, technicians, contractors, inventory utilization, serialised equipment, territories, queues, delay reasons, appointment details, and more. Some collections held upward of <strong>7 million records</strong>. Every night, the pipeline would start from the primary work orders collection and sequentially join, filter, transform, and project data across all of them.</p><p>The result was a 19-column, deeply enriched flat report that operations teams could open in Excel and act on. Simple concept. Catastrophic in execution.</p><p>Here’s what was wrong — and how each problem was fixed.</p><h3><strong>The Database Was Flying Blind: Indexing</strong></h3><p>The single biggest performance gain — reducing report generation from <strong>17+ hours to under 5 minutes</strong> — came not from clever code, but from something foundational: <strong>indexes</strong>.</p><p>MongoDB, like any database, performs a full collection scan when it has no index to guide it. On a collection with 7 million records, every $match, every $lookup join, every filter condition was scanning millions of documents to find the handful it needed. Multiply that across 22 collections and you understand how 17 hours happens.</p><p>The fix was systematic indexing across every collection involved in the pipelines.</p><pre>// Single-field indexes for frequent filter &amp; sort fields<br>JobSchema.index({ isDeleted: 1, jobReferenceId: 1 });<br>JobSchema.index({ createdAt: -1 });<br>JobSchema.index({ status: 1 });<br><br>// Compound index matching the exact $match condition at the pipeline entry point<br>JobSchema.index({ status: 1, requiresDispatch: 1, origin: 1 });<br><br>// Indexes for every $lookup foreign key<br>JobSchema.index({ jobReferenceId: 1 });<br>JobSchema.index({ regionId: 1 });<br>JobSchema.index({ zoneId: 1 });<br>JobSchema.index({ facilityId: 1 });<br>JobSchema.index({ contractorId: 1 });<br><br>// Compound indexes for better performance on filtered lookups<br>JobSchema.index({ isDeleted: 1, status: 1 });</pre><p><strong>Why compound indexes matter here:</strong> MongoDB can use a compound index for queries that match a prefix of the index fields. The { status: 1, requiresDispatch: 1, origin: 1 } index directly services the opening $match stage of every pipeline, which filters on all three fields simultaneously. Without it, MongoDB scans the entire collection just to pick the eligible work orders.</p><p>The same logic applied to the equipment collections. A { serialNumber: 1 } index on the serialised equipment collection — 7 million records — turned every $lookup from a full collection scan into an index seek taking microseconds instead of seconds.</p><p><strong>The indexing rule of thumb:</strong> Every field that appears in a $match, a $lookup&#39;s foreignField, or a $sort in a frequently-run pipeline should have an index. The cost is write-time overhead and storage. On read-heavy reporting workloads, that trade is almost always worth it.</p><h3><strong>Filter Early, Compute Less: </strong><strong>$match at the Top</strong></h3><p>MongoDB’s aggregation pipeline processes documents stage by stage. Every document that enters a stage must be processed by it. The implication is simple but critical: <strong>reduce your dataset as early as possible.</strong></p><p>In the original pipelines, filter conditions — status checks, soft-delete filters, null guards — were scattered throughout the pipeline, sometimes only applied several stages deep after expensive $lookup joins had already bloated the working set.</p><p>The fix was mechanical but impactful: every $match, date range filter, and exclusion condition was hoisted to the very top of the pipeline.</p><pre>// Stage 1: Filter FIRST — before any joins happen<br>{<br>  $match: {<br>    updatedAt: { $gte: new Date(start), $lt: new Date(end) },<br>    status: { $ne: &#39;PENDING&#39; },<br>    requiresDispatch: true,<br>    isDeleted: { $ne: true },<br>    jobReferenceId: { $exists: true, $ne: null }<br>  }<br>}<br>// Only THEN do lookups, unwinds, groups...</pre><p>This isn’t just about speed — it’s about respect for your database’s memory budget. MongoDB’s aggregation pipeline has a 100MB memory limit per stage unless allowDiskUse is enabled. Starting with 500,000 work orders and joining them to 7 million equipment records in-memory is a path to crashes. Starting with 3,000 filtered work orders is manageable.</p><p>Additionally, $match stages at the top of a pipeline can leverage indexes. Move them down, and MongoDB may no longer be able to use the index at all.</p><p>Similarly, within each $lookup sub-pipeline, match conditions were pushed as early as possible and only the necessary fields were projected:</p><pre>{<br>  $lookup: {<br>    from: &#39;DeviceInventory&#39;,<br>    localField: &#39;serialNumber&#39;,<br>    foreignField: &#39;serialNumber&#39;,<br>    as: &#39;deviceDetails&#39;,<br>    pipeline: [<br>      { $match: { $expr: { $and: [<br>        { $eq: [&#39;$isDeleted&#39;, false] },<br>        { $ne: [&#39;$serialNumber&#39;, null] }<br>      ]}}},<br>      { $project: { _id: 1, serialNumber: 1, catalogCode: 1 }} // only what you need<br>    ]<br>  }<br>}</pre><p>The $project inside each lookup sub-pipeline is often overlooked. Without it, every matched document from a 7-million-record collection carries its full payload through the rest of the pipeline. With it, only three fields travel through memory per document.</p><h3>Breaking the Monolith Pipeline: Two-Stage Architecture with a Temp Collection</h3><p>One of the most architecturally significant changes was breaking each report’s single monolithic pipeline into two discrete stages, with a temporary MongoDB collection acting as a materialised checkpoint between them.</p><p><strong>Stage 1</strong> handled the heavy lifting: the initial $match, equipment joins, deduplication, and projection of only the fields needed for Stage 2. Results were written directly into a temporary collection using MongoDB&#39;s $merge operator.</p><p><strong>Stage 2</strong> read from that temporary collection — a much smaller, pre-filtered dataset — to perform enrichment lookups (customer details, address, appointment, technician, geography) and produce the final projection.</p><pre>// Stage 1: filter + equipment joins → write to temp collection<br>await this.aggregateQuery(&#39;JobOrders&#39;, stage1Pipeline);<br>// stage1Pipeline ends with:<br>// { $merge: { into: &#39;JobReportStaging&#39;, whenMatched: &#39;replace&#39;, whenNotMatched: &#39;insert&#39; } }<br><br>// Stage 2: enrich from temp collection → return final data<br>const results = await this.aggregateQuery(&#39;JobReportStaging&#39;, stage2Pipeline);</pre><p>Why does this matter so much? In a single-stage pipeline, every intermediate document — fully joined with equipment data — had to be kept in memory simultaneously while enrichment lookups were being performed. The working set was enormous.</p><p>With the two-stage split, Stage 2 operates on a compact, pre-resolved dataset. The enrichment lookups in Stage 2 join against much smaller reference collections (technicians, contractors, regions) and the base documents themselves are lean, containing only the fields that Stage 1 explicitly projected.</p><p>This is essentially <strong>materialised view thinking</strong> applied to a pipeline context. You compute the expensive part once, persist it, and then run lighter operations on the result. The temp collection also provides a natural debugging checkpoint — if Stage 2 fails, you can inspect exactly what Stage 1 produced without re-running the expensive first half.</p><p>Cleanup is handled deterministically — before execution starts and after it ends (including on error), ensuring the temp collection never accumulates stale data:</p><pre>async execute(): Promise&lt;ReportRow[]&gt; {<br>  await this.cleanupStagingCollection(); // clean before<br>  try {<br>    await this.runStage1();<br>    const results = await this.runStage2();<br>    await this.cleanupStagingCollection(); // clean after success<br>    return results;<br>  } catch (error) {<br>    await this.cleanupStagingCollection(); // clean on failure too<br>    throw error;<br>  }<br>}</pre><h3>When Memory Is the Enemy: Batch Processing</h3><p>Even with the two-stage architecture, Stage 2 of certain reports — particularly those enriching work orders with deeply nested documents — was still capable of exhausting the Node.js heap. The documents were large. There were tens of thousands of them. Processing them all at once was simply not viable.</p><p>The solution was batch processing: slicing the staging collection into chunks of 2,000 records and running the enrichment pipeline independently on each chunk.</p><pre>const BATCH_SIZE = 2000;<br>const totalCount = await StagingModel.countDocuments({});<br>let processedCount = 0;<br>const allResults: ReportRow[] = [];<br><br>while (processedCount &lt; totalCount) {<br>  const batchPipeline: PipelineStage[] = [<br>    { $skip: processedCount },<br>    { $limit: BATCH_SIZE },<br>    ...this.buildEnrichmentPipeline()<br>  ];<br><br>  const batchResults = await this.aggregateQuery(&#39;JobReportStaging&#39;, batchPipeline);<br>  allResults.push(...batchResults);<br>  processedCount += BATCH_SIZE;<br><br>  // Yield to GC between batches<br>  if (global.gc) global.gc();<br>  await new Promise(resolve =&gt; setTimeout(resolve, 500));<br>}</pre><p>The 500ms pause between batches isn’t arbitrary slowness — it’s a deliberate yield to the garbage collector, allowing V8 to reclaim memory from the previous batch before the next one lands. On large reports, without this pause, memory pressure accumulates batch by batch until the process OOMs.</p><p>The batch size of 2,000 was calibrated empirically. Too large and memory spiked. Too small and the overhead of repeated pipeline setup and round-trips to MongoDB slowed things down. 2,000 hit the sweet spot for the document sizes in this system.</p><h3>The Theoretical Ceiling: Streaming</h3><p>Batching solved the memory problem for most reports. But for the heaviest ones — work order collections where individual documents carried large embedded arrays (workflow history, notes, configuration details) — even a batch of 2,000 could produce a memory spike.</p><p>The architecturally ideal solution for such cases is <strong>streaming</strong>: processing one document at a time as it flows out of MongoDB, transforming it, and writing it to the output (file, response stream, etc.) without ever holding the full dataset in memory.</p><pre>import { pipeline } from &#39;stream/promises&#39;;<br>import { Transform } from &#39;stream&#39;;<br><br>const cursor = JobOrderModel.aggregate(stage1Pipeline).cursor({ batchSize: 100 });<br><br>const transformStream = new Transform({<br>  objectMode: true,<br>  transform(doc, _encoding, callback) {<br>    const mapped = mapToReportRow(doc);<br>    this.push(mapped);<br>    callback();<br>  }<br>});<br><br>const writableStream = fs.createWriteStream(&#39;report-output.csv&#39;);<br><br>await pipeline(cursor, transformStream, writableStream);</pre><p>With cursor-based streaming, MongoDB sends documents to your application in small batches (batchSize: 100), your transform function processes each one and emits a row, and the write stream flushes it to output — all without the full dataset ever living simultaneously in memory. Peak memory usage becomes a function of batch size, not total record count.</p><p>For a reporting service generating CSV or Excel output, streaming is the gold standard approach. It decouples throughput from memory capacity entirely.</p><h3><strong>The Deduplication Redesign: Eliminating </strong><strong>$sort from the Critical Path</strong></h3><p>This one deserves careful attention because it’s both a correctness improvement and a significant performance win.</p><p>The original pipeline faced a challenging deduplication problem. Each job could have multiple equipment items attached to it (a modem, a SIM card, a telephone set, a voice SIM). After joining with the equipment collections, the pipeline would produce one document per equipment item per job — meaning a job with four items appeared four times in the working set.</p><p>The original solution was to assign each row a priority score based on the quality of its equipment data, sort by that priority, and then group by job ID, taking the first (highest priority) row:</p><pre>// Old approach<br>assignEquipmentPriorityStage(),         // adds a numeric priority field<br>{ $sort: { equipmentPriority: 1 } },    // sorts entire working set<br>{ $group: { _id: &#39;$jobReferenceId&#39;, doc: { $first: &#39;$$ROOT&#39; } } }  // keep best row</pre><p>The fundamental problem: $sort on a large, unbounded working set is one of the most expensive operations in MongoDB&#39;s aggregation framework. It requires loading the entire intermediate dataset into memory (or spilling to disk), sorting it, and then discarding most of it. You&#39;re paying the full cost of sorting tens of thousands of rows just to pick one per group.</p><p>Worse, this approach had a correctness flaw: by picking a single “best” equipment row per job, you inevitably lost data. A job with both a modem and a telephone set would only surface one of them in the report. The other equipment was silently dropped.</p><p>The redesign eliminated $sort entirely and fixed the data loss simultaneously, using a two-pass grouping strategy:</p><pre>// Pass 1: group by (jobReferenceId + deviceCategory) — one row per equipment type per job<br>{<br>  $group: {<br>    _id: { jobReferenceId: &#39;$jobReferenceId&#39;, deviceCategory: &#39;$deviceCatalog.deviceCategory&#39; },<br>    jobRecord: { $first: &#39;$$ROOT&#39; },<br>    equipment: { $first: {<br>      serialNumber: &#39;$deviceInventory.serialNumber&#39;,<br>      catalogCode: &#39;$deviceInventory.catalogCode&#39;,<br>      deviceCategory: &#39;$deviceCatalog.deviceCategory&#39;<br>    }}<br>  }<br>}<br><br>// Pass 2: group by jobReferenceId — collect all equipment types into an array<br>{<br>  $group: {<br>    _id: &#39;$_id.jobReferenceId&#39;,<br>    doc: { $first: &#39;$jobRecord&#39; },<br>    allEquipment: { $push: { deviceCategory: &#39;$_id.deviceCategory&#39;, data: &#39;$equipment&#39; } }<br>  }<br>}<br><br>// Then extract each type by filtering the collected array<br>{<br>  $addFields: {<br>    &#39;doc.modemDevice&#39;: {<br>      $arrayElemAt: [<br>        { $map: {<br>          input: { $filter: { input: &#39;$allEquipment&#39;, cond: { $eq: [&#39;$$this.deviceCategory&#39;, &#39;Modem&#39;] } } },<br>          in: &#39;$$this.data&#39;<br>        }},<br>        0<br>      ]<br>    },<br>    &#39;doc.simDevice&#39;: { /* same pattern for SIM, Handset, Voice SIM */ }<br>  }<br>}</pre><p>Instead of sorting and discarding, the pipeline now explicitly collects one representative row per equipment type per job in the first group, then assembles all equipment types together in the second group. The final $addFields stage simply filters the collected array by category to extract each equipment field.</p><p>No sort. No data loss. Each job in the output now correctly surfaces its modem, its SIM, its handset, and its voice SIM as distinct fields — because all four were preserved through the two-pass group, not collapsed away by a sort-and-take-first.</p><h3>From Server Crons to Infrastructure Crons: The Architecture Mistake</h3><p>Perhaps the most consequential design flaw in the original service wasn’t in the pipeline code at all — it was in the scheduling layer.</p><p>The reports ran on a schedule: one at 3 AM, one at 11:30 PM, others at various intervals. The original developers had implemented these schedules as <strong>in-process cron jobs</strong> — setInterval or node-cron timers living inside the Node.js process itself.</p><p>The consequences were severe and interrelated:</p><p>The microservice had to run <strong>24 hours a day, 7 days a week</strong> — consuming CPU and memory allocations continuously — just to fire a cron job for 20 minutes a day. The allocated resources were sized for the report generation workload (which was intensive) but those resources sat idle for 23+ hours daily.</p><p>When the report cron fired and demanded more compute, Kubernetes’ autoscaler would spin up additional pods to handle the load. Each pod had the in-memory cron timer running. So now multiple pods were all attempting to run the same report simultaneously, generating <strong>duplicate reports</strong>. This was the mysterious duplication bug. It wasn’t a code bug — it was an architectural one baked into the scheduling approach itself.</p><p>The fix was to move report scheduling entirely to <strong>Kubernetes CronJobs</strong>. Instead of a long-running service with internal timers, each report became a containerised job that Kubernetes would spin up at the scheduled time, run to completion, and terminate.</p><pre># Kubernetes CronJob: Run report at 3 AM daily<br>apiVersion: batch/v1<br>kind: CronJob<br>metadata:<br>  name: job-report-3am<br>spec:<br>  schedule: &quot;0 3 * * *&quot;<br>  concurrencyPolicy: Forbid   # The key: never run two instances simultaneously<br>  jobTemplate:<br>    spec:<br>      template:<br>        spec:<br>          containers:<br>          - name: report-runner<br>            image: report-service:latest<br>            command: [&quot;node&quot;, &quot;run-report.js&quot;, &quot;--report=3am&quot;]<br>          restartPolicy: OnFailure</pre><p>concurrencyPolicy: Forbid is the architectural guarantee that the old design could never provide: if a previous job is still running when the next scheduled time arrives, Kubernetes simply skips it rather than spawning a duplicate. Run-once, generate-once — guaranteed at the infrastructure level.</p><p>The broader principle: <strong>scheduling belongs to infrastructure, not application code</strong>. In-process schedulers create hidden state, prevent horizontal scaling, and couple availability requirements to task frequency. Offloading to a job scheduler (Kubernetes CronJobs, AWS EventBridge + Lambda, Cloud Scheduler + Cloud Run) separates these concerns cleanly.</p><h3><strong>Don’t Join What You Already Have</strong></h3><p>In complex aggregation pipelines, there’s a tendency to reach for $lookup reflexively — if you need data, join the collection that has it. But joins carry real cost: index lookups, document loading, memory allocation, pipeline execution overhead.</p><p>Before adding any new lookup stage, it’s worth asking: <strong>is this data actually unavailable in what I already have?</strong></p><p>In practice, several enrichment lookups in the original pipelines were unnecessary because the data already existed in an already-joined collection, just under a different field name. In other cases, derived data could be computed from existing fields — a display label from two existing string fields, a boolean from a status enum, a count from an array length.</p><pre>// Instead of joining another collection to get a technician&#39;s display name:<br>technicianDisplayName: {<br>  $concat: [<br>    { $ifNull: [&#39;$assignedAgent.firstName&#39;, &#39;&#39;] },<br>    &#39; &#39;,<br>    { $ifNull: [&#39;$assignedAgent.lastName&#39;, &#39;&#39;] }<br>  ]<br>}<br><br>// Instead of joining for record count when the array is already present:<br>numberOfSiteAddresses: {<br>  $cond: {<br>    if: { $isArray: &#39;$siteDetails.installationRecords&#39; },<br>    then: { $size: &#39;$siteDetails.installationRecords&#39; },<br>    else: 0<br>  }<br>}</pre><p>The discipline of auditing each lookup — asking “do I truly need a separate collection for this?” — trimmed several unnecessary joins from the pipelines. Each eliminated lookup is a round-trip to disk avoided, a sub-pipeline not executed, and a set of joined documents not held in memory.</p><h3>Observability: Structured Logging at Every Stage</h3><p>A pipeline that fails silently is worse than one that crashes loudly. In a system generating reports that C-level stakeholders depend on, knowing exactly where and why a failure occurred is non-negotiable.</p><p>Every stage of every report was wrapped with structured logging at entry, at checkpoints, and at failure:</p><pre>logger.info({<br>  functionName: &#39;DailyJobReport&#39;,<br>  message: &#39;Starting Stage 1 (filter + equipment processing)&#39;,<br>  data: { start, end }<br>});<br><br>// ... stage 1 runs ...<br><br>logger.info({<br>  functionName: &#39;DailyJobReport&#39;,<br>  message: &#39;Stage 1 complete, beginning Stage 2 enrichment&#39;<br>});<br><br>// ... stage 2 runs ...<br><br>logger.info({<br>  functionName: &#39;DailyJobReport&#39;,<br>  message: &#39;Pipeline complete&#39;,<br>  data: { resultCount: results.length }<br>});</pre><p>And on failure:</p><pre>logger.error({<br>  functionName: &#39;DailyJobReport&#39;,<br>  message: &#39;Pipeline execution failed&#39;,<br>  data: { error: error.message, stack: error.stack }<br>});</pre><p>Structured logging (objects, not strings) means your log aggregation system (Datadog, ELK, GCP Logging) can index and query on functionName, message, data.resultCount. When a report fails at 3 AM, you don&#39;t scroll through raw text — you query functionName = &quot;DailyJobReport&quot; AND level = &quot;error&quot; and the relevant entry surfaces immediately.</p><p>The resultCount in success logs is particularly useful: a report that completes successfully but returns 0 rows is a data pipeline issue, not a code crash. That distinction matters enormously when debugging report discrepancies.</p><h3>Time is Tricky: UTC in the Database, Local Time in the UI</h3><p>For a system generating reports consumed by operations teams in a specific timezone, time handling is a source of subtle, hard-to-debug data discrepancies.</p><p>The principle adopted was straightforward: <strong>the database stores everything in UTC, always.</strong> createdAt, updatedAt, appointment timestamps, delay timestamps — all UTC. The database is the single source of truth, and UTC ensures there&#39;s no ambiguity from daylight saving transitions or regional clock differences.</p><p>Time zone conversion happens exactly once, at the presentation layer — in the pipeline’s projection stage when formatting dates for the report output:</p><pre>// In the aggregation projection: convert UTC to local timezone for display<br>createdAt: formatDateInPipeline(&#39;$createdAt&#39;)<br>// formatDateInPipeline uses $dateToParts with a configured timezone<br>// then formats as MM/DD/YYYY HH:MM:SS AM/PM for the report consumer&#39;s locale</pre><p>The frontend, separately, reads timestamps from the API and formats them using the user’s browser timezone for real-time displays. Two consumers, two formats — but both working from the same UTC source.</p><p>The antipattern to avoid: storing timestamps in application-local time. This makes the database’s timestamps dependent on where the application runs, breaks if the application moves servers, and makes cross-timezone reporting impossible.</p><h3>Eliminating Single Points of Failure: Pipeline Independence</h3><p>The original architecture had a particularly fragile pattern: a single aggregation pipeline that queried all job orders — installations, repairs, and miscellaneous types — and then used in-memory JavaScript filtering afterward to split the results into three separate reports.</p><p>This meant one pipeline failure killed all three reports. It also meant the pipeline had to be sized for the union of all three workloads, carrying data for all types through every stage even when only one type needed certain lookups.</p><p>Each report type was separated into its own independent pipeline:</p><ul><li><strong>Installation Report</strong>: $match: { jobType: &#39;INSTALL&#39; } — optimised for install-specific fields</li><li><strong>Repair Report</strong>: $match: { jobType: &#39;REPAIR&#39; } — includes diagnostic codes, not relevant to installs</li><li><strong>General Report</strong>: $match: { jobType: { $nin: [&#39;INSTALL&#39;, &#39;REPAIR&#39;] } }</li></ul><p>This breaks the DRY (Don’t Repeat Yourself) principle in the narrow sense — some lookup stages appear in multiple pipelines. But it adheres to a more important systems principle: <strong>fault isolation</strong>. A data anomaly in repair job orders doesn’t stall the installation report. Each pipeline can be scheduled, monitored, and debugged independently.</p><p>Where genuine duplication is a concern, shared pipeline segments can be extracted as factory functions:</p><pre>function createCustomerLookupStages(collectionName: string): PipelineStage[] {<br>  return [<br>    createBaseLookupStage(collectionName, &#39;jobReferenceId&#39;, &#39;jobReferenceId&#39;, &#39;customerInfo&#39;),<br>    { $unwind: { path: &#39;$customerInfo&#39;, preserveNullAndEmptyArrays: true } }<br>  ];<br>}</pre><p>The factory function pattern gives you reuse at the composition level without coupling pipeline execution. Each report assembles its own pipeline from shared building blocks but runs it independently.</p><h3><strong>Beyond the Mapper: Smarter Data Shaping</strong></h3><p>After the aggregation pipeline completed, the raw output passed through a mapper function that translated MongoDB field names to report column headers. It worked, but it was a layer of indirection that added complexity and was easy to break silently — a field name change in the pipeline produced a blank column with no error.</p><pre>// The mapper pattern: verbose, fragile, hard to keep in sync<br>function jobReportMapper(data: Record&lt;string, any&gt;): ReportRow {<br>  return {<br>    &#39;Job Reference ID&#39;: data?.jobReferenceId ?? &#39;&#39;,<br>    &#39;Customer Name&#39;: escapeCommas(data?.customerFirstName ?? &#39;&#39;) + &#39; &#39; + ...,<br>    &#39;Status&#39;: data?.status ?? &#39;&#39;,<br>    // ...60 more fields<br>  };<br>}</pre><p>A more robust approach is to move the column naming into the aggregation projection itself, eliminating the mapper entirely:</p><pre>// Projection with final column names as field names<br>{<br>  $project: {<br>    &#39;Job Reference ID&#39;: &#39;$jobReferenceId&#39;,<br>    &#39;Customer Name&#39;: { $concat: [&#39;$customerFirstName&#39;, &#39; &#39;, &#39;$customerLastName&#39;] },<br>    &#39;Status&#39;: { $ifNull: [&#39;$status&#39;, &#39;&#39;] }<br>  }<br>}</pre><p>When the projection output IS the report schema, there’s no translation layer to go out of sync. The aggregation result can be streamed directly to CSV serialisation.</p><p>For systems where the pipeline and output schema genuinely need to be decoupled, a <strong>Zod schema or TypeScript type-safe mapper</strong> makes mismatches compile-time errors rather than silent runtime blanks:</p><pre>const ReportRowSchema = z.object({<br>  jobReferenceId: z.string(),<br>  status: z.string(),<br>  customerFirstName: z.string()<br>});<br><br>// Validation at the boundary: know immediately if pipeline output doesn&#39;t match schema<br>const validated = ReportRowSchema.safeParse(rawDoc);<br>if (!validated.success) {<br>  logger.error({ message: &#39;Schema mismatch in report output&#39;, errors: validated.error });<br>}</pre><h3>The Honest Conversation About Database Choice</h3><p>MongoDB was the right choice for this project given the constraints — the wider platform was already built on it, the client had existing infrastructure, and the flexible document model genuinely fit the deeply nested, variable-schema nature of field service job orders.</p><p>But for a reporting microservice specifically, it’s worth being direct: <strong>MongoDB is not the ideal database for analytical reporting workloads.</strong></p><p>If the reporting service had been designed from scratch in isolation, the honest recommendation would be:</p><p><strong>PostgreSQL with read replicas</strong> would have been a strong fit. The report schemas are ultimately flat tabular data. SQL’s declarative joins, window functions, and query planner are purpose-built for the kind of cross-table aggregations this service performs. Read replicas would separate reporting load from transactional load cleanly, preventing report queries from impacting application response times.</p><p><strong>BigQuery or a dedicated data warehouse</strong> would be the right answer at scale. For 19 daily reports aggregating millions of records, you want columnar storage (which BigQuery, Redshift, and Snowflake provide), separate compute for analytical queries, and a proper ETL pipeline moving data from the operational MongoDB into the warehouse on a schedule. The reporting service then becomes lightweight query execution against pre-optimised analytical storage, rather than a complex MongoDB aggregation marathon.</p><p><strong>InfluxDB or TimescaleDB</strong> would be specifically appropriate if the reports were primarily time-series in nature — monitoring technician response times over rolling windows, for example.</p><p>The pattern to aspire to is <strong>CQRS at the infrastructure level</strong>: the operational system (MongoDB, optimised for writes and transactional reads) feeds an analytical store (PostgreSQL read replica or BigQuery), and the reporting service reads exclusively from the analytical store. The two workloads never compete.</p><p>When you’re handed an existing stack, you optimise within it. But when you have the luxury of design, separate your OLTP and OLAP concerns from day one.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*TtimIi_QQbOOHHya5Xp5Cg.png" /><figcaption>The Numbers, in Summary</figcaption></figure><h3>What I’d Tell Myself at the Start</h3><p>Reporting microservices are deceptively complex. They look like read-only, low-stakes services — until the operations team can’t start their day because the overnight report didn’t generate, or it generated three times, or it generated but the equipment serial numbers are all blank.</p><p>The optimisations in this piece aren’t exotic. They’re disciplined application of fundamentals: index what you join, filter before you expand, isolate failures, let infrastructure handle scheduling, stream rather than batch rather than all-at-once.</p><p>The 17-hour pipeline wasn’t slow because MongoDB is slow or Node.js is slow. It was slow because no one had thought carefully about what the database needed to do its job. Given the right indexes, the right query structure, and the right execution model, the same stack that crawled for 17 hours finished in under 5 minutes.</p><p>The tools are rarely the problem. The thinking around them usually is.</p><p><em>If you want momentum, you’ll have to create it yourself, right now, by getting up and getting started.” -Ryan Holiday</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=b6c79ccd1def" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Upcoming Revolution in 3D Design: Computational Design for Creativity & BeeGraphy]]></title>
            <link>https://medium.com/@jainsuyash2003/upcoming-revolution-in-3d-design-computational-design-for-creativity-beegraphy-ef6009bdd64b?source=rss-e3e2d6a0739b------2</link>
            <guid isPermaLink="false">https://medium.com/p/ef6009bdd64b</guid>
            <category><![CDATA[computational-design]]></category>
            <category><![CDATA[design]]></category>
            <category><![CDATA[3d]]></category>
            <category><![CDATA[creativity]]></category>
            <category><![CDATA[model]]></category>
            <dc:creator><![CDATA[Suyash Jain]]></dc:creator>
            <pubDate>Wed, 11 Dec 2024 17:38:32 GMT</pubDate>
            <atom:updated>2024-12-11T17:38:32.715Z</atom:updated>
            <content:encoded><![CDATA[<blockquote>Computational Design has significantly grown in past 10–12 years and is rapidly becoming a center of attraction among industry’s leading architects and designers. It has brought a considerable transformation in the way modern architecture, engineering and modern art are perceived in today’s world, opening doors for more efficient, customised and creative solutions, it has not just enhanced the existing practices but also enabled scope for entirely new possibilities giving rise to astounding projects that have the blend of functionality with the essence of artistic beauty.</blockquote><p>Have you ever wondered what are the most effective ways to address the challenges of inefficient workflows and limited design flexibility in 3D modelling, especially when adapting to evolving project requirements?</p><p><strong>Traditional methods of 3D design</strong> often need help to keep pace with the demands of modern projects. These methods rely on <strong>linear workflows</strong>, which limit the ability to make real-time changes without disrupting the entire design process. Such rigidity becomes a significant hurdle when dealing with complex structures or last-minute client requirements. Additionally, traditional tools often have <strong>high entry barriers</strong>, requiring advanced technical expertise and expensive software setups to handle intricate modelling tasks. This restricts accessibility for independent creators, small firms, or those new to the field. Scaling or customising designs dynamically is another challenge — manual modifications are time-intensive and prone to errors, making the entire process inefficient.</p><p>Enters <strong>Computational Design</strong>, a game-changer in this domain. Leveraging advanced algorithms, real-time simulations, and generative modelling, it allows designers to break free from these constraints. Computational Design is simply like having a super-smart assistant for creative requirements, and making them even better — everything much faster than a normal human could ever do. It has reshaped modern architecture, engineering, and digital art by enabling new forms of creativity, precision, and efficiency. Let’s dive deeper into understanding computational design’s contributions to these fields:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_uUgvRLuDb4hKg4ZXSXVjg.png" /><figcaption>Canary Wharf Underground Station, London, United Kingdom</figcaption></figure><h4>Architecture</h4><ol><li><strong>Parametric Design: </strong>It has enabled architects to work with complex algorithms that have abilities to generate diverse and adaptable forms based on various specific parameters, thus enabling the design of buildings with intricate geometries, curves &amp; structures that would be impossible or highly labour-intensive using traditional methods.</li><li><strong>Efficiency and Optimisation: </strong>Leveraging the computational simulations, architectures can optimise material usage, and energy efficiency while maintaining structural integrity. These tools have capabilities to simulate multiple environmental conditions like lightning, airflow, and thermal performance, leading to the creation of buildings that are more sustainable and cost-effective.</li><li><strong>Bespoke Designs: </strong>Computational Design has opened doors for the creation of highly personalised buildings and facades that align with specific user needs or environmental contexts, thus creating structures that are highly customised and contextually relevant.</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*h32CQx3fUwUzIvMgULpgqw.png" /><figcaption>Electrical Tower, Sheffield, United Kingdom</figcaption></figure><h4>Engineering</h4><ol><li><strong>Structural Optimisation: </strong>Engineers can exploit the computational tools to design and rigorously test the structures for optimal performance. These tools allow for complex load-bearing simulations, enabling the creation of safer and more resilient structures.</li><li><strong>Generative Design: </strong>With the current available AI capabilities in the market, generative design tools using AI algorithms can explore thousands of possible design solutions that adhere to specific functional or material constraints, leading to more innovative solutions, with engineering applications ranging from bridges and skyscrapers to automotive and aerospace designs.</li><li><strong>Rapid Prototyping: </strong>Digital Models created using computational design can be quickly tested, modified, and fabricated through advanced technologies like 3D printing and CNC machining, drastically reducing development time and costs.</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*T6Ax6pqAA2Nr84xK98857g.png" /><figcaption>A modular rotating structure to be docked on a spacecraft or a station in orbit, Designed by Stefan Lehner</figcaption></figure><h4>Digital Art</h4><ol><li><strong>Algorithmic Creativity: </strong>Digital Artists can now create complex visual art that would be impossible to achieve manually. Pushing the boundaries of traditional art, algorithms can now generate unique, dynamic patterns, 3D forms, and immersive environments.</li><li><strong>Interactive Art: </strong>Artists can now integrate real-time data and user interaction into their designs, creating responsive installations that change based on user input or environmental factors. This has revolutionised the art gallery and public space experience.</li><li><strong>Generative Art: </strong>The kinds of<strong> </strong>artwork that evolve over time or respond to input data can be created now. These pieces can be algorithmically generated, creating an infinite variety of outputs, often with machine learning playing a role in evolving the art beyond traditional means.</li></ol><h4>Interdisciplinary Synergy</h4><ol><li><strong>Collaboration Between Fields:</strong> Convergence of computational design across<strong> AEC (Architecture, Engineering, and Construction)</strong> fosters an interdisciplinary approach to problem-solving. Designers, engineers, and artists can collaborate using shared digital platforms, leading to innovative, efficient, and visually captivating solutions that combine functionality with artistic expression.</li></ol><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FPnUi_JL3KYA%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DPnUi_JL3KYA&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FPnUi_JL3KYA%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/f93262708b39a77173ce9c8942fc9481/href">https://medium.com/media/f93262708b39a77173ce9c8942fc9481/href</a></iframe><p>The real-world impact of Computational Design is best illustrated through platforms like <a href="https://beegraphy.com/"><strong>BeeGraphy</strong></a>, which bring these principles to life.</p><p>Unlike conventional tools that require extensive manual adjustments, BeeGraphy integrates <strong>computational design principles</strong> into an intuitive, cloud-based web platform. With a simple yet profound mission of making 3D design tools <strong>accessible, flexible, and collaborative</strong>, BeeGraphy harnesses <strong>parametric design</strong> at its core. This allows users to create 3D models that can be quickly modified by adjusting parameters and generate <strong>print-ready 2D structures</strong> that can be assembled to form the final 3D creation.</p><p>A key feature of BeeGraphy is its ability to <strong>bridge the gap between traditional 3D design workflows and algorithmic design approaches</strong>. Its <strong>real-time collaboration tools</strong> enable teams to work together seamlessly, fostering innovation and efficiency in design processes. By utilising <strong>cloud infrastructure</strong>, the platform ensures scalability and accessibility, allowing designers to focus more on creativity and less on technical constraints. This makes it easier for small teams or individual creators to access advanced design capabilities without the need for costly setups.</p><p>BeeGraphy is particularly empowering designers by eliminating the steep learning curve often associated with computational tools. It enables the <strong>customization of parametric models without requiring programming expertise</strong>, providing a user-friendly interface that ensures flexibility and adaptability. Designers can work with <strong>intelligent, dynamic templates</strong> that respond instantly to changes, saving time and effort. Moreover, the platform’s collaborative features ensure that teams can refine designs in real-time, breaking down the silos that often hinder creative projects.</p><p>Looking ahead, BeeGraphy is poised to play a pivotal role in shaping the future of 3D design. <strong>Lowering the entry barrier</strong> for computational design, BeeGraphy encourages more individuals and smaller organizations to adopt advanced design methodologies. This has the potential to <strong>inspire a wave of innovation and creativity </strong>across industries, from architecture to product design. BeeGraphy is not just a tool for today’s designers — it is a platform that redefines how we will design and create in the years to come.</p><p>Computational Design is redefining the world of 3D design, enabling creators to move beyond traditional constraints and embrace limitless possibilities. By blending creativity with advanced algorithms, it empowers architects, engineers, and artists to craft efficient, innovative, and visually stunning solutions. Platforms like BeeGraphy are making these tools accessible to everyone, fostering collaboration and creativity without the need for steep learning curves or expensive setups.</p><p>As technology continues to evolve, Computational Design will play a central role in shaping the future of industries, driving sustainability, and making complex designs simple and achievable. It is not just transforming how we create, but also inspiring a new wave of designers to dream big and turn ideas into reality.</p><p><strong><em>The future of design is here, and it’s computational!</em></strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=ef6009bdd64b" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>