<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Aurelijus Banelis on Medium]]></title>
        <description><![CDATA[Stories by Aurelijus Banelis on Medium]]></description>
        <link>https://medium.com/@aurelijus-banelis-home24?source=rss-444ef2c0b995------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*YdeCcsggE6bTreNQIpjiRg.jpeg</url>
            <title>Stories by Aurelijus Banelis on Medium</title>
            <link>https://medium.com/@aurelijus-banelis-home24?source=rss-444ef2c0b995------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sat, 16 May 2026 17:07:46 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@aurelijus-banelis-home24/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Debugging GraphQL schema change in Golang app]]></title>
            <link>https://medium.com/home24-technology/debugging-graphql-schema-change-in-golang-app-64098de9ff06?source=rss-444ef2c0b995------2</link>
            <guid isPermaLink="false">https://medium.com/p/64098de9ff06</guid>
            <category><![CDATA[graphql]]></category>
            <category><![CDATA[monitoring]]></category>
            <category><![CDATA[golang]]></category>
            <category><![CDATA[debugging]]></category>
            <category><![CDATA[lambda]]></category>
            <dc:creator><![CDATA[Aurelijus Banelis]]></dc:creator>
            <pubDate>Thu, 23 Sep 2021 08:24:38 GMT</pubDate>
            <atom:updated>2022-03-09T19:30:38.456Z</atom:updated>
            <cc:license>https://creativecommons.org/licenses/by-sa/4.0/</cc:license>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*yUHPcCXtAY8QrNi6m_Jp5g.png" /></figure><p><a href="https://graphql.org/">GraphQL</a> was created to have <strong>stable APIs</strong> via <a href="https://graphql.org/learn/schema/">strict schemas</a> on a server-side and the more relaxed <a href="https://graphql.org/learn/queries/">query/mutate/subscribe</a> methods from the client-side. You might imagine a feeling when schema validation stops working <strong>randomly after some requests</strong>. And you might imagine that investigation of those kinds of issues are the fun<strong> </strong>ones.</p><p><em>Are you ready for </em><a href="https://medium.com/home24-technology/debugging-node-js-memory-leak-on-production-via-shadowed-traffic-cd8198d3df28"><em>another</em></a><em> debugging story?</em></p><h3>Schema difference tool</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/664/1*6r1cAEOSu0KgYYz8nCPHKA.png" /></figure><p>It all started, when we saw some false positives from <a href="https://github.com/kamilkisiela/graphql-inspector">graphql-inspector</a> tool.</p><p>We are checking schema changes after each GraphQL functionality change (Pull request hook) to prevent breaking changes for GraphQL clients in production.</p><p>From the first impression, it looked like some configuration issue after pre-production/staging infrastructure changes (it was a coincidence that both changes were made within a similar time range).</p><p>While we could easily double-check affected GraphQL queries (or mutations) — it was a low priority task.</p><h3>The obvious: maybe it is already fixed?</h3><p>One day we finally picked that difference checker task, hoping to just upgrade the tool and see the issue gone.</p><p>Unfortunately, the upgrade of the <a href="https://github.com/kamilkisiela/graphql-inspector">graphql-inspector</a> version did not help and we (thanks to a colleague, James) figured out, that pre-production and production environments were returning different results. Those two environments are supposed to run the same code and therefore – supposed to return the same results.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/e7f2d8b6d2913d0a5112799cd4fdf385/href">https://medium.com/media/e7f2d8b6d2913d0a5112799cd4fdf385/href</a></iframe><p>As you can see from the code example, ! means that argument <a href="https://graphql.org/learn/schema/#object-types-and-fields">is mandatory</a>. Therefore input arguments of type Locale! should be validated always<strong>. </strong>But for some docker containers (applications running for some time) — input validation did not work<strong>.</strong></p><h3><strong>Signs of a bigger problem</strong></h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/576/1*kPUwRH7mm_-cEISziu2SmA.png" /></figure><p>Testing environment you cannot trust (or releasing new versions blindly) was not a good thing, so this low-priority task had to be taken more seriously.</p><p>We started from obvious things: searching for some stupid typo mistake by aligning pre-production and production configuration. To our disappointment same docker image, the same environment variable, the same AWS roles — were still reproducing the same error.</p><p>When hope was almost replaced by frustration — we finally saw the same issue<strong> </strong>in a staging environment<strong>.</strong></p><p><em>So we went a small step further: the issue is reproducible (the good news), but (the bad news) it started without a code change.</em></p><h3><strong>Managing the randomness</strong></h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*kubVsgIRBrtNBq5e90GhfQ.png" /><figcaption>Comparing request types before and after behavior have changed</figcaption></figure><p>For GraphQL application development, we have multiple environments (local, staging, pre-production, and production) — some are best for debugging, others are best to catch integration and configuration mistakes.</p><p>Ideally, we wanted to reproduce the issue in the <em>Local</em> environment, so the problem could be narrowed down until it is easy to find the real cause. Unluckily, the issue was appearing randomly after some time and some calls to the application.</p><p>Replaying requests did not seem like an option, because not all requests were <a href="https://en.wikipedia.org/wiki/Idempotence">idempotent</a>, also timing or cache were candidates for the cause of the issue.</p><p>So we ended up with a simple Lamba cron<em> </em>(<a href="https://aws.amazon.com/lambda/">small application on AWS</a>) that was calling Staging environment to reproduce the symptoms of the issue. The Staging environment had less traffic, so we hoped it would be easier to narrow down to fewer examples to double-checking.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/0b1c753a6a1e7312c65fa5496b82a3ac/href">https://medium.com/media/0b1c753a6a1e7312c65fa5496b82a3ac/href</a></iframe><p><em>We automated everything we could — now we needed only to wait to gather more data (assuming our monitoring tools were also ready).</em></p><h3><strong>Monitoring GraphQL</strong></h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/867/1*GU7idEtV-wpcAdM01nf2kg.png" /><figcaption>Because HTTP URL will always be the same, we use operation name as a short and meaningful identifier</figcaption></figure><p>Because GraphQL is flexible regarding queries, logging full GraphQL input could fill up your storage quite fast<strong>.</strong> Also, we want to be careful about not storing sensitive data (secrets, personal data, etc). In the company, we have agreed on 2 GraphQL best practices:</p><ul><li><strong>Variables</strong> are used as arguments for common calls (it is also beneficial for <a href="https://gqlgen.com/reference/apq/">Automatic persisted queries</a>). Those parameters are obfuscated in logs from sensitive data.</li><li><strong>OperationName</strong> is an optional parameter to provide a human-readable description of a query (or a mutation). OperationName is the closest thing to the <em>endpoint</em> concept in <a href="https://swagger.io/resources/open-api/">REST API</a>.</li></ul><p>Logging OperationName and metrics of cache usage was the optimal choice for both privacy and resource usage even in the Production environment.</p><p>Of course, longer monitoring, filtering by container, and special OperationName for Lamba cron — were useful additions.</p><p><em>Script to reproduce the error ready, behavior logging ready, monitoring to identify start the issue also ready, text difference tools ready — so lacked only hope to finally solve it.</em></p><h3><strong>Reproducible locally, what’s next</strong></h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*8IHURrXovfdX5EEp5WqxoQ.png" /><figcaption>Debugging variable state and looking for anomalies (E.g. NonNull changing: true to false)</figcaption></figure><p>Lambda tool combined with GraphQL monitoring gave a lot smaller set of OperationNames to double-check. Luckily, searching the company’s GitHub for a particular OperationName string was giving quite good GraphQL query examples to test those locally.</p><blockquote>I still remember that moment of joy when I loudly said: “<strong><em>yey, I reproduced it locally!”</em></strong></blockquote><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/6771bc506230a002b7539b6096df300d/href">https://medium.com/media/6771bc506230a002b7539b6096df300d/href</a></iframe><p>After all this mystery-solving, we still needed debugging in Go (aka <em>Golang</em>) code:</p><ul><li>Check for obvious mistakes in our code</li><li>Make testing easy (E.g. simplify GraphQL query, so it would be fewer fields and no special authentication)</li><li>Follow the execution flow and look for unexpected state</li><li>Narrow down the issue, by changing the code (dirty, but fast fixes)</li><li>When the hypothesis is tested: finish with long term solution: an automated test to reproduce the issue and actual code fix making those tests passing</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/795/1*8qv-HMPV8vRJPAM-h5AcgA.png" /><figcaption>A quick fix in the code — probably the best way to narrow down the problem</figcaption></figure><p>Regarding debugging in Golang, I felt really happy when source code of all dependencies were also downloaded. So tools like debugger watchers, conditional breakpoints, updating code manually — were working across own and dependencies code base.</p><p><em>It does not matter where the issue is if it affects our clients or the stability of the service.</em></p><h3><strong>Issue in Open Source dependency</strong></h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/709/1*k733MtZSyOWCkAQzragbeA.png" /></figure><p>In <a href="https://home24.career.softgarden.de/en/">home24</a> our software is relying on Open source (keeping license terms in mind). So we do not only consume free code but also try to give back to the community. <a href="https://github.com/vektah/gqlparser/pull/161">Pull requests with bug fixes</a> are the perfect example of Open source flourishing.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/928/1*aLvvymdpPar9Qsk715u8Mg.png" /><figcaption><a href="https://github.com/vektah/gqlparser/pull/161">Pull request with test cases to reproduce in the upstream library</a></figcaption></figure><p>Of course, maintainers of Opens source project are the ones to decide, <a href="https://github.com/vektah/gqlparser/pull/158">how the issues could be resolved</a>. Still, notifying about the issue or proving an alternative way to fix the problem — is a good practice in Software industry overall.</p><blockquote>Go programing langue have powerful tools to save some memory by <a href="https://tour.golang.org/moretypes/1">using references to objects</a>, but those same optimizations (as seen from bug fixes) can also lead to <strong>unexpected </strong><a href="https://betterprogramming.pub/pass-by-value-and-reference-in-go-94423b6accf1"><strong>updates by reference</strong></a>. And that was the cause for this story to be written.</blockquote><h3><strong>To sum up</strong></h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*anRSAEopvdTewnQHEl3z7A.png" /></figure><p>I hope this debugging journey was a good example, how someone could approach and think about the problems in Software development. It might be easy to find a quick solution by the error code, but the reality is not always that simple. The initial assumption may differ a lot from the cause of the issue: especially when it is a known issue somewhere deep in the dependency’s issue tracker.</p><p>Therefore practices like raising and testing assumptions, getting a fresh view from colleagues, prioritizing risks and tasks, using multiple tools, optimizing the process itself, and thinking more than you — are the ones that work, at least for me.</p><p><em>I wish you fun programming journeys, as we have here at </em><a href="https://home24.career.softgarden.de/en/"><em>home24</em></a><em>.</em></p><p>P.S. Summary version also available as an <a href="https://app.infinitymaps.io/maps/mbhn6qmq3p7/HLBrGmgjqTB">Infinity map</a>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=64098de9ff06" width="1" height="1" alt=""><hr><p><a href="https://medium.com/home24-technology/debugging-graphql-schema-change-in-golang-app-64098de9ff06">Debugging GraphQL schema change in Golang app</a> was originally published in <a href="https://medium.com/home24-technology">home24 technology</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Debugging Node.js Memory leak on Production via Shadowed traffic]]></title>
            <link>https://medium.com/home24-technology/debugging-node-js-memory-leak-on-production-via-shadowed-traffic-cd8198d3df28?source=rss-444ef2c0b995------2</link>
            <guid isPermaLink="false">https://medium.com/p/cd8198d3df28</guid>
            <category><![CDATA[nodejs]]></category>
            <category><![CDATA[cloud-native]]></category>
            <category><![CDATA[aws]]></category>
            <category><![CDATA[debugging]]></category>
            <category><![CDATA[home24]]></category>
            <dc:creator><![CDATA[Aurelijus Banelis]]></dc:creator>
            <pubDate>Mon, 25 Jan 2021 08:11:27 GMT</pubDate>
            <atom:updated>2021-01-25T12:55:59.154Z</atom:updated>
            <cc:license>http://creativecommons.org/licenses/by/4.0/</cc:license>
            <content:encoded><![CDATA[<figure><img alt="Room as  background with various charts. Text: Debugging Node.js Memory leak on Production via Shadowed Traffic. At home24" src="https://cdn-images-1.medium.com/max/1024/1*T2P-Y6SjNEkjOvo8Iv8RVA.jpeg" /></figure><p>Memory leaks are hard to debug, especially when using programming languages with <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Memory_Management">garbage collection</a>. At <a href="https://home24.career.softgarden.de/en/">home24</a> we had quite a journey searching for the cause of <a href="https://www.geeksforgeeks.org/exit-codes-in-c-c-with-examples/">139</a> exit codes from crashed docker containers.</p><p>The outcome of the debugging was not only the fixed bug, but also an example how <strong>tools outside of the JavaScript ecosystem can help in debugging JavaScript code</strong>.</p><p>So let’s start the journey…</p><h3>In the beginning, it was just a 5xx status codes</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*F3xAB1UNSYUEkf4E8Sm61g.png" /><figcaption>Number of crashed docker containers via AWS CloudWatch metrics</figcaption></figure><p>In a Web service world, there is a convention to respond with <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500">500–599 HTTP status codes</a> when there is a nonrecoverable problem in the application itself. Therefore most popular <a href="https://nodejs.org/en/about/">Node.js</a> applications (e.g. <a href="https://expressjs.com/en/guide/error-handling.html#the-default-error-handler">Express</a>) and <a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-troubleshooting.html#load-balancer-http-error-codes">AWS Load balancers</a> also implement this behavior.</p><p>What cannot be found and fixed via <a href="https://jestjs.io/">Unit tests</a> while still in development, ends up in <em>5xx</em> exit codes in a live environment. For code logic errors — it is repetitively easy to fix because those are deterministic: with the same input and context, we should get the same error. But there are other issues like failed network, hardware and… insufficient <strong>memory management</strong>. We still get the same <em>5xx</em> error codes but fix range from <em>wait-and-retry</em> to <em>long story ahead</em>.</p><h3>Memory leaks in a docker container world</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/777/1*YAfextYOtiS3R6aaQ7qocQ.png" /><figcaption>Example of memory leak issue delegating to other teams</figcaption></figure><p>If you are coming from <a href="https://www.digitalocean.com/community/tutorials/how-to-use-pm2-to-setup-a-node-js-production-environment-on-an-ubuntu-vps">bare metal Node.js applications</a>, you would probably say “<a href="https://marmelab.com/blog/2018/04/03/how-to-track-and-fix-memory-leak-with-nodejs.html#restart-before-its-too-late">just restart pm2 and it is fixed</a>”. But it does not work in the world of containerized applications because:</p><ul><li>There is <a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_agent.html">ecs-agent</a>, <a href="https://kubernetes.io/">kubernetes</a>, <a href="https://www.docker.com/">docker</a> or other <a href="https://en.wikipedia.org/wiki/Hypervisor">hypervisors</a> that would restart the application instead of <em>pm2</em></li><li>restarting application mitigates outcome, but we are still paying for more CPU or memory of a not optimized application</li></ul><p>In <em>home24</em> we are trying to innovate fast and take advantage of the best programming language for the task, so we are using <a href="https://www.docker.com/">docker</a> extensively. At first, we tried to increase the memory of the containers (think bigger servers), but it just postponed the issues.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_GjtvJJIEX2EpmkBT9h85A.png" /><figcaption>Advertisement about discounts before Black Friday weekend</figcaption></figure><p>Black Friday (biggest sale of the year) was coming, so “just postpone” was not a viable solution. <em>Frontend</em> team was already filled with tasks of creating new features and fixing “reproducible” errors, so memory leak issue was delegated to <em>Scaling</em> team —a team that was more familiar with AWS infrastructure than with internals of Node.js/JavaScript.</p><h3>Brace yourself, memory leak hunter is coming</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*L-7My2fq_lFDQNymv-yxyw.png" /><figcaption>Slide from presenting bug hunting internally at home24 Demo night</figcaption></figure><p>Coming from other programming languages, <em>Scaling</em> team assumed that unused memory would be released quite soon. But even with a <a href="https://expressjs.com/en/starter/hello-world.html"><em>Hello world Express application</em></a><em> </em>after no requests memory usage did not drop. So we realized that <a href="https://felixgerschau.com/javascript-memory-management/">JavaScript memory management is more complex</a> than we anticipated. So as a Scaling team we went from different debugging tools (E.g. <a href="https://nodejs.org/en/docs/guides/debugging-getting-started/">Chrome inspector</a>, <a href="https://clinicjs.org/">clinic,</a> <a href="https://www.netdata.cloud/">netdata,</a> <a href="https://newrelic.com/">NewRelic</a>)to different debugging methods (E.g. building locally, running production containers, comparing logs of similar services).</p><p>As usual in these types of bug hunting — the best ideas came in the middle of midnight. <em>Marius </em>saw that there are a lot of HTML strings persisted in the memory when trying to reproduce locally. He googled for similar issues in JavaScript community and guess what — there was a known memory leak in the popular <em>axios</em> library:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/929/1*KB_Iew_XEX_whlDPnUlNlw.png" /><figcaption><a href="https://github.com/axios/axios/issues/1997#issuecomment-463571527">Workaround for axios and HTML content</a></figcaption></figure><p>It was used in 3rd level dependency, so rebuilding (minifying) all upstream dependent services took some time. We kinda reproduced it locally and saw fewer retained strings via <a href="https://nodejs.org/en/docs/guides/debugging-getting-started/">Chrome inspector</a>. Therefore we were already excited that we managed to find the bug. Finally, we released the application with the fix and…</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*YJOJohBXUZoUzOjgN6xWVA.png" /><figcaption>Release date of new service version, but no visible decrease in errors</figcaption></figure><p>there was no significant impact.</p><p>We were going out of options, so we had to accept the defeat. To be honest it was Scaling not the <em>Frontend</em> team.</p><p><em>But the story does not end here…</em></p><h3>Meanwhile migration to Fargate</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/913/1*8x5PQUUjFR6lE-RrTOoI-g.png" /><figcaption>The task to migrate Node.js service on new infrastructure</figcaption></figure><p>As part of Scaling team tasks, there was a migration from <a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/getting-started-ecs-ec2.html">AWS EC2 based ECS</a> to <a href="https://aws.amazon.com/fargate/">Fargate based ECS</a>. From practice, we saw that we are not good at optimally choosing the right servers (EC2 instances) and right auto-scaling parameters. Therefore, for stateless Node.js application, it made sense to migrate.</p><p>To get the right container sizes for production load we used a technique called <strong>shadowed (mirrored) traffic:</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/620/1*KsYPlYCm4Qd3FlspoogAOw.png" /><figcaption>Traffic shadowing</figcaption></figure><p>Traffic shadowing is a quite common technique in <a href="https://landscape.cncf.io/">Cloud-native</a> tools like <a href="https://opensource.zalando.com/skipper/tutorials/shadow-traffic/">Skipper</a> or <a href="https://istio.io/latest/docs/tasks/traffic-management/mirroring/">Istio</a>. It is similar to <a href="https://en.wikipedia.org/wiki/Load_testing">load testing</a>, but instead of manually describing requests, examples of real ones are directed to the copy of the service. While our <em>Node.js</em> applications were <a href="https://medium.com/@rachna3singhal/stateless-over-stateful-applications-73cbe025f07">stateless</a> and dependent services could handle 2x traffic — the migration plan seemed perfect… until we saw the response time chart:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/912/1*np6eZbHMeiYbV7VljmPT_g.png" /><figcaption>Response times more unstable on Fargate</figcaption></figure><blockquote>But wait — why a Response time chart, how it is related to memory usage?.. It is not only garbage collection calls that make application slower. The load balancer also does not remove unhealthy containers instantly. So clients (services or browsers) are still waiting for valid connection close packets until it times out.</blockquote><p>It turned out that <a href="https://aws.amazon.com/fargate/">Fargate</a> was starting all containers at the same time, so memory was filling also at the same time. To our disappointment, <strong>the memory leak issue was more visible on <em>Fargate</em> than on <em>EC2</em> instances</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*csgJviR4q_ZFJA27AaEHuQ.png" /><figcaption>Slide from presenting bug hunting internally at home24 <em>Demo night</em></figcaption></figure><p><em>Scaling</em> team already failed in trying to fix a memory leak issue before. If other similar Node.js apps would also have a similar memory issue, the whole migration to Fargate goal would be at risk.</p><p><em>It seemed that the Scaling team had just bad luck, unless…</em></p><h3>Introducing: Debugging on production</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*LHIZXqDAhIefzi76rJSpvA.png" /><figcaption>Connecting to remove Node.js container via Chrome Inspector on a shadowed instance</figcaption></figure><p>The Situation was different. We had an environment that did not affect real users but had the best test data possible (mirroring all LIVE traffic). <strong>So we could use all heavy JavaScript debugging tools</strong>.</p><p>It turned out that <a href="https://developers.google.com/web/tools/chrome-devtools">Chrome inspector (DevTools)</a> was adding a lot of overhead, so for Heap snapshot and Allocation instrumentation on timeline containers were crashing in seconds (meaning connection reset errors in Chrome). So connecting to “real production” would be catastrophic (or at least not professional).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/722/1*HL2H_bkVVAU2UMYL6Fp4TQ.png" /><figcaption>Chrome Inspector options for memory profiling</figcaption></figure><p>From all profiling types, only Allocation sampling survived ~10 seconds until the container was marked as unhealthy (<em>note: we were using container sizes that had same response time as original ones</em>). And it was enough to see the real issue: docCache <a href="https://github.com/apollographql/graphql-tag/search?q=docCache">variable</a>.</p><p>Also, this setup allowed to <strong>remove false-positives fairly quickly</strong> (E.g. is it related to Node.js version, Apollo library version, hooks, caching configuration, etc).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*WzGMmskRHWV2Xt2azH8aiQ.png" /><figcaption>Slide from presenting bug hunting internally at home24 <em>Demo night</em></figcaption></figure><p><em>Finally, we saw a happy ending in this debugging story…</em></p><h3>The real cause: string interpolation instead of Apollo variables</h3><p>After assumption was tested with a traffic of real users, we finally could confirm the real cause of the issue:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/bafbfc5d0aff875893829442f779a8bc/href">https://medium.com/media/bafbfc5d0aff875893829442f779a8bc/href</a></iframe><p>Of course, this is a simplified example. But when you are writing a lot of:</p><pre>) @include(if: $<em>withOptionalBackendParameters</em>)</pre><p><strong>It</strong> <strong>gets tempting to replace those with some simple generated strings.</strong></p><p>To summarize, at the start of debugging:</p><ul><li>There was a false assumption<a href="https://github.com/apollographql/graphql-tag/issues/182"> that many different query parameters would impact memory usage</a></li><li>The bug was very hard to reproduce locally because there were not enough unique products/configurations simulated</li></ul><h3>Lessons learned</h3><p>We could say that <em>Scaling</em> team learned some new JavaScript tricks, but personally, I see it from a wider perspective:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*P7okV7R2HYhN22eEBXxoDw.png" /><figcaption>For complex problems, cross-team effort gives more unique tools and views</figcaption></figure><p>If not the unique insights from multiple team members — I doubt I could write a happy ending for this debugging story.</p><p>Therefore I want to publicly give big thanks to the whole <em>Scaling</em> and <em>Frontend</em> teams (not excluding Tomas, Marius, Džiugas, Danny, Olga, Robert, Karolis, Antonella), as well as many other colleagues from <a href="https://home24.career.softgarden.de/en/"><strong>home24</strong></a><strong>.</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=cd8198d3df28" width="1" height="1" alt=""><hr><p><a href="https://medium.com/home24-technology/debugging-node-js-memory-leak-on-production-via-shadowed-traffic-cd8198d3df28">Debugging Node.js Memory leak on Production via Shadowed traffic</a> was originally published in <a href="https://medium.com/home24-technology">home24 technology</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>