<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[helpshift-engineering - Medium]]></title>
        <description><![CDATA[Engineering blog for Helpshift - Medium]]></description>
        <link>https://medium.com/helpshift-engineering?source=rss----3229f31ca4f4---4</link>
        <image>
            <url>https://cdn-images-1.medium.com/proxy/1*TGH72Nnw24QL3iV9IOm4VA.png</url>
            <title>helpshift-engineering - Medium</title>
            <link>https://medium.com/helpshift-engineering?source=rss----3229f31ca4f4---4</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Mon, 11 May 2026 16:48:45 GMT</lastBuildDate>
        <atom:link href="https://medium.com/feed/helpshift-engineering" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Agile vs. Waterfall QA: A Comparative Guide]]></title>
            <link>https://medium.com/helpshift-engineering/agile-vs-waterfall-qa-a-comparative-guide-dbd45fd2f25a?source=rss----3229f31ca4f4---4</link>
            <guid isPermaLink="false">https://medium.com/p/dbd45fd2f25a</guid>
            <category><![CDATA[waterfall-testing]]></category>
            <category><![CDATA[agile-testing]]></category>
            <dc:creator><![CDATA[Pankaj Dusane]]></dc:creator>
            <pubDate>Fri, 09 Jan 2026 12:06:15 GMT</pubDate>
            <atom:updated>2026-01-09T12:06:14.764Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="Agile vs. Waterfall" src="https://cdn-images-1.medium.com/max/648/1*5s48tJjU6Wio4ELiOU2IXQ.png" /></figure><p>Quality Assurance (QA) is a critical aspect of software development, ensuring that the final product meets the desired standards and functions as intended. While QA practices are integral to all development methodologies, the approach to QA can vary significantly depending on whether a project follows the Agile or Waterfall methodology. In this blog post, we’ll compare these two methodologies, focusing on their impact on QA processes, workflows, and outcomes.</p><h3><strong>What is Waterfall QA?</strong></h3><p>The Waterfall model is a linear and sequential approach to software development. In this methodology, each phase — from requirements gathering to design, development, testing, and deployment — is completed before moving to the next. QA in the Waterfall model is typically conducted during a dedicated testing phase after development is complete.</p><p><strong>Key Characteristics of Waterfall QA:</strong></p><ol><li><strong>Late Involvement:</strong> QA is often involved only after development is finished.</li><li><strong>Fixed Scope:</strong> Testing is based on a predefined set of requirements and test cases.</li><li><strong>Extensive Testing:</strong> Since QA occurs after development, it often involves extensive end-to-end testing.</li><li><strong>Minimal Flexibility:</strong> Changes to requirements or functionality after testing begins can be costly and disruptive.</li><li><strong>Predictability:</strong> The linear nature of the process makes timelines and deliverables more predictable.</li></ol><p><strong>Advantages of Waterfall QA:</strong></p><ol><li>Clear documentation and test plans.</li><li>Structured processes provide clarity on responsibilities and timelines.</li><li>Suitable for projects with well-defined requirements and minimal expected changes.</li></ol><p><strong>Challenges of Waterfall QA:</strong></p><ol><li>Delayed feedback loops can prolong the discovery of critical issues.</li><li>Limited ability to adapt to changes or new requirements.</li><li>Potential for larger defects due to the lack of continuous testing.</li></ol><h3>What is Agile QA?</h3><p>Agile methodology emphasizes flexibility, collaboration, and iterative development. Unlike Waterfall, Agile integrates QA throughout the development lifecycle, ensuring continuous testing and feedback.</p><p><strong>Key Characteristics of Agile QA:</strong></p><ol><li><strong>Early Involvement:</strong> QA begins during the initial stages of development, such as planning and requirement gathering.</li><li><strong>Continuous Testing:</strong> QA is performed at every iteration or sprint, often using automated testing tools.</li><li><strong>Collaborative Approach:</strong> QA teams work closely with developers, product owners, and other stakeholders.</li><li><strong>Flexibility:</strong> Agile accommodates changing requirements and evolving priorities.</li><li><strong>Incremental Delivery:</strong> Testing is focused on small, manageable increments of the product.</li></ol><p><strong>Advantages of Agile QA:</strong></p><ol><li>Faster identification and resolution of defects.</li><li>Improved collaboration between teams.</li><li>Greater adaptability to changing requirements.</li><li>Higher product quality due to continuous testing.</li></ol><p><strong>Challenges of Agile QA:</strong></p><ol><li>Requires strong communication and collaboration skills.</li><li>Frequent testing can increase workload if not automated effectively.</li><li>Documentation may be less comprehensive than in Waterfall.</li></ol><h3><strong>How to transition from Waterfall QA to Agile QA efficiently?</strong></h3><ol><li><strong>Resistance to Change: </strong>Teams habitual to the structured approach of Waterfall may resist Agile practices. Providing clear communication on the benefits of Agile and offering training and workshops would build confidence in the new methodology.</li><li><strong>Lack of Agile Expertise: </strong>Hiring Agile coaches to guide teams during the transition would help teams struggling to understand Agile principles and practices.</li><li><strong>Start Small:</strong> Initiate Agile QA practices on a smaller project or a single team before scaling to the entire organization.</li><li><strong>Coordination Issues:</strong> Agile requires closer collaboration between teams, which can be challenging. Establish regular communication channels, such as daily stand-ups and cross-team meetings.</li><li><strong>Automation: </strong>Prioritise test automation to handle repetitive manual tasks.</li><li><strong>Transitional</strong> <strong>Expectations:</strong> Stakeholders may expect immediate results from the transition which might be a bit unrealistic. Manage expectations by highlighting that the transition is a gradual process with incremental improvements.</li><li><strong>Documentation Needs:</strong> Agile’s lightweight documentation approach may be a cultural shift for teams used to detailed Waterfall documentation. Document critical aspects without compromising Agile’s flexibility.</li></ol><h4>Which Approach is Right for You?</h4><p>The choice between Agile and Waterfall QA depends on various factors, including the nature of your project, team dynamics, and organizational goals. Waterfall QA might be suitable for projects with clearly defined requirements and rigid timelines. In contrast, Agile QA is better suited for dynamic projects requiring frequent updates and iterative development.</p><p>By understanding the strengths and challenges of each methodology, we can make informed decisions that align with our project needs and deliver high-quality software efficiently.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=dbd45fd2f25a" width="1" height="1" alt=""><hr><p><a href="https://medium.com/helpshift-engineering/agile-vs-waterfall-qa-a-comparative-guide-dbd45fd2f25a">Agile vs. Waterfall QA: A Comparative Guide</a> was originally published in <a href="https://medium.com/helpshift-engineering">helpshift-engineering</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[From Data to Insight: Helpshift’s Journey with ML Observability]]></title>
            <link>https://medium.com/helpshift-engineering/from-data-to-insight-helpshifts-journey-with-ml-observability-9680e27d1d01?source=rss----3229f31ca4f4---4</link>
            <guid isPermaLink="false">https://medium.com/p/9680e27d1d01</guid>
            <category><![CDATA[analytics]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[observability]]></category>
            <dc:creator><![CDATA[Sujit Singh]]></dc:creator>
            <pubDate>Wed, 26 Nov 2025 14:00:15 GMT</pubDate>
            <atom:updated>2025-11-26T14:00:15.557Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*nXRRIPO4gFvMmVTupmCGOA.png" /></figure><h3>Introduction</h3><p>In an age where artificial intelligence (AI) and machine learning (ML) are integral to almost every aspect of our lives, ensuring the effectiveness, fairness, and reliability of ML models is paramount. Observability plays a crucial role in maintaining the performance of these models, allowing us to detect and resolve issues promptly. At Helpshift, we recognized the need for robust ML observability to keep our models running smoothly and efficiently.</p><p>This blog post explores our journey in building a custom ML observability solution tailored to our specific needs. We’ll delve into the concept of ML observability, discuss the limitations of existing tools, and share how we implemented our own solution based on the idea of “Wide Events.”</p><h4>Understanding ML Observability</h4><p>ML observability is the ability to monitor and understand the performance, behavior, and outputs of machine learning models in real-time. It enables us to proactively identify potential issues and anomalies, facilitating timely interventions and mitigating risks. ML observability encompasses several key components:</p><ul><li><strong>System Monitoring</strong>: Tracking the performance of the infrastructure where ML services are deployed, including metrics like CPU and memory usage, network traffic, disk space, and service performance.</li><li><strong>Inference Monitoring</strong>: Evaluating and auditing the real-time performance of deployed ML models in production by tracking incoming requests and the accuracy of model predictions.</li><li><strong>Model Monitoring</strong>: Observing the long-term accuracy of ML models by monitoring key metrics such as accuracy, precision, recall, and F1-score, and detecting any drift over time.</li></ul><h4>Why Observability Matters</h4><p>If you’ve ever played <strong>Age of Empires</strong>, you know how crucial it is to explore the map to manage resources proactively and strategize effectively. Similarly, ML observability is about exploring properties and patterns not determined in advance. It allows us to be proactive in debugging and improving our systems, ensuring that our ML models remain reliable and efficient.</p><p>As my colleague @<a href="https://medium.com/u/9f0a9ecd0a2b">Utkarsh</a> aptly put it, observability is like having a comprehensive map of your ML environment, enabling you to navigate challenges and seize opportunities before they impact your system.</p><h4>The Limitations of Existing Observability Tools</h4><p>The market offers a variety of observability tools, ranging from traditional telemetry solutions to the latest language model (LLM) monitoring systems, as well as system monitoring and logging tools like Kibana. While these tools are powerful and feature-rich, they come with complexities and challenges:</p><ul><li><strong>Complexity and Overhead</strong>: Many solutions require significant setup and maintenance, adding complexity to the observability process.</li><li><strong>Cost</strong>: High licensing and operational costs can be a burden, especially for small to mid-sized companies.</li><li><strong>Flexibility</strong>: Off-the-shelf solutions might not align perfectly with specific needs, leading to either overkill or insufficient coverage.</li><li><strong>Data Silos</strong>: Different tools for metrics, logs, and traces can lead to fragmented data, making it difficult to correlate events and gain holistic insights.</li></ul><h3>Embracing Wide Events: A New Approach to Observability</h3><p>Inspired by the concept discussed in the article <a href="https://isburmistrov.substack.com/p/all-you-need-is-wide-events-not-metrics">“All You Need is Wide Events, Not ‘Metrics, Logs and Traces’”</a>, we realized that logging wide events provided the granularity and context needed for effective monitoring and debugging.</p><h4>What Are Wide Events?</h4><p><strong>Wide events</strong> are comprehensive, structured logs that capture a broad set of key-value pairs, encompassing all relevant information about an event or operation in your system. Unlike traditional metrics or logs, wide events are designed to include as much context as possible, making them highly versatile for analysis.</p><p>For example, a wide event for an ML prediction might include:</p><ul><li><strong>Timestamp</strong></li><li><strong>Model Version</strong></li><li><strong>Input Features</strong></li><li><strong>Predicted Output</strong></li><li><strong>Actual Output (if available)</strong></li><li><strong>Latency</strong></li><li><strong>User ID</strong></li><li><strong>Request ID</strong></li><li><strong>System Metrics (CPU, Memory usage at the time)</strong></li></ul><p>By capturing detailed information in each event, we can perform granular analyses, correlate different aspects of our system, and uncover insights that might be missed with traditional observability approaches.</p><h4>Advantages of Wide Events</h4><ul><li><strong>Granularity</strong>: Provides detailed context for each event, enabling deep dives into system behavior.</li><li><strong>Flexibility</strong>: Allows for dynamic querying and filtering without the need for predefined metrics.</li><li><strong>Simplification</strong>: Eliminates the need to manage separate systems for logs, metrics, and traces.</li></ul><h3>Why We Built Our Own Observability Solution</h3><p>Recognizing the limitations of existing tools and the benefits of wide events, we decided to build our own observability solution based on simple logging. This approach provided several key benefits:</p><ul><li><strong>Simplicity and Control</strong>: By leveraging a logging-based approach with wide events, we gained complete control over our observability processes, tailoring the system precisely to our needs without the overhead of complex telemetry tools.</li><li><strong>Focused Use Case</strong>: Our main objective was to efficiently monitor ML observability events. A focused, logging-based solution allowed us to design the system specifically for our requirements.</li><li><strong>Cost-Effectiveness</strong>: By utilizing our existing infrastructure and open-source tools, we maintained low costs while achieving robust observability.</li><li><strong>Unified Data</strong>: Storing all observability data as wide events in a single system eliminated data silos, making it easier to correlate events and gain holistic insights.</li></ul><h3>How We Implemented Our Observability Pipeline</h3><h4>Architecture Overview</h4><p>Our observability pipeline consists of three main components:</p><ol><li><strong>Application Services Generating Wide Events</strong>: Our ML services generate detailed observability logs as wide events, capturing all relevant data points about each prediction or operation.</li><li><strong>Logging in a Distributed Event Store and Stream-Processing Platform (Kafka)</strong>: We use Kafka to handle the high volume of logs generated, efficiently ingesting and distributing ML observability events in real-time.</li><li><strong>Consuming and Storing in Cloud Storage (Secor and S3)</strong>: We utilize Secor to consume logs from Kafka, process them as needed, and store them in Amazon S3 for durable, scalable storage.</li></ol><h4>Data Visualization and Analysis</h4><p>To derive insights from the stored logs, we use <strong>Amazon Athena</strong>, which allows us to query and analyze data directly from S3 using standard SQL. This setup eliminates the need for complex processing pipelines and enables us to create dashboards, generate meaningful metrics, and perform deep dives into our observability data.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*EZvfS_TPst8wOtrff81Qgg.png" /><figcaption><em>Figure: High-level architecture of our observability pipeline.</em></figcaption></figure><h4>Implementation Details</h4><ul><li><strong>Kafka Integration</strong>: Our application services publish wide events to Kafka topics dedicated to ML observability. Kafka’s scalability and fault-tolerance ensure that we can handle large volumes of data without loss.</li><li><strong>Secor Processing</strong>: Secor consumes events from Kafka, partitions them appropriately (e.g., by date or event type), and writes them to S3 in a structured format like Parquet or JSON, optimizing for query performance.</li><li><strong>Athena Queries</strong>: We define schemas for our wide events in Athena, allowing us to run ad-hoc SQL queries, create views, and integrate with visualization tools like Amazon QuickSight or Grafana.</li></ul><h4>What We Log: Data and Events</h4><p>Our logging focuses on capturing detailed ML observability events as wide events. Key types of data and events include:</p><ul><li><strong>Prediction Results</strong>: Details of each prediction made by our ML models, including input features, predicted outputs, and confidence scores.</li><li><strong>Performance Metrics</strong>: Metrics such as latency, throughput, and error rates at the time of each prediction.</li><li><strong>Anomalies and Exceptions</strong>: Any detected anomalies, exceptions, or errors encountered during model execution.</li><li><strong>User Feedback</strong>: Feedback from users, such as corrections or satisfaction ratings, which can be correlated with model performance.</li></ul><h4>Example of a Wide Event</h4><pre>{<br>  &quot;timestamp&quot;: &quot;2023-10-01T12:34:56Z&quot;,<br>  &quot;model_name&quot;: &quot;intent_classifier_v2&quot;,<br>  &quot;model_version&quot;: &quot;2.3.1&quot;,<br>  &quot;input_text&quot;: &quot;How do I reset my password?&quot;,<br>  &quot;predicted_intent&quot;: &quot;password_reset&quot;,<br>  &quot;confidence_score&quot;: 0.92,<br>  &quot;actual_intent&quot;: null,<br>  &quot;user_feedback&quot;: null,<br>  &quot;latency_ms&quot;: 45,<br>  &quot;request_id&quot;: &quot;abc123xyz&quot;,<br>  &quot;user_id&quot;: &quot;user_789&quot;,<br>  &quot;cpu_usage&quot;: 55.3,<br>  &quot;memory_usage&quot;: 1024,<br>  &quot;host&quot;: &quot;ml-service-1&quot;<br>}</pre><p>By capturing all relevant information in a single event, we enable comprehensive analysis and debugging capabilities.</p><h4>Benefits of Our Approach</h4><ul><li><strong>Scalability</strong>: Kafka and S3 ensure the pipeline handles large data volumes without bottlenecks.</li><li><strong>Flexibility</strong>: Wide events allow for dynamic querying and analysis without the need for predefined metrics.</li><li><strong>Cost-Effectiveness</strong>: Utilizing open-source tools and existing infrastructure keeps costs low.</li><li><strong>Improved Debugging</strong>: Detailed logs enable us to quickly identify and resolve issues, maintaining model reliability.</li><li><strong>Enhanced Insights</strong>: The ability to slice and dice data in Athena allows us to uncover patterns and insights that inform model improvements.</li></ul><h3>Overcoming Challenges</h3><h4>Versioning and Schema Evolution</h4><p>One of the challenges we faced was managing versioning and schema changes in our logs. As our models and logging requirements evolved, we needed a strategy to handle changes without disrupting analysis.</p><p><strong>Solution</strong>:</p><ul><li><strong>Schema Versioning</strong>: We include a schema_version field in each wide event, allowing us to track changes and apply appropriate parsing logic.</li><li><strong>Flexible Schemas</strong>: By using formats like JSON and tools that support schema-on-read (like Athena), we can handle missing or additional fields gracefully.</li></ul><h4>Data Quality and Consistency</h4><p>Ensuring the quality and consistency of the logged data is crucial for accurate analysis.</p><p><strong>Solution</strong>:</p><ul><li><strong>Validation</strong>: Implement validation checks in our application services to ensure that all required fields are present and correctly formatted.</li><li><strong>Monitoring</strong>: Set up alerts for anomalies in the logging pipeline, such as sudden drops in event counts or unexpected field values.</li></ul><h3>How It Benefits Helpshift</h3><p>Our custom observability solution has brought significant benefits to Helpshift:</p><ul><li><strong>Derived Metrics</strong>: We can derive meaningful metrics from logged events, such as model accuracy over time, latency distributions, and error rates.</li><li><strong>Enhanced Debugging</strong>: Detailed, contextual logs enable rapid identification and resolution of issues, minimizing downtime and impact on users.</li><li><strong>Continuous Improvement</strong>: Incorporating user feedback into our logs allows us to refine our models continuously, enhancing user satisfaction and engagement.</li><li><strong>Proactive Monitoring</strong>: Real-time analysis capabilities enable us to detect anomalies and address them before they escalate into larger problems.</li></ul><h3>Conclusion</h3><p>At Helpshift, our journey to building a robust ML observability solution was driven by the need for simplicity, control, and cost-effectiveness. By focusing on logging wide events, we created a scalable and flexible pipeline that meets our needs and helps us maintain the reliability and performance of our ML models.</p><p>Our approach demonstrates that with thoughtful design and leveraging existing open-source tools, small to mid-sized companies can implement effective observability solutions tailored to their specific needs.</p><h3>References</h3><ul><li><a href="https://isburmistrov.substack.com/p/all-you-need-is-wide-events-not-metrics">“All You Need is Wide Events, Not ‘Metrics, Logs and Traces</a>’” by Charity Majors</li><li><a href="https://jxnl.github.io/blog/writing/2024/02/28/levels-of-complexity-rag-applications/#level-3-observability">“Levels of Complexity in RAG Applications”</a></li></ul><h3>Join the Conversation</h3><p>We’d love to hear about your experiences with ML observability. Have you implemented a similar solution or faced challenges in monitoring your ML models? Share your thoughts and let’s learn together.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=9680e27d1d01" width="1" height="1" alt=""><hr><p><a href="https://medium.com/helpshift-engineering/from-data-to-insight-helpshifts-journey-with-ml-observability-9680e27d1d01">From Data to Insight: Helpshift’s Journey with ML Observability</a> was originally published in <a href="https://medium.com/helpshift-engineering">helpshift-engineering</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[The Hidden Layer of Analytics: How QA Builds Trust in Data]]></title>
            <link>https://medium.com/helpshift-engineering/the-hidden-layer-of-analytics-how-qa-builds-trust-in-data-f2dcca8adf56?source=rss----3229f31ca4f4---4</link>
            <guid isPermaLink="false">https://medium.com/p/f2dcca8adf56</guid>
            <category><![CDATA[data-validation-testing]]></category>
            <category><![CDATA[kafka-consumer]]></category>
            <category><![CDATA[analytics]]></category>
            <dc:creator><![CDATA[Gayatri Panganti]]></dc:creator>
            <pubDate>Wed, 26 Nov 2025 13:55:02 GMT</pubDate>
            <atom:updated>2025-11-26T13:55:01.324Z</atom:updated>
            <content:encoded><![CDATA[<blockquote><strong>Every accurate metric is backed by countless validations, events checks and integrity tests in the background.</strong></blockquote><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*hKEeXfA19IyFXjEUEnXglw.png" /></figure><h3>Introduction</h3><p>Quality Assurance in the data-driven systems extends beyond UI validation and backend verification. Such systems rely heavily on data precision and accuracy.</p><p>A recent QA focused on validating a productivity analytics framework, ensuring that every event, metric and data flow accurately represented real-world user behaviour. The process was primarily <strong>manual</strong>, involving live simulations, event validation and detailed metric verification across environment which emphasised logic and data accuracy over automation.</p><h3>Simulating Real-World Scenarios</h3><p>Extensive simulations were conducted for multiple user roles as <strong>Agent</strong>, <strong>Supervisor</strong>, <strong>Admin</strong> and <strong>Super admin</strong>. Each governed by specific dashboard permissions and access rules.</p><p>Actions tested included :</p><ul><li>Logging in and logging out from the dashboard</li><li>Marking presence states such as <strong>Available</strong>, <strong>Online,</strong> Or <strong>Away</strong></li><li>Switching between workzones</li><li>Navigating through dashboards</li></ul><p>These test conditions generated diverse <strong>event</strong> streams used to verify how accurately the system captured and processed state transitions.</p><h3>Parallel Event Validation Through Kafka</h3><p>Real-time validation was a key aspect of this QA process. Event streams were observed directly through kafka consoles, enabling verification of generated events and their <strong>payloads, data structures, and JSON fields </strong>as actions were executed.</p><p>Each dashboard or SDK action was simulated while the corresponding kafka stream was monitored in parallel, confirming that events triggered correctly and carried accurate data information. Testing covered <strong>sandbox, staging, and production environments, </strong>ensuring reliability and consistency across all setups.</p><h3>Data Verification with Metabase and Calculations</h3><p>After events were processed by the analytics pipeline, metabase queries were used to validate computed metrics against expected outcomes from simulations.</p><p>To cross-check results, timestamps (ex. UTC vs IST) were analysed and durations recalculated according to the defined formulas. This combinations of <strong>query-based data verification with calculated data</strong> confirmed the accuracy of each metric and its alignment with underlying event data.</p><h3>Testing Across Workzones</h3><p>Validation extended to multiple workzones to ensure consistent data aggregation and metric computation globally. The process confirmed that analytics logic produced uniform results regardless of context and configuration.</p><h3>Edge-Case and Event-Order Testing</h3><p>The QA process also included detailed validation of edge conditions such as:</p><ul><li>Network interruptions during event capture</li><li>Out-of-order or missing event sequences</li><li>Duplicate triggers or delayed events</li></ul><p>Since event order and timing directly affect metric accuracy, verifying event sequence integrity was essential to maintaining trustworthy analytics output.</p><h3>Understanding and Validating Metric Types</h3><p>Three core catagories of metrics were validated through this effort :</p><ol><li><strong>Simple Counter Metrics</strong> : Direct counts of user or system actions.</li><li><strong>Complex Metrics</strong> : Aggregated or state-based computations across multiple events or entities.</li><li><strong>Timer Metrics </strong>: Duration based calculations between event pairs, used to measure productivity or performance trends.</li></ol><p>Each category demanded tailored validation to confirm that event logic, aggregation and computation aligned with business definitions and customer expectations.</p><h3>Why Accuracy Matters?</h3><p>Accurate data builds confidence. Every decision informed by analytics relies on the assumption that metrics reflect reality. When analytics are powered by real-time event tracking and computed metrics, even minor errors in the event capture, time-stamping or sequence logic can lead to misleading conclusions.</p><p>Through structured simulations, real-time kafka validation and detailed data verification, this QA effort ensured that the analytics system produced metrics users could trust.</p><h3>Key Takeaways</h3><ul><li>QA for analytics systems requires <strong>end-to-end validation </strong>from event capture to final metric output.</li><li><strong>Kafka-based event monitoring </strong>provides clear visibility into live data accuracy.</li><li><strong>Role-based and multi-environment </strong>testing ensures consistency across contexts.</li><li><strong>Timestamp precision and event sequencing </strong>are critical for data reliability.</li><li>Thorough QA directly strengthens <strong>business trust and decision quality.</strong></li></ul><h3>Challenges and Limitations in Analytics QA</h3><p>Validating analytics isn’t always straightforward. Testing event-based systems involves several obstacles that demand patience and precision.</p><p><strong>Challenges:</strong></p><ul><li><strong>Data Dependency &amp; Delays : </strong>QA must account for delays as events take time to reflect in the system..</li><li><strong>Environment Differences :</strong> Behaviour can vary across staging, sandbox, and production environments due to distinct configurations or filters.</li><li><strong>Test Data Management :</strong> Managing and maintaining realistic test data across multiple workzones and Time Zones<strong> </strong>can be tedious and error-prone.</li><li><strong>Manual Validation Load :</strong> With manual validations, coverage and consistency rely heavily on detailed documentation and repetition.</li></ul><p><strong>Limitations:</strong></p><ul><li><strong>Gaps in simulations :</strong> Some metrics depend on large-scale or real-time user behaviour, which can’t always be perfectly simulated in test environments.</li><li><strong>Human Factors :</strong> Manual interpretation introduces the risk of human error, especially when analysing large event payloads or long sessions.</li><li><strong>Dynamic Systems :</strong> Frequent updates in data models or computation logic can impact consistency, calling for continuous revalidation.</li></ul><h3>Closing Thoughts</h3><blockquote><strong>Validating analytics pipelines is part science, part investigation. It demands attention to data logic, event behaviour, timing and the patience to trace results back to their source.</strong></blockquote><p>Through complex and time-intensive, such QA ensures that what businesses see on their dashboards truly represents what happens in reality. In the end, that accuracy is what turns analytics from just numbers into trust.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=f2dcca8adf56" width="1" height="1" alt=""><hr><p><a href="https://medium.com/helpshift-engineering/the-hidden-layer-of-analytics-how-qa-builds-trust-in-data-f2dcca8adf56">The Hidden Layer of Analytics: How QA Builds Trust in Data</a> was originally published in <a href="https://medium.com/helpshift-engineering">helpshift-engineering</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Why Mentorship Matters? Beyond Tasks and Deadlines]]></title>
            <link>https://medium.com/helpshift-engineering/why-mentorship-matters-beyond-tasks-and-deadlines-61994c2e3d3b?source=rss----3229f31ca4f4---4</link>
            <guid isPermaLink="false">https://medium.com/p/61994c2e3d3b</guid>
            <category><![CDATA[organizational-culture]]></category>
            <category><![CDATA[mentorship]]></category>
            <category><![CDATA[growth-mindset]]></category>
            <dc:creator><![CDATA[Gayatri Panganti]]></dc:creator>
            <pubDate>Wed, 26 Nov 2025 10:36:42 GMT</pubDate>
            <atom:updated>2025-11-26T10:36:41.613Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/822/1*6SKT5BxnKUjSlDloCTeAHA.jpeg" /></figure><p>In every growing organisation, mentorship quietly powers progress. It’s not just about reviewing work or assigning tasks but it’s about helping people discover their potential, learn faster, and build confidence in their expertise.</p><p>Strong mentorship programs help create a culture where learning is continuous, team alignment improves , and quality becomes a shared mindset rather than an individual effort.</p><h3>🎯 Why Mentorship Is Important?</h3><h4>1. Skill Growth Happens Faster</h4><p>Learning becomes faster and more meaningful with the right guidance. Mentorship helps bridge gaps that could otherwise take years to overcome whether it’s technical depth, communication, or accountability.</p><h4>2. Builds Confidence and Accountability</h4><p>When someone knows there’s support and guidance available, they are more likely to take ownership, ask questions, and explore solutions without fear of failure. Mentorship creates a safe environment for experimentation and growth.</p><h4>3. Aligns Individual and Team Goals</h4><p>Most of the times individual OKRs (Objectives and Key Results) often feel abstract until someone helps translate them into personal growth plans. Effective mentorship bridges that gap ensuring skill development aligns with both personal and organizational success metrics.</p><h4>4. Encourages a Culture of Feedback</h4><p>Mentorship promotes open communication. Each session becomes an opportunity to exchange constructive feedback and insights. Over time, this fosters a culture of transparency and continuous improvement across the team.</p><h3>🧭 What Effective Mentorship Looks Like</h3><p>A good mentorship rhythm involves consistent check-ins, clear goals, and space for reflection.</p><ul><li><strong>Regular discussions/sync-ups</strong> on personal OKRs help mentees focus on targeted skill improvement.</li><li><strong>Weekly or biweekly meetings </strong>provide structure for reviewing progress and addressing blockers.</li><li>During high-priority feature or project work, <strong>flexibility is key.</strong> Short async updates or quick syncs can keep the mentorship active without adding pressure.</li></ul><p>The intent is to ensure growth remains continuous, even when schedules become demanding.</p><h3>⚖️ Balancing Mentorship Goals with Organizational Priorities</h3><p>In mentorship, there are times when a proposed learning topic or framework might not align with organisational priorities. For example, when a new tool poses a potential security risk or when leadership decides the focus area isn’t strategically valuable at that time.</p><p>Such situations highlight the importance of adaptability. Effective mentorship is not only about proposing ambitious learning ideas but also helping mentees navigate changes in direction without losing focus and motivation. Finding alternative topics that balance personal development with business direction ensures that learning remains meaningful and sustainable.</p><p>When growth goals are aligned with the organisation’s vision, mentorship efforts translate into outcomes that strengthen both the individual and the team.</p><h3>💡 Lessons from Successful Mentorship Models</h3><ul><li>Mentorship is not always about giving answers . It’s about enabling the right questions.</li><li>Progress often comes from small, consistent guidance rather than occasional large sessions.</li><li>Mentorship benefits both sides such as, mentors refine their leadership, empathy, and clarity, while mentees gain direction and confidence.</li><li>Adapting to feedback and changing circumstances keeps the mentorship relevant and effective.</li></ul><h3>🌱 The Other Side of Mentorship</h3><ul><li>Mentorship is never one-sided. Mentors grow too, gaining new perspectives and refining their own leadership approach.</li><li>Every mentorship exchange offers the mentor fresh insights, patience, and renewed curiosity.</li></ul><h3>🌟 Conclusion</h3><ul><li>Mentorship is not an additional task; it’s an investment in people, culture, and collective success.</li><li>Every conversation, no matter how brief, contributes to a stronger, more self-driven team.</li><li>When mentorship becomes part of the work culture, growth becomes inevitable for individuals, for teams, and for the organisation as a whole.</li></ul><p>Mentorship promotes open communication. Each session becomes an opportunity to exchange constructive feedback and insights. Over time, this fosters a culture of transparency and continuous improvement across the team.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=61994c2e3d3b" width="1" height="1" alt=""><hr><p><a href="https://medium.com/helpshift-engineering/why-mentorship-matters-beyond-tasks-and-deadlines-61994c2e3d3b">Why Mentorship Matters? Beyond Tasks and Deadlines</a> was originally published in <a href="https://medium.com/helpshift-engineering">helpshift-engineering</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How Data Powers Agent Productivity]]></title>
            <link>https://medium.com/helpshift-engineering/how-data-powers-agent-productivity-f310f414872d?source=rss----3229f31ca4f4---4</link>
            <guid isPermaLink="false">https://medium.com/p/f310f414872d</guid>
            <category><![CDATA[apache-spark]]></category>
            <category><![CDATA[analytics]]></category>
            <category><![CDATA[big-data]]></category>
            <category><![CDATA[data-analysis]]></category>
            <dc:creator><![CDATA[Poorva Patil]]></dc:creator>
            <pubDate>Mon, 06 Oct 2025 04:22:31 GMT</pubDate>
            <atom:updated>2025-10-08T06:11:27.019Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*kcU_kcYLnpMD33l-8DxYLg.jpeg" /></figure><p>As a data engineer, I used to see metrics as just numbers on a dashboard — until I realized they’re the lens through which customers view and run their operations. In customer support, for example, agent productivity metrics aren’t just figures, they’re actionable insights that drive efficiency, shape staffing decisions, and directly impact customer satisfaction.</p><p>These aren’t just charts — they help customers understand the value we provide, how well things are working, and what decisions to make next. Realizing this changed how I think about building analytics.</p><h3>➡️💡The Question That Shifted Our Perspective</h3><p>In customer support, how well the team works really matters. It affects how much the company spends, how happy the customers are, and how the team feels about their work.</p><p>Support managers often ask:</p><ul><li>Are we staffed correctly for the volume we’re handling?</li><li>Are agents spending their time productively?</li><li>How can we optimize scheduling and performance?</li></ul><p>When we began our Agent Workforce Management project, we already had a few standard metrics in place like <em>online time</em>, <em>login time</em>, and <em>available time</em>. These told us when agents were present — but not what they were actually doing.</p><p>Customers weren’t asking <em>“Are our agents online?”</em></p><p>They were asking <em>“How productive are our agents?”</em></p><p>And truthfully, we didn’t have a good answer.</p><p>There was no visibility into how much time was being spent on real work versus idle time. No way to differentiate between being “available” and being “productive”. This made it hard for teams to identify gaps, support high performers, or spot patterns that needed attention.</p><p>This project was all about answering that question in the right way.</p><h3>🔍📊 Custom Metrics We Built</h3><p>We designed a set of new metrics that give a clearer picture of how agents spend their time. These metrics give us a deeper understanding of how time is actually being spent, helping us move beyond assumptions and focus on what really drives productivity.</p><h4>Engagement Metrics</h4><p>These show how much time agents actively engage with different parts of the dashboard, giving insight into focused work and task-level activity.</p><ul><li><strong>Dashboard Interaction Time — </strong>Time spent by the agent actively using the main dashboard, with the browser tab in focus. It helps indicate general engagement during working hours.</li><li><strong>Issues Page Interaction Time — </strong>Measures how much time agents spend in the “Issues” section of the dashboard. This shows how long they are reviewing or managing tickets.</li><li><strong>Issues Interaction Time — </strong>Covers the time an agent has a specific ticket open and is actively working on it. It’s a direct view of task-level engagement.</li></ul><h4>Idle/Availability State Metrics</h4><p>These track time when agent is technically available but not handling any work.</p><ul><li><strong>Unallocated Available Time — </strong>Tracks when an agent is marked as ‘Available’ but has no pending tickets to reply to. It highlights unused agent capacity.</li><li><strong>Unallocated Unavailable Time — </strong>Represents periods when the agent is not marked as ‘Available’ and also has no active tickets. Often includes breaks or idle time.</li></ul><h4><strong>Anomaly &amp; Attention Metrics</strong></h4><p>Even when agents are online or marked as available, it doesn’t always mean they’re engaged or productive. Sometimes they get distracted, stuck, or step away without logging out properly. Anomaly metrics surface these blind spots.</p><ul><li><strong>Untracked Time — </strong>Time during which the agent is online, but the dashboard browser tab is not in focus. It may point to distractions or periods of inactivity.</li></ul><p>Each of these metrics plays a different role, but together they give a fuller, more accurate view of agent productivity.</p><h3>🛠️🕵️‍♂️ Behind the Scenes</h3><p>We knew what we wanted to measure — time spent actively engaging with dashboards, tickets, and actual tasks. But the problem? <strong>We had no way to capture the signals behind those actions.</strong> <strong>We couldn’t tell if their browser tab was in focus, if they opened a ticket, or even if they were looking at the right page.</strong></p><p>At first, it seemed easy 🤷‍♀️ just track what agents are doing and for how long. But as we got into the details, we realized it was more complicated than it looked.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*HAdLUuqnxdXMpYIhu5sSQg.jpeg" /></figure><p>It took us few 🧠 brainstorming sessions to land on a set of <strong>custom client-side events</strong> that gave us the visibility we needed.</p><h3>🎯 The Events That Made It Possible</h3><p>To track meaningful activity, we created and instrumented <strong>focus events</strong> — signals that fire based on what the agent is doing or which tab/page is in view.</p><p>For example, we started tracking events that captured whether an agent was on the main dashboard or had navigated away, whether they were viewing the Issues page or another section, and when they were focused on a specific ticket.</p><p>To detect whether agents were really present and engaged, we initially used the browser’s visibilitychange event. We set up the visibility tracking by adding an event listener as soon as the HsPage component loaded.</p><ul><li>When the dashboard tab became visible, we logged an Agent Helpshift (HS) In Focus event. It tells us whether the agent actually has the Helpshift dashboard in focus, rather than just having it open in the background.</li><li>When it became hidden, we logged an Agent HS Out Focus event.</li></ul><p><strong>But this came with some challenges-</strong></p><p>While visibilitychange worked well in many scenarios (like switching tabs, minimizing the window, or full-screening another app), it didn’t trigger when-</p><ul><li>The tab was covered by an overlay screen using CMD + Tab</li><li>The user used gestures like three-finger swipe to switch between open applications</li><li>The app was searched via Spotlight (macOS)</li></ul><p>In these cases, the HS tab remained “In Focus”, technically it was still visible to the browser. As a result, the “Out of Focus” event wasn’t triggered, and we couldn’t reliably track focus loss.</p><p>To make tracking more accurate, we moved to using focus and blur event listeners. These events are fired more consistently when:</p><ul><li>A user switches away from or returns to the browser window</li><li>The browser itself loses or gains focus — regardless of how the switch happens</li></ul><pre>componentDidMount: function () {<br>    window.addEventListener(&quot;focus&quot;, handleFocus);<br>    window.addEventListener(&quot;blur&quot;, handleBlur);<br>  }</pre><p>But in case of session expiry, manual logouts or when dashboard is manually closed or refreshed, the blur event wasn’t getting triggered. So we used beforeunload event listener to capture “Out of Focus” event.</p><pre>componentDidMount: function () {<br>    window.addEventListener(&quot;focus&quot;, handleFocus);<br>    window.addEventListener(&quot;blur&quot;, handleBlur);<br>    window.addEventListener(&quot;beforeunload&quot;, handleBeforeUnload);<br>  }</pre><p>The beforeunload event listener will get triggered when the current window, contained documents are about to be unloaded:</p><pre>componentWillUnmount: function () {<br>    window.removeEventListener(&quot;focus&quot;, handleFocus);<br>    window.removeEventListener(&quot;blur&quot;, handleBlur);<br>    window.removeEventListener(&quot;beforeunload&quot;, handleBeforeUnload);<br>  }</pre><p>These events gave us <strong>start and end timestamps</strong> for focused interactions — a key ingredient in our metric calculations.</p><h3>🧑‍💻📈Agents Pipeline Overview</h3><p>Before diving deeper, let’s first understand how the foundation of our data platform is structured</p><h4><strong>How our Data Platform works</strong></h4><p>Our data platform is built with multiple layers on AWS S3, each serving a specific purpose.</p><p>We capture all events coming in from Kafka and store in the Silver Layer on S3<strong> </strong>as Parquet files. Data from the Silver layer is cleaned, transformed, and aggregated into half-hour intervals<strong> </strong>and<strong> </strong>stored in Hudi format in the Gold Layer,<strong> </strong>still as Parquet files.</p><p>The Gold data feeds into Redshift, making it easier to run fast queries and integrate with BI tools. This layer supports both internal analytics (like Metabase reports) and customer-facing use cases.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*mi6yPu1YNasPPu1ZOwnStQ.png" /></figure><p>👉 In short, <strong>Kafka → Silver → Gold → Redshift → Analytics</strong> is the pipeline that takes raw events and turns them into insights for both our teams and our customers.</p><h4><strong>Turning Events into Metrics</strong></h4><p>With the pipelines setup and events in place, next task was to implement these agents productivity metrics in Agents Fact table.</p><p>But defining the metrics wasn’t just about dropping numbers into a table, it was about deciding how exactly we calculate them. Some were straightforward, based on simple “in” and “out” events, while others required more context and logic.</p><p>Calculating something like <strong>Dashboard Interaction Time</strong> sounds simple at first:</p><ul><li><em>In event</em> (agent_hs_in_focus) marks the start</li><li><em>Out event</em> (agent_hs_out_focus) marks the end</li><li>The difference gives you the time spent</li></ul><p>Easy, right? Well, not always.</p><p>Because we aggregate metrics in <strong>half-hour windows</strong>, things get tricky when the <em>in</em> and <em>out</em> events don’t fall in the same window.</p><h4>Example</h4><p>Imagine an agent comes <strong>in focus</strong> at <strong>00:20</strong> and goes <strong>out of focus</strong> at <strong>01:12</strong>. Instead of just one number, we need to break this time into chunks, one for each 30-minute window:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/580/1*R9cTAwNd8QIIVZuhVOJBvQ.png" /></figure><ul><li><strong>00:00–00:30 → 10 minutes</strong></li><li><strong>00:30–01:00 → 30 minutes</strong></li><li><strong>01:00–01:30 → 12 minutes</strong></li></ul><p><strong>Total dashboard interaction time → 52 minutes</strong></p><h4>The State Table</h4><p>To handle this, we keep track of the agent’s <strong>last known state</strong> in a <em>state table</em>. That way, when the pipeline runs for the next window, it knows whether the agent was already in focus and can continue the calculation until the corresponding out event arrives.</p><pre>agents_state table<br><br>| agent_id   | version_ts     | is_currently_hs_in_focus |<br>|------------|----------------|--------------------------|<br>| profile_1  | 1757500200000  | 1                        |</pre><p>In the state table, we store previous half an hour state as boolean value. This ensures metrics are <strong>accurate across windows</strong>, even if an agent stays focused for hours without switching tabs.</p><h4><strong>Unallocated Time metrics</strong></h4><p>Unallocated time, however, was a whole different story. We had to figure out: <em>Were they available but idle? Were there no issues to pick up? Did they go unavailable without actually working on anything?</em></p><h4>Step 1: Understanding the waiting_for_agent State</h4><p>We relied heavily on a state called <strong>waiting_for_agent count</strong> — a counter representing how many tickets are actionable for a specific agent.</p><p>Tickets can land in this state due to several triggers:</p><ul><li>A <strong>new ticket is assigned</strong> to the agent</li><li>A <strong>customer replies</strong> to an ongoing conversation</li><li>The <strong>status changes</strong>, making the issue actionable for the agent</li></ul><p>This count is crucial:<br> — If it’s <strong>greater than 0</strong>, the agent has work.<br> — If it’s <strong>0</strong>, the agent is idle and “unallocated”.</p><h4>Step 2: Starting With a Clean State Daily</h4><p>Since ticket states fluctuate constantly throughout the day, we need a <strong>stable starting point</strong>.</p><p>Every day at <strong>00:00 hours</strong>, we take a <strong>snapshot of </strong><strong>waiting_for_agent</strong> count for each agent from <strong>Elasticsearch (ES)</strong>. This acts as our baseline.</p><p>This helps avoid compounding errors due to missed or delayed events.</p><h4>Step 3: Calculating the Metric via Spark UDF</h4><p>To compute <strong>Unallocated Time</strong>, we combined:</p><ul><li>The <strong>daily snapshot from ES</strong></li><li>Real-time <strong>state transition</strong> events (like ticket assignments or replies)</li><li>Agent <strong>availability</strong> status</li></ul><p>We built a <strong>Spark UDF (User Defined Function)</strong> for this. Why a UDF?</p><p><em>Because we needed to apply a custom logic across millions of rows of event data, which native Spark functions couldn’t handle flexibly.</em></p><h4>Example: A Day in the Life of Agent</h4><p>Let’s visualize how this logic plays out for an agent named <strong>Alice</strong></p><p><strong>Timeline</strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*pCP5RG9D1IpHkG72yQxw_w.png" /></figure><p>From the above, we can calculate:</p><ul><li>First unallocated window: 09:00–09:15 = 15 mins</li><li>Second unallocated window: 09:45–10:00 = 15 mins</li><li><strong>Total unallocated time for Alice:</strong> <strong>30 minutes</strong></li></ul><h4>How We Manage State Across Runs</h4><p>State management is critical to avoid errors during reruns or failures.</p><pre>agents_state table<br><br>| agent_id   | version_ts     | waiting_for_agent_count  |<br>|------------|----------------|--------------------------|<br>| profile_1  | 1757500200000  | 0                        |</pre><ul><li>We <strong>persist the last known state</strong> (i.e was the timer running or not) at the end of each run.</li><li>During the <strong>next run</strong>, we pull the ES snapshot for the first day of the run or fetch the last saved state from the previous run</li><li>Continue from where we left off</li></ul><p>It helps maintain consistency, especially in distributed systems where task failures can happen.</p><h3>🔖Final Thoughts</h3><p>Yes, there were plenty of technical hurdles — missing events, tricky edge cases and complex implementation logic. But tackling those challenges helped us build something truly meaningful.</p><p>Ultimately, it’s all about transforming raw data into real insights — that’s where the real impact begins.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=f310f414872d" width="1" height="1" alt=""><hr><p><a href="https://medium.com/helpshift-engineering/how-data-powers-agent-productivity-f310f414872d">How Data Powers Agent Productivity</a> was originally published in <a href="https://medium.com/helpshift-engineering">helpshift-engineering</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Using Clojure channels to increase throughput]]></title>
            <link>https://medium.com/helpshift-engineering/using-clojure-channels-to-increase-throughput-c051cc7f9893?source=rss----3229f31ca4f4---4</link>
            <guid isPermaLink="false">https://medium.com/p/c051cc7f9893</guid>
            <category><![CDATA[multithreading]]></category>
            <category><![CDATA[kafka]]></category>
            <category><![CDATA[channel]]></category>
            <category><![CDATA[clojure]]></category>
            <category><![CDATA[core-async]]></category>
            <dc:creator><![CDATA[Abhinav Dubey]]></dc:creator>
            <pubDate>Wed, 28 May 2025 10:07:12 GMT</pubDate>
            <atom:updated>2025-05-28T10:07:12.660Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*eOiA-Mcp69QOP2CfvfIZsw.png" /></figure><p>When building systems that process large volumes of messages synchronously, performance bottlenecks can quickly become a challenge specially with single-threaded designs. In this post, we’ll look at how leveraging worker threads in a Clojure-based Kafka consumer can significantly boost throughput &amp; reduce total processing time. Using simple concurrency primitives, it’s possible to achieve parallelism &amp; scale gracefully, all while keeping the codebase clean &amp; maintainable. We’ll start with a baseline, introduce worker threads using Clojure’s core.async &amp; measure the impact.</p><h3><strong>Setup &amp; Context</strong></h3><ul><li>Kafka &amp; Zookeeper</li><li>For observability: Grafana</li><li>Kafka producer: A simple script that sends messages to a Kafka topic at a configurable rate (messages per minute) for a fixed duration. After each event is pushed, a counter metric is emitted</li><li>Kafka consumer: A simple script that listens to a topic &amp; consumes messages &amp; simulates processing time finding square-root of a number (henceforth, assume that it takes <em>~1 second</em> to find the square root) . A counter metric is emitted after processing each message</li></ul><h3>The Baseline: Single-Threaded Consumer</h3><p>If each message takes <em>t</em> seconds to process &amp; there are <em>n</em> messages, total processing time becomes <em>n × t </em>seconds. This provides a clean baseline to evaluate the impact of using channel moving forward.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*omxZjzU3FQn3vGUsUBuPmA.png" /></figure><h3>Adding workers with core.async</h3><blockquote>Values are conveyed on queue-like channels. By default channels require producer and consumer to rendezvous for the transfer of a value through the channel <br><a href="https://clojuredocs.org/clojure.core.async">https://clojuredocs.org/clojure.core.async</a></blockquote><ul><li>To improve throughput, we introduce parallelism using Clojure’s <em>core.async</em> channels. Messages from Kafka are fed into a channel, &amp; multiple worker threads read from this channel to process messages concurrently</li><li>Here, we used &gt;!! (<em>blocking</em> put) &amp; &lt;!! (<em>blocking</em> take) to communicate via channels &amp; <em>future </em>to execute the business-logic on a separate thread</li></ul><p>Who gets blocked &amp; when :</p><ul><li>The thread putting message into the channel will get blocked when there is no space in the buffer</li><li>The thread consuming message from the channel will get blocked when there is no message in the buffer</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*90oKwYjUj2THUO4Uv3BECQ.png" /></figure><h3>Benchmarking</h3><p>The producer script runs for 5 minutes, producing 200 messages per minute into a topic</p><ul><li>Single-threaded consumer: Time taken to process 1k events <strong>16min 30sec</strong></li><li>With 3 workers using core.async : Time taken to process 1k events <strong>5min 42sec (<em>65% faster</em> </strong>than<strong> </strong>Single-threaded)</li><li>With 10 workers using core.async: Time taken to process 1k events <strong>5min (70<em>% faster</em> </strong>than<strong> </strong>Single-threaded &amp; <strong><em>12% faster</em></strong> than 3 worker-setup)</li></ul><blockquote>Please note that we are using just one machine to benchmark the single &amp; multi threaded consumer setup, this might not be ideal in production but is suffice to gauge the impact of adding worker threads on throughput &amp; processing-time.</blockquote><figure><img alt="" src="https://cdn-images-1.medium.com/max/980/1*_a-I7t6zrhex_IkZr9aFvw.png" /><figcaption>Kafka consumer : after adding channels</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*KuqY0kdgaN30kqOWskuMlw.png" /><figcaption>The green metric is emitted by kafka-producer. The yellow metric is emitted by the kafka consumer. In the leftmost, we have the single-threaded setup, followed by 3-worker setup &amp; lastly 10-worker setup</figcaption></figure><p>Consider another scenario, where the producer script runs for 5 minutes again, but this time it produces 500 messages per minute into the topic</p><ul><li><strong>Single-threaded consumer:</strong> Time taken to process 2.5k events 40<strong>min 28sec</strong></li><li><strong>With 3 workers threads:</strong> Time taken to process 2.5k events <strong>13min 39sec (<em>66% faster</em> </strong>than<strong> </strong>Single-threaded)</li><li><strong>With 10 workers threads:</strong> Time taken to process 2.5k events <strong>5min 6sec (87<em>% faster</em> </strong>than<strong> </strong>Single-threaded &amp; <strong><em>62% faster</em></strong> than 3 worker-setup)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*eXZvSSkdtFZR0lCLDtSsug.png" /></figure><h3>Comparing load averages</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*PxbON1YFURmRN98YC3Ea_Q.png" /><figcaption>Comparing 1 minute &amp; 5 minute load-average of single-threaded &amp; 10-worker thread consumer</figcaption></figure><ul><li>One of the primary trade-off between single-threaded &amp; multi-threaded consumers lies in balancing system load with processing throughput.</li><li>While multi-threaded consumers provide higher parallelism &amp; faster event handling, they also lead to increased CPU usage &amp; a noticeably higher system load average. On the other hand, single-threaded consumers maintain a lower &amp; more predictable system load but process events at a slower rate.</li><li>During the benchmarking, it was observed that transitioning to a multi-threaded model caused the load average to rise significantly. This increase — measured using the <strong><em>uptime</em></strong> command — highlights the computational overhead introduced by concurrency &amp; highlights the need to tune thread count &amp; workload based on the system’s capacity</li></ul><h3>Analysing the logs &amp; threads</h3><p>Consider the below function gets invokes by the kafka consumer for processing each message</p><pre>(defn process-event<br>  [event]<br>  (let [start-time (System/currentTimeMillis)]<br><br>    ;; simulate work : find square-root<br>    (dotimes [_ (* 10000 10000)]<br>      (Math/sqrt (* 1000 1000)))<br><br>    ;; logging <br>    (let [end-time (System/currentTimeMillis)<br>          elapsed-seconds (/ (- end-time start-time) 1000.0)]<br>      (println {:id (:id event)<br>                :thread (.getName (Thread/currentThread))<br>                :duration elapsed-seconds}))<br><br>    ;; increment metric<br>    (clj-statsd/increment &quot;event-processing.done&quot;)))</pre><p>Below are the logs when run with :</p><p><strong>Single-threaded consumer</strong></p><pre>[INFO] {:id 1 :thread clojure-agent-send-off-pool-6 :duration 1.001}<br>[INFO] {:id 2 :thread clojure-agent-send-off-pool-6 :duration 1.000}<br>[INFO] {:id 3 :thread clojure-agent-send-off-pool-6 :duration 1.002}<br>[INFO] {:id 4 :thread clojure-agent-send-off-pool-6 :duration 1.000}<br>[INFO] {:id 5 :thread clojure-agent-send-off-pool-6 :duration 1.001}<br>... and so on</pre><p><strong>Consumer with 3 worker threads</strong></p><pre>[INFO] {:id 1 :thread clojure-agent-send-off-pool-9 :duration 1.001}<br>[INFO] {:id 2 :thread clojure-agent-send-off-pool-7 :duration 1.000}<br>[INFO] {:id 5 :thread clojure-agent-send-off-pool-8 :duration 1.002}<br>[INFO] {:id 3 :thread clojure-agent-send-off-pool-9 :duration 1.000}<br>[INFO] {:id 4 :thread clojure-agent-send-off-pool-7 :duration 1.001}<br>... and so on</pre><p><strong>Consumer with 10 worker threads</strong></p><pre>[INFO] {:id 1  :thread clojure-agent-send-off-pool-10 :duration 1.001}<br>[INFO] {:id 2  :thread clojure-agent-send-off-pool-11 :duration 1.000}<br>[INFO] {:id 4  :thread clojure-agent-send-off-pool-12 :duration 1.002}<br>[INFO] {:id 3  :thread clojure-agent-send-off-pool-13 :duration 1.000}<br>[INFO] {:id 6  :thread clojure-agent-send-off-pool-14 :duration 1.001}<br>[INFO] {:id 5  :thread clojure-agent-send-off-pool-3  :duration 1.001}<br>[INFO] {:id 7  :thread clojure-agent-send-off-pool-4  :duration 1.000}<br>[INFO] {:id 9  :thread clojure-agent-send-off-pool-7  :duration 1.002}<br>[INFO] {:id 8  :thread clojure-agent-send-off-pool-8  :duration 1.000}<br>[INFO] {:id 10 :thread clojure-agent-send-off-pool-9  :duration 1.001}<br>... and so on</pre><p>We can use <em>VisualVM</em> to check the JVM threads, doing this for the setup with 10 workers is shown below :</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*GnG9vJQ84Hf1dpFDWAjTtA.png" /><figcaption>Selected only the threads printed in the above logs</figcaption></figure><h3>Conclusion</h3><ul><li>By carefully orchestrating channels &amp; worker threads, we can increase throughput &amp; reduce the total processing time.</li><li>If you have a similar use-case, Clojure’s <em>core.async</em> might be just what you need to scale without sacrificing simplicity.</li><li>We can also infer, from the above VisualVM thread visualizer, that it is a scenario of parallel (&amp; not concurrent) execution by looking at the overlapping <strong><em>running</em></strong> status of multiple threads.</li></ul><p>You can checkout the implementation details here : <a href="https://github.com/abhinavdubey8989/clj-core-async-poc">https://github.com/abhinavdubey8989/clj-core-async-poc</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=c051cc7f9893" width="1" height="1" alt=""><hr><p><a href="https://medium.com/helpshift-engineering/using-clojure-channels-to-increase-throughput-c051cc7f9893">Using Clojure channels to increase throughput</a> was originally published in <a href="https://medium.com/helpshift-engineering">helpshift-engineering</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Load Testing API’s on Redshift & Snowflake — A  Quick POC]]></title>
            <link>https://medium.com/helpshift-engineering/load-testing-apis-on-redshift-snowflake-a-quick-poc-3cf94104cb98?source=rss----3229f31ca4f4---4</link>
            <guid isPermaLink="false">https://medium.com/p/3cf94104cb98</guid>
            <category><![CDATA[load-testing]]></category>
            <category><![CDATA[data-engineering]]></category>
            <category><![CDATA[snowflake]]></category>
            <category><![CDATA[redshift]]></category>
            <category><![CDATA[performance]]></category>
            <dc:creator><![CDATA[Sameeksha Bhatia]]></dc:creator>
            <pubDate>Fri, 02 May 2025 10:43:54 GMT</pubDate>
            <atom:updated>2025-05-02T10:43:54.236Z</atom:updated>
            <content:encoded><![CDATA[<h3>Load Testing API’s on Redshift &amp; Snowflake — A Quick POC</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*8MK1KyYW-su8GjhmiplHWA.jpeg" /></figure><h3>Overview</h3><p>At Helpshift, our data platform follows a <strong>Lakehouse architecture</strong>, combining the best of both <strong>data lakes and data warehouses</strong>. This architecture allows us to store and analyze large amounts of raw data in a structured and organized manner, while also providing the scalability and low-cost storage of a data lake. It consists of three key components:</p><ul><li><strong>Amazon S3</strong> for storing historical data efficiently.</li><li><strong>Amazon EMR</strong> for running Spark pipelines on ephemeral compute.</li><li><strong>Amazon Redshift</strong> as the data warehouse powering <strong>customer-facing analytics</strong>, which includes <strong>Embedded Dashboard analytics</strong>, <strong>Power BI template apps, and Analytics APIs</strong>.</li></ul><p>We serve the analytics data models to our customers using AWS Redshift through the following customer channels:</p><ol><li>Embedded Analytics dashboard with Luzmo as the BI vendor.</li><li>PowerBI integration using REST API as a datasource. PowerBI apps allow customers to also create customized reports on the data models.</li><li>REST APIs to query/download analytics data models. These APIs allow custom aggregations and filters. (e.g., hourly or daily rollups, ad hoc filtering, etc.)</li></ol><p>We have isolated our read path traffic for the above customer-facing analytics channels in a separate Redshift cluster from the write path traffic. However, in recent months, the team has noticed a few issues during peak hours of read traffic. One of the main challenges that the team faces is performance issues with serving analytics data on the above-mentioned customer channels.</p><h3>Why This Matters</h3><p>With increasing adoption of our analytics interfaces — Power BI integrations and REST APIs — the Redshift cluster is under growing pressure from concurrent query workloads. These workloads are latency-sensitive and often power downstream operational or reporting workflows. Failures, timeouts, or inconsistent response times can disrupt customer pipelines, delay business-critical decisions, and increase retry or error-handling complexity on the client side. Ensuring predictable performance and horizontal scalability under load is critical to meeting SLAs, minimizing operational noise, and maintaining a consistent analytics experience across all customer-facing endpoints.</p><h3>Problem</h3><ol><li><strong>Difficulty in Handling High-Concurrent Traffic</strong><br>Redshift offers Multi-Cluster Concurrency Scaling, but it comes at an additional cost and only partially resolves traffic spikes. During peak loads, we observed:</li></ol><ul><li>API failures and timeouts due to query queuing and resource contention.</li><li>Slow query execution despite concurrency scaling, leading to degraded user experience. Query duration varied from 10 seconds to 3 minutes for similar OLAP queries.</li></ul><h4>2. Vacuum &amp; Sorting Overhead Under High Load</h4><p>During high-concurrency execution, we noticed:</p><ul><li><strong>Table fragmentation and sort order degradation</strong>: As tables are continuously updated with new data, especially under high write throughput, rows become increasingly unsorted. This leads to fragmentation, which degrades scan performance and increases query latencies — particularly for queries relying on sort keys for efficient filtering. High write throughput leads to fragmentation, increasing query execution times.</li><li><strong>Increased query execution time due to on-the-fly sorting</strong>: Instead of reading pre-sorted blocks, Redshift spent a substantial portion of query execution time re-sorting fragmented data at runtime, resulting in higher and inconsistent query durations.</li><li><strong>VACUUM operations failing under load</strong>: Redshift requires periodic VACUUM operations to reclaim disk space and restore sort order. However, under heavy read and write workloads, VACUUM processes are deprioritized or delayed in favor of query execution. Since VACUUM is resource-intensive, it competes for the same I/O and CPU resources as live queries. On a saturated cluster, this leads to VACUUM jobs stalling, timing out, or failing entirely — leaving tables fragmented and causing query performance to degrade over time.</li></ul><p>These challenges prompted us to explore alternative solutions for handling high-concurrency workloads efficiently. To evaluate alternatives, we conducted a <strong>performance benchmarking POC between Redshift and Snowflake</strong> using load testing tools like <strong>Newman and K6</strong>.</p><h3>Tools</h3><h4>Newman and Graphana K6</h4><p>To effectively simulate traffic and test the performance of our queries, we used <strong>Newman</strong> and <strong>k6</strong>, the powerful duo that helped us simulate the load.</p><p><strong>Newman</strong> is a command-line tool built by <strong>Postman</strong> to run collections of API requests directly from the terminal. It’s particularly useful for automating and running tests for APIs, making it a great tool for testing query performance. It helps to execute postman collections via command line and supports parameterization, i.e., you can specify variables like the number of runs, parallel executions, and delay between requests.</p><p><strong>Graphana K6 </strong>is an open source tool that is optimized for minimal resource consumption and designed for running high-load performance tests.</p><p>Let&#39;s quickly walk through the setup steps for these tools.</p><h4>Setup Steps</h4><h4>I. Newman</h4><p>We already had the postman collection of our Analytics API’s. To setup Newman. Ensure npm is installed and upgraded to version 16. Install Neman and check if its working correctly.</p><pre>--&gt; npm install -g newman<br>--&gt; newman run Sunbird\ -\ Analytics\ API.postman_collection.json</pre><p>This executes the API requests from the collection and provides important metrics, such as <strong>total run time, average request time</strong>, and other performance stats.</p><h4>II. k6</h4><p>Next, to simulate more <strong>realistic traffic</strong> using <strong>k6</strong>, which can handle a higher load and more complex testing scenarios, install k6.</p><pre>--&gt; brew install k6<br>--&gt; npm install -g postman-to-k6 ;; a tool to convert Postman collections to k6 scripts<br>--&gt; postman-to-k6 Sunbird\ -\ Analytics\ API.postman_collection.json -o scriptIssue.js</pre><p>The last command generates a scriptIssue.js file that contains the necessary k6 script to perform load testing on the API.</p><h4>III. Testing Steps</h4><p>We tested the Issue entity API, which has a <em>from</em> and <em>to</em> parameter required for date ranges.</p><p><strong>Step 1:</strong> To simulate a realistic dataset, we created a CSV file with random values for the from and to parameters. For example, in a file named IssueWeeklyData.csv with sample data.</p><p><strong>Step 2:</strong> Created a new file for the <strong>k6 script</strong>, namedIssueScript.jsin the same directory as our CSV file. Sample k6 script: The script lets us define the stages, http_req_duration, etc.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/a785030706aa6d7ebd14c9dbb53cb0b1/href">https://medium.com/media/a785030706aa6d7ebd14c9dbb53cb0b1/href</a></iframe><p>This script will:</p><ul><li>Randomly select values from the csv file for the from and to parameters.</li><li>Make HTTP GET requests to the API endpoint with those parameters.</li><li>Check the response status and time, ensuring the queries run within the expected time limits.</li></ul><pre>k6 run IssueScript.js</pre><p>This will simulate the load, making concurrent requests to the API, and output performance metrics like <strong>response times</strong>, <strong>throughput</strong>, and <strong>error rates</strong>. Sample output looks like —</p><pre>✗ status is 200<br>      ↳  95% — ✓ 97 / ✗ 5<br>     ✗ response time &lt; 3000ms i.e 3 secs<br>      ↳  2% — ✓ 3 / ✗ 99<br><br>     checks.........................: 49.01% ✓ 100      ✗ 104<br>     data_received..................: 14 MB  940 kB/s<br>     data_sent......................: 19 kB  1.3 kB/s<br>     http_req_blocked...............: avg=256.57µs min=2µs      med=26.99µs max=1.52ms   p(90)=669.4µs  p(95)=697.49µs<br>     http_req_connecting............: avg=191.59µs min=0s       med=0s      max=820µs    p(90)=534.3µs  p(95)=555.85µs<br>   ✗ http_req_duration..............: avg=3.43s    min=271.48ms med=2.11s   max=8.08s    p(90)=6.49s    p(95)=7.11s<br>       { expected_response:true }...: avg=3.56s    min=479.34ms med=4.65s   max=8.08s    p(90)=6.57s    p(95)=7.13s<br>     http_req_failed................: 4.90%  ✓ 5        ✗ 97<br>     http_req_receiving.............: avg=82.49ms  min=25µs     med=14.53ms max=549.84ms p(90)=220.58ms p(95)=292.28ms<br>     http_req_sending...............: avg=50.84µs  min=10µs     med=40.5µs  max=161µs    p(90)=101.6µs  p(95)=117.89µs<br>     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s      max=0s       p(90)=0s       p(95)=0s<br>     http_req_waiting...............: avg=3.35s    min=271.42ms med=2.09s   max=7.87s    p(90)=6.3s     p(95)=7.03s<br>     http_reqs......................: 102    6.851256/s<br>     iteration_duration.............: avg=4.43s    min=1.27s    med=3.11s   max=9.08s    p(90)=7.49s    p(95)=8.11s<br>     iterations.....................: 102    6.851256/s<br>     vus............................: 20     min=3      max=50<br>     vus_max........................: 50     min=50     max=50<br><br><br>running (14.9s), 00/50 VUs, 102 complete and 0 interrupted iterations<br>default ✓ [======================================] 00/50 VUs  10s<br>ERRO[0016] thresholds on metrics &#39;http_req_duration&#39; have been crossed</pre><h3>Approach</h3><h4>Key variables tested:</h4><p>To evaluate the performance of both Redshift and Snowflake under varying loads, we simulated API queries under different scenarios. Several <strong>key parameters</strong> were used to assess how each platform performs under different levels of demand and customer data sizes:</p><ol><li><strong>Number of concurrent queries:</strong> We tested with different numbers of concurrent queries being executed within <strong>1 second</strong> to simulate various load scenarios. This helped us understand how each platform scales under high traffic and whether it can handle bursty traffic spikes.</li><li><strong>Snowflake Warehouse Size:</strong> We used different warehouse sizes on Snowflake—XS<strong>(Extra Small)</strong>, <strong>S (Small)</strong>, and <strong>M (Medium)</strong> — to simulate performance across various configurations. The warehouse size plays a significant role in determining how much compute power is available to handle queries, particularly during peak usage.</li><li><strong>Snowflake Cluster’s Maximum Capacity:</strong> Snowflake’s ability to scale automatically by adding or removing clusters was tested to see how well it handled increases in concurrent queries. We simulated workloads pushing Snowflake’s cluster capacity to its limit to evaluate its auto-scaling capabilities.</li><li><strong>Customer Data Size:</strong> We simulated queries for customers with <strong>small</strong>, <strong>medium</strong>, and <strong>large</strong> datasets to assess how Snowflake handles different data volumes.</li></ol><h3>Results &amp; Findings</h3><p>For our Analytics and Power BI API load test, we selected two heavily used APIs: Issues Entity API and Agents Entity API. We ran these APIs against both Redshift and Snowflake, evaluating total execution time under different conditions. Overall, Snowflake has a much better query execution time under heavy workload for our use cases and proved to be relatively better than Redshift.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Phvl8q50ZkUxrscPtOhJOQ.png" /></figure><ul><li>These preliminary performance tests showed ~30 to 50 % improved query performance on Snowflake read path compared to our current Redshift Database.</li><li>Cost of Operation: By extrapolating the numbers based on POC runs, Snowflake has a 2x or 3x jump in cost as compared to Redshift.</li><li>Scalability: Snowflake is very easy to scale up or down automatically and can cater to high concurrency with high performance.</li></ul><h3>Conclusion</h3><p>Snowflake offered significant performance improvements in handling high-concurrency workloads, especially in terms of query latency and scalability. However, these benefits came at a higher operational cost, highlighting the trade-offs between performance and budget.</p><p>This POC was a valuable exercise in defining the right architectural decisions to support the growing demand on our customer-facing analytics platform. It validated the need for more elastic and maintenance-free architectures as concurrency continues to scale.</p><p>Future work will focus on <strong>evaluating cost optimization strategies within Snowflake</strong>, including compute sizing, auto-suspend configurations, and query tuning, to ensure that the improved performance aligns with our operational efficiency goals.</p><p><em>Thanks to Abhishek and Aqeel</em> <em>for the valuable support.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=3cf94104cb98" width="1" height="1" alt=""><hr><p><a href="https://medium.com/helpshift-engineering/load-testing-apis-on-redshift-snowflake-a-quick-poc-3cf94104cb98">Load Testing API’s on Redshift &amp; Snowflake — A  Quick POC</a> was originally published in <a href="https://medium.com/helpshift-engineering">helpshift-engineering</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Form Validation Tips Every Web Developer Should Know!]]></title>
            <link>https://medium.com/helpshift-engineering/form-validation-tips-every-web-developer-should-know-9d966d8fd571?source=rss----3229f31ca4f4---4</link>
            <guid isPermaLink="false">https://medium.com/p/9d966d8fd571</guid>
            <category><![CDATA[javascript]]></category>
            <category><![CDATA[programming]]></category>
            <category><![CDATA[technology]]></category>
            <category><![CDATA[software-development]]></category>
            <category><![CDATA[ux]]></category>
            <dc:creator><![CDATA[Hritik Jaiswal]]></dc:creator>
            <pubDate>Fri, 20 Dec 2024 04:18:55 GMT</pubDate>
            <atom:updated>2024-12-20T04:18:54.951Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*dkmWBcaNAUuep2cHoZV2mA.png" /></figure><p>Forms are everywhere online, from signing up for newsletters to making purchases. But let’s be honest — nothing’s more frustrating than a form that’s hard to fill out or riddled with unclear error messages. In this post, we’ll dive into practical tips and tricks to make your form validation seamless, user-friendly, and maybe even enjoyable!</p><p>We’ll walk through tips for using built-in HTML features and creating custom validation with JavaScript. No complicated jargon — just practical steps to improve your forms.</p><h3># Tip 01: Use the Correct HTML Input Types</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*hKH8uDnjlTUG7_tf7fiwYA.gif" /><figcaption>Before applying correct HTML Input Types</figcaption></figure><p>If you set the input type as “<strong>text</strong>” for a password field, the password will not be obscured as you type. Similarly, if you use the “<strong>text</strong>” input type for an email field, the browser’s default email pattern check will not be triggered.</p><p>When you use the correct input type, the password field will obscure the characters being typed, which is the desired behavior. Additionally, if you type an incorrect email address in the email field, the browser will notify you with a pop-up because of the built-in validation it provides.</p><ul><li>Email should use type=&quot;email&quot;.</li><li>Password should use type=&quot;password&quot;.</li><li>Confirm Password should also be used type=&quot;password&quot;.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_3ibbHKwrpeD49T0v4KHbw.gif" /><figcaption>After applying correct HTML Input Types</figcaption></figure><p><strong>Code</strong>:</p><p><a href="https://github.com/hritik5102/Form-Validation-Tips/commit/502e4162ab6bf4c19d50763e116d516617c74e85">Add Form Validation Tip #1 · hritik5102/Form-Validation-Tips@502e416</a></p><h3># Tip 02: Built-in HTML Validators</h3><p>If the username is the required field in your form. To ensure that the username field contains valid data, mark it as “<strong>required</strong>.” If the user tries to submit the form without filling in this field, the browser will prompt them to complete it. You can also restrict the number of characters by using the “<strong>minlength</strong>” attribute.</p><ul><li>For Username, mark it as “required.” If empty upon submission, the browser will ask: &quot;Please fill out this field.&quot;</li><li>Use minlength=&quot;5&quot; to require a minimum of five characters.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*F5ct--r9F-4MtasYVwJV1g.gif" /><figcaption>Tip 02: Built-in HTML Validators</figcaption></figure><p><strong>Code</strong>:</p><p><a href="https://github.com/hritik5102/Form-Validation-Tips/commit/cd8abbbeab98abd62b59dc8e293563b92fd346df">Add Form Validation Tip #2 · hritik5102/Form-Validation-Tips@cd8abbb</a></p><h3># Tip 03: Using Regex for Pattern Matching</h3><ul><li>For instance, to require a number in the username, use the pattern attribute: pattern=&quot;^[0-9]+$&quot;.</li><li>If a string is entered instead of a number, an HTML tooltip will appear, prompting: “<strong>Please match the requested format</strong>.”</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*VHyAvGuPqAnb0pPl49_TjQ.gif" /><figcaption>Tip 03: Using Regex for Pattern Matching</figcaption></figure><p><strong>Code</strong>:</p><p><a href="https://github.com/hritik5102/Form-Validation-Tips/commit/3b3230b6c7f0fa525b2718b46e9df404a913c20f">Add Form Validation Tip #3 · hritik5102/Form-Validation-Tips@3b3230b</a></p><h3># Tip 04: Custom Tooltip Validation Messages</h3><ul><li>The tooltip message “<strong>Please match the requested format</strong>” doesn’t clearly explain the specific format we expect from the user.</li><li>We can use the title attribute to display custom tooltips. For example, on the username field, set title=&quot;Username must be at least 5 characters and include a number.&quot;</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*OV6AV5bdZ_ePoxeShiGUIQ.gif" /><figcaption>Tip 04: Custom Tooltip Validation Messages</figcaption></figure><p><strong>Code</strong>:</p><p><a href="https://github.com/hritik5102/Form-Validation-Tips/commit/9e64865950a4ac9b16683bc4e5d576ee574c067c">Add Form Validation Tip #4 · hritik5102/Form-Validation-Tips@9e64865</a></p><h3># Tip 05: Create Custom Tooltip Messages with JavaScript</h3><ul><li>reportValidity() is a method available on form elements (&lt;form&gt;, &lt;input&gt;, &lt;select&gt;, &lt;textarea&gt;, etc.) that triggers the browser&#39;s built-in form validation.</li><li>setCustomValidity(message) is a method on input elements that lets you set a custom validation message, overriding the browser’s default message for specific constraints.</li><li>Together, they provide a robust mechanism for client-side form validation in web applications.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ZpddbGSlsJrbCQklMulmhQ.gif" /><figcaption>Tip 05: Create Custom Tooltip Messages with JavaScript</figcaption></figure><p><strong>Code</strong>:</p><p><a href="https://github.com/hritik5102/Form-Validation-Tips/commit/73c63bc18491db636b52663faa6cd35be4134c15">Add Form Validation Tip #5 · hritik5102/Form-Validation-Tips@73c63bc</a></p><h3># Tip 06: Avoid Early Validation</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*hDrA1-3-PhDUp5ZeZHc9Mw.gif" /><figcaption>When onInput event handler is used</figcaption></figure><ul><li>Wait until the user moves to the next input or loses focus on the current input before validating. Replace the onInput event handler with onChange for this purpose.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*c1eTpXKvYFnVzzPShMqdYA.gif" /><figcaption>Tip 06: Avoid Early Validation using onChange event handler</figcaption></figure><p>Notice how the error message is displayed only when the end user shifts focus to the next input field.</p><p><strong>Code</strong>:</p><p><a href="https://github.com/hritik5102/Form-Validation-Tips/commit/1ed19a1b3a9070e43c248689a8681a87e9c9e345">Add Form Validation Tip #6 · hritik5102/Form-Validation-Tips@1ed19a1</a></p><h3># Tip 07: Show Specific Validation Message At A Time</h3><ul><li>Assume the validation accepts both letters and numbers and requires a minimum of 8 characters. If the user enters ‘james,’ they will see an error indicating that a number is required. If they enter ‘james12,’ but it is too short, they will see an error stating that the input must be at least 8 characters long. Display the errors when the user moves to the next field.</li><li>Instead of displaying a tooltip for an error message, we can show a custom validation message in red. You can choose either approach based on the UI/UX design of your application.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*I-UTjcyuOiDAkhewzw8UOg.gif" /><figcaption>Tip 07: Show Specific Validation Message At A Time</figcaption></figure><p><strong>Code</strong>:</p><p><a href="https://github.com/hritik5102/Form-Validation-Tips/commit/a83908ca2d4325d9a224744f43e3bf1ff888c001#diff-58417e0f781b6656949d37258c8b9052ed266e2eb7a5163cad7b0863e6b2916aR45">Add Form Validation Tip #7 · hritik5102/Form-Validation-Tips@a83908c</a></p><h3># Tip 08: Static Height for Error Messages</h3><ul><li>Notice that when an error message appears, the UI shifts slightly, which can look unappealing.</li><li>To prevent this, set a fixed height for error messages to ensure the form’s layout remains consistent and visually appealing.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*xS_PmLbO5N4XIGXjNM2yVw.gif" /><figcaption>Tip 08: Static Height for Error Messages</figcaption></figure><p><strong>Code</strong>:</p><p><a href="https://github.com/hritik5102/Form-Validation-Tips/commit/138ae47af33d40839cb6e20445a38592728e3797">Add Form Validation Tip #8 · hritik5102/Form-Validation-Tips@138ae47</a></p><h3># Tip 09: Highlight Errors with Danger Indicators</h3><ul><li>Use a red color, danger icon, or tooltip for errors if it fits your UI guidelines. Ensure the tooltip is accessible for screen readers.</li><li>Highlight the input field with a red border when there’s an error.</li></ul><h3># Tip 10: Remove Error Messages When Fixed</h3><ul><li>Clear error messages as soon as the user meets the required conditions.</li></ul><h3># Tip 11: Indicate Required Fields With Red asterisk *️⃣</h3><ul><li>How do you know if a field is required? A visual clue, right?</li><li>The red asterisk (*) is one of the most common visual patterns used to indicate that a field is required.</li><li>It allows users to quickly identify mandatory fields, reducing trial and error and speeding up form completion.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*k45y7Hc3Xg4qt9dBJmr8Uw.png" /><figcaption>Tip 11: Indicate Required Fields With Red asterisk *️⃣</figcaption></figure><p><strong>Code</strong>:</p><p><a href="https://github.com/hritik5102/Form-Validation-Tips/commit/068226bec1179111ddd069eccf56d636671575cd">Add Form Validation Tip #9 · hritik5102/Form-Validation-Tips@068226b</a></p><h3># Tip 12: Always Perform Server-side Validation</h3><ul><li>Frontend validation can be bypassed if someone tampers with the form or submits it via Postman. Server-side validation is crucial for security.</li><li>Would highly recommend reading this out, this explains why both client and server validations are important.</li></ul><p><a href="https://stackoverflow.com/questions/63459923/why-should-one-validate-data-if-it-is-already-sanitized/63463433#63463433">Why should one validate data if it is already sanitized?</a></p><h3># Tip 13: Ensure your form is not only usable but also accessible</h3><ul><li>Adding validation to improve a form’s usability is only half the challenge. The other half is ensuring the form is accessible, meaning that individuals with disabilities can understand whether a field is invalid and know how to correct it.</li></ul><blockquote>Accessibility ensures everyone can access the content, while usability focuses on how easy it is to use the website. Together, they create the best possible user experience.</blockquote><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Y0ZC54dOUS1XCsONBDiBcg.png" /></figure><p>From an accessibility perspective, we must ensure that everyone not only knows the field is invalid but also understands the error message.</p><p>Below is a demonstration of the form running on VoiceOver, the built-in screen reader for macOS.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplayer.vimeo.com%2Fvideo%2F1039772599%3Fapp_id%3D122963&amp;dntp=1&amp;display_name=Vimeo&amp;url=http%3A%2F%2Fvimeo.com%2F1039772599&amp;image=https%3A%2F%2Fi.vimeocdn.com%2Fvideo%2F1962212233-e4e06572045c2d66da32ed53cc10eda5d8eb8a4d76c14e2500e99c99e52845cd-d_1280&amp;type=text%2Fhtml&amp;schema=vimeo" width="1920" height="969" frameborder="0" scrolling="no"><a href="https://medium.com/media/8800939f538b4eacabf22a4c6e406d64/href">https://medium.com/media/8800939f538b4eacabf22a4c6e406d64/href</a></iframe><p><strong>Code</strong>:</p><p><a href="https://github.com/hritik5102/Form-Validation-Tips/commit/3d96464861780689e50690823acae5c87e82c160">Form Validation Tip #10 · hritik5102/Form-Validation-Tips@3d96464</a></p><p><strong>You can find the source code for the above tips and tricks in the below repository</strong>:</p><p><a href="https://github.com/hritik5102/Form-Validation-Tips">GitHub - hritik5102/Form-Validation-Tips: Form Validation Tips Every Web Developer should know!</a></p><p>Thank you for taking the time to read the post until the end. Your attention and interest are greatly appreciated.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Ftenor.com%2Fembed%2F12813575&amp;display_name=Tenor&amp;url=https%3A%2F%2Ftenor.com%2Fview%2Fcats-cool-cat-cool-sunglasses-deal-with-it-gif-12813575&amp;image=https%3A%2F%2Fmedia.tenor.com%2FUKccvryc7-YAAAAF%2Fcats-cool-cat.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=tenor" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/fceae43fd9c499f9d20584444f4fe641/href">https://medium.com/media/fceae43fd9c499f9d20584444f4fe641/href</a></iframe><p>Please 👏🏻 if you like this post. It will motivate me to continue creating high-quality content like this one.</p><h4>Support Me</h4><p>Thank you for taking the time to read my blog post! If you found it valuable, I would greatly appreciate it if you could share the post on Twitter and LinkedIn, etc. Your support in spreading the word about my content means a lot to me. Thank you again!</p><h4>Follow me</h4><p>I hope you found this post helpful. If you want to stay up-to-date with my latest work, be sure to follow me on <a href="https://twitter.com/imhritik_dj"><strong><em>Twitter</em></strong></a><strong><em>, </em></strong><a href="https://www.linkedin.com/in/hritik-jaiswal"><strong><em>LinkedIn</em></strong></a>, and <a href="https://github.com/hritik5102"><strong><em>GitHub</em></strong></a>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=9d966d8fd571" width="1" height="1" alt=""><hr><p><a href="https://medium.com/helpshift-engineering/form-validation-tips-every-web-developer-should-know-9d966d8fd571">Form Validation Tips Every Web Developer Should Know!</a> was originally published in <a href="https://medium.com/helpshift-engineering">helpshift-engineering</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Breaking the Ice: Integrating Snowflake with Power BI]]></title>
            <link>https://medium.com/helpshift-engineering/breaking-the-ice-integrating-snowflake-with-power-bi-44bfafa9b05e?source=rss----3229f31ca4f4---4</link>
            <guid isPermaLink="false">https://medium.com/p/44bfafa9b05e</guid>
            <category><![CDATA[reporting]]></category>
            <category><![CDATA[pipeline]]></category>
            <category><![CDATA[snowflake]]></category>
            <category><![CDATA[analytics]]></category>
            <category><![CDATA[power-bi]]></category>
            <dc:creator><![CDATA[Mithil Oswal]]></dc:creator>
            <pubDate>Tue, 30 Apr 2024 12:06:48 GMT</pubDate>
            <atom:updated>2024-04-30T12:06:48.736Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*A6wr4c1f1qktTF2T" /><figcaption>Photo by <a href="https://unsplash.com/@aaronburden?utm_source=medium&amp;utm_medium=referral">Aaron Burden</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><h4>A simple guide on how to connect Snowflake data in Power BI to create reports, publish them, and schedule refreshes.</h4><h3>Pre-requisites</h3><p>Well, since you’ve already reached this page, I’m assuming that you know of, and have access to both the tools — Snowflake as well as Power BI.</p><p>In case you do not have both / any of these accounts, you can try signing up below:</p><ul><li><a href="https://signup.snowflake.com/"><strong>Snowflake</strong></a></li><li><a href="https://app.powerbi.com/singleSignOn?ru=https%3A%2F%2Fapp.powerbi.com%2F%3FnoSignUpCheck%3D1"><strong>Power BI</strong></a></li></ul><p>Once you have completed the setup and have things up &amp; running, you can proceed to the next steps.</p><h3>Creating some dummy data in Snowflake</h3><p>For our demo purposes, we will create some dummy data.</p><ol><li>Using the list of SQL queries below, we will create an <strong><em>employee database</em></strong>, an <strong><em>employee schema</em></strong>, &amp; an <strong><em>employee table</em></strong>.</li></ol><pre>CREATE DATABASE employee_database; <br><br>CREATE SCHEMA employee_database.employee_schema; <br><br>CREATE TABLE employee_database.employee_schema.employee_table<br>(<br>    id NUMBER, <br>    name VARCHAR,<br>    salary NUMBER<br>); </pre><figure><img alt="Create a table in Snowflake" src="https://cdn-images-1.medium.com/max/1024/1*6PqkfdRg2jzh4ZME7EnpgQ.png" /><figcaption>Create a table in Snowflake</figcaption></figure><p>2. Our newly created Snowflake structure for “employee” data should look like this —</p><figure><img alt="Employee data structure" src="https://cdn-images-1.medium.com/max/1024/1*mWVCHZOh04SsJSVjhzkgXA.png" /><figcaption>Employee data structure</figcaption></figure><p>3. Let’s add some test data to the table &amp; see how it looks. Also, you can “describe” the table to view the schema.</p><pre>DESC TABLE employee_database.employee_schema.employee_table; <br><br>INSERT INTO employee_database.employee_schema.employee_table(id, name, salary) <br>    VALUES (111, &#39;Max&#39;, 50000), <br>           (222, &#39;Irene&#39;, 55000),<br>           (333, &#39;Tim&#39;, 70000), <br>           (444, &#39;Harris&#39;, 41000), <br>           (555, &#39;Isabelle&#39;, 97000), <br>           (666, &#39;Lennon&#39;, 87000); <br><br>SELECT * FROM employee_database.employee_schema.employee_table;</pre><figure><img alt="Insert data into table" src="https://cdn-images-1.medium.com/max/1024/1*IavpUr2M4YX2lsAdGwUw5w.png" /><figcaption>Insert data into table</figcaption></figure><blockquote>→ Now that we have the required data set up in Snowflake, we can move to Power BI Desktop for the next section of this process. ✅</blockquote><h3>Setting up the Snowflake connection in Power BI Desktop and creating reports</h3><ol><li>Open Power BI Desktop → Get Data → Snowflake → Connect. Click “Advanced options” on the dialog box that appears next.</li></ol><figure><img alt="Snowflake connector in Power BI" src="https://cdn-images-1.medium.com/max/1024/1*fXfqK5uFbAYmIeCKMH8hLw.png" /><figcaption>Snowflake connector in Power BI</figcaption></figure><p>2. You will now need to add some credentials for the connection details. Although some of these may be optional, it is recommended to include them.</p><ul><li>Server: &lt;your server name&gt;[Note: It will be in the format of &lt;account_name/locator&gt;.&lt;region_id&gt;.snowflakecomputing.com and remember to EXCLUDE the https://]</li><li>Warehouse: &lt;your warehouse name&gt;</li><li>Role: &lt;your role name&gt;</li><li>Database: &lt;your database name&gt; [Note: This should be in CAPITAL LETTERS only]</li><li>SQL statement: &lt;your SQL query&gt;</li></ul><p>In the next step, I will explain where to find this information in your Snowflake account.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*d0-CzMaIXyy_Ezv_uh3naA.png" /><figcaption>Setting up Snowflake connection in Power BI</figcaption></figure><p>For our use case, we can add the following details —</p><ul><li>Server: &lt;your server name&gt;</li><li>Warehouse: COMPUTE_WH</li><li>Role: ACCOUNTADMIN</li><li>Database: EMPLOYEE_DATABASE</li><li>SQL statement: SELECT * FROM employee_database.employee_schema.employee_table</li></ul><p>3. Now, let’s go back to our Snowflake account to fetch some credentials.</p><blockquote>For the <strong>Server name</strong>, click on your initials on the bottom right → Account → ID → Copy account URL. Don’t forget to remove the “https://” when pasting it.</blockquote><p><strong>Role and Warehouse names</strong> can be taken from the top right. For a new account, they are usually the same default names.</p><p>In a larger organization, you will be provided with these details during your Snowflake account setup, based on your restricted access.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*rlZiimej5mkjeglZdzCYDg.png" /><figcaption>Getting credentials for the connection</figcaption></figure><p>4. Back to Power BI now. After saving the credential details, you will be asked for a “<strong>Username</strong>” and “<strong>Password</strong>”.</p><p>👉 This is your login information from your Snowflake account.</p><figure><img alt="Enter Snowflake credentials" src="https://cdn-images-1.medium.com/max/1024/1*SgsPTQYchkbOZ5rVfZIZfQ.png" /><figcaption>Enter Snowflake credentials</figcaption></figure><p>5. If everything works out as expected, you should be able to see the result of your SQL query. Simply click “Load”.</p><figure><img alt="Snowflake data in Power BI" src="https://cdn-images-1.medium.com/max/1024/1*e4hCyhiIk0vgqUE5Xpw1pA.png" /><figcaption>Snowflake data in Power BI</figcaption></figure><p>6. Finally, to wrap on the connection side - choose a mode. Pretty self-explanatory here.</p><ul><li>Import: <br>- Data from the source is loaded into Power BI and stored within the file itself. <br>- Faster for small to medium-sized datasets. Can perform extensive data transformations, modeling, and calculations on the imported data. <br>- Data can be refreshed based on a schedule.</li><li>Direct Query: <br>- Power BI connects directly to the data source in real-time and queries the data on-demand. <br>- Allows you to work with the most current data available in the source system. <br>- Can be slower for large datasets or complex queries because it relies on the performance of the source system.</li></ul><p>Personally, I prefer to use the “Import” mode for my use case since I need faster reporting.</p><figure><img alt="Connection mode" src="https://cdn-images-1.medium.com/max/1013/1*1es5pN8HP0mGnhYd31R2hQ.png" /><figcaption>Connection mode</figcaption></figure><p>7. Time to create reports! To demonstrate, I’ve just made a simple bar chart based on “employee_name” and “employee_salary” using our sample data.</p><figure><img alt="Power BI Report" src="https://cdn-images-1.medium.com/max/1024/1*kTOPVe1y5BEvSfflEgG1zw.png" /><figcaption>Power BI Report</figcaption></figure><p>8. Let’s save the report &amp; then publish it to “My Workspace” in the Power BI Service online.</p><blockquote>Note: When you publish a Power BI report to your workspace, it also publishes the corresponding dataset automatically.</blockquote><figure><img alt="Publishing the Power BI Report" src="https://cdn-images-1.medium.com/max/1024/1*bNF8pRQU4tZSR63O6IM_2w.png" /><figcaption>Publishing the Power BI Report</figcaption></figure><p>9. Voila, your report &amp; dataset aka semantic model have been published! 🥳</p><figure><img alt="Power BI Service workspace" src="https://cdn-images-1.medium.com/max/1024/1*FT_WkkTSvs1fGQAjIeXhgQ.png" /><figcaption>Power BI Service workspace</figcaption></figure><h3>Scheduling Snowflake data refresh in Power BI Service</h3><ol><li>Now that we can access our report in the Power BI Service online, it is time to connect Power BI Service with our Snowflake account, so that if we make any changes to the database, it is reflected in our reports as well as the PBI semantic model.</li></ol><p>For this, Click on the Semantic Model → Settings</p><figure><img alt="Semantic model settings" src="https://cdn-images-1.medium.com/max/1024/1*DclK1Kgen9shNwd04wlHaQ.png" /><figcaption>Semantic model settings</figcaption></figure><p>2. Under “Data source credentials”, you will notice an error. That’s okay, we’ll fix it now. Click on “Edit credentials”</p><figure><img alt="Add data source credentials" src="https://cdn-images-1.medium.com/max/1024/1*1bknscwAXIoEg8Hxs6H7bg.png" /><figcaption>Add data source credentials</figcaption></figure><p>3. Now enter your Snowflake account login details / credentials, and click “Sign in”.</p><p>You can choose any Privacy level setting that fits your purpose.</p><figure><img alt="Enter Snowflake account credentials" src="https://cdn-images-1.medium.com/max/1024/1*FYZ2kgTIBQrtFZWFMEYdYw.png" /><figcaption>Enter Snowflake account credentials</figcaption></figure><p>4. You will notice that the error has disappeared because we have connected our Power BI Service account with our Snowflake account.</p><p>Moving on to scheduling a data refresh automatically.</p><figure><img alt="Snowflake credentials added" src="https://cdn-images-1.medium.com/max/1024/1*ViFVK9fhhFVvbwJCkzukSQ.png" /><figcaption>Snowflake credentials added</figcaption></figure><p>5. Turn on the toggle under “Refresh” and select a frequency, time, and notification (if required). Click “Apply”.</p><p><strong>Done! 👍</strong></p><figure><img alt="Schedule data refresh" src="https://cdn-images-1.medium.com/max/1024/1*sstdApH21uEzP5XksZTSWw.png" /><figcaption>Schedule data refresh</figcaption></figure><h3>Testing the connection in Power BI Service</h3><ol><li>To check if our “refresh” would work as expected, let’s make some changes in our “employee_table” in Snowflake and see if it is reflected in our Power BI report.</li></ol><p>I will just add some additional records to the table.</p><pre>INSERT INTO employee_database.employee_schema.employee_table(id, name, salary) <br>    VALUES (777, &#39;Onyx&#39;, 100000), <br>           (888, &#39;Selena&#39;, 115000), <br>           (999, &#39;Wren&#39;, 120000), <br>           (123, &#39;Ariana&#39;, 175000),  <br>           (456, &#39;Linda&#39;, 142000); <br><br>SELECT * FROM employee_database.employee_schema.employee_table; </pre><figure><img alt="Inserting new records in the table" src="https://cdn-images-1.medium.com/max/1024/1*RZ71ctQPasMjKF7zcl1pTA.png" /><figcaption>Inserting new records in the table</figcaption></figure><p>2. Instead of waiting for the scheduled refresh to kick in, we’ll perform a “manual” refresh in Power BI to check the updated data.</p><p>Go back to the Semantic Model in Power BI Service → Refresh → Refresh now.</p><figure><img alt="Refreshing the semantic model in Power BI Service" src="https://cdn-images-1.medium.com/max/1024/1*lmEhiqQaf-gMiA5ym6JtyA.png" /><figcaption>Refreshing the semantic model in Power BI Service</figcaption></figure><p>3. Wait for the refresh to complete, and then open the “Snowflake Employee Report” that we had published earlier.</p><p>You should see the new additional records in the chart!</p><figure><img alt="Updated report with new records" src="https://cdn-images-1.medium.com/max/1024/1*dgc56WKmcgxUMxIqLK2jvA.png" /><figcaption>Updated report with new records</figcaption></figure><h3>Conclusion</h3><p>There you have it. A simple but detailed guide on connecting Snowflake and Power BI.</p><p>Cheers!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*KiaD5v9GBRRpFdYq" /><figcaption>Photo by <a href="https://unsplash.com/@aaronburden?utm_source=medium&amp;utm_medium=referral">Aaron Burden</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=44bfafa9b05e" width="1" height="1" alt=""><hr><p><a href="https://medium.com/helpshift-engineering/breaking-the-ice-integrating-snowflake-with-power-bi-44bfafa9b05e">Breaking the Ice: Integrating Snowflake with Power BI</a> was originally published in <a href="https://medium.com/helpshift-engineering">helpshift-engineering</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[JavaScript Loading Strategies: Normal vs Async vs Defer]]></title>
            <link>https://medium.com/helpshift-engineering/javascript-loading-strategies-normal-vs-async-vs-defer-930285016803?source=rss----3229f31ca4f4---4</link>
            <guid isPermaLink="false">https://medium.com/p/930285016803</guid>
            <category><![CDATA[performance]]></category>
            <category><![CDATA[web-development]]></category>
            <category><![CDATA[html]]></category>
            <category><![CDATA[user-experience]]></category>
            <category><![CDATA[javascript]]></category>
            <dc:creator><![CDATA[Hritik Jaiswal]]></dc:creator>
            <pubDate>Wed, 24 Apr 2024 03:13:39 GMT</pubDate>
            <atom:updated>2024-04-24T03:13:39.246Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*KutaPS3pBue3CDR37JO2dA.png" /></figure><p>When it comes to JavaScript loading strategies, it’s all about optimizing how and when your Javascript files are downloaded and executed in the browser. This directly affects your web page’s <strong><em>performance</em></strong>, <strong><em>user experience</em></strong>, and <strong><em>overall efficiency of a web application</em></strong>.</p><p><strong>Here’s a breakdown of some common strategies:</strong></p><ol><li>Synchronous Loading ( aka Blocking Script )</li><li>Asynchronous Loading</li><li>Defer Loading</li><li>Dynamic Script Loading</li><li>Lazy Loading</li></ol><p>By choosing the appropriate loading strategy, developers can optimize the loading of JavaScript resources to ensure fast page load times, smooth user interactions, and overall better performance.</p><h4>Let’s understand the basics</h4><p>When you load a webpage there are 2 major things happen in your browser:</p><ol><li>HTML parsing</li><li>Loading of the script (two ways)<br>a. Fetching it from the network<br>b. Actually executing script line by line</li></ol><h4>1️⃣ How HTML is parsed with image attributes?</h4><p>HTML parsing starts from the top and goes on until it ends, if in between it finds any image tag, it will send a request for downloading that image in the background and continue parsing the HTML even if the image is not downloaded.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplayer.vimeo.com%2Fvideo%2F932210382%3Fapp_id%3D122963&amp;dntp=1&amp;display_name=Vimeo&amp;url=https%3A%2F%2Fvimeo.com%2F932210382&amp;image=https%3A%2F%2Fi.vimeocdn.com%2Fvideo%2F1829814494-b43d0162a202a30bce0732a3cc0d70b3e529a009179eb98ece51b11ff53dfc64-d_1280&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=vimeo" width="1920" height="1183" frameborder="0" scrolling="no"><a href="https://medium.com/media/bb874877348825394baf07fa1e1ffd28/href">https://medium.com/media/bb874877348825394baf07fa1e1ffd28/href</a></iframe><p>But this not the case with script tag. In modern websites, scripts are often “<strong>heavier</strong>” than HTML, their download size is larger, and processing time is also longer.</p><p>When the browser loads HTML and comes across a</p><pre>&lt;script&gt;…&lt;/script&gt; </pre><p>It can’t continue building the DOM. It must execute the script right now. The same happens for external scripts <strong><em>&lt;script src=”…”&gt;&lt;/script&gt; </em></strong>the browser must wait for the script to download, execute the downloaded script, and only then can it process the rest of the page.</p><h4>2️⃣ Normal script ( without any attribute )</h4><pre>&lt;!-- is fetched and run immediately --&gt;<br>&lt;script src=&quot;index.js&quot;&gt;&lt;/script&gt;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Ew2akJ7R96mw-IyXQi9J0A.png" /><figcaption>Normal script</figcaption></figure><p><strong>Black</strong> : HTML Parsing<br><strong>Green</strong> : Fetching JS file from the network<br><strong>Red</strong> : Executing that JS file</p><p>When browser is loading a webpage, from the above example. It will parse the html first, then if it encounter a script tag in between, it will stop the parsing of the html at that point and it will fetch the script tag from the network and execute that script then and their itself.</p><p>After the script is fully executed, HTML parsing continues where it was stopped earlier.</p><blockquote>This is the reason script are added at the end of the body tag. so that normal script does not block rendering of the page.</blockquote><p><strong>Example</strong>:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/cf23e3dd37081ba003cc722280fb7760/href">https://medium.com/media/cf23e3dd37081ba003cc722280fb7760/href</a></iframe><p><strong>Output</strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*T0hZyjE1GhF16iE3L05sHA.png" /></figure><blockquote>What is DOMContentLoaded ? 🤔</blockquote><blockquote>The DOMContentLoaded event fires when <strong>non-async</strong> scripts have arrived and executed and initial HTML document has been completely loaded and parsed, without waiting for stylesheets, images to finish loading.</blockquote><p><strong>non-async scripts</strong> : JS code present inside &lt;script&gt;... &lt;/script tag.</p><p><strong>async script</strong> : Using async and defer attribute.</p><h4>3️⃣ Async (load-first order)</h4><pre>// script might run anytime, before OR after HTML is parsed<br>&lt;script src=&quot;index.js&quot; async&gt;&lt;/script&gt;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*sqITDJzTpI-LHmBYRhad-A.png" /><figcaption>Async (load-first order)</figcaption></figure><p>With async attribute, HTML parsing and fetching script from the network is done parallelly, once the async script is downloaded and the script execution starts, HTML parsing is stopped and when script is done with its execution then HTML parsing continues, at the point where it had stopped earlier.</p><p>Here, if the script download is in progress, then HTML parsing won’t stop. It will keep parsing the HTML, and parsing might get completed even before the script download is finished and it starts its execution. So, in this case, it won’t block HTML parsing.</p><p><strong>Independent Execution:</strong></p><blockquote>Once the async script is downloaded, the browser executes it as soon as possible, regardless of where the &lt;script&gt; tag is positioned in your HTML or whether other scripts have finished running.</blockquote><blockquote>This asynchronous execution allows the browser to continue rendering the page content while the script runs in the background.</blockquote><p>if we’ve multiple script tags.</p><pre>&lt;script src=&quot;index.js&quot; async&gt;&lt;/script&gt;<br>&lt;script src=&quot;main.js&quot; async&gt;&lt;/script&gt;<br>&lt;script src=&quot;handler.js&quot; async&gt;&lt;/script&gt;</pre><p>In above case, the file will be executed randomly based on the which file is downloaded quickly.</p><blockquote>💡 Async does not guarantee the order of the execution of the script</blockquote><p><strong>Point to be noted</strong></p><ul><li>DOMContentLoaded and async scripts don’t wait for each other:</li><li>DOMContentLoaded may happen both before an async script ( if an async script finishes loading after the page is complete )</li><li>…or after an async script (if an async script is short or was in HTTP-cache)</li></ul><p>Example: Index.html</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/10f888870404bca67d288a97e7ec0a29/href">https://medium.com/media/10f888870404bca67d288a97e7ec0a29/href</a></iframe><p>Output:</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplayer.vimeo.com%2Fvideo%2F932240907%3Fapp_id%3D122963&amp;dntp=1&amp;display_name=Vimeo&amp;url=https%3A%2F%2Fvimeo.com%2F932240907&amp;image=http%3A%2F%2Fi.vimeocdn.com%2Fvideo%2F1829859409-fad4f0f76a102ee569b1755428789563a985b18552a8b0d7aba7f1283c49e0d9-d_1280&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=vimeo" width="1920" height="963" frameborder="0" scrolling="no"><a href="https://medium.com/media/a7af11db524024677c84dde476505a80/href">https://medium.com/media/a7af11db524024677c84dde476505a80/href</a></iframe><p><strong>Summary</strong></p><ul><li>Usually, The page content shows up immediately: async doesn’t block it.</li><li>DOMContentLoaded may happen both before and after async, no guarantees here.</li><li>Async scripts run in the “load-first” order. So whichever script loads first will run first. Here, ‘long.js’ runs first, probably because it might be cached. However, it may happen that ‘small.js’ will run first because it loads before ‘long.js’.</li></ul><p>Async scripts are great when we integrate an independent third-party script into the page: counters, ads and so on, as they don’t depend on our scripts, and our scripts shouldn’t wait for them:</p><pre>&lt;!-- Google Analytics is usually added like this --&gt;<br>&lt;script async src=&quot;https://google-analytics.com/analytics.js&quot;&gt;&lt;/script&gt;</pre><h4>4️⃣ Defer</h4><p>The <strong>defer</strong> attribute tells the browser not to wait for the script. Instead, the browser will continue to process the HTML, build DOM. The script loads “in the background”, and then runs when the DOM is fully built.</p><p>As a name suggest, defer means delaying or postponing some event. so here script execution will get delayed.</p><pre>&lt;!-- blocks DOMContentLoaded, but page might display early --&gt;<br>&lt;!-- &amp; defer means that the DOM will be ready before run --&gt;<br>&lt;script src=&quot;index.js&quot; defer&gt;&lt;/script&gt;</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*NSppc0l9oxtrbUlJNmyqow.png" /><figcaption>Defer attribute</figcaption></figure><p>With <strong>defer</strong> attribute, HTML Parsing and fetching script from the network is done parallelly and once the HTML parsing is completed, script execution will start.</p><p>Important point to mention:</p><ul><li>Defer is non blocking, in a sense that scripts with defer never block the rendering of the DOM.</li><li>Scripts with defer always execute when the DOM is ready (but before DOMContentLoaded event).</li></ul><blockquote>💡 <strong>Note</strong>: Unlike async, defer guarantees the order of the execution of the script and it will run in the order they’ve defined.</blockquote><p><strong>Example</strong></p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/668b262b636fbdec3b089c192e4e4034/href">https://medium.com/media/668b262b636fbdec3b089c192e4e4034/href</a></iframe><p>Output:</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplayer.vimeo.com%2Fvideo%2F932439476%3Fapp_id%3D122963&amp;dntp=1&amp;display_name=Vimeo&amp;url=https%3A%2F%2Fvimeo.com%2F932439476&amp;image=https%3A%2F%2Fi.vimeocdn.com%2Fvideo%2F1830394555-516c19ea85efcc6d619c57214e9a3db61dfc72bb3b0c4308163f64b3190969b4-d_1280&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=vimeo" width="1920" height="431" frameborder="0" scrolling="no"><a href="https://medium.com/media/447d48c6358fefecf3cc6f3af633fd9a/href">https://medium.com/media/447d48c6358fefecf3cc6f3af633fd9a/href</a></iframe><ol><li>The page content shows up immediately.</li><li>DOMContentLoaded event handler waits for the deferred script. It only triggers when the script is downloaded and executed.</li></ol><p><strong>Deferred scripts keep their relative order, just like regular scripts.</strong></p><p>Let’s say, we have two deferred scripts: the long.js and then small.js:</p><pre><br>// Long script<br>&lt;script defer src=&quot;https://javascript.info/article/script-async-defer/long.js&quot;&gt;&lt;/script&gt;<br><br>// short script<br>&lt;script defer src=&quot;https://javascript.info/article/script-async-defer/small.js&quot;&gt;&lt;/script&gt;</pre><p>Browsers scan the page for scripts and download them in parallel, to improve performance. So in the example above both scripts download in parallel. The small.js probably finishes first.</p><p>…But the defer attribute, besides telling the browser “<strong>not to block</strong>”, ensures that the relative order is kept. So even though small.js loads first, it still waits and runs after long.js executes</p><p>That may be important for cases when we need to load a JavaScript library and then a script that depends on it.</p><p><strong>The </strong><strong>defer attribute is only for external scripts</strong></p><blockquote>💡 The defer attribute is ignored if the &lt;script&gt; tag has no src.</blockquote><p>I’m not covering the dynamic script and lazy loading strategies in this blog post, as the post has already gotten slightly longer. However, if you want to read about them, here is the article you can check out:</p><ul><li><a href="https://javascript.info/script-async-defer#dynamic-scripts">Javascript.info, “Dynamic scripts”, Aug 26th 2021</a></li><li><a href="https://blog.logrocket.com/understanding-lazy-loading-javascript/">Logrocket, “Understanding lazy loading in JavaScript”, <br>Jun 7, 2023</a></li></ul><p>Thank you for taking the time to read the post until the end. Your attention and interest are greatly appreciated.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Ftenor.com%2Fembed%2F12813575&amp;display_name=Tenor&amp;url=https%3A%2F%2Ftenor.com%2Fview%2Fcats-cool-cat-cool-sunglasses-deal-with-it-gif-12813575&amp;image=https%3A%2F%2Fmedia.tenor.com%2FUKccvryc7-YAAAAF%2Fcats-cool-cat.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=tenor" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/fceae43fd9c499f9d20584444f4fe641/href">https://medium.com/media/fceae43fd9c499f9d20584444f4fe641/href</a></iframe><p>Please 👏🏻 if you like this post. It will motivate me to continue creating high-quality content like this one.</p><h4>Support Me</h4><p>Thank you for taking the time to read my blog post! If you found it valuable, I would greatly appreciate it if you could share the post on Twitter and LinkedIn, etc. Your support in spreading the word about my content means a lot to me. Thank you again!</p><h4>Follow me</h4><p>I hope you found this post helpful. If you want to stay up-to-date with my latest work, be sure to follow me on <a href="https://twitter.com/imhritik_dj"><strong><em>Twitter</em></strong></a><strong><em>, </em></strong><a href="https://www.linkedin.com/in/hritik-jaiswal"><strong><em>LinkedIn</em></strong></a>, and <a href="https://github.com/hritik5102"><strong><em>GitHub</em></strong></a>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=930285016803" width="1" height="1" alt=""><hr><p><a href="https://medium.com/helpshift-engineering/javascript-loading-strategies-normal-vs-async-vs-defer-930285016803">JavaScript Loading Strategies: Normal vs Async vs Defer</a> was originally published in <a href="https://medium.com/helpshift-engineering">helpshift-engineering</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>