<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Akshayprabhu on Medium]]></title>
        <description><![CDATA[Stories by Akshayprabhu on Medium]]></description>
        <link>https://medium.com/@akshayprabhu396?source=rss-8783643de3bb------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/0*9lXi_nMLxbL1wsq7</url>
            <title>Stories by Akshayprabhu on Medium</title>
            <link>https://medium.com/@akshayprabhu396?source=rss-8783643de3bb------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sun, 17 May 2026 19:36:31 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@akshayprabhu396/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Building a Serverless Sports Event Ingestion Pipeline]]></title>
            <link>https://medium.com/@akshayprabhu396/building-a-serverless-sports-event-ingestion-pipeline-70942ebdafe3?source=rss-8783643de3bb------2</link>
            <guid isPermaLink="false">https://medium.com/p/70942ebdafe3</guid>
            <dc:creator><![CDATA[Akshayprabhu]]></dc:creator>
            <pubDate>Fri, 15 May 2026 05:30:26 GMT</pubDate>
            <atom:updated>2026-05-15T05:30:26.662Z</atom:updated>
            <content:encoded><![CDATA[<h3>Initial Architecture / Approach to Design</h3><p>One of the first design problems I encountered while building the application was determining how bets should be validated and represented internally. Rather than allowing users to submit arbitrary bet information, each bet was tied to a specific sporting event stored within the database. At a basic level, this prevented invalid or incorrectly entered bets, such as users mistyping teams or referencing games that did not actually exist on a given date. However, the relationship quickly became much more valuable than simple validation. Associating bets with normalized event records enabled the application to support analytics and future insight systems, such as tracking which teams or sports users frequently bet on, analyzing how close to game time bets were being placed, and eventually comparing betting behavior across sports books or markets. It also introduced consistency into the relational model, since multiple bets from different users or sports books could reference the same underlying event.</p><p>Initially, the simplest approach would have been allowing the frontend to query third-party sports APIs directly whenever game information was needed. However, this introduced several architectural and operational concerns, including API rate limiting, increased frontend latency, exposure of external API keys, repeated duplicate requests across clients, and tight coupling between frontend logic and external payload formats. To address this, the application introduced an internal events table that stored normalized game data fetched from external providers. While this initially functioned as a lightweight cache to reduce API calls, it gradually evolved into a central source of truth for the application. By maintaining an internal representation of sporting events, the system gained control over normalization, validation, and consistency while insulating downstream application logic from external API changes. As the project expanded toward supporting additional leagues and potentially prediction markets in the future, this separation became increasingly important.</p><p>Once the events table became foundational infrastructure, keeping it synchronized with real-world game data became its own engineering problem. Games may be postponed, delayed, rescheduled, or updated dynamically during playoff scenarios where future matchups are not known in advance. As a result, the system required some form of scheduled ingestion pipeline capable of periodically fetching, transforming, and updating event data automatically.</p><h3>Local Prototype</h3><p>The initial synchronization architecture for the application followed a relatively straightforward ingestion pipeline that lived directly within the backend application itself. The flow consisted of sending a request to an external sports API to retrieve event information, normalizing the returned payload into the application’s internal schema, and inserting or updating records within the events table. Since sporting events change continuously throughout a season due to delays, postponements, updated start times, and playoff scheduling, the database required periodic synchronization in order to remain reliable as the application’s source of truth. To automate this process, I introduced a local cron job that executed the synchronization flow once every day. Cron jobs are time-based scheduling utilities commonly used in operating systems to automate repetitive background tasks, and this approach provides a simple way to keep the database updated without requiring manual intervention.</p><h3>Operational Limitations</h3><p>While the local cron job architecture worked well as an initial prototype, it quickly introduced several operational limitations. Since the synchronization workflow was running locally, ingestion reliability became tied directly to my personal development environment and required my machine to remain online in order for scheduled updates to execute successfully. As the project continued to grow, deployment and scaling also became more difficult because the synchronization pipeline was tightly coupled to the application runtime rather than existing as an independently managed service. Additionally, the scheduled jobs themselves were physically separated from the hosted database infrastructure, creating a less centralized architecture for secrets management, API credentials, and deployment configuration. Although the system functioned correctly, it became clear that a more production-oriented synchronization architecture was needed.</p><h3>Edge Function Development and Deployment</h3><h4>Supabase Edge Functions</h4><p>To address the operational limitations of the local synchronization architecture, I transitioned the ingestion workflow into a dedicated Supabase Edge Function. Edge functions are lightweight serverless backend functions that execute on demand in response to HTTP requests or scheduled events. Instead of maintaining a continuously running backend service solely for periodic synchronization, the ingestion logic could now execute independently within managed cloud infrastructure located near the hosted database itself. This significantly simplified the architecture by separating event synchronization responsibilities from the main application runtime. The edge function became solely responsible for fetching external API data, normalizing payloads, and updating the events table, while the frontend and backend only consumed the already-normalized internal data model.</p><pre>// Edge Function Logic<br>// logic for each impoerted module exists in files within same function<br><br>import &quot;jsr:@supabase/functions-js/edge-runtime.d.ts&quot;<br>import { formatNBAEvents } from &quot;./formatNBAEvents.ts&quot;<br>import { getNBAEvents } from &quot;./getNBAEvents.ts&quot;<br>import { writeNBAEvents } from &quot;./writeNBAEvents.ts&quot;<br><br>Deno.serve(async (request) =&gt; {<br><br>  if (request.method !== &quot;POST&quot;) {<br>    return new Response(<br>      JSON.stringify({ error: &quot;Method not allowed. Use POST.&quot; }),<br>      {<br>        status: 405,<br>        headers: { &quot;Content-Type&quot;: &quot;application/json&quot; },<br>      },<br>    )<br>  }<br>  try {<br>    console.log(&quot;sync-events edge function invoked&quot;)<br>    const nbaResponses = await getNBAEvents()<br>    const events = nbaResponses.flatMap((item) =&gt; formatNBAEvents(item.payload))<br>    const writtenEvents = await writeNBAEvents(events)<br><br>    return new Response(<br>      JSON.stringify({<br>        success: true,<br>        message: &quot;sync-events fetched, formatted, and upserted events&quot;,<br>        gameDates: nbaResponses.map((event) =&gt; event.gameDate),<br>        totalResponses: nbaResponses.length,<br>        totalEvents: events.length,<br>        upsertedEvents: writtenEvents.length,<br>        events: writtenEvents,<br>      }),<br>      {<br>        status: 200,<br>        headers: { &quot;Content-Type&quot;: &quot;application/json&quot; },<br>      },<br>    )<br>  } catch (error) {<br>    console.error(&quot;sync-events failed&quot;, error)<br><br>    return new Response(<br>      JSON.stringify({<br>        success: false,<br>        error: error instanceof Error ? error.message : &quot;Unknown error&quot;,<br>      }),<br>      {<br>        status: 500,<br>        headers: { &quot;Content-Type&quot;: &quot;application/json&quot; },<br>      },<br>    )<br>  }<br>})</pre><p>This also centralized API credential and secrets management within Supabase while removing the dependency on my local machine remaining online for scheduled updates.</p><h4>Local Development and Testing</h4><p>Before deploying the synchronization pipeline into the hosted Supabase environment, I first wanted to validate the ingestion workflow locally against a controlled database environment. To do this, I took the following steps:</p><ol><li>I created a dedicated supabase folder within my project directory by running supabase init. This initialized local configuration files, migrations, and edge function scaffolding.</li><li>To launch the local database services and the edge function runtime used for testing, I ran supabase start.</li><li>In the functions folder within the supabase directory, I created a new directory which contained the edge function using supabase functions new sync-events. The synchronization logic was stored in the index.ts file. The edge function was responsible for fetching external API data, normalizing payloads, and updating the events table.</li><li>Local testing requires keys and secrets to be stored on the machine. I configured environment variables and API credentials locally using an .env file or exported shell variables.</li><li>Since my edge function must be able to communicate with the database, I pulled the database from my remote machine in order to test if my function was able to insert the data in the format required.</li><li>To test the edge function, I ran the command supabase functions serve sync-events. This creates a temporary backend environment for supabase edge functions. In a separate terminal I called the functions manually using a curl request to validate the ingestion workflow end-to-end.</li></ol><h4>Remote Deployment</h4><p>Once the synchronization workflow was validated locally, the edge function could be deployed into the hosted Supabase environment and integrated with remote scheduling infrastructure.</p><ol><li>I first deployed the edge function to the remote machine thats hosts my Supabase project using supabase functions deploy sync-events.</li><li>In my local environment, my .env file held third party API keys and other secrets that my edge function uses. To store my secrets, I ran the command supabase secrets set API_KEY=... , for each variable.</li><li>I followed the same testing as I did with local development but I called the url endpoint of my deployed edge function instead of my local edge function.</li><li>To automate the calls to the edge function, I attached a cron job using the pg_cron integration on Supabase to the deployed edge functio to run the function everyday at 10 a.m.</li><li>I then verified that the function was actually be called by viewing the logs associated with both the integration and the edge function.</li></ol><h3>Key Takeaways</h3><p>One of the biggest lessons from this project was realizing that there are many valid ways to approach synchronization and ingestion workflows. The system could have been implemented using traditional backend servers, cloud functions, external schedulers, or containerized workers. For my particular use case, Supabase Edge Functions combined with pg_cron provided a strong balance between simplicity, reliability, and operational overhead. The architecture allowed the ingestion pipeline to run independently from the main application while remaining close to the hosted database infrastructure. More importantly, the project reinforced that infrastructure decisions should evolve naturally alongside application requirements rather than starting with unnecessary complexity.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=70942ebdafe3" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Classifying NBA Contenders]]></title>
            <link>https://medium.com/@akshayprabhu396/classifying-nba-contenders-b10759c8c0f6?source=rss-8783643de3bb------2</link>
            <guid isPermaLink="false">https://medium.com/p/b10759c8c0f6</guid>
            <dc:creator><![CDATA[Akshayprabhu]]></dc:creator>
            <pubDate>Thu, 14 May 2026 06:19:24 GMT</pubDate>
            <atom:updated>2026-05-14T06:19:24.128Z</atom:updated>
            <content:encoded><![CDATA[<p>Originally Posted 2/27/2026</p><h3><strong>Framing the Problem</strong></h3><h4><strong>League Context</strong></h4><p>The NBA regular season spans 6 months and consists of 1230 total games (82 per team). Each team competes to snag a spot in the playoffs where they play a bracket style tournament to win the NBA championship. Given the length of the regular season, hot streaks, flukey stretches, injuries, and trades continuously prompt new story lines in the NBA regarding which teams have a real shot at the NBA championship.</p><h4><strong>Core Problem Statement</strong></h4><p>Can regular-season data identify true championship contenders?</p><h4><strong>Why It’s Hard</strong></h4><p>Championship teams rarely follow a single blueprint. Over the past several seasons, the league has produced a wide range of champions with distinct identities, some built around elite offensive firepower, others anchored by dominant defense, depth, or a transcendent superstar. No team has returned to the Finals in consecutive years in the past 7 years, underscoring how fragile sustained dominance can be in a league shaped by parity, matchup dynamics, and roster volatility.</p><p>Compounding this challenge is the nonlinear nature of an NBA season. Teams evolve. Early-season performance may be distorted by injuries, experimentation with rotations, midseason trades, or the gradual development of chemistry and identity. A contender in April may look fundamentally different from that same team in November. Any model built purely on aggregate regular-season statistics must therefore account for timing, trajectory, and context.</p><h4><strong>What is a Contender</strong></h4><p>Defining a “contender” is more nuanced than simply identifying the eventual champion. A single playoff outcome can obscure how competitive a team truly was. For instance, in the 2017–2018 season, the Warriors ultimately swept the Cavaliers in the Finals, yet were pushed to a 3–2 deficit in the Western Conference Finals against Houston. If the Rockets had converted a few late-game possessions differently, the championship narrative may have changed entirely.</p><p>Relying solely on champions or Finals appearances therefore risks collapsing meaningful competitive tiers into a binary outcome. Instead, this project treats contention as a spectrum of playoff performance rather than a single result.</p><p>To operationalize this idea, I use playoff statistics and outcomes from 2010–2025 and apply an unsupervised clustering approach to group teams based on postseason success profiles. Rather than imposing a predefined label, the model allows historical playoff data to reveal natural tiers of performance, which can then be interpreted as championship-level, near-contender, or non-contender archetypes.</p><p><strong>Unsupervised Learning to Determine Contenders</strong></p><p>Within the playoffs I extracted 5 features that best characterized a teams dominance without having to scale by season and accounted for strong contender losing teams losing in the semi-finals or conference finals. These 5 features were:</p><p>1. Win Percentage — what percentage of the games did the team win</p><p>2. Total Playoff Wins — more wins equals deeper playoff run</p><p>3. Average +/- per game</p><p>4. Average wins per series</p><p>5. Average +/- per series</p><p>These features were scaled and fit to a KMeans model with n_cluster=3 and init= “k-means++”. The goal was to split the teams into categories of play-in / non-contender, decent playoff team, and championship contender which these parameters did well. For visualization, I applied Principal Component Analysis (PCA) to reduce the five-dimensional feature space into two principal components, enabling a 2D representation of the clustering structure.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/971/1*61AQMtuuieqeBoMS66kIrg.png" /></figure><p>The rightmost cluster (yellow) captures championship-caliber teams, including archetypes such as the 2013–2014 Spurs and 2016–2017 Warriors. The central cluster (dark purple) represents strong but not dominant playoff teams, such as the 2023–2024 Los Angeles Lakers or the 2016–2017 Celtics. The leftmost cluster (pink) consists of lower-tier playoff participants and early exits.</p><h4><strong>Modeling Perspective</strong></h4><p>Although playoff performance naturally forms tiers, I ultimately frame this as a binary classification problem: contender vs. non-contender. After identifying championship-level teams through clustering, I collapse all other teams — including solid playoff teams and non-playoff teams — into a single non-contender class. This simplifies the prediction task to a clearer decision boundary: Does this regular-season profile resemble that of a historically championship-caliber team or not?</p><p>While a ranking or probability-based framework could provide more granularity, the binary setup aligns with the practical objective of identifying true title threats rather than differentiating between varying levels of non-championship outcomes. This framing also improves interpretability and stability, as it reduces noise introduced by marginal playoff differences and focuses the model on learning the structural characteristics unique to elite teams.</p><h4><strong>Hypothesized Key Factors</strong></h4><p>At a foundational level, championship-caliber teams tend to separate themselves through strong net rating, which serves as a holistic indicator of two-way performance. Teams that consistently outscore opponents at a high margin over large samples generally possess the structural balance required for postseason success.</p><p>Beyond baseline efficiency, more granular factors may further distinguish true contenders. These include lineup-level performance metrics (to capture rotational stability and adaptability), star player impact and usage concentration, clutch-time efficiency, prior playoff experience, and health continuity throughout the season and entering the postseason. Together, these variables attempt to move beyond surface-level win totals and instead capture the underlying traits that historically define championship-level teams.</p><h3><strong>Exploratory Data Analysis</strong></h3><h4><strong>Season Aggregate Stats</strong></h4><p>This section establishes a high-level baseline for team strength using season-long advanced metrics aggregated at the team level. By combining indicators such as PIE, usage rate, and availability-adjusted contributions, the goal is to capture overall roster quality beyond traditional box score totals. These features help contextualize which teams consistently generate positive on-court impact across their rotation, rather than relying on raw win totals alone. As an EDA step, this provides an initial signal of which statistical profiles tend to align with higher-tier teams. Below we see two advanced stats that allow us to visualize some rough decision boundaries to classify championship contenders.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/998/1*FviDKpfieDPe4jkRzPwtvQ.png" /></figure><h4><strong>Best Player Analysis</strong></h4><p>This analysis isolates each team’s top contributors using a custom rating metric derived from PIE scaled by minutes played, along with a season impact adjustment that incorporates games played. By combining efficiency, workload, and availability, the metric captures not just how effective a player is, but how much that effectiveness translates over a full season. The motivation behind this step is the consistent historical pattern that championship teams are anchored by at least one elite, high-impact player, such as Shai Gilgeous-Alexander during the 2024–2025 season or Nikola Jokić in 2022–2023, whose individual production meaningfully elevates team performance. Rather than analyzing depth, this feature is explicitly constructed to encode the concentration of star talent on a roster, providing a quantitative proxy for elite player impact that can later be incorporated into the classification model.</p><p>Although the table tracks the top 3 players from each team for every season, below is a visual of the top 10 most frequently occuring best players.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/939/1*-_rfTzcqurBc2A6cOmkJcA.png" /><figcaption>10 Most Frequently Occuring “Best Players”</figcaption></figure><h4><strong>Clutch Performance Analysis</strong></h4><p>This analysis captures team performance in high-leverage situations using full-season clutch net rating, defined as the difference between offensive and defensive rating in games within five points during the final five minutes. Clutch net rating provides a compact measure of late-game execution, reflecting how efficiently a team scores and defends under pressure. Championship-level teams often separate themselves in close games, where possession value and decision-making are magnified. By encoding clutch net rating as a single variable, this feature serves as a quantitative proxy for composure and execution in decisive moments for use in the classification model.</p><p>The top 10 teams by best net rating in the clutch are displayed below.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/691/1*DPylUC3iiy8fyis8ug75Og.png" /><figcaption>Dataframe of Best Clutch Teams within Dataset</figcaption></figure><h4><strong>Line Up Analysis</strong></h4><p>This analysis evaluates team performance at the lineup level using a composite Four Factors score rather than raw net rating. While net rating captures overall scoring margin, it can become noisy at the lineup level due to small sample sizes and short shooting swings. The Four Factors framework breaks performance into the core drivers of winning, including shooting efficiency, turnover control, offensive rebounding, and free throw rate, on both offense and defense. By aggregating the top minute weighted lineups per team, this feature encodes how consistently effective a team’s primary combinations are and provides a more stable, process oriented measure of lineup strength for the classification model.</p><h3><strong>Model Building</strong></h3><h4><strong>Pre Modeling Observations</strong></h4><p>Before training classifiers, several structural properties of the dataset were addressed. First, the target variable is highly imbalanced, with 62 contenders and 388 non-contenders, meaning a naïve model could achieve high accuracy by simply predicting the majority class; therefore evaluation is centered on F1 score to better balance precision and recall, ensuring the model meaningfully identifies true contenders rather than defaulting to the dominant class. Second, features were scaled within each season rather than across seasons, since league context shifts over time, for example an elite offensive rating in 2012–2013 may correspond numerically to a below-average rating in 2022–2023 due to pace and scoring inflation; seasonal normalization preserves relative strength within era. Finally, with 15 engineered features spanning lineup metrics, Four Factors aggregates, and player impact measures, Principal Component Analysis was explored to reduce dimensionality and mitigate multicollinearity while retaining the majority of variance, allowing for more stable and interpretable downstream modeling.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/421/1*3bGJ7jrZ8kwbdYe3KBeOYA.png" /><figcaption>Principle Components that Maximize Explained Variance</figcaption></figure><h4><strong>Logistic Regression</strong></h4><p>Given the class imbalance and moderately high dimensional feature space, Logistic Regression was expected to perform well due to its ability to learn a stable linear decision boundary while incorporating class weights and probability threshold tuning. Using a pipeline with PCA and grid search cross validation scored on F1, the best model retained 90% explained variance, reducing 15 features to 8 principal components. The optimal configuration used C = 0.1, a class weight ratio of 1:6.8, and a tuned prediction threshold of 0.866, producing an F1 score of 0.88, substantially outperforming other classical approaches.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/308/1*iUpYh83okovhrd6LMf6HtA.png" /><figcaption>Logistic Regression Confusion Matrix</figcaption></figure><h4><strong>Support Vector Classifier</strong></h4><p>SVC was expected to perform reasonably well given its ability to construct flexible decision boundaries in high dimensional spaces. However, despite tuning kernel and gamma parameters, performance lagged behind Logistic Regression. The best configuration used a polynomial kernel with gamma = 0.1 and PCA retaining 90% explained variance, but achieved an F1 score of only 0.44, suggesting limited separability even under nonlinear transformations.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/318/1*KHZoglTQEsXHeX8r7WQ8Dg.png" /><figcaption>Support Vector Classifier Confusion Matrix</figcaption></figure><h4><strong>K Nearest Neighbors</strong></h4><p>KNN was not expected to perform strongly due to the curse of dimensionality and class imbalance, which can distort distance based methods. Even after PCA retaining 95% variance and tuning neighbors, distance weighting, and power parameters, the best model achieved an F1 score of 0.50, reinforcing the expectation that local neighborhood methods struggle in this setting.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/305/1*n_8TEq9bmE_U3Ep7bLjL1Q.png" /><figcaption>KNN Confusion Matrix</figcaption></figure><h4><strong>Decision Tree Classifiers</strong></h4><p>Decision Trees were expected to suffer from instability and high variance, particularly given the heterogeneous makeup of contending teams where strong star driven teams and balanced lineup driven teams may not share simple rule based splits. Using PCA at 90% explained variance, the best configuration achieved an F1 score of 0.50, indicating limited generalization and supporting the hypothesis that a single tree struggles to capture contender structure.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/301/1*podA6MnnmcYDikoL3w5h8A.png" /><figcaption>Decision Tree Confusion Matrix</figcaption></figure><h3><strong>Inference on 2025–2026 Season</strong></h3><p>This model surfaces a few important limitations before interpreting results. First, Four Factors metrics were not directly available at the lineup level, requiring custom feature engineering to approximate lineup impact. Second, season availability data is slightly skewed since a full 82-game sample is not yet complete, which may introduce small distortions in cumulative impact metrics.</p><p>Using Logistic Regression provides a key advantage: probabilistic outputs rather than just binary labels. Based on the selected threshold, the Oklahoma City Thunder, Denver Nuggets, and Cleveland Cavaliers are classified as contenders, with the San Antonio Spurs narrowly missing the cutoff by just 0.004, effectively placing four teams at the top tier. The second tier, based on probability magnitude, includes the New York Knicks, Minnesota Timberwolves, Los Angeles Lakers, and Boston Celtics. A third tier emerges with lower but still notable probabilities, including the Charlotte Hornets, Toronto Raptors, Detroit Pistons, and Houston Rockets. These tier distinctions reflect relative contender strength rather than strict playoff predictions, offering a probabilistic view of competitive positioning across the league.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/776/1*84cC17nUa6cZsUaY4Znx9A.jpeg" /><figcaption>NBA Contenders Classification + Prediction Threshold Score</figcaption></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=b10759c8c0f6" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Designing a Sports Betting Bankroll Manager: Database Design, RLS, and RPC with Supabase]]></title>
            <link>https://medium.com/@akshayprabhu396/designing-a-sports-betting-bankroll-manager-database-design-rls-and-rpc-with-supabase-59ef4ceaca86?source=rss-8783643de3bb------2</link>
            <guid isPermaLink="false">https://medium.com/p/59ef4ceaca86</guid>
            <category><![CDATA[database-design]]></category>
            <category><![CDATA[backend-engineering]]></category>
            <category><![CDATA[system-design-project]]></category>
            <category><![CDATA[sports-betting]]></category>
            <dc:creator><![CDATA[Akshayprabhu]]></dc:creator>
            <pubDate>Thu, 07 May 2026 06:35:29 GMT</pubDate>
            <atom:updated>2026-05-07T06:38:22.404Z</atom:updated>
            <content:encoded><![CDATA[<h3>Motivation + Initial Architecture</h3><p>The goal of this system is to provide users with a structured way to track and analyze their betting activity across multiple sports books, while enabling deeper insights into betting behavior over time. Rather than simply storing raw bet data, the system is designed to capture relationships between users, bets, and events so that meaningful patterns that allow users to better understand their habits and make more informed, responsible decisions. This foundation supports both deterministic metrics (e.g., profit/loss, win rates) and more advanced behavioral analysis, allowing the system to surface trends like overuse of parlays or inconsistent stake sizing, ultimately helping users better understand and refine their betting decisions.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Ky-Q861eY6fxqsDP-NHZiQ.png" /><figcaption>Bankroll Manager UI</figcaption></figure><p><strong>Core Entities</strong></p><p>The system is built around four core entities that model the relationships needed to track and analyze betting activity. The <strong>User</strong> entity identifies each bettor and stores metadata such as unit size, which standardizes bet sizing and promotes responsible gambling, since the same dollar amount can represent different levels of risk depending on a user’s bankroll. The <strong>Event</strong> entity represents a game along with its start time, enabling time-based analysis and helping identify patterns in team or league preferences. The <strong>Bet</strong> entity serves as the central record, tying a wager to a user and associated events through foreign keys while storing key details such as stake, payout, and sportsbook for performance tracking. Finally, the <strong>Bet Leg</strong> entity captures granular information for each wager, including market type, line, and selection, and supports both single bets and multi-leg parlays through multiple associated records per bet.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*fnnLl_TBf46ZHJc7Lr2DEg.png" /><figcaption>Entity Relationships and Schema</figcaption></figure><p><strong>Design Influences</strong></p><p>The design of the schema was heavily influenced by both ingestion patterns and query requirements. On the ingestion side, repeatedly syncing external sports data required idempotent writes, which led to the use of stable business keys such as a combination game information (home team, away team, start time, league) and external event ID rather than solely internal identifiers. Time normalization also became important, as incoming data from multiple sources needed to be standardized into a single starts_at timestamp to support consistent querying. On the query side, access patterns drove the decision to store commonly filtered fields such as home_team, away_team, and starts_at, directly on the event record, avoiding the need for reconstruction or parsing at runtime. This ensured efficient lookups and simplified range-based queries, particularly when filtering by local calendar days.</p><p>The structure of bets and bet legs was shaped by the need to support flexible wagering types while maintaining clean data modeling. Since a single bet can contain multiple legs, a parent-child relationship between bets and bet_legs allows for normalization without duplicating shared data. This also influenced the write path, where creating a bet and its associated legs required an atomic, transactional approach, leading to the use of RPC functions to handle multi-table inserts. Additionally, read and analytics patterns informed selective denormalization at the bet level, with frequently queried fields like bet_type, placed_at, and result stored directly to support efficient filtering and metric computation. Finally, event resolution requirements — matching bets to real-world games using team names and timestamps — led to storing both human-readable and structured identifiers, enabling reliable mapping for both ingestion and downstream analysis.</p><h3>Initial Implementation + Limitations</h3><p>While building the logic for how I wanted my backend to interact with the database, I started by adding basic API endpoints that allowed me to connect to my database and write the information necessary in the correct tables but problems arose as I built out the application.</p><p><strong>Supabase Service Key</strong></p><p>Supabase is a Backend as a Service (BaaS), meaning my application connects to the database and related backend services through a Supabase client configured with an API key. In my initial implementation, I used the Supabase service role key for most database operations. This worked during development, but the service role key effectively has administrative access and bypasses Row Level Security, meaning database reads and writes were trusted entirely by my backend code rather than enforced by the database itself.</p><pre># Querying DB using Service Key<br>user = self._supabase_service.get_user_from_access_token(access_token)<br>user_id = user[&quot;id&quot;]<br>client = self._supabase_service.get_client()<br><br>try:<br>    query = (<br>        client.table(&quot;bets&quot;)<br>        .select(&quot;*&quot;)<br>        .eq(&quot;user_id&quot;, user_id)<br>        .order(&quot;placed_at&quot;, desc=True)<br>    )<br>except Exception as exc:<br>    raise RuntimeError(f&quot;Failed to fetch user bets: {exc}&quot;) from exc</pre><p>For an application used only by myself, this was functional, but it would not be a safe design for a multi-user system. If simple operations like querying public events or retrieving a user’s bet history rely on root-level access, then a bug in the backend query logic could expose or modify data across users. A better approach is to reserve the service role key for trusted backend jobs and use the anon key with JWT-based authentication and RLS for user-facing queries.</p><p><strong>REST API POST Requests</strong></p><p>In the initial implementation, most database access was handled through standard REST-style operations from the backend. This worked well for simple reads and writes, but it became more fragile once a single user action required changes across multiple related tables. Creating a bet, for example, required inserting a record into the bets table and then inserting one or more related records into bet_legs. If these were handled as separate REST operations, the application would need to manually ensure that both writes succeeded together.</p><pre># API call with multiple post requests to add a bet<br>def create_bet_with_legs(self, access_token: str, payload: BetCreate) -&gt; dict[str, Any]:<br>  client = self._supabase_service.get_user_client(access_token)<br>  normalized_payload = self.prepare_bet_payload(payload)<br>  <br>  payload_json = normalized_payload.model_dump(mode=&quot;json&quot;)<br>  legs_payload = payload_json.pop(&quot;legs&quot;, [])<br>  <br>  # 1. Insert parent bet first<br>  bet_response = client.table(&quot;bets&quot;).insert(payload_json).execute()<br>  bet_rows = getattr(bet_response, &quot;data&quot;, None) or []<br>  if not bet_rows:<br>      raise RuntimeError(&quot;Bet insert returned no data.&quot;)<br>  <br>  bet_id = bet_rows[0][&quot;bet_id&quot;]<br>  <br>  # 2. Attach bet_id to each leg<br>  leg_rows = [<br>      {<br>          **leg,<br>          &quot;bet_id&quot;: bet_id,<br>      }<br>      for leg in legs_payload<br>  ]<br>  <br>  # 3. Insert legs second<br>  legs_response = client.table(&quot;bet_legs&quot;).insert(leg_rows).execute()<br>  inserted_legs = getattr(legs_response, &quot;data&quot;, None) or []</pre><p>In a multi-user system, this becomes harder to reason about because a partial failure could leave the database in an inconsistent state, such as a bet being created without its legs or legs being written without the expected parent data. The fix was to move this flow into an RPC function, where Postgres handles the multi-table write as one database-side operation.</p><p>The second issue was that REST queries placed too much responsibility on the backend application to enforce ownership rules. For example, a get_bets function might query the bets table using select * and then filter by user_id based on the authenticated user from the JWT. While this works, the security depends on every query being written correctly every time. If a filter is forgotten or incorrectly implemented, the database itself would not prevent access to another user’s rows. Row Level Security fixed this by moving the ownership rule into the database through policies, allowing the application to use an authenticated Supabase client while Postgres enforces which rows the user can read, insert, update, or delete.</p><h3>Improved Architecture</h3><p><strong>Supabase Anonymous Key + Row Level Security</strong></p><p>In the improved architecture, the system shifts from application-enforced logic to database-enforced guarantees by fully leveraging features provided by Supabase. Instead of relying on a privileged service key and manual query filters, all user-facing interactions are performed through the anonymous key paired with an authenticated JWT.</p><pre># Query db with anonymous key<br>client = self._supabase_service.get_user_client(access_token)<br><br>try:<br>    query = (<br>        client.table(&quot;bets&quot;)<br>        .select(&quot;*&quot;)<br>        .order(&quot;placed_at&quot;, desc=True)<br>    )<br>except Exception as exc:<br>    raise RuntimeError(f&quot;Failed to fetch user bets: {exc}&quot;) from exc</pre><p>This allows Row Level Security (RLS) policies to govern access at the database level, ensuring that users can only read and modify data they own, while still enabling public access patterns where appropriate (such as getting information for a user’s betting history).</p><pre>-- Row Level Security Policy<br>CREATE POLICY &quot;Users can manage their own bets&quot;<br>ON public.bets<br>FOR ALL<br>TO authenticated<br>USING (user_id = auth.uid())<br>WITH CHECK (user_id = auth.uid());</pre><p>This change simplifies the application layer and centralizes authorization logic within Postgres, making it consistent and harder to bypass.</p><p><strong>RPC Functions</strong></p><p>RPC functions are used to encapsulate more complex operations that involve multiple tables or derived logic. Rather than issuing multiple REST calls for related writes, these functions execute server-side within a single transaction, enabling atomic writes, ensuring that creating a bet and its associated legs cannot result in partial or inconsistent data.</p><pre># Creating a bet using RPC function<br>def create_bet_with_legs(self, access_token: str, payload: BetCreate) -&gt; dict[str, Any]:<br>  client = self._supabase_service.get_user_client(access_token)<br>  <br>  normalized_payload = self.prepare_bet_payload(payload)<br>  payload_json = normalized_payload.model_dump(mode=&quot;json&quot;)<br>  legs_payload = payload_json.pop(&quot;legs&quot;, [])<br>  <br>  rpc_params = {<br>      &quot;p_bet&quot;: payload_json,<br>      &quot;p_legs&quot;: legs_payload,<br>  }<br>  <br>  # rpc function is titled &quot;rls_create_bet_with_legs&quot; in Supabase<br>  try:<br>      response = client.rpc(&quot;rls_create_bet_with_legs&quot;, rpc_params).execute()<br>  except Exception as exc:<br>      raise RuntimeError(f&quot;Failed to create bet with legs via RPC: {exc}&quot;) from exc</pre><p>This pattern aligns naturally with RLS, as functions can derive the current user from auth.uid() and operate within the same security context as standard queries. Together, RLS and RPC functions form a cleaner boundary between the application and the database, where the database is responsible not just for storing data, but also for enforcing correctness and access control.</p><h4>Key Takeaways</h4><p>Designing this system reinforced the importance of pushing critical logic, especially authorization and data integrity, down to the database layer rather than relying on application code. Using Row Level Security with authenticated clients simplifies access control and ensures consistent enforcement across all queries. RPC functions provide a clean way to handle multi step operations through atomic writes, reducing the risk of partial or inconsistent data. More broadly, structuring the schema around real access patterns and relationships leads to a more maintainable, secure, and scalable system.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=59ef4ceaca86" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>