<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by William Sovine on Medium]]></title>
        <description><![CDATA[Stories by William Sovine on Medium]]></description>
        <link>https://medium.com/@wcsovine?source=rss-7fd0b1fba6f2------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/0*zL1osoeC_mv3nXTW</url>
            <title>Stories by William Sovine on Medium</title>
            <link>https://medium.com/@wcsovine?source=rss-7fd0b1fba6f2------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sun, 24 May 2026 02:29:16 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@wcsovine/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[A Master Clase in Fixed Pitch Significance]]></title>
            <link>https://medium.com/@wcsovine/a-master-clase-in-fixed-pitch-significance-804b80864cec?source=rss-7fd0b1fba6f2------2</link>
            <guid isPermaLink="false">https://medium.com/p/804b80864cec</guid>
            <category><![CDATA[statistics]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[mlb]]></category>
            <category><![CDATA[sports-betting]]></category>
            <dc:creator><![CDATA[William Sovine]]></dc:creator>
            <pubDate>Fri, 14 Nov 2025 14:53:37 GMT</pubDate>
            <atom:updated>2025-11-14T14:53:37.636Z</atom:updated>
            <content:encoded><![CDATA[<p>See what I did there? Was Emmanuel Clase’s pitch fixing scandal something statistically significant? And are there others that we can identify?</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*fS2MwknubDtm3y8VV3kvIw.jpeg" /></figure><p>In recent news releases, the public has been made aware that two pitchers have been throwing rigged pitches in order to profit themselves and their co-conspirators.</p><p>The pitchers would tip-off bettors that they were working with about what type of pitch they were going to throw. The bettors, now armed with insider information, would then place bets that were all but guaranteed to win on those pitches. The inside bettors win several thousand dollars and the pitchers would get a kickback, it’s a win-win, until it isn’t…</p><p>The two pitchers facing the rigged or fixed pitch charges are Emmanuel Clase and Luis Ortiz from the Cleveland Guardians. Clase has allegedly been involved in the scheme since 2023 and Ortiz has allegedly been involved as of 2025.</p><h4>The Scheme</h4><p>One of the bets that Clase and his partners would take advantage of, is the outcome of the next pitch. Clase would let the bettors know that he would throw a ball on his first pitch, the informed bettors would bet the next pitch is a ball when Clase comes in to pitch, Clase then spikes the ball into the ground on his first pitch and voila easy money for those in on the scheme.</p><h4>Obvious Rigged Pitch — In Hindsight</h4><p>Was it obvious that Clase was throwing a ball on purpose? Watching the pitches armed with the power of hindsight, maybe? It is easy to look at the videos of the pitches brought up in the accusations, knowing what we know, now it might seem evident.</p><iframe src="https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;schema=reddit&amp;url=https%3A//www.reddit.com/r/mlb/comments/1osvuis/every_single_emmanuel_clase_pitch_referenced_in/&amp;image=" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/6768a740cd469c7cefa2a9d5dd2296f1/href">https://medium.com/media/6768a740cd469c7cefa2a9d5dd2296f1/href</a></iframe><p>Sometimes pitchers throw bad pitches, so in the moment, it may seem like the pitcher missed his mark.</p><h3>Following the Data</h3><p>I wanted to take a data driven approach to see:</p><p><strong>Was Emmanuel Clase’s pitch rigging something that we could pick up on by tracking the data?</strong></p><p>In the following data, I have tracked pitches starting with the 2023 season (when Clase was alleged to have begun the scheme).</p><h4>League Averages (excluding Clase)</h4><ul><li>First-Pitch Ball Rate: 37.74%</li><li>Non-First-Pitch Ball Rate: 35.53%</li></ul><p>Right off the bat (pun intended), we see that on average, pitchers do tend to throw balls on their first pitch more often than non-first pitches. This means that if Clase is also throwing more balls on his first pitch, then it may be hard to detect as an anomaly.</p><h4>Clase Averages</h4><ul><li>First-Pitch Ball Rate: 40.10%</li><li>Non-First-Pitch Ball Rate: 29.71%</li></ul><p>Well that’s interesting, Clase’s non-first-pitch ball rate is LOWER than the league average. This could indicate he has better control of the ball than most pitchers, but for some reason his first-pitch ball rate is HIGHER than the league average.</p><p>Not only that, we see that the league average first-pitch ball rate is 2.2 percentage points higher than non-first-pitches. Yet the difference for Clase’s first-pitch ball rate is <strong>10.4 percentage points higher</strong>!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/1*32YIqxAVdm1MERdNRkgocw.png" /></figure><h3>Significance</h3><p>While this may or may not look like damning evidence to you, I now want to see if this rate difference is statistically significant or if it could fly under the radar as pure chance.</p><p>I’ve conducted three tests to see if anything stands out as an anomaly. For this, I am utilizing pairwise z-tests to compare two proportions at a time.</p><p><em>We’ll use α=0.05 (i.e. p-value must be less than 0.05 for us to consider it a statistically significant difference), but we’ll be a bit more conservative and utilize the Bonferroni adjustment. With this adjustment, since we have 3 tests, we will utilize α=0.0167.</em></p><h4>Test 1: Clase First-Pitch (40.1%) vs. League Average Overall (35.6%)</h4><ul><li>Difference: 4.5 percentage points</li><li>p-value: .186673 — NOT SIGNIFICANT at <em>α</em>=0.0167</li></ul><p>First test reveals that Clase’s first-pitch ball rate of 40.1% is not really an anomaly when we consider that the overall ball rate for the league is 35.6%.</p><h4>Test 2: Clase First-Pitch (40.1%) vs. League Average First-Pitch (37.74%)</h4><ul><li>Difference: 2.36 percentage points</li><li>p-value: .495593 — NOT SIGNIFICANT at <em>α</em>=0.0167</li></ul><p>Clase’s first-pitch ball rate of 40.1% is also not statistically significant when compared to the league average first-pitch ball rate of 37.74%.</p><h4>Test 3: Clase First-Pitch (40.1%) vs. Clase Non-First Pitch (29.71%)</h4><ul><li>Difference: 10.39 percentage points</li><li>p-value: .002233 —<strong> SIGNIFICANT at <em>α</em>=0.0167</strong></li></ul><p><strong>Now we see the truth come to light. When we compare Clase’s first pitch ball rate of 40.1% to his non-first-pitch ball rate of 29.71%, we see that elevated number of balls certainly appears to be a statistically significant anomaly.</strong></p><h3>Conclusion</h3><p>Emmanuel Clase has better control of the ball than the average MLB pitcher. When we compare his first-pitch ball rate to his non-first-pitch ball rate, the difference is too wide to be chalked up to chance. Clase’s first-pitch ball rate is statistically significant and intentional.</p><p>This makes me wonder, is this a widespread issue? I’m thinking I’ll automate this type of analysis across all pitchers ahead of next season. Maybe I’ll uncover some widespread pitch rigging scandal, but hopefully this type of behavior is limited to the two pitchers named in the recent allegations.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=804b80864cec" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Any Way The Wind Blows]]></title>
            <link>https://medium.com/@wcsovine/any-way-the-wind-blows-3d69d5980c1e?source=rss-7fd0b1fba6f2------2</link>
            <guid isPermaLink="false">https://medium.com/p/3d69d5980c1e</guid>
            <category><![CDATA[statistics]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[baseball]]></category>
            <category><![CDATA[sports-betting]]></category>
            <dc:creator><![CDATA[William Sovine]]></dc:creator>
            <pubDate>Sat, 29 Jun 2024 18:24:18 GMT</pubDate>
            <atom:updated>2025-01-25T02:44:16.577Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*p9En4yYKUzdicx2oaI9c5A.png" /></figure><p>Can the wind help us with betting on MLB totals? In this article, I’m going to take a look at historic total points scored in MLB games compared to the wind direction. Finally, we’ll see if the findings can be used as an edge when betting on the MLB totals market.</p><h3>The Premise</h3><p>This is something that I’ve wanted to investigate for quite some time. The wind has to have an impact on the number of points scored, right? It makes sense logically.</p><ol><li>Baseball games theoretically have higher scores when balls are hit further.</li><li>Wind blows baseballs and most MLB stadiums are exposed to the outdoors.</li><li>So it makes sense that the wind can have an impact on the score of a baseball game. Either positively (wind blowing balls out of the park) or negatively (wind keeping balls in the park).</li></ol><p>I’ve collected data starting from the 2022 season up to the time of writing this article (June 28, 2024) for the purposes of this analysis.</p><h3>How many points can be scored in a baseball game?</h3><p>At a minimum — 1 point, but this outcome is less than 2% of the games.</p><p>At most — 95% of games are 17 points are less.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/896/1*rSixkZJYnNo1vV6FIskG7w.png" /></figure><h3>Which ways DOES the wind blow?</h3><p>No direction or “None” is the most common and this is typically what you would see with an indoor stadium, but there can be no wind during an outdoor game as well.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/896/1*5JZx_gyTWuHJl4T-cMmFJg.png" /></figure><h3>The Model</h3><p><strong>Model Type<br></strong>Through a bit of trial and error, I found that the most appropriate model to use for this situation is a <strong>Zero-Truncated Negative Binomial Regression</strong>.</p><p>Negative Binomial models are useful when we are trying to model distinct outcomes (e.g. 1, 2, 3, etc.) rather than continuous (e.g. 1.2, 1.5, 2.3, etc.), which is the case with baseball scores. You can’t score half a point.</p><p>The “Zero-Truncated” part refers to a part of the model that accounts for the fact that we can’t have zero points.</p><p><strong>Model Results<br></strong>The intercept, average points with no wind or “None”, came out to 8.78 points and looking back at the total points histogram above, that tracks.</p><p>The following wind directions lead to <strong>lower scores </strong>than no wind on average:<br>- Calm<br>- In From CF<br>- In From RF<br>- In From LF<br>- L To R<br>- Varies</p><p>And the following wind directions lead to <strong>higher scores</strong> than no wind on average:<br>- Out To CF<br>- Out To LF<br>- Out To RF<br>- R To L</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*i6ZDTJaMqwKDgq5t6Ug-ng.png" /></figure><p>Most of these make sense intuitively. Wind blowing out to the outfield shows higher scores on average than wind blowing in. However, there were only 2 directions that were statistically significant in this model:<br><strong>R To L</strong> — higher score than no wind by a factor of 1.05 or 5% on average.<br><strong>Out To LF</strong> — higher score than no wind by a factor of 1.04 or 4% on average.</p><p><em>One question this raises for me as perhaps something to investigate later. Why is wind going left more significant than wind going right? Could it be something to do with handedness of the hitter?</em></p><h3>Let’s Bet</h3><p>Alright we have proven that there are 2 wind directions that statistically produce higher scores. Let’s get to betting. In the below profit/loss graph, we bet on every game where the wind is either “Out To LF” or “R To L”.</p><p>We are going to bet the over, meaning we expect the total points for these games to go over the threshold set by the sportsbook. We bet the over because our model has shown that these 2 wind directions lead to the highest scores.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*nK5h52v-hFNTQ-dkSrmOkw.png" /></figure><p>And things don’t go so well… we end up with a -30 unit loss.</p><p><strong>What went wrong?</strong></p><p>There are 2 things that we need to consider:</p><ol><li>What is the total points line set at?</li><li>What is the implied probability based on the set price?</li></ol><h3>Total Points Line</h3><p>The sportsbook does not simply set the total points line for every game at 8.5 points and call it a day. They could set the line at 10.5 points, meaning the total points scored in a game must be at least 11 in order for a bet on the over to win and that is well above the 9.2 average that we see for “R To L” above.</p><p>Let’s look at the averages along with the opening and closing total points for the sportsbook:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*IY9bDcobzRFrRdHPPJg5RQ.png" /></figure><p>There are a couple of fairly interesting things we see in this graph:</p><ol><li>Average lines are lower or higher for certain wind directions. No wind has the lowest lines and Out To LF has the highest.</li><li>The total points line moves up from open to close on average for the wind directions with the highest scores, yet it moves down from open to close on average for the rest of the wind directions.</li></ol><p>Seems like we might be onto something.</p><h3><strong>Implied Probability</strong></h3><p>The over and the under bets don’t always come at a fair price. If you need a refresher on why that is, take a look back at <a href="https://medium.com/@wcsovine/how-a-sportsbook-makes-money-59d56f5eec64">How a Sportsbook Makes Money</a>.</p><p>We aren’t going to get 50% odds on the over and 50% odds on the under. Instead it is more likely going to be something like 52% on the over and 52% on the under so the sportsbook can make their 4% profit.</p><p>The probability plus the vig, or that 52% figure, is what we call the <em>implied probability</em>.</p><p>This is important because if we think that games with a certain wind direction score 9 total points 50% of the time, do we really want to bet when the implied probability says it is a 52% probability that the game goes over 9 total points? The answer is “No, we do not.” and that’s because we will lose 2% over the long run.</p><h3>Probability Distribution</h3><p>There is actually a third piece to the puzzle that we need to factor in and thats the probability distribution from the model we built. It would look something like this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*8ncDvM-KMxcljeJeLq74Cg.png" /></figure><p>We can now take a hypothetical example of when we should bet the over:<br>- Wind direction is “R To L”<br>- Sportsbook is offering Over 7.5 points for -110 (-110 comes out to an implied probability of 52.4%)</p><p>To obtain the probability of the game going over 7.5 points based on the wind direction, we would add up all of the bars less than 7.5 and get 51.9%. Since 51.9% is lower than the 52.4% implied probability offered by the sportsbook, we would NOT place a bet on the over.</p><h3>Let’s Bet Again</h3><p>Equipped with a smarter way of betting, let’s see how we do. We are again going to bet the over for our 2 big score wind directions, but only if our model says there is a higher chance than what the sportsbook is saying.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*rjJnu8T9U8me6S6metsGnw.png" /></figure><p>Things look a bit better now. This comes out to a -10 unit loss, or a 67% improvement based on this model instead of naively betting the over.</p><h3>What would you do next?</h3><p>We could easily apply the same logic to bet on the under instead of just the over. We could also add in other predictors like the wind speed or maybe certain parks have higher scores based on certain wind conditions. We’ve only just scratched the surface here, but I think we have shown that wind direction does have an impact on the total points scored in an MLB game. We have also shown that you can use that information to your advantage when placing bets on the over or the under.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=3d69d5980c1e" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[If All of Your Friends Jumped Off a Bridge, Would You? … Should You?]]></title>
            <link>https://medium.com/@wcsovine/if-all-of-your-friends-jumped-off-a-bridge-would-you-should-you-12d1135fb480?source=rss-7fd0b1fba6f2------2</link>
            <guid isPermaLink="false">https://medium.com/p/12d1135fb480</guid>
            <category><![CDATA[wisdom-of-the-crowd]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[sports-betting]]></category>
            <dc:creator><![CDATA[William Sovine]]></dc:creator>
            <pubDate>Sat, 11 May 2024 11:37:33 GMT</pubDate>
            <atom:updated>2025-01-25T02:38:58.113Z</atom:updated>
            <content:encoded><![CDATA[<h3>If All of Your Friends Jumped Off a Bridge, Would You? … Should You?</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*Jtn2u77xy1sawAt6" /><figcaption>Photo by <a href="https://unsplash.com/@martinirc?utm_source=medium&amp;utm_medium=referral">José Martín Ramírez Carrasco</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>“If all of your friends jumped off of a bridge, would you?” The words of my father still echo in my head. This is the question he would ask every time I did something dumb and gave the excuse that someone else did it. In my youth the answer was always, “Of course not!”. Now though, I think the correct answer would be, “Do they know something that I don’t?”</p><h3>What are you talking about? What does this have to do with sports betting?</h3><p>To connect the friends jumping off a bridge example back to sports betting, imagine you are checking the odds for your favorite team to win a game and all of a sudden the odds change from -110 to -150 right in front of your eyes. You must now risk $150 instead of $110 if you want to make $100 on a win. What happened?</p><p>As discussed in my previous post (<a href="https://medium.com/@wcsovine/how-a-sportsbook-makes-money-59d56f5eec64">How a Sportsbook Makes Money</a>), there is a reason the line moved. There were a lot of people (your friends) betting on your favorite team, so the sportsbook had to lower the odds to encourage more bets on the other team. People were making the leap betting on your favorite team because they think there is a greater chance of them winning than what the -110 (52%) odds reflect.</p><p>Because people already jumped off the bridge, right into the safe arms of your favorite team, you must now risk more for the same payout. Your potential return of this bet takes an immediate hit, down from 91% to 67%.</p><h3>Market Efficiency &amp; Wisdom of The Crowds</h3><p>Market efficiency suggests that prices in financial markets reflect all available information, implying that assets are accurately priced based on collective knowledge. The wisdom of the crowds concept posits that the aggregated opinions of a diverse group often yield more accurate predictions or assessments than those of individual experts.</p><p>A common example that we hear in finance is the Challenger spacecraft explosion. In the aftermath of the explosion, the market speculated on which company was at fault before the official announcement. Many different investors analyzed various factors such as past performance, contract details, and industry reputation to make informed guesses. Morton Thiokol, the contractor responsible for the faulty O-rings, experienced a significant drop in stock prices compared to the other companies, indicating market suspicion. This drop in price demonstrates how the market as a whole can make informed decisions, even with incomplete information held by each investor.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*Uk5KRWywz8EY9F-J.png" /></figure><p>This same logic can be applied to sports betting. Even though everyone is making their betting decision based on different information, we would expect that with this combined information, the market price will stabilize close to the actual value. In the example given above of betting on your favorite team, according to the latest market price, their chance of winning is actually closer to 60% instead of 52%. Maybe that’s why you felt inclined to make that bet in the first place, your knowledge of this team told you that they had a better than 52% chance of winning.</p><h3>Closing Line Value</h3><p>Closing Line Value (CLV) is a common way to determine if you are making smart bets.</p><p>Let’s take the example above of the line moving from -110 (52%) to -150 (60%) and let’s say the line stays at -150 until the game starts. The -150 line would be considered the closing line. Based on the theory of market efficiency, this should be very close to the team’s actual probability of winning. The market has had as much time as possible to reflect the knowledge of all the actors in the market.</p><p>CLV is calculated as follows:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/358/1*qj4fpM3RaPLfivGThsB0Dw.png" /></figure><h3>Real World Example</h3><p>You may be saying “Whoa, whoa, whoa! Even though I place a bet that the market agrees with, that’s only half the story, don’t we still have to win the bet?” Fair enough, let’s put this to the test and look at an example for the MLB 2023 season.</p><p>In these hypothetical examples, using my superpower of 20/20 hindsight, I have placed a $1 bet on every moneyline bet.</p><h4>All My Friends Jumped, Should I Jump?</h4><p>In this situation, I will place the bet AFTER the line has moved. In other words, I have seen everyone else jump and decide it is a wise to join in.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/965/1*GUA8MTxrWTvjrxthjk_BWg.png" /></figure><p>Over the course of the season I placed 2,141 bets and ended up with a loss of -$42, giving an ROI of -2%. My father would be pleased to hear that just because all of my friends jumped, it would not be wise for me to jump.</p><h4>All My Friends Are Going to Jump, Should I Jump?</h4><p>In this situation, I will place the bet BEFORE the line has moved. Everyone else is going to jump and I decide that I should go ahead and jump before they do.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/965/1*gcfTLXfjk_pqVIgNzU3l0A.png" /></figure><p>Over the course of the season I placed 2,141 bets for a profit of $22, giving me an ROI of 1%. It’s nothing to write home about, but I’m in the positive.</p><p><em>Perhaps in a later post, I’ll delve into ways to make this more profitable. For now it is important to note that jumping before others is the key to profit over loss in these scenarios.</em></p><h3>The Takeaway</h3><p>What have we learned? Should we jump just because all of our friends did? No, we should not because the market has already adjusted the price based on their decision to jump.</p><p>Right, so we should predict the outcome of the game and jump before everyone else does? This is getting closer to the big idea, but we are not quite there yet. This would work great if you have some sort of damning evidence that nobody else in the market has. I would imagine that most of the time, for most of us, this is not the case given the widespread availability of information that exists today. The outcome of trying to do this will be that you are simply contributing to the information that the rest of the market has.</p><p>We need a humbler approach. We don’t need to be some powerful sports guru that can predict the outcome of a game. All we need to do is understand what others are going to do.</p><blockquote>“When people talk, listen completely. Most people never listen.”</blockquote><blockquote>— Ernest Hemingway</blockquote><p>I would argue that we need to understand what everyone else thinks instead of trying to predict the outcome of a given game. Aggregate information from others to determine what they are going to do, and make your bet accordingly.</p><p><strong>Don’t jump because your friends have already jumped. Jump because you know they are going to jump.</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=12d1135fb480" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[A Bird In Hand]]></title>
            <link>https://medium.com/@wcsovine/a-bird-in-hand-a42cc54c0455?source=rss-7fd0b1fba6f2------2</link>
            <guid isPermaLink="false">https://medium.com/p/a42cc54c0455</guid>
            <dc:creator><![CDATA[William Sovine]]></dc:creator>
            <pubDate>Sun, 05 May 2024 02:16:47 GMT</pubDate>
            <atom:updated>2024-05-05T02:16:47.916Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*erLDO_Nu3pPOph_R" /><figcaption>Photo by <a href="https://unsplash.com/@the_real_napster?utm_source=medium&amp;utm_medium=referral">Dominik Lange</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>This will be quick, and I promise I’ll have many more interesting articles to follow. My goal of paying for my degree through sports betting is still alive and well. However, I’ve received interest from third parties to utilize my skills to build and improve sports betting models outside of my own. This has taken much of my time away from the weekly updates that I initially wanted to publish.</p><p>Now, I’m just a businessman, doing business, and the guarantee from helping others is a better time investment if I have to choose between that and building models for my own use. For this reason, my own personal betting has slowed down. Yet, the knowledge that I’ve acquired on sports betting has accelerated way quicker than anticipated. For this reason, I have a ton of ideas in my backlog that I want to write and share with you all.</p><p><strong>Going forward, my goal progress will include my work for building models for third parties.</strong> I wrestled with this idea for a bit and I think I need to include this as progress towards my goal for a few reasons:</p><ol><li><strong>The purpose of this experiment is to show that education can be funded solely through using the knowledge obtained in said education.</strong> This is very much still the case as I would have no way of obtaining or delivering on these types of third-party projects without the education I received in my Master’s degree.</li><li><strong>Sports betting was chosen for the reason that it is something that I enjoy working on.</strong> This still holds up. Even though I am helping make money for others, it does not change the subject matter of making money through sports betting.</li><li><strong>Lastly, it is an opportunity to share more with you all.</strong> In the same way that I learned Data Science from professors with years of experience at Notre Dame. I am now learning about sports betting from professionals with years of experience in the industry. Not all of the detailed information is able to be shared, but I do hope to share the lessons learned and any insights that I can.</li></ol><p>So finally an update on the goal!</p><h4>Goal Update — 4.97%</h4><p>$2,885 / $58,000</p><p><a href="/@wcsovine/paying-for-my-data-science-degree-with-sports-betting-0d0376aa4550">What’s the goal?</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a42cc54c0455" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[NHL Home Ice Advantage Isn’t Real]]></title>
            <link>https://medium.com/@wcsovine/nhl-home-ice-advantage-isnt-real-34a64a73f35b?source=rss-7fd0b1fba6f2------2</link>
            <guid isPermaLink="false">https://medium.com/p/34a64a73f35b</guid>
            <category><![CDATA[nhl]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[sports-betting]]></category>
            <dc:creator><![CDATA[William Sovine]]></dc:creator>
            <pubDate>Sat, 16 Mar 2024 17:27:21 GMT</pubDate>
            <atom:updated>2025-01-25T02:34:42.124Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/576/1*nZHA_MJiIQY70wN5tuX9lg.png" /></figure><p>Home ice advantage in the NHL is a social construct. It’s a fugazi, fairy dust, it doesn’t exist and I’m going to prove it.</p><h3>Setting the Stage</h3><p>I am going to be using NHL data from the start of the 2021 season up through the most recent games in 2024 (March 14).</p><p>Betting lines will be the consensus closing line across multiple online sportsbooks.</p><h3>Home Ice Advantage</h3><p>In ice hockey, goals are how games are won. Through regression analysis, we are able to identify how many more goals we can expect the home team to score on average.</p><p>As it turns out, on average, we expect a given team to score 3 goals. The home team is expected to score ~7% more or 3.2 goals, on average.</p><p><strong>How does a 0.2 goal advantage translate to win percentage?</strong></p><p>By running the above expected goals through a simulation we would expect the following outcomes:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/840/1*ZL0zhEfndDJPUFXxTTiBnw.png" /></figure><ul><li>45% home wins in regulation</li><li>38% away wins in regulation</li><li>16% go into overtime</li></ul><p>For the sake of simplicity, let’s assume no advantage in overtime, let’s assume home and away split the overtime victories.</p><p><strong>We conclude home ice advantage is 0.2 goals or 7% win percentage on average.</strong></p><h3>Betting Based On Home Ice Advantage</h3><p>What if we wanted to use this home ice advantage when betting on a given game? We need to look at the implied win probability from Vegas and see how well they predict the actual win probability.</p><p><a href="https://medium.com/@wcsovine/how-a-sportsbook-makes-money-59d56f5eec64">We know from previous research that the implied win probability should be higher than the actual win probability because of the vig.</a> The next question is, does the home ice advantage cause any situations where the actual win probability exceeds the implied win probability? If so, these would be profitable bets.</p><p>We can model the home team’s win probability based on the win probability assigned by Vegas using logistic regression.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/1*1iI7pjyLoWY2fJHvRBm5Sg.png" /></figure><p>Anything above the Break Even line would indicate a profitable situation when betting on the home team. Based on this model, the only time we might be profitable is when Vegas is giving less than a 10% chance to the home team. Unfortunately, those kind of odds never happen in real life.</p><p>We could create the same model for the away team and then compare how much area is beneath the curve between both models. More Area Under the Curve (AUC) would indicate better opportunities since more area means we are closer to the Break Even line. Here are the results:</p><ul><li>Home Team AUC: 44.9</li><li>Away Team AUC: 46.5</li></ul><p>Neither one of these models would prove profitable as both curves are less than 50. However, we can make the observation that Vegas odds for the home team are less favorable than Vegas odds for the away team.</p><h3>In Summary</h3><p>I began noticing that any model I created for NHL rarely ever picked the home team. This research shows why. Vegas seems to charge a higher price for the home team on average. Is Vegas overcompensating for home ice advantage? Maybe. But I think it is more likely that they are factoring in that their customers (bettors like you and me) are overcompensating for home ice advantage.</p><p><strong>Tread carefully when betting on the home team in NHL. Yes, they have an advantage, but that advantage is typically overpriced.</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=34a64a73f35b" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Which Sportsbook is Best? (NHL Edition)]]></title>
            <link>https://medium.com/@wcsovine/which-sportsbook-is-best-nhl-edition-5926b62cb587?source=rss-7fd0b1fba6f2------2</link>
            <guid isPermaLink="false">https://medium.com/p/5926b62cb587</guid>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[sports-betting]]></category>
            <dc:creator><![CDATA[William Sovine]]></dc:creator>
            <pubDate>Sat, 02 Mar 2024 13:47:16 GMT</pubDate>
            <atom:updated>2025-01-25T02:29:37.046Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*wfN2r5fVXcqo3CLG" /><figcaption>Photo by <a href="https://unsplash.com/@vjpedro?utm_source=medium&amp;utm_medium=referral">Pedro Bariak</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>For our purposes of defining which sportsbook is best, we want to know where are we getting the best bang for our buck. As discussed in the <a href="https://medium.com/@wcsovine/how-a-sportsbook-makes-money-59d56f5eec64">last post</a>, a sportsbook makes money by charging a certain percentage (called the vig) and we would like to minimize that cost.</p><h3>Price of Online Sportsbooks</h3><p>I picked three of the popular online sportsbooks (at least where I’m located in TN) and calculated the vig (price) charged for several NHL moneyline pairs from 2021–2024.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/917/1*7PtEh1HMJaZk9851juBKSQ.png" /><figcaption>The dotted line inside the box is the average vig and the solid line inside the box is the median vig.</figcaption></figure><p>The lower the vig the better and it looks like DraftKings offers the lowest price on average compared to the other two. BetMGM charges the most and I think it is safe to say that extra profit margin isn’t going towards fixing their buggy app.</p><p>There are even better options if you are interested in using offshore online sportsbooks. By using LowVig, you are guaranteeing yourself roughly 2% more profit on each wager compared to most other sportsbooks.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/917/1*lCXygpBKdF7t1sOsILLBog.png" /></figure><p>A measly 2% may not sound like much, but that can certainly add up. Let’s look at 10 winning bets with each of these sportsbooks at even odds (minus the average vig that is charged).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/315/1*hyWlvqX5Zmj5JBBSfiAX6A.png" /></figure><p>By betting with LowVig instead of BetMGM, you would have made an extra $21. This is only 10 bets so you can imagine how much that adds up over the long run.</p><h3>Getting the Best Price</h3><p>So what’s the best strategy to get the best price?</p><p><strong>When betting on a particular NHL game, it is best to shop around between a few online sportsbooks to see which one has the best price.</strong></p><p>Some are worth paying more in my opinion, especially if you are just using a single sportsbook and want a pleasant user experience. However, you should at least be aware how much you are paying for that user experience.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=5926b62cb587" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How a Sportsbook Makes Money]]></title>
            <link>https://medium.com/@wcsovine/how-a-sportsbook-makes-money-59d56f5eec64?source=rss-7fd0b1fba6f2------2</link>
            <guid isPermaLink="false">https://medium.com/p/59d56f5eec64</guid>
            <category><![CDATA[sports-betting]]></category>
            <category><![CDATA[data-science]]></category>
            <dc:creator><![CDATA[William Sovine]]></dc:creator>
            <pubDate>Sat, 24 Feb 2024 13:49:14 GMT</pubDate>
            <atom:updated>2025-01-25T02:26:10.248Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*XlZWlkBZkf8Ygjpc" /><figcaption>Photo by <a href="https://unsplash.com/@grantcaiphoto?utm_source=medium&amp;utm_medium=referral">Grant Cai</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><h3>We need to understand how Vegas or the Sportsbook that we use intends to make money off of us if we want to turn the tides in our favor.</h3><blockquote>“If you know the enemy and know yourself, you need not fear the result of a hundred battles. If you know yourself but not the enemy, for every victory gained you will also suffer a defeat. If you know neither the enemy nor yourself, you will succumb in every battle.” — Sun Tzu, The Art of War</blockquote><h3>How Sportsbooks Ensure a Profit</h3><p>Let’s say that there is a matchup that is completely balanced, each team has a 50% chance of winning. If we were to convert this probability to American odds you would see something like:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/132/1*xhHwTb0TesCz39umzTal8A.png" /><figcaption>Odds Scenario 1</figcaption></figure><p>But you will never see this because the sportsbook would never make a profit by offering these odds (and you will understand why shortly). In reality you will likely see odds like this offered:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/124/1*2ZCmGDp5ip5KExgggI5mhw.png" /><figcaption>Odds Scenario 2</figcaption></figure><p>These odds imply that each team has a 52.4% chance of winning, for a total of 104.8%. That extra 4.8% is called the vigorish (vig for short) or the juice, and that is the sportsbook’s profit.</p><p>To see how this works, let’s suppose we have 2 people betting on this particular game. The first person bets on Team A while the second person bets on Team B. Both are placing $100 bets with the online sportsbook.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*W9z3Oq1giepcclsKbNf-dg.png" /></figure><p>Let’s assume that Team A wins and the first person should be paid out. In Scenario 1 from the above odds scenarios, there would be no money left for the sportsbook. In Scenario 2, we see that the winner is paid out according to their odds (-110) and that leaves the vig leftover as the sportsbook’s revenue.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*pBhqi6KW8c_294a3Japw7Q.png" /></figure><p><em>I think something that is also fun to note is that the loser essentially gets to play for free (albeit they are now down $100).</em></p><h3>Impact of Vig on Our Bets</h3><p>Because of the vig that sportsbooks tack on, there are plenty of games that we will not bet on. Like in the above example, we would not bet on the game because we give each team a 50% chance of winning and the lines imply that they each have a 52.4% chance of winning. If we bet on this game over the long run, we would always lose money.</p><p>Therefore, we need to bet on outcomes where the implied probability (i.e. the odds that are offered to us) is less than the probability that we expect:</p><p><strong>Implied Probability = Book’s Assigned Probability + Vigorish</strong></p><h3>There Isn’t Always a Right Pick</h3><p>As discussed in the <a href="https://medium.com/@wcsovine/what-should-i-bet-on-and-why-its-ok-to-bet-on-the-expected-loser-08d7bd9be086">last post</a>, we don’t always need to pick the expected winner, but we do always need to factor in the vig. <strong>Sometimes, when you factor in that vig, the best bet is no bet.</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=59d56f5eec64" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[What Should I Bet On? (And Why It’s OK to Bet on the Expected Loser)]]></title>
            <link>https://medium.com/@wcsovine/what-should-i-bet-on-and-why-its-ok-to-bet-on-the-expected-loser-08d7bd9be086?source=rss-7fd0b1fba6f2------2</link>
            <guid isPermaLink="false">https://medium.com/p/08d7bd9be086</guid>
            <category><![CDATA[sports-betting]]></category>
            <category><![CDATA[data-science]]></category>
            <dc:creator><![CDATA[William Sovine]]></dc:creator>
            <pubDate>Sat, 17 Feb 2024 13:09:04 GMT</pubDate>
            <atom:updated>2025-01-25T02:20:00.262Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*8mXsLvg3mant-FSQ" /><figcaption>Photo by <a href="https://unsplash.com/@jeshoots?utm_source=medium&amp;utm_medium=referral">JESHOOTS.COM</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><h3>What to bet on?</h3><p>Unless I come across something interesting in my analysis that can give an edge, the plan is to bet on the moneyline.</p><h3>What’s the moneyline?</h3><p>Betting the moneyline is betting that a team will win. As simple as that. This is in contrast to betting the spread where you can bet a team to win by or lose within a certain number of points. The tradeoff with the moneyline is that you are paid more for underdogs winning and less for the favorites winning.</p><h3>Why the moneyline?</h3><p>With the moneyline, we just need the team that we are betting on to win. Ours and the team’s interests are aligned. As opposed to betting the spread where we want each team to score a certain amount of points and in reality both teams are simply trying to win.</p><p><strong>Keep it simple until complexity is justified.</strong></p><h3>Ok great so we just need to predict the winner and we’ll be profitable?</h3><p>Absolutely not.</p><p>We don’t have to predict a team to win, we simply need our model to be closer to the probability of them winning than what Vegas has them priced at <strong><em>even if that means we expect them to lose.</em></strong></p><p>Without getting too much into the math just yet, Vegas makes the most money by setting their lines where they have equal money bet on both teams, not by setting the lines exactly at a team’s win probability. This conflict can open up opportunity when money is being placed or is expected to be placed more heavily on one side than it should be from a probability perspective.</p><p><strong>Simple Example:</strong></p><ol><li>Vegas has the moneyline on Team A set at +233. These odds imply that team A has a 30% chance of winning and would pay out $2.33 for every $1 bet.</li><li>Our model is predicting that Team A has a 40% chance of winning.</li></ol><p>If this game plays out 100 times, Team A will still lose most of those games. Both we and Vegas can agree that Team A losing is what is most probable. The crucial point is that Vegas has them priced to lose 70/100 games whereas we believe they will only lose 60/100, which gives us an opportunity to make profit over the long run.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/578/1*bTDVWEuhxUr7G5hkyYcEtQ.png" /></figure><h3>Summary</h3><p>In order to make money sustainably, the goal isn’t to pick the winner every time. We’re in it for the long haul, and because of that, we’ll be picking the undervalued.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=08d7bd9be086" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Paying for My Data Science Degree with Sports Betting]]></title>
            <link>https://medium.com/@wcsovine/paying-for-my-data-science-degree-with-sports-betting-0d0376aa4550?source=rss-7fd0b1fba6f2------2</link>
            <guid isPermaLink="false">https://medium.com/p/0d0376aa4550</guid>
            <category><![CDATA[sports-betting]]></category>
            <category><![CDATA[data-science]]></category>
            <dc:creator><![CDATA[William Sovine]]></dc:creator>
            <pubDate>Sat, 10 Feb 2024 14:42:38 GMT</pubDate>
            <atom:updated>2024-02-10T17:52:22.955Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*5upjUpzluzW8aGUU" /><figcaption>Photo by <a href="https://unsplash.com/@drench777?utm_source=medium&amp;utm_medium=referral">Dylan LaPierre</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>A little over two years ago, I started on the journey towards obtaining my Master’s degree in Data Science. Through my early career I picked up skills in both the realm of data and the realm of computer science because they interested me and why not make a career out of what interests me?</p><p>Perhaps as importantly as my passions, the college I had always dreamed of attending offered such a degree, <strong>Notre Dame</strong>. Queue the Rudy chant.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/320/1*VW5QoI-Ul3tarc-hLOKe8g.gif" /><figcaption>Rudy! Rudy! Rudy!</figcaption></figure><p>Would I be accepted? How would I pay for it?</p><p>These were all questions for future me and the future is now: I got in, I graduate in May 2024, and sports betting is going to recoup the costs (hopefully).</p><h3>The Mission</h3><p>Make enough money from sports betting to pay the total cost of my Master’s in Data Science from the University of Notre Dame.</p><p><strong>Profit Goal: $58,000</strong></p><p>This is the latest figure from Notre Dame’s website for total cost of the MS in Data Science. It may have been a bit cheaper when I enrolled, but let’s use the most up to date figures for future readers.</p><p><strong>Starting Point: $0</strong></p><p>I will keep track of how much money I put in as a minus and profit from succesful wagers as a plus. Meaning we could get into the negative, everybody buckle up.</p><h3>Updates</h3><p>I will post updates as I have things to share. It will take time to bring in data and develop the models to use for wagering on each different sport. However, I will create posts as I find interesting insights along the way. Would love to create at least 1 profitable model for every major sport, so we can keep the action going year round.</p><h3>My Picks</h3><p>I’ll also post what my models suggest for anyone that wants to follow along. However, the actual bets that I place towards the goal will be determined by what odds are available at the time that I place my wagers.</p><p>I’ll post all of those suggestions here for anyone that wants to join in on the fun:</p><p><a href="https://st-cajetan.ghost.io/">https://st-cajetan.ghost.io/</a></p><h3><strong>Goal Progress — February 10, 2024</strong></h3><p>$0 / $58,000 (0%)</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=0d0376aa4550" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Opponent Adjusted Strength]]></title>
            <link>https://medium.com/@wcsovine/opponent-adjusted-strength-e2d69826bbfa?source=rss-7fd0b1fba6f2------2</link>
            <guid isPermaLink="false">https://medium.com/p/e2d69826bbfa</guid>
            <category><![CDATA[statistical-modeling]]></category>
            <category><![CDATA[ncaa-football]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[William Sovine]]></dc:creator>
            <pubDate>Wed, 18 Oct 2023 15:28:32 GMT</pubDate>
            <atom:updated>2023-10-18T15:28:32.245Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*s9gW1zE1aGeLpqQz.png" /></figure><p><em>Audience: Basic knowledge about football and machine learning models.</em></p><h3>Introduction</h3><p><em>I am going to employ many of the same techniques found in this </em><a href="https://blog.collegefootballdata.com/opponent-adjusted-stats-ridge-regression/"><em>blog post</em></a><em> from collegefootballdata.com.</em></p><p>Evaluating a team’s offensive and defensive capabilities is pivotal for assessing a team’s success on the field. Expected Points Added (EPA), a metric calculated from Expected Points that we saw in my <a href="https://wsovine.github.io/expected_points.html">previous experiment</a>, stands as a powerful metric for quantifying the impact of each play on a team’s potential to score or prevent points. This experiment seeks to leverage EPA, employing a Ridge Regression model, to comprehensively measure the offensive and defensive strengths of NCAA football teams.</p><p><em>How is EPA different from Expected Points?</em><br> EPA is simply how much EP increases or decreases on each play, i.e. EP from the current play minus EP from the previous play. A positive EPA would indicate a succesful play as the team has increased their position in terms of scoring the next points.</p><p>Ridge Regression, a proven technique in predictive modeling, adds a valuable layer of sophistication to our analysis. By addressing issues of multicollinearity and overfitting, it ensures a robust and stable evaluation of team performance including the strength of their opponents.</p><p>By scrutinizing play-by-play data and applying Ridge Regression with each team as a predictor for EPA, we aim to offer a generalized assessment of a team’s offensive and defensive strength. This methodology allows us to unearth nuanced insights, revealing not only how effectively a team scores, but also how effectively they prevent their opponents from doing so.</p><h3>Data</h3><p>The data that we are using comes from collegefootballdata.com. We are also leveraging the expected points data previously calculated. Cleanup and transformations have already been performed on the dataset and it only includes standard non-special teams plays, think passing and rushing plays.</p><h3>EPA</h3><pre>import pandas as pd <br>df_epa = pd.load_csv(&#39;https://github.com/wsovine/data/blob/main/ncaaf_epa_pbp.csv&#39;)</pre><h3>Modeling with Ridge Regression</h3><p>Use the team on offense, the team on defense, and homefield advantage (hfa) as our predictor variables (X) and Standard Expected Points Added (SEPA) as our dependent variable (y).</p><pre>X = df_epa[[&#39;offense&#39;, &#39;defense&#39;, &#39;hfa&#39;]] <br>X = pd.get_dummies(X) <br><br>y = df_epa.sepa</pre><p>We’ll try out a few different values for alpha in the below code to find the one that minimizes RMSE. Alpha determines the strength of penalty term and the penalty term is what keeps our coefficients at a reasonable scale.</p><pre>alphas = np.arange(10, 500, 10) <br>reg = RidgeCV(alphas=alphas, fit_intercept=False) <br>reg.fit(X, y) <br><br>print(f&#39;{reg.alpha_ = }&#39;) <br>print(f&#39;{reg.intercept_ = }&#39;) <br>print(f&#39;{reg.best_score_ = }&#39;)</pre><p>The next block of code simply takes the coefficients calculated in the above model for each team’s offense and defense, averages out EPA for each team’s offense and defense, and put it all in tabular form</p><pre>df_results = pd.DataFrame(<br>{ &#39;coef_name&#39;: X.columns.values, &#39;ridge_reg_coef&#39;: reg.coef_ }<br>) <br><br># Offense coefficients <br>df_off = df_results[df_results.coef_name.str.startswith(&#39;offense&#39;)].copy() <br>df_off[&#39;coef_name&#39;] = df_off[&#39;coef_name&#39;].str.replace(&#39;offense_&#39;, &#39;&#39;) <br>df_off.rename(columns={&#39;ridge_reg_coef&#39;: &#39;adj_off_epa&#39;}, inplace=True) <br><br># Defense coefficients <br>df_def = df_results[df_results.coef_name.str.startswith(&#39;defense&#39;)].copy() <br>df_def[&#39;coef_name&#39;] = df_def[&#39;coef_name&#39;].str.replace(&#39;defense_&#39;, &#39;&#39;) <br>df_def.rename(columns={&#39;ridge_reg_coef&#39;: &#39;adj_def_epa&#39;}, inplace=True) <br><br># Extract each unique team to a dataframe <br>df_team_stats = ( <br>  df_epa<br>  .rename(columns={&#39;home_team&#39;: &#39;team&#39;})<br>  [[&#39;team&#39;]]<br>  .sort_values(&#39;team&#39;)<br>  .drop_duplicates()<br>  .set_index(&#39;team&#39;) <br>) <br><br># Tidy stats up in tabluar form <br>df_team_stats = ( <br>  df_team_stats<br>  .join(df_epa.groupby(&#39;offense&#39;).sepa.mean())<br>  .rename(columns={&#39;sepa&#39;: &#39;avg_off_epa&#39;})<br>) <br><br>df_team_stats = (<br>  df_team_stats<br>  .join(df_off.set_index(&#39;coef_name&#39;))<br>) <br><br>df_team_stats = (<br>  df_team_stats<br>  .join(df_epa.groupby(&#39;defense&#39;).sepa.mean() * -1)<br>  .rename(columns={&#39;sepa&#39;: &#39;avg_def_epa&#39;})<br>) <br><br>df_team_stats = (<br>  df_team_stats<br>  .join(df_def.set_index(&#39;coef_name&#39;) * -1)<br>)</pre><h3>Review the Output</h3><h3>Homefield Advantage</h3><p>Before we dive into each teams strengths, lets first look at the extra predictor variable we added to the model to determine homefield advantage.</p><pre>df_results[df_results.coef_name == &#39;hfa&#39;]</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/200/1*XZvLgsKiZcfNVtkg2VwWtA.png" /></figure><p>At around 0.007 points per play we can start to understand how big of an impact homefield advantage has.</p><p>Let’s say there are 150 plays in a game, 75 plays for each team, we would expect about 1 point for the home team to be attributed to homefield advantage on average.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/434/1*biRoc0P9Shbgw1MoL9z2rA.png" /></figure><h3>Offense</h3><p>Now let’s check out the offense, we will take a glance at the top 5. Seems to make sense with Georgia at the top considering they did win the national championship.</p><pre>(<br>  df_team_stats[[&#39;avg_off_epa&#39;, &#39;adj_off_epa&#39;]]<br>  .sort_values(&#39;adj_off_epa&#39;, ascending=False)<br>  .head()<br>)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/476/0*fV5Ga_RE7PgnTrEZ.png" /></figure><h3>Defense</h3><p>Next we check out the top 5 defenses. Annnnd things start to look a little fishy. Did James Madison of the Sun Belt Conference have a defense better than the national champions and the entire SEC for that matter?</p><pre>(<br>  df_team_stats[[&#39;avg_def_epa&#39;, &#39;adj_def_epa&#39;]]<br>  .sort_values(&#39;adj_def_epa&#39;, ascending=False)<br>  .head()<br>)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/546/0*WkOXKZrFL0FgKTrr.png" /></figure><p>The outcome above reveals a big problem with this ridge regression method for calculating the opponent adjusted team strength. We have very sparse categorical data that we are trying to train this model with. Each game only allows us to compare two teams at a time, but that isn’t even the biggest problem. Each team typically plays the majority of their games against opponents in their own conference and that doesn’t allow us to adjust their strength relative to teams in other conferences very well.</p><p>In the above defensive strength output, there are 4 different conferences represented in the top 5 (Sunbelt, SEC, Big 12, Big 10). While it might make sense that JMU had the best defense in the SBC, you can’t convince me that their defense would outperform Alabama’s or Georgia’s.</p><h3>Adding Context</h3><p>Let’s add in the conference for each of our teams to at least give a bit of extra context around the team and their likely opponents.</p><pre>teams_conf = pd.load_csv(&#39;https://github.com/wsovine/data/blob/main/ncaaf_fbs_team_conferences_2022.csv&#39;) <br><br>df_team_stats = df_team_stats.join(teams_conf)</pre><p>Now we can plot our data using our 3 different dimensions (offense, defense, and conference) to really start to compare teams to one another.</p><pre>import plotly.express as px <br><br>fig = px.scatter(<br>  df_team_stats.sort_values(&#39;Conference&#39;).reset_index(), <br>  x=&#39;adj_off_epa&#39;, <br>  y=&#39;adj_def_epa&#39;, <br>  color=&#39;Conference&#39;, <br>  hover_data=[ &#39;team&#39;, &#39;Conference&#39;, &#39;adj_off_epa&#39;, &#39;adj_def_epa&#39; ]<br>) <br><br>fig.add_hline(df_team_stats.adj_def_epa.median(), line_dash=&#39;dash&#39;) <br>fig.add_vline(df_team_stats.adj_off_epa.median(), line_dash=&#39;dash&#39;) <br><br>fig.update_xaxes(title=&#39;Offensive Effeciency&#39;, zeroline=False) <br>fig.update_yaxes(title=&#39;Defensive Effeciency&#39;, zeroline=False) <br><br>fig.add_annotation(<br>  x=df_team_stats.adj_off_epa.min(), <br>  y=df_team_stats.adj_def_epa.max(), <br>  text=&quot;Bad Offense | Good Defense&quot;, <br>  showarrow=False, <br>  yshift=20 <br>) <br><br>fig.add_annotation(<br>  x=df_team_stats.adj_off_epa.max(), <br>  y=df_team_stats.adj_def_epa.max(), <br>  text=&quot;Good Offense | Good Defense&quot;, <br>  showarrow=False, <br>  yshift=20 <br>) <br><br>fig.add_annotation(<br>  x=df_team_stats.adj_off_epa.max(), <br>  y=df_team_stats.adj_def_epa.min(), <br>  text=&quot;Good Offense | Bad Defense&quot;, <br>  showarrow=False, <br>  yshift=-20<br>) <br><br>fig.add_annotation(<br>  x=df_team_stats.adj_off_epa.min(), <br>  y=df_team_stats.adj_def_epa.min(), <br>  text=&quot;Bad Offense | Bad Defense&quot;, <br>  showarrow=False, <br>  yshift=-20 <br>)</pre><p><a href="https://wsovine.github.io/posts/opponent_adjusted_strength.html#chart"><em>View interactive plot</em></a><em>.</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/789/1*GG7xAEjaTQf8k-pw2JcK_A.png" /></figure><h3>Conclusion</h3><p>Ridge regression gives us a simple yet robust way to estimate opponent adjusted team strength for all teams at once. If falls short in certain areas with the sparsity seen in our dataset; however, we are able to overcome these challenges by adding in a bit more context to our interpretations.</p><p><em>Originally published at </em><a href="https://wsovine.github.io/posts/opponent_adjusted_strength.html"><em>https://wsovine.github.io</em></a><em>.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e2d69826bbfa" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>