I’m “off” for a bit. Here’s why. Also, some stats

Dashiell Nusbaum
Push The Pace
Published in
2 min readMar 9, 2018

I haven’t written an article in a while.

It’s not that I haven’t been working towards creating content — well, actually, I haven’t been. I mean, I was. You’ll see.

I was working on a project, trying to learn at what point in a team’s season for each of the four major American sports leagues — MLB, NBA, NFL, NHL — is the team’s winning percentage strongly correlated to their winning percentage at the end of the season. As with most articles where I have to collect lots of data… it’s been a slog. There’s a lot of time involved: typing something into basketball reference, clicking to get to the right page, scrolling down, copying the right table, repeating this cycle for hours, then putting it all in a google spreadsheet (google’s excel). From there, I have to sort the data, plug in formulas that will spit out numbers, make graphs to produce regression values and find patterns, etc.

I know those past few sentences may have been a lot to read, but trust me, reading it is a lot easier than doing it. I love doing it, but it takes a while. Also, because it’s such a time-consuming process, I’m not able to collect as large a sample size as I might want to. I need to make that process quicker, so I can not only have more accurate data, but so I can write more articles.

So in the meantime, I’m learning the programming language Python. I’m using the website Codecademy, which is free. Python will allegedly make “scraping” data quicker and help streamline this whole process. Hopefully it will. If not, this may be a huge waste of time. I’m 42% of the way through learning it.

Anyway, I wouldn’t want to leave you with nothing. Here’s what I have so far regarding the “how far into the season do we know stuff” question.

I so far only have data for the NBA, and only for the 2016–2017 season.

It was at game 27, (a date as close as integer-ly possible to) exactly 1/3 of the way through the season, that there was a strong correlation (r squared=0.7) between winning percentage on that day and winning percentage at the end of the season.

(Hopefully) once I learn Python, I’ll be able to gather more than data from just 30 teams in one season, and we’ll get a more accurate result. I’ll be able to get data from every team across, multiple leagues, multiple seasons, in much less time. We’ll see. So again, bye, probably, for another period of time. At least for stats articles. I think.

--

--