DigitalCrafts Week 12: Indy Project

Ryan Leon’s Bootcamp
DigitalCrafts
Published in
2 min readJul 21, 2017

Aggregating to:

Boom, individual project: https://goatfinder.herokuapp.com/

It’s an app that lets the user select a year range, some stat weights, and any current MLB team (or all of MLB), and it’ll return the greatest players of that era by that given criteria. I’m aware that there are dashboarding solutions that can do this in a matter of hours, but I wanted to further tighten my SQL skills. My classmate told me to just drop my main aggregation script on this blog post and drop the mic. So, this isn’t optimized, but it was fun to write:

SELECT fullnames.namefirst, fullnames.namelast, avghr.avghomer, position.pos
FROM
— Get player full names from master table
(SELECT
baseball.batting.playerid as idall,
baseball.master.namefirst,
baseball.master.namelast
FROM baseball.batting
JOIN baseball.master
ON baseball.master.playerid = baseball.batting.playerid)
as fullnames
— Get homer averages and constrain by year
JOIN
(SELECT
avg(baseball.batting.hr) as avghomer,
baseball.batting.playerid as idavg
FROM baseball.batting
WHERE baseball.batting.yearid >= 1800
AND baseball.batting.yearid <= 2016
GROUP BY idavg)
as avghr
ON fullnames.idall = avghr.idavg
— Get primary player positions
JOIN ( SELECT poslist.playerid, poslist.pos
FROM ( SELECT fielding.pos,
sum(fielding.g) AS position_occurence,
fielding.playerid
FROM baseball.fielding
GROUP BY fielding.pos, fielding.playerid
) AS poslist
LEFT JOIN ( SELECT fielding.pos,
sum(fielding.g) AS position_occurence,
fielding.playerid
FROM baseball.fielding
GROUP BY fielding.pos, fielding.playerid
) AS primarypos
ON primarypos.playerid=poslist.playerid AND primarypos.position_occurence >
poslist.position_occurence
WHERE primarypos.playerid IS NULL)
as position
ON avghr.idavg = position.playerid
— WHERE position.pos = ‘’
GROUP BY fullnames.namefirst, fullnames.namelast, avghr.avghomer, position.pos
ORDER BY avghomer DESC
LIMIT 10;

It was a process, first getting the data, then massaging it to a queryable database, then digging around and understanding the structure.

It sounds like complaining, but I had a great time. The script above took 3.5 seconds to execute. Thanks to implementing materialized views and indexes, I got it down to 200 milliseconds.

I think I worked waaaay smarter on this. I took regular breaks, 1.5 hours on, 30 minutes off, and I ran in to almost no major frustrations. I mean, I was challenged at every step, but once the 1.5 hours were up and I had a chance to rest and re-attach the challenge, my brain was ready for the challenge, my attention to detail was continuously sharp, and bug-causing mistakes were minimal.

Now to bone up on React, since I’m gonna refactor this so you don’t even need a submit button.

--

--

Ryan Leon’s Bootcamp
DigitalCrafts

Software developer student and aspiring data wrangler documenting the learning process of the development bootcamp