From Mainframe COBOL into the Object Oriented Abyss: The Dinosaur Roars
You Are What You Eat
Is anyone else hungry? I am nearing the end of the the third and final month of the ChiPy Spring Mentorship program through which I am learning Python (and a great deal more), and it seems the more I dig in the more I want to consume! My skill set in these modern technologies is growing every day, and I am have having a fun time learning new things and putting them to use.
Since posting my last blog I have been pouring myself into my project, and as a result not only do I have a working web app, I have been learning even more about Python as well as other prominent technologies such as the AWS Cloud. My website can be used to retrieve real-time statistics from the MLB Gameday Data server. To make this happen I wrote my own logic to extract this data from a huge collection of JSON dictionaries on their server as the average fan cannot access the MLB API. This was a blessing in disguise as I learned to write API’s in Python and how to call them from the web server. With that, let’s get into the nerdy details! 🤓
I chose to put this site on the AWS Cloud because of important entities in my life which are moving in that direction, and it would definitely be another great skill to have in my arsenal none the less. AWS Elastic Beanstalk is a tool for deploying and scaling web applications and requires Python 3.4, so the virtual environment I created to develop this website had to downshift slightly from my base 3.6 installation.
I created the web app using the Python Flask framework. This was a hoot to figure out even with the many Flask tutorials I found on the web. I easily understood the routing method but passing data to/from the form and how to reference those fields properly, especially given the nuances inherent to one multi-purpose page I designed was tricky to figure out! But after many hours of reading, trial and error, and the occasional F-bomb 💣, I prevailed and data flowed into the page like water over Niagara Falls.
Shortcuts To The Data
After studying the MLB server, I chose to create 3 files supplying the web app with much of this information, and would also direct the app to the real-time data on the MLB server when requested. Two key MLB files drove this setup…
Master Scoreboard: There is one occurrence of this file for each day and buried deep inside is the schedule of all games for that day. Not only does it contain the game information, it also has the directory where the files for each game can be found! This proved to be crucial in how I designed back end and web processes to be driven from this data. #thankyouMLB
Below is just a tiny, tiny piece of the Master Scoreboard file from June 20th and I’ve highlighted an area referencing the Cubs vs Padres game. Notice the dictionary keyword “game_data_directory” and it’s value is a long directory name which I use to find the boxscore file for that specific game.
Boxscore: One of these files exists for each game played on any given day, and exists in a directory specific to that game. If teams play a double header there will be separate directories and boxscores for each game. This is where I found player stats specific to that game as well as their season stats. The one caveat here that really forced my hand to create a player master file was that the Boxscore dictionary only lists players who played in the game. So if I wanted to look for Tommy La Stella’s stats, and he did not play today, I’d have to search through boxscore dictionaries going back in time, day by day, until I found the last game he played in so I could retrieve his correct season stats. Hence the need to create a player master. #uuggghhh
Necessary Backend Processes
So based on the above I wrote backend processes to create and update three files; a daily schedule file containing links to all of today’s games, and two master files (players and teams) which use the schedule file to find their way into the myriad of data. These files are built from the boxscore files in the background throughout the day and became the source data for much of the web app since most requests are for team and seasonal data. It is only when a request is made to retrieve today’s stats that I need to access the boxscore on the MLB server again to pull those up-to-the-minute stats.
It seems logical that a master file update process like this would only need to run once per day, but thanks to how the data is updated on the Gameday server, information is not available all at once. Game and lineup information trickle in and are not complete until about an hour before the game starts. Schedule information is there early in the day, but the directories might not have been created yet, and even if they were the boxscore dictionary for each game might not have been created yet, and even if those were ready the lineups might not have been entered yet. This really necessitated an ongoing update process throughout the day so I would have information available to display at all times. #thanksagainMLB
I created the master dictionaries using JSON. To the left is the entry for Madison Bumgarner of the San Francisco Giants. Every MLB player is assigned a unique “code” which tends to be their last name, unless multiple players have the same last name in which case part of the first name is added to make them unique. From there you see some basic information at the next level along with two nested dictionaries, one for hitting and one for pitching.
The Web App
In using Flask I modeled the scripts around what I learned from the Mega Flask Tutorial. There’s a web server that waits for HTTP requests from an outside port and routes each request internally to a specific location in the main module. This was great as I could easily see how the actions taken in the web page (click, submit, etc) get directed to the corresponding function to handle the request, and if necessary redirect to another location within the module appropriate for the next part of the request. Below is a snapshot of the routing and code for the Home and Display By Team pages.
With web apps come API’s and I had to write several of them for this site. Most of the API’s retrieve data from the master files described earlier, but some go against the real-time data available on the MLB server to get that all important up-to-the-minute statistic. The architecture is interesting as a similar but separate server gets initiated to handle the API requests coming from the web server.
A funny thing happened on the way to the API forum. I found no matter which data type I returned from an API, I had to decode it first converting the result from a Byte String to Character String (unicode). Then I could reload the string into a JSON dictionary to feed my forms…which is what the API passed back to the calling module in the first place. 😑
Everything is returned as a Byte String data type. In the end this was no big deal to add two more lines of Python magic to decode the data and reload the dictionary, but imagine my frustration as every tutorial and blog I read failed to mention this crucial little piece of information! IPython was my saving grace for this and many other gotchas I had to work through.
Hey Web App Junction, What’s Your Function?
Simply put, the website is a mechanism to retrieve and display MLB player statistics through predefined methods, each one taking the user down a slightly different path to get to the desired data. Both paths lead to a list of players being displayed along with their stats for the season. If a player is selected at that point, then the system will retrieve and display the up-to-the-minute statistics from today’s boxscore file.
By Team: Click on the By Team button and all 30 MLB teams will be shown in alphabetic order along with their record, today’s opponent, and the local time they play (both game times are shown if there’s a double header). Then click on a team and all players for that team and will be displayed, grouped into Hitting and Pitching categories, and sorted by last name within each category showing the appropriate statistics.
By Last Name: Click on the By last Name button and a box opens up to enter a portion of the player’s last name (minimum of 1 letter is required). Click on Search Name and season statistics for any player whose last name matches the beginning pattern entered will be displayed, grouped by Hitting and Pitching categories, and sorted by last name within each category showing the appropriate statistics.
Regardless of the method used to display the season stats for a list of players, today’s stats for games already completed or in progress are just one click away. I wrote some very cool HTML code to highlight a player’s row and made it clickable to submit the request to see today’s stats. I can get to the real-time data very quickly as the player selected has an associated team code, which points to that team’s entry on the Team master file, which holds the directories for today’s games where I’ll find the boxscore file containing today’s statistics. A little up-front design work can go a very long way in developing efficient systems.
The Watch List: This is an epic function I hope to have completed by the big dance on July 13th. Through this feature you select players to be monitored, enter your phone number, and any time they hit a home run the system will text you. This process will make use of yet another MLB dictionary called the Game Events log as it scans for home runs hit by your favorite players. Thank goodness we just got a $20 credit from our great sponsor Twilio to receive 2500 free messages! Now that we have an extra week I am hopeful I can put those 2500 messages to great use!
Here is the link to the project folder in my GitHub account. Feel free to browse and reach out with any questions, keep in mind the site is still under construction but it’s pretty darn close to completion!
I Digress, Sort Of…
Monday night I sat in the left field bleachers at Wrigley Field watching the Cubs game with my daughter Ashley. Sunday was Father’s Day and we continued the celebration into Monday, which we dubbed Padres Day. There was a late game comeback by our hometown heroes and some 9th inning drama, and the Cubs held on to beat the Padres which brought much joy to the Mudville crowd. But during the game when not involved in conversation or engaged in crazy chants with the Bleacher Bums, the back of my mind was working out a problem amidst the lulls between innings.
Earlier in the day I had seen a problem posted in a new slack channel (#code-talk) that originated from some MIT course and it seemed like a fun challenge. I couldn’t resist the temptation to figure this out as I love to solve problems. Almost immediately I knew this could be easily solved using a powerful technique called Recursion and I made a few notes. My daughter glanced over at was what I was scribbling on my phone’s notepad and asked, “What the heck is that?” To which I replied, “Well…it’s Python!”
Don’t ask me why but my mind just does these things and when it gets close to figuring something out it keeps churning through ideas in the background until I have a resolution. What does this have to do with my web app? Not a darn thing, but in the previous two blogs I compared the mainframe COBOL world from which this dinosaur came to the Python / Object Oriented world through which I now travel, and I am going to do the same again here. However this time, I am going to compare code and show you solutions I wrote in both Python and COBOL so you can see the differences.
Today’s Title Bout, Python vs COBOL!
The problem was this: using the string ‘azcbobobegghakl’ as input, write Python code to find the longest string of characters in ascending alphabetic order, and ties count. Letters don’t have to be consecutive, just in order, and if the same letter is repeated next to each other, that counts as being in order. Given the input string cited, the correct answer is beggh.
First, the epic Python solution…
Notice the use of the recursive functions and how they both call themselves to shorten the string currently being examined until it runs out of letters. The search_next_letter_in_s function looks at each letter in s (catchy name, eh?) one at a time and calls search_rest_of_s to look at the rest of s to find the longest string in alphabetic order from that point on.
One interesting thing to note about this solution is that no for or while loops were used to drive this process. This was a self driving loop powered only by recursion. Also, no math was performed to manipulate indexes pointing to the current slice within the string being examined. In fact the only operation that added anything was the concatenation of the next letter in s to the current_long string. You can change string s to anything, even just a single letter, or null, and it still works!
And now the moment you’ve all been waiting for…the COBOL solution!
What you see to the left is just the preface to the actual logic. In COBOL, we must tell the program it’s running on IBM 370 architecture (the IBM 370 mainframe was created in 1970 and has undergone vast improvements since). We also have to define all variables used in the program, their sizes, data types, and any initial values. There’s a lot of work to be done before the real work begins, and this is just a tiny example!
So now the actual logic to solve the problem…
Here, loops are a plenty whereas in Python I was able to code without them, not that there’s anything wrong with that. I could not use recursion…well, I could, but let me tell you just how much work that would entail.
First, I would have to write two other programs, one for the paragraph that deals with s and one for the rest of s. The programs would call themselves much like the equivalent Python functions I wrote. Each program would need all of the boring stuff repeated that tells COBOL about the IBM 370 environment and other fun stuff it’s dying to know.
Next I’d have to define what we call a comm area (communications area). This is where you define all fields being passed back and forth between two programs, and to go along with this I’d also have to code a Linkage Section which is where the comm area definition resides (just before the Procedure Division). In accordance with these changes the Perform statements you see would change to call subroutine statements which would have to specify the comm area being passed to the other program.
Putting the notion of COBOL recursion aside and just looking at the two solutions I created, which language would you choose to develop your own solution? Both work and produce the correct results. But just looking at the number of lines to code, the COBOL logic itself is 50% longer than the Python solution, and then there’s that boring stuff to be added.
Winner by technical knockout…Python!!
Be assured no dinosaurs were injured in the making of this comparison!
Which Way Is Up?
There is still work to be done to get this site running properly in the cloud and finish up some additional functionality, and I have three weeks left to get there. I am determined though and am positive you will see a robust web app serving up real-time MLB data your way very soon.
I have had a blast with this mentorship program and have learned much, and I am ever thankful for the ChiPy User Group, its members, and most of all my mentor, Allan. Thank you for guiding me down a great path from which I have learned a ton, and for putting up with the barrage of questions and a little stubbornness at times. #thankyouAllan!!
Where do I go from here? Who knows but new doors will open and they’re starting to crack already. A couple of small opportunities have been talked about at my office where I can get involved and apply some of the things I’ve learned and get real working experience using these newer technologies.
I don’t know where this new path will eventually lead this dinosaur, but times are a changing and things are certainly looking up!