CS373 Fall 2020: Week 6
What Did I Do This Week?
This week, my team got straight to week on phase two. With a lot of our front-end done from Phase 1, we moved to the harder part of data collection. With a couple of team meetings and sub-team meetings out of the way, this week is dedicated to collecting data for our Texas politicians.
What’s In My Way?
Right now, data collection is a lot of work. Our three RESTFUL sources are only a part of the data collection — much of our data needs to be scraped from CSVs, excel spreadsheets, PDFs, and HTML websites. Trying to source everything will be a team effort and will definitely take a lot of time.
What Will I Do Next Week?
In the upcoming days, I’m collecting the harder parts of the politician model. With one of my teammates scraping elected official information from the Google Civics Information REST API, my job is to scrape data about campaign finance from Open Secrets and get missing political challenger information from HTML files on Ballotpedia. This will be done through making use of the requests python module, the csv module, and the html.parser module. We’ll store this in another CSV using pandas.
Thoughts on Why is Silicon Valley So Awful to Women
The stories within the article are quite depressing, and it’s pretty awful to think about how this is still rampant within the industry. To fight this, it isn’t acceptable to not be sexist — we must be anti-sexist to call out messed up power structures and create an inclusive environment for all genders.
What is it Like Working in a Group?
I’ve had bad experiences with groups in the past, but so far my group is pretty on point. One of our members is pretty experienced with AWS and has become the de-facto pipeline guy and GIT master. The atmosphere has been super chill, and we have really smart people on all ends. Overall, this has been my favorite group project so far.
Experience with Iterators, Reduce, and Tuple
Experience with Team Contract and Peer Review
The team contract and peer reviews were very easy to do. With our group being very cooperative with each other, crafting the team contract was very straightforward, and giving my teammates glowing reviews was a no-brainer.
What Made Me Happy This Week
This week was relatively busy, but seeing our site deployed did put a smile on my face.
Tip Of The Week
If any other team has to scrape through a table within a PDF, the easiest way to do so is to convert the table inside the PDF into a CSV. Luckily, an open source project exists for this! Tabula was made to scrape table data in PDFs for news publications like ProPublica and the New York Times, and it’s maintained by an amazing developer community. All you have to do is run the local server, click a few buttons, and export your new CSV to your computer! Very simple.