Using PyPDF For PDF →CSV Conversion To Find Missing Groceries

Convert PDFs to CSVs in an unusual–but practical–use case of real-life data engineering problem-solving.

Zach Quinn
Pipeline: Your Data Engineering Resource

--

Currently job searching? Give yourself an edge by developing a personal project using my free 5-page project ideation guide.

There’s an epidemic plaguing my neighborhood. Thankfully, it’s not biological. And, being an apartment complex, the area is rarely a target for “porch pirate” package stealers.

Our current issue: Missing or wrong address grocery deliveries.

Though the errors span the delivery service spectrum from Kroger to Instacart, I’ve only had experience with Walmart Plus (it’s part of my cell phone plan). After several… incidents, I’ve become vigilant (or paranoid) about the accuracy of my household’s delivery. When I shopped groceries in-person or, in a worst case scenario, shopped myself, I could check to make sure I purchased–and made it home with–everything on my list.

Since I don’t want to have to tap in the Walmart app or scroll the website, and because I’m a nerd, I’ve been searching for a way to automate grocery tracking.

Walmart offers an API–but only to vendors and e-commerce partners. Based on the MFA that exists for the current sign in flow and discussions on the web scraping subreddit, I’m not going to attempt to develop a web scraper with an ever-changing proxy.

--

--

Zach Quinn
Pipeline: Your Data Engineering Resource

Journalist—>Sr. Data Engineer; helping you target, land and excel in data-driven roles.