Locked In and Busy: Week 1

Aaron Chen
5 min readMar 19, 2020

--

Photo by Sebastian Kurpiel on Unsplash

Hey folks!

Last week, I wrote a quick article on things to do that are helping me learn new data science/data engineer skills (you can find that here) and wanted to do a quick update as to how I’ve been doing!

Distributed Computing Cluster

TL;DR : This did actually turn out to be a cluster f. But I have alternate options to explore!

So I had Allison’s and Karen’s writings in mind as I was doing the actual physical labor of spinning up a server.

Basically, I was trying to take a computer from over 10 years ago; get it up and running; and put Linux, Spark, and Hadoop on it to use it as a local cluster. The goal was to take some workload off my laptop while also getting practice doing data engineer tasks.

I had been using said computer as a 1080p gaming/home theater PC up until maybe 2015, when I just didn’t need it anymore and unplugged it. So, I had to literally dust it off to try to get things up and going. After eradicating the dust bunnies, checking all the connections, redoing cable management, cleaning and changing the liquid cooling supply (which had mold in it!), I was rewarded with a computer that wouldn’t display a video out.

I removed one of the video cards (it was using two in CrossFire configuration) and tried again. Nope. Different cable. Still no signal. Different monitor? Nope.

After a few days, I ended up finding out a few things:

  1. Said old computer’s machine spirit appears to no longer want to work with me
  2. Same can be said of an even older VGA monitor I had that was working last time I checked in like…2008, haha
  3. Getting older hardware to run (or even just maintaining it) without manuals or documentation can be time consuming and tedious

As such, I have decided to give up on fixing the computer and have set it aside to be handled as e-waste once the coronavirus quarantine ends. Normally, I’d try to put them up for a swapmeet or something, but those events are definitely canceled, and ebay listings for my video card show that it’s worth anywhere from $50 (yay!) down to $3… with the sold listings seemingly heavily weighed towards $3 (boo).

I would really like to get maybe 2 Raspberry Pi 4s and rig them together like in the 8 blade unit I referenced before but I don’t consider that to be an absolutely necessary item to purchase in a time when orders are going crazy when delivery staff should be focused on delivering necessary supplies instead of toys.

I’ll figure something out. I still have my loaner computer from Flatiron School, so maybe I can configure that to be a Spark server? I’m not sure since it had other stuff installed by Flatiron and WeWork.

Progress on Other Projects

As I mentioned before, I decided to help out on a friend’s project! It does character recognition but in another language, so it’s interesting for me to see. One slight negative though is that the documentation needs some work. When I forked and cloned the repo to my computer, I didn’t even know what Python version I needed, let alone what packages were being used. I had to open the scripts individually in Visual Studio Code to figure that out. While I was trying to figure out how the logic worked and where they came from, I realized that it was unclear whether it was actually using things like Python 3 vs Python 2.

I’ve been trying to resolve dependency problems (which for some reason take a while to resolve on my computer running Ubuntu in WSL2), and just trying to mark up the existing scripts with comments and explanations. I’ll likely write a more clear ReadMe for the repo soon, and create a complete a list of required packages.

Since I did not get the Spark server up and running, I didn’t try to download other datasets to play with. My current laptop could probably hold a (small) big dataset…but I’d prefer not to keep loading up this thing with more and more. It slows down my computer and isn’t necessarily reflective of a job in industry.

Containerization

I was following along with Aaron’s walkthrough on containerization, but it turns out that this is a little more complicated than I expected for my use case. His instructions used MacOS and I’m on Windows 10 Home with WSL2 (Insider Slow Ring). I think I’ll actually write a very brief article talking about what I did to get things working.

Outside of Data Science

My area recently started a shelter-in-place order and things have changed quite a bit. Non essential trips outside can technically get you arrested or fined, although walks and errand runs are acceptable. Most public places have closed: the restaurant a block away from me won’t be open for ~3 weeks, my local Pokemon Go community locked down, and my gym has closed its doors until April.

This is making a big change in everyday life. While I don’t frequent the restaurant, I do realize that tons of small shops and food spots will be closed or going take out only. Pokemon Go was a nice way to see people I wouldn’t otherwise see and that’s on pause. Most importantly: it’s been almost 10 years since the last time I went without a gym. Minus the recovery times from my appendectomy and broken foot. I have to be more careful now about eating right, doing body weight exercises, and even hitting my step goals! By the way, I’m on Fitbit and Garmin, so if you’re on either or both of those networks, lemme know!

I’m going to do my best to stay positive, healthy, and diligent in my efforts! Hopefully, this whole mess will end sooner rather than later, and I’m definitely hoping to be working in a new spot as a data scientist =D

--

--