2019 Project Preclude Part II

Hank Yun
4 min readJan 8, 2019

At 9 AM on a brisk fall morning in September, when I was looking forward to my walk to work in the city, I found myself still seated in the New Jersey Transit Bus to Port Authority waiting to get through the Lincoln Tunnel. I grew bored of looking out the window that I decided to make use of my time to work on my course project.

It was a fairly simple project. My instructor for the Machine Learning course at the New York City Data Science Academy, Reece Heineke, had helped me choose my dataset from Kaggle. The objective was to accurately predict the insurance cost of 1338 patients using a simple linear regression model. He said all I had to do was remove a few variables, replace a couple of text variables with their corresponding numeric label, and run a linear regression using scikit learn, but at the time, I still had difficulty figuring it out. Nothing made sense and nothing I tried seemed to work. I remember feeling very frustrated at the time but I will never be able to forget the experience of struggling through the problems and the turning point when I was actually able to figure it out.

https://www.kaggle.com/mirichoi0218/insurance

As I’m a recent graduate, my budget doesn’t extend to the purchasing of the best hardware on the market. A month before I started my project, I had bought a $200 Chromebook and a $200 Samsung phone. I had an older Windows laptop, but it wasn’t up to the tasks I needed a computer for. The machine learning models I would run in class would run too slowly, and the laptop itself was too heavy to carry around in the city. I wanted a cheap, lighter laptop that I could code on, so I chose a Chromebook. I didn’t look into it too much, thinking it would work like any other laptop. It turned out to be somewhat more complicated than I had anticipated it would be.

If you’ve never worked with a Chromebook, here is the short background. It’s essentially just a Chrome browser, and it can’t download Anaconda, or more specifically Juypter Notebook or R studios. That means that you can’t code on it the same way you would on a Windows or an Apple laptop. There are really only two ways to code on a Chromebook for the beginner:

  1. Go into developer mode and hack your way into the Chrome OS, ultimately downloading Anaconda into the local host.
  2. Use a cloud service.

I tried the first option, something I do not recommend to anyone, especially for a beginner. It is simply not a viable option because you would have to reinstall Juypter Notebook every time you turn off developer mode, which is what you’re doing every single time you turn off your laptop. I had more success with the cloud computing service. I chose Microsoft Azure notebooks because I have Outlook as my primary e-mail and liked Onedrive as my cloud backup. I found it relatively easy to use once I got the hang of it.

I found that running Juypter notebook using just the Chrome browser worked faster or at least at the same speed as my old laptop. However, it’s possible to run into trouble when you try and build your own datasets and want to run more ‘advanced’ machine learning libraries like keras or tensorflow, but more on that later. For scikit-learn libraries and traditional machine learning techniques, Azure’s free tier was good enough for me and it should be good enough for any beginner.

Now back to my experience while I was stuck on the bus in traffic. I connected my Chromebook to the internet though my phone’s mobile hotspot and went to work. The project was due in a couple of days, and I had been hacking away at this project for weeks now with no results. While I labored under the dark lights in the Lincoln Tunnel, though, everything I did seemed to click. I was in the state of “flow”. Completing the project itself didn’t take very long; I finished right before the bus reached Port Authority. My first reaction was a feeling of awe at my accomplishment; I was able to run my very first machine learning model from scratch on the bus!

My second reaction was a feeling some skepticism. The project wasn’t supposed to take this long. It was supposed to be a short easy project and here I was stressed that it wasn’t working until a couple days before it was due. But I was able to finish although the score was not very accurate: a score of .7507371027994937. I should have been proud of myself at that moment, looking back, I don’t blame myself. It’s just who I am; I want to be the best version of myself often times underplaying my accomplishments and focusing too much on my faults.

And I should be proud. A biology major who just started learning Python two months ago and whose highest math course is Intro Statistics and Calculus 1 should not be able run his first machine learning model on the bus with .78 mbps download speed and 0.67 upload mbps speed, let alone on a Chromebook with 4GB of RAM and 16 GB harddrive that can’t even download Anaconda.

But I had done it and got confirmation at the end of class. Reece gave a slight nod and smile as he looked at my project at the end of the last class. Maybe he was surprised at how simple the project was or just at seeing a completed project as non of the other student had theirs done yet. He said I had done a good job, and I was happy to hear that. I was ecstatic but I knew I could complete a better project with a better score with the confidence I had now. So I signed up for the next course…

Deep Learning with Jon Krohn.

--

--

Hank Yun

Research Assistant at Weill Cornell Medicine, Deep Learning Hobbyist, Sneaker Enthusiast!