[DSI] Week 5: NLP, Image Featuring, and Web Scraping Oh My

[insert excitement here]

If you don’t or didn’t know me, this was… EASILY the most anticipated part of the course for me.

To provide a little bit of back story, I’d just started to my journey into coding a little over 2 years ago diving into a book about R. As with any first time coder, my naivete and inner wide-eyed child took over. I didn’t understand much of the entry level code, but the possibilities entranced me. Probably the first passage that really got me enticed about coding involved image featurization. The idea that black and white images were stored as matrices of integers representing intensities.. blew.. my.. mind.

Each number represents an intensity for a pixel in the image

Having finished up this week, I understand that you can flatten out these matrices to vectors to compare with other image vectors to find similarity. I understand that colored images are just matrices of red, green, and blue color mappings, but I can’t help but nerd out on the idea that you can accomplish such things.

Having built up the week’s subject matters so much to myself going into the week, I was not let done. The topics that built up all of the glitz and glammer around “data scientists” wrapped up in one week. Neural nets (deep learning), web scraping, image featurization and natural language processing.

These topics defined to me the difference between data science and traditional data analytics. Companies have been building models for many years. Data science gives you the tools to quantify likeness in more abstract mediums. We’re capable of determining a youtube video is a cat video using commonality in image features from cat photos. Capable of quantifying the positivity or negativity of your speech to a single score. And finally, we’re capable of acquiring this data from nearly anywhere with web scraping.