Interview with the Data Scientist
It’s pitch-dark inside the office at QHacks (keep dreaming, right?) as I usher in my guest. Nevertheless, I sidle right up to the blinds and peer — through the cracks of light — down at the bustling metropolis below.
“I know what you are,” I begin, fully aware that this wasn’t the best way to begin. “You’re impossibly fast on the keyboard and an astute logician. Your skin; undertones so pallid, like you never go out into the sunlight. You’re a slave to the algorithm-”
“Say it out loud. I dare you to say it.”
“You”, I whirl around to face them, “are a Data Scientist!”
“And are you afraid?”
“No. Yes. I don’t know.”
“Then ask me the most basic question; what do we eat?”
“PEOPLE! SOYLENT GREEN IS PEOPLE!” My screams echo down the empty corridors until they dissipate into nothing.
Okay, maybe that was a dramatic misrepresentation of one of the sexiest jobs right now in the market. It’s everywhere, though. According to The Economist’s latest May 2017 issue, data is the new oil. Naturally, where there is honey, there are bees.
I went and interviewed some data scientists from RBC, one of QHacks’ partners at last year’s hackathon. RBC currently has less than 20 data scientists on board company-wide. However, over the next few years, they wish to expand this number to over 500 employees. I was lucky enough to speak with some bona fide data wranglers. Here’s a top-down explanation of some common terms that get thrown around a lot during a Data Scientist’s day.
HUGE DISCLAIMER: I am not an expert on the technical bits of data science. Any concept explained here is meant to give an intuitive understanding and is extremely simplified. Unfortunately, I don’t have a “Margot-Robbie-in-a-bathtub” to explain but I’ll try my best.
Machine Learning: Algorithms that detect patterns in data in order to make predictions/decisions for future data.
Deep Learning: A set of machine learning algorithms that learns by discovering layers of increasingly abstract concepts of the data it’s trained on.
Neural Nets: Again, more algorithms. A multiprocessing computer system based off the architecture of a biological brain with a highly complex network of neurons (nodes to you, data-scientists).
Natural Language Processing: A sub-field of data science that concentrates on the interpretation of the nuances of human language.
Semantic Annotation: A compartment of Natural Language processing that allows computers to add context/background information to extracted concepts of machine processable information.
This semantic annotation topic is precisely what one of RBC’s data scientists, Mehrnaz, is working on. After achieving a master’s degree in computer science, her first “real job” was at a Toronto-based startup called Tiny Hearts. Tiny Hearts is an digital product atelier that has excelled in creating products like games, apps, keyboards and chatbots. So much so, it has recently been acquired by Shopify. According to Mehrnaz, she is still very interested in startups and entrepreneurship but appreciates the work environment that RBC provides. For example, RBC Research Team members hold a weekly reading circle in which they are intrinsically motivated to learn and explore new concepts together.
Another data scientist, Diane Renton, has her undergrad and master’s degrees in both pure math and physics. On top of this, she’s got two papers published in a journal and a conference publication. At RBC, Diane works with supervised learning, the most common type of machine learning. Supervised learning trains a machine learning task by providing it with the labelled inputs that the model corresponds to outputs. On the other hand, unsupervised learning is when these outputs are assumed by the trained model to be caused by latent variables, or unlabelled inputs. In other words, the machine learning model must determine the all the input’s relationship with each other. For a better understanding, here is a great example.
Luyu Wang, having recently graduated from UWaterloo both a master’s in math and electrical engineering, is now living the dream. At RBC Research, he works with unsupervised learning and data visualisation while creating a software tool called “Kaleidoscope”. Essentially, Kaleidoscope is an interactive Data Visualization tool that aims to “improve work efficiency for data scientists and quants at the bank”. Check out the video below:
Sure enough, Data Science skills are becoming a rising star at hackathons worldwide. Whether using it’s powerful data-mining/analysing/visualising capabilities or conceiving prototypes of data-driven software, there are an infinite amount of applications. For all you out there thinking about where to place your efforts for QHacks 2018, I have one word for you; *EXTERMINATE!*