Sitemap
Loopio Tech

The team behind the product — comprises, Engineering, Tech, and Design

Why Being A Data Scientist In Tech Is Everything I Hoped It Would Be

--

Photo by Marius Masalar on Unsplash

I came to Loopio as a Data Scientist after 10 years as a Mechanical Engineer in the aerospace and automotive industry. It’s an understatement to say: this was a pretty big change for me. Just one year ago, I transitioned from:

  • A hardware company to a SaaS company
  • A customer-facing technical resource role to an individual contributor role
  • A travel-intensive role to working at a remote-first company
  • The United States to Canada

It has been a constant learning process, around all aspects of work. In this blog, I’ll share the lessons I learned from this period of change (and why it was everything I hoped it would be).

Ability to Work Across the Entire Machine Learning Pipeline

The first project I worked on at Loopio had me working through the entire Machine Learning (ML) pipeline: data gathering > data cleaning > exploratory data analysis (EDA) > feature engineering > feature selection > model-building > model evaluation > model optimization (hyperparameter tuning). It was glorious! I had been told, during my Data Science bootcamp, that in a ‘real job’ we would focus only on one aspect of the pipeline and were warned that it might get repetitive and boring. Yet here I was, with the opportunity to work through the entire process. Being able to go through the entire flow in a work setting gave me a good look at the similarities and differences between working on my own project and working on a shippable company project.

The biggest similarity is that data is dirty and the 80/20 rule holds: 80% of the time spent by a data scientist is on gathering, cleansing, and storing the data, while 20% of the time is spent on analyzing the data. All this uncertainty, however, was a fantastic opportunity to ask questions and learn the intricacies of the data. In order to trust the data source and to trust my analysis, I had to be able to run reality checks on the numbers and their distributions, to confirm that basic expectations were met. If not, I had to go back and resolve the mismatches.

Learning to Focus on Actual Business Questions

Data Scientists are always counselled on the importance of approaching problems from the business question angle. First, figure out what the product problem is that needs answering, then dive into the data. This is sound advice. I recently made the mistake of trying to develop a solution first. That resulted in raised hackles and the loss of an entire meeting debating the feasibility of my suggestion, because we did not yet have a collective understanding of what the actual problem was that we were trying to solve. What I should have done instead was work with all stakeholders to drill down and agree on the product problems we wanted to solve and prioritize them, before solutioning.

The Importance of Tailored Communication

Communication is important, there are lots of avenues for this, and it is crucial to tailor the communication to the audience. When I am stuck on a problem or need a second pair of eyes to evaluate my work, I need to communicate with my technical teammates about what the problem is, what I’ve tried, why I did it, and what seems wrong, so that they can help me out. Jargon is OK. At company-wide technical review meetings such as Sprint Reviews, being able to present the what, why, how, and wow concisely facilitates efficient information transfer to other engineers and also folks from other divisions. When I’m presenting to the leadership team, technical details don’t matter at all — what they really care about are the “so-what’s”, the advantages, drawbacks, amount of effort, cost-savings, resulting efficiencies, pain points solved, and other business-related issues.

Exposure to the Tradeoffs of An Embedded vs Centralized Data Science Team Mode

Over the last year, I’ve already had the opportunity to work in two different team modes, centralized and decentralized, both with their own advantages and disadvantages.

In the centralized mode, we had an ML Product Manager who surveyed our users’ requests, evaluated platform challenges, and looked for potential user workflow efficiencies. These were funnelled into potential ML opportunities. As a Data Scientist, I was downstream of this process. I worked on the proof of concept and passed this along to the Machine Learning Engineers who productionized the model so that it could be used for beta testing and ultimately released to general availability (GA). In this working model, I enjoyed the fact that the problems were prioritized and predefined when they got to me, and I could focus on solutioning. However, I did not enjoy being a step removed from the platform.

In the de-centralized mode, I am embedded within a specific platform team and am closer to our users and their problem space. The goal is to work closely with the platform Product Manager to uncover opportunities where ML can provide additional value by automating repetitive tasks and processes. The advantage of this model is that by being closer to the problem space, there is opportunity to influence product release prioritization earlier in the process, and to better understand all the intricacies and uncertainties of the platform.

Learning Brand New Meanings for Regular Words

I don’t think I will ever forget the time when a Director asked me to t-shirt size the effort”. It was after a hackathon presentation where I had outlined some of the challenges we have with our data sources. I was dumbfounded. I was pretty sure that he wasn’t asking me about my actual t-shirt size, but I could not figure out what the question was really about. It took me quite some few seconds to finally realize that he was asking how much effort it would take to address the data problem. Thankfully my Product Manager gracefully jumped to my rescue.

Here are more examples of my now-expanded vocabulary:

  • “We can decide on this async” — let’s discuss this on a chat (slack in our case) where folks can provide input at their own time instead of all deciding it right now
  • “Let’s take this discussion offline” — let’s discuss this outside of this meeting (in a smaller group meeting or async ;))
  • “There are no blockers to my work today” — there is nothing preventing my work from advancing today
  • “Just to double-click on this” — to drill down into this one particular thing
  • “Good thing we dogfood our own product” — we are our own customers and use our own product and this helps us uncover gaps and potential improvements
  • “I’m happy to give you back 10 mins of your time” — when a meeting ends early and you miraculously get 10 mins to go grab a snack
  • “We want to surface this to customers” — this is the piece of information that we want to display to customers
  • “This is our security posture” — these are the controls and processes we have in place to protect our software platform from cyber attacks

I am now happy to give folks back their time so they can remove blockers, carry on conversations async and offline as relates to dogfooding specific platform features and t-shirt-sizing efforts, and especially also on the items we decided to double-click on and surface to customers without compromising our security posture.

Kickstart Next Career

I had read somewhere that on average, there is enough time to go through 4–5 careers in a lifetime. Based on my experience so far, I am very happy to have made this leap into a completely different line of work. If you’re interested in making a career transition and want to chat, I’m happy to connect!

Check out the multiple opportunities currently available across Loopio’s Engineering, Product, and Design teams.

--

--

Loopio Tech
Loopio Tech

Published in Loopio Tech

The team behind the product — comprises, Engineering, Tech, and Design

Aida Rahim
Aida Rahim

Written by Aida Rahim

Data scientist with a background in mechanical and biological engineering.

No responses yet