My First Quarter in the World of Data

Musings of a Newbie in a Corporate Data Science Team

Sukriti Paul
6 min readMar 5, 2020
Source: iStock, referenced via MIT EECS

Switching careers comes with a pinch of uncertainty, exhilaration, and changes that we homo sapiens can quickly adapt to. Embracing one such change wholeheartedly, I recently switched my job from a Research Assistant at a premier research college in India, to a role in the Enterprise Customer Data Science Team of a multinational financial services corporation. My journey has been enriching so far: there is scope for learning something new every day.

Over interactions with my team and senior leaders, I’ve learned important lessons, which (I believe) could benefit every newbie pursuing a Data Analytics/ Data Science role. Therefore, I’ve compiled a list of learnings; if you breathe data, this might be of use!

Note: This article does not contain any company-specific or confidential information. My views do not represent those of my workplace.

Brush up Your Tech Basics

As you progress into the role, you will have more and more complex use cases to solve for. Imagine your plight if you shoddily revise your basics, only to realise that you’re taking 3X the time to produce results after a year into the role!

Source: A sample data science tech stack taken from here.

Utilize the first few months to strengthen your technical skills. Some skills can be learned on the go, however, there are a few theoretical concepts that need to be revisited now and then. Usually, every company specifies certain technical skills required for the role, based on its Data Science Tech Stack. The following list is generic:

  1. SQL Knowledge is Sacrosanct: Remember, one needs to manage and work with data (possibly big data), irrespective of the application. Hence, it becomes imperative to revise your SQL basics. Fortunately, SQL is something that you can pick up rather quickly, irrespective of whether you come from a technical background.
  2. SQL Query Tools (or Querying Engines): If you’re working with Big Data, you’ll need query tools depending on what your company uses for pulling data. The tools include Apache Impala/Hive/Spark, Presto, what have you.
  3. Software Frameworks like MapReduce: Similar frameworks can help in processing large amounts of data.
  4. Implement those Learning Models: Familiarize yourself with Python libraries like Keras/ PyTorch/ Theano/ TF etc. and(or) deep learning frameworks like Caffe.
  5. Working with Data: While pre-processing data or using it for a meaningful task, some corporates use Python/R. I also recommend getting accustomed to the shortcuts in MS Excel for drawing simple insights.

Understand the Business (Consumer) Landscape

As a Data Scientist/Analyst, your findings will be used in business-related decision-making. To draw actionable insights from the data or study impact, you need to understand how things fit into the bigger picture! Figuring out solutions without any context can be likened to preparing a stand-up comedy script without knowing your target audience. You must connect the dots to envision what the entire picture will look like. Take a minute to comprehend the stakes involved if your team decides whether a product/feature should continue or not, based on data!

Embrace the Gargantuan Databases

One of the best advice that I got, was to spend a good amount of weeks playing around with different corporate data sets. Defining your problem statements brings about a different thrill…of discovering something new while drawing inferences. Explore the gazillion attributes and data sources, find out which ones are important to your team. You grow with the databases: as new tables are updated, you keep discovering new trends and links! While designing a learning model, you should be able to provide a well-thought-out data set. Creating attributes requires a solid meticulous understanding of the existing data!

What is Your Team’s Workflow?

As a Data Scientist/Analyst, it’s important to find out the synergy between different teams (i.e., how different teams work with yours). Understanding this pipeline is crucial in realising where you fit in, and how your team’s efforts are contributing to a larger cause. For example, let us consider that you’re working for an online E-commerce firm. The Marketing and Product teams may design campaigns spanning different durations. After launching these campaigns, data may be collected and analysed by the Data Science team, who further design probabilistic or deterministic models for certain business use cases. Once the learning models are designed and implemented, the findings may be discussed with the Marketing/Product teams. Subsequently, the finalized model may be forwarded to the Tech/Dev team for production or optimization. See how things make more sense when you have a defined purpose in the pipeline?

Be a Content Master

Design Credits: Dennis Salvatier

For starters, I’m quoting the “content master” phrase from a 1x1 session with a senior leader (phrase credits). Know your content inside out for a particular task. While analysing data in the initial stages, chalk out the flow of steps that you wish to carry out, on paper. Avoid repeating executions and looping back to previous steps due to flawed logic or ambiguities in the initial problem statement. Big Data tasks can take hours and days; teeny changes to your code will not reflect in a jiffy!

Clarify the Data Flow at the Beginning

If you’re tasked with creating new features for a data set, discuss the requirements and database design flow with your manager- the immediate goal should be crystal clear. When it comes to feature engineering, one must consider that campaigns can run for months. If your algorithm was trained and is being tested on select attributes, you possibly cannot decide to pop in an extra attribute on the control group, as and when! Modifying the data provided to a model is far easier, timewise, in academia.

Be Mindful of the Testing Process (Especially for Online Learning Tasks)

Just like the Dev team, Data Science teams also have extensive testing performed over prolonged periods. It is worthwhile to enquire about the different test cells, including the control. How are data points divided into these cells? What are the durations for the same? What if a model is not performing up to the mark, midway? Which cells should the model be designed for? How to avoid overfitting on the optimal test cell? In addition, gain a general idea about performance metrics and their values for the existing model(s). On getting the feel of ML for enterprise right after my experience in academia, I observed a considerable difference in what was deemed to be a well-performing algorithm: I experienced what I call a ‘reality versus expectation’ scenario w.r.t metric values and thresholds. 😛

Learn How to Communicate Findings Effectively

Source: A Nature jobs blog article.

In my second review-meeting, I presented a few results to my leaders. Little did I know that each word would count towards making the team interpret my results. Fortunately, my Team Lead corrected me when there was an ambiguity in what I did versus what I had verbally expressed. Thereon, I realised the importance of clearly communicating the data flow to a large group of members. A good way to get the terms right is via mock discussions- before the meeting- where you could explain the flow to your teammates, and ask them if there’s a better way of organizing your content!

Count on Your Intuition (Not Fully Though)

Oftentimes, managers can view the data and assess if the numbers are what they expected, or if there are far too many outliers. Unless you’ve been a part of your corporate ecosystem for a good amount of time, it becomes challenging to develop such intuition. However, glaring results will play on your intuition early on. If you feel that the numbers are not right, run your logic though a more experienced team member. Let’s revisit the E-commerce firm example. If the number of customer transactions is far lesser than the number of merchants, then this could indicate that you’ve gone wrong somewhere (or the product/feature has flopped, which is unlikely). Common sense goes a long way, sometimes.

Note: This is my first attempt at writing a corporate job-related article. Do comment below, if you have any feedback! Thanks for reading the article. :)

--

--

Sukriti Paul

RA @ the Indian Institute of Science (ML/CV) || Founder @ The One in Asankhya Project || Google WTM Scholar || ACM-W Best Officer Awardee || GHCI Scholar .