How to answer a coding question in an interview

Deena Blumenkrantz

Published in

Deena Does Data Science

4 min readFeb 23, 2019

Question: “Using Pandas, how would you find text in a column?”

TLDR

After being asked a coding question at an interview you should follow these steps:

Take a breath and note your assumptions
Ask questions
Keep it simple
Address mistakes
Go further

Introduction

During an interview at a genetic testing company, I was asked the question: “how would you find some specific text in a column of a dataframe using Pandas?” This blog is about how I should have interacted with my interviewer to find the right answer to his question.

Background

For the sake of this article, I’ll call the genetic company GenesRUs and the interviewer, who asked me the question, Harold.

My interview consisted of a 30 minute presentation on a published paper with about 15 minutes for questions. I then met with two members of the team for six 45 minute sessions.

GenesRUs seems to have a great company culture. My interviewers had a nice rapport with one another and described their roles with balanced enthusiasm. Furthermore, they told me stories of colleagues who had moved from benchwork and become bioinformaticians within the company.

Harold’s demeanor was representative of the company culture. He described how teams interacted with each other at GenesRUs and his questions were delivered in a way that showed he wanted to hear how I think. Nevertheless, my nervousness made me jump to an answer when I should have asked more questions.

My ideal response

Step 1. Take a breath and note assumptions

The assumptions that jumped into my head were:

The text would show up multiple times
The original dataframe will be big! There would be too many rows for a human to view them all.
The original data will be complex! There would be too many columns for a human to view them all.

Step 2. Ask questions

Questions I should have asked:

Is the specific text exactly equal to entries in a dataframe field/cell or do cells contain the text along with other text?
What do you want the output to look like — should it be a smaller dataframe or a number of instances?

Step 3. Keep it simple

If the specific text is exactly equal to entries in the dataframe’s fields of interest and Harold wanted the output to be a smaller dataframe, then using the symbol ==, which means ‘is equal to’ would work. Below is an example dataframe, expression, and output.

# df[df['column_name'] == 'specific_text']
df[df['Month'] == 'June']

The above expression says: subset the dataframe “df” and return all rows where the entry in the column “Month” (of dataframe “df”) is equal to “June”.

Image 2. New dataframe that only contains rows where the month is June

If the specific text is a string that could be found in a longer string in the dataframe’s fields of interest, then use the contains() function.

df[df['Month'].str.contains('June')]

The above expression says: subset the dataframe “df” and return all rows where the entry in the column “Month” contains the string “June”. This would also lead to the dataframe shown in Image 2. Note that within Pandas, when you want to run a function on a string, you often have to turn the object into a string by applying the string function. This function goes through each object in a cell in the column “Month”, turns it into a string, then asks if the string contains the string “June”.

Step 4. Address mistakes

During my interview, my first answer to the question of how I would “find text in a column” was that I would use the sort function. I said this because I thought there would be multiple entries with the value (too many for a human to digest) and I wanted to see if that was true before I found them all. The confused look on Harold’s face, told me that this answer didn’t make sense to him. The simple question of “what do you want the output to be?” could have saved me the embarrassment of blurting out this answer. Furthermore, after some reflection, I now realize that counting the number of instances or graphing the data would be better outputs than a sorted list of the the column entries.

Step 5. Go further

To go through a full thought process on data munging I created a scenario and challenge that could be reflective of something that might come up at GenesRUs. To see it, check out my blog article “Data munging with pseudo genetic data”. (If you can’t find this blog, it is because I am in the process of making it!)

Please comment below with feedback — thanks in advance :)