Approach to Answering AI/Data Science Interview Coding Questions

Most AI/Data Science companies are hiring Data Scientists who can code as data engineers.

Vimarsh Karbhari
5 min readOct 23, 2018

Looking at the trend in the last few years around hiring Data Scientists, it is increasingly becoming evident that Data Scientists not only analyze data and build models but they also program to deploy the algorithms at scale. Data Engineering is an important aspect of a Data Scientist’s day to day life. This article aims to provide an approach to answer coding questions asked during a data science interview.

The interviewer provides a problem and wants to see how you get to the solution. Problem solving ability is also at test here. The approach which the interviewee takes to find a solution to the problem is as important as the solution itself. Hence, directly jumping to the finest solution may not be the best approach. Let us understand the approach to answer coding questions by going through a step by step solution. The below question has been asked in many Data Science Coding interviews before.

Given an array of integers, return indices of the two numbers such that they add up to a specific target.

You may assume that each input would have exactly one solution, and you may not use the same element twice.

Example: Given nums = [2, 7, 11, 15], target = 18,

nums[1] + nums[2] = 7+ 11= 18, return [1, 2].

A good way to answer a coding question is to start with a Brute force solution which gives an unoptimized performance. During the course of the interview, the interviewer expects the interviewee to optimize the solution by thinking of other solutions.

Let us take the same approach towards finding a solution.

Brute force approach:

Brute force approach would be to have two For loops. The first For loop, we loop from first element to the last element in the array while the second For loop we are looping from the second element to the last one in the array. Then we just try to find the elements by subtracting them from the target.

public int[] twoSumBruteForce(int[] numbers, int target) {
for (int i = 0; i < numbers.length; i++) {
for (int j = i + 1; j < numbers.length; j++) {
if (numbers[j] == target — numbers[i]) {
return new int[] { i, j };
}
}
}
}

Complexity Analysis

  • Time complexity : O(n²). For each element, we try to find its complement by looping through the rest of array which takes O(n) time. Therefore, the time complexity is O(n²).
  • Space complexity : O(1).

Optimization:

Our time complexity needs improvement. The time complexity is poor in this case as we go through the loop twice. At this point we should be thinking of data structures which will reduce the lookup of the elements. Let us consider using a Hash table as the time to lookup an element in the Hash table O(1) for the most part. If there was a collision the lookup time would go to O(n) but the lookup to the hash table should be amortized to O(1) time as long as the hash function was chosen carefully.

One way to use the hash table is to store the elements into the hashmap indexed by their array index. Then using a For loop, we can lookup if each element’s complement (target -element value) exists. This would be O(n) as we are going through the elements twice, once while adding them to the table and second while we get the complement. This will require two passes. Lets take one step further to optimize having just one pass. We can iterate and insert elements into the table and at the same time, also look back to check if current element’s complement already exists in the table. If it exists we return the value.

public int[] twoSum(int[] numbers, int target) {
Map<Integer, Integer> map = new HashMap<>();
for (int i = 0; i < numbers.length; i++) {
int complement = target — numbers[i];
if (map.containsKey(complement)) {
return new int[] { map.get(complement), i };
}
map.put(numbers[i], i);
}
}

Complexity Analysis:

  • Time complexity : O(n). We traverse the list containing n elements only once. Each look up in the table costs only O(1) time.
  • Space complexity : O(n). The extra space required depends on the number of items stored in the hash table, which stores at most n elements.

From this example, we see that we have traded time complexity from O(n²) to O(n) to increase space complexity from O(1) to O(n). For any algorithm which performs on data, such trade offs have to be accounted for. In the Data Science world it is completely up to the person building the model to optimize the GPUs vs the time. GPUs have a cost value associated to them. So it is important to have a time-cost ratio which is optimal. In most situations, time is always optimized but there are scenarios where we cannot have infinite GPUs or adding more GPUs will not drastically reduce the time. A prudent data scientist will always see what is available and balance this trade off between time and cost.

During reader engagement, there was feedback around how to approach a coding interview in general. Hopefully this article equips the readers with a better approach.

Acing AI, the aim is to help you to get into Data Science and AI. The goal of the new Top AI Interview Questions Series is to provide top interview questions, answers and approach in AI/Data Science Interview. So far, Part 1, Part 2 and Part 3 have been very helpful to readers. I have profiled some of the best technology companies and written articles about AI interviews at Microsoft, Google, Amazon, Netflix, LinkedIn, Ebay, Twitter, Walmart, Apple, Facebook, Zillow, Salesforce, Uber, Intel, Adobe, Tesla and most recently IBM. This has led me to become the top writer in Artificial Intelligence on Medium. The AI interview preparation guides Part 1, Part 2 go over the details which help you prepare for any AI interview. Acing AI Portfolios helps you to showcase your AI work. Expert interviews and analyses gives you a sneak peak into the lives of AI/Data Science Leaders and analyses of AI tech companies.

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

The sole motivation of this blog article is to provide answers to some Data Science Interview Questions. I aim to make this a living document, so any updates and suggested changes can always be included. Please provide relevant feedback.

--

--