Q#32: Fraudulent Retail Accounts

Below is a daily table for an active acount at Shopify (an online ecommerce, retail platform). The table is called store_account and the columns are:

Here’s some additional information about the table:

  • The granularity of the table is store_id and day
  • Assume “closed” and “fraud” are permanent labels
  • Active = daily revenue > 0
  • Accounts get labeled by Shopify as fraudulent and they no longer can sell product
  • Every day of the table has every store_id that has ever used Shopify

Given the above, write code using Python (Pandas library) to show what percent of active stores were fraudulent by day.
-Credit to:
team@interviewqs.com

TRY IT YOURSELF

ANSWER

This question tests our ability to use the Pandas library in Python. Pandas is a library that allows us to organize and interact with data and tables in a dataframe format, which is quite similar to a table with column and row indices.

To answer this question, let’s break it down into chunks. First lets get a count of the total number of active stores. To do this we can use indexers [] to extract the rows by some conditional statement. Then we can use the .count() method to count the number of rows remaining. Here active stores are anything with daily revenue > 0 and our dataframe is called store_account.

active_stores = store_account[store_account['revenue'] > 0]
active_stores_count = active_stores.count()

Next, lets obtain the number of fraudulent stores within the active_stores dataframe again using indexers and count. Finally, lets divide by the active_stores_count to get the final answer.

fraud_stores = active_stores[active_stores['status' == 'fraud'] 
fraud_stores_count = fraud_stores.count()
fraud_stores_count/active_stores_count

--

--