Q#73: Categorizing foods

You are given the following dataframe and are asked to categorize each food into 1 of 3 categories: meat, fruit, or other.

Given this, write code to add a new column categorizing each row.

TRY IT YOURSELF

ANSWER

As a Data Scientist categorizing data is a common precursor task in data analysis and is often required to gain insights for a model building step that appears later. This question tests our ability to work with pandas dataframes.

Lets set up the data first.

import pandas as pd

data = {'food': ['bacon', 'STRAWBERRIES', 'Bacon', 'STRAWBERRIES', 'BACON', 'strawberries', 'Strawberries', 'pecans'],
'pounds': [4.0, 3.5, 7.0, 3.0, 6.0, 9.0, 1.0, 3.0]}

df = pd.DataFrame(data)

Our task is to add a new column, let’s call it ‘category’, to the DataFrame. This column will categorize each food item into one of three categories: meat, fruit, or other.

To categorize the food items and add a new column to the DataFrame, we will utilize pandas and define a custom categorization logic. Here's the code to accomplish this task:

# Define a function to categorize food items
def categorize_food(item):
item = item.lower()
if 'bacon' in item:
return 'meat'
elif 'strawberries' in item:
return 'fruit'
else:
return 'other'

# Apply the categorization function to create the 'category' column
df['category'] = df['food'].apply(categorize_food)

Explanation:

  1. We start by defining a function called categorize_food, which takes an item as an input. This function will categorize the food items based on the given logic.
  2. Inside the function, we convert the food item to lowercase using the lower() method. This ensures that the categorization is case-insensitive.
  3. We then use conditional statements (if, elif, else) to check the item against specific keywords. In this case, we check for 'bacon' and 'strawberries'.
  4. If the item contains the word ‘bacon’, we categorize it as ‘meat’. If it contains the word ‘strawberries’, we categorize it as ‘fruit’. For all other cases, we assign the category ‘other’.
  5. Finally, we apply the categorize_food function to the 'food' column of the DataFrame using the apply() method. This creates a new column named 'category' with the assigned category for each food item.

--

--