LeetCode Problem 2882 Drop Duplicate Rows — LeetCode Introduction to Pandas
2 min readOct 12, 2023
Solving Leetcode Introduction to Pandas study plan problems
Problem:
DataFrame customers
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| customer_id | int |
| name | object |
| email | object |
+-------------+--------+
There are some duplicate rows in the DataFrame based on the email
column.
Write a solution to remove these duplicate rows and keep only the first occurrence.
The result format is in the following example.
Example 1:
Input:
+-------------+---------+---------------------+
| customer_id | name | email |
+-------------+---------+---------------------+
| 1 | Ella | emily@example.com |
| 2 | David | michael@example.com |
| 3 | Zachary | sarah@example.com |
| 4 | Alice | john@example.com |
| 5 | Finn | john@example.com |
| 6 | Violet | alice@example.com |
+-------------+---------+---------------------+
Output:
+-------------+---------+---------------------+
| customer_id | name | email |
+-------------+---------+---------------------+
| 1 | Ella | emily@example.com |
| 2 | David | michael@example.com |
| 3 | Zachary | sarah@example.com |
| 4 | Alice | john@example.com |
| 6 | Violet | alice@example.com |
+-------------+---------+---------------------+
Explanation:
Alic (customer_id = 4) and Finn (customer_id = 5) both use john@example.com, so only the first occurrence of this email is retained.
Solution:
import pandas as pd
def dropDuplicateEmails(customers: pd.DataFrame) -> pd.DataFrame:
return customers.drop_duplicates(subset=['email'])
The drop_duplicates
method is called on the customers
DataFrame. This method removes duplicate rows from the DataFrame based on the specified columns, which is done using the subset
parameter. In this case, it specifies email
as the column based on which duplicate rows should be identified and removed.