Joins in Pandas

Lavanya Geetha
4 min readFeb 16, 2023

--

Introduction to Joins in Pandas

Pandas is a popular Python library for data analysis that provides powerful tools for manipulating and analyzing data. One of the most common operations when working with data is joining two or more data sets together.

In Pandas, joins can be performed using the merge function. The merge function combines two DataFrames based on the values of one or more columns. There are four types of joins in Pandas: inner join, left join, right join, and outer join.

Image source — Google

Inner Join

An inner join returns only the rows that have matching values in both Data Frames. Here’s some sample code that demonstrates an inner join:

import pandas as pd

# create the first DataFrame
df1 = pd.DataFrame({
'employee_id': ['1', '2', '3', '4'],
'name': ['John', 'Mary', 'David', 'Sarah']
})

# create the second DataFrame
df2 = pd.DataFrame({
'employee_id': ['2', '3', '4', '5'],
'department': ['HR', 'Finance', 'IT', 'Marketing']
})

# perform an inner join on the 'employee_id' column
result = pd.merge(df1, df2, on='employee_id', how='inner')

print(result)

Output:

employee_id name department
0 2 Mary HR
1 3 David Finance
2 4 Sarah IT

In this example, the merge function performs an inner join on the employee_id column. The resulting DataFrame contains only the rows that have matching values in both DataFrames.

Left Join

A left join returns all the rows from the left DataFrame and the matching rows from the right DataFrame. If there are no matching rows in the right DataFrame, the result will contain NaN values. Here's some sample code that demonstrates a left join:

import pandas as pd

# create the first DataFrame
df1 = pd.DataFrame({
'employee_id': ['1', '2', '3', '4'],
'name': ['John', 'Mary', 'David', 'Sarah']
})

# create the second DataFrame
df2 = pd.DataFrame({
'employee_id': ['2', '3', '4', '5'],
'department': ['HR', 'Finance', 'IT', 'Marketing']
})

# perform a left join on the 'employee_id' column
result = pd.merge(df1, df2, on='employee_id', how='left')

print(result)

Output:

employee_id name department
0 1 John NaN
1 2 Mary HR
2 3 David Finance
3 4 Sarah IT

In this example, the merge function performs a left join on the employee_id column. The resulting DataFrame contains all the rows from the left DataFrame and the matching rows from the right DataFrame. The row with employee_id equal to '1' doesn't have a matching row in the right DataFrame, so the department value is NaN.

Right Join

A right join returns all the rows from the right DataFrame and the matching rows from the left DataFrame. If there are no matching rows in the left DataFrame, the result will contain NaN values. Here's some sample code that demonstrates a right join:

import pandas as pd

# create the first DataFrame
df1 = pd.DataFrame({
'employee_id': ['1', '2', '3', '4'],
'name': ['John', 'Mary', 'David', 'Sarah']
})

# create the second DataFrame
df2 = pd.DataFrame({
'employee_id': ['2', '3', '4', '5'],
'department': ['HR', 'Finance', 'IT', 'Marketing']
})

# perform a right join on the 'employee_id' column
result = pd.merge(df1, df2, on='employee_id', how='right')

print(result)

Output:

employee_id name department
0 2 Mary HR
1 3 David Finance
2 4 Sarah IT
3 5 NaN Marketing

In this example, the merge function performs a right join on the employee_id column. The resulting DataFrame contains all the rows from the right DataFrame and the matching rows from the left DataFrame. The row with employee_id equal to '5' doesn't have a matching row in the left DataFrame, so the name value is NaN.

Outer Join

An outer join returns all the rows from both DataFrames. If there are no matching rows, the result will contain NaN values. Here's some sample code that demonstrates an outer join:

import pandas as pd

# create the first DataFrame
df1 = pd.DataFrame({
'employee_id': ['1', '2', '3', '4'],
'name': ['John', 'Mary', 'David', 'Sarah']
})

# create the second DataFrame
df2 = pd.DataFrame({
'employee_id': ['2', '3', '4', '5'],
'department': ['HR', 'Finance', 'IT', 'Marketing']
})

# perform an outer join on the 'employee_id' column
result = pd.merge(df1, df2, on='employee_id', how='outer')

print(result)

Output:

employee_id name department
0 1 John NaN
1 2 Mary HR
2 3 David Finance
3 4 Sarah IT
4 5 NaN Marketing

In this example, the merge function performs an outer join on the employee_id column. The resulting DataFrame contains all the rows from both DataFrames. The row with employee_id equal to '1' doesn't have a matching row in the right DataFrame, so the department value is NaN. Similarly, the row with employee_id equal to '5' doesn't have a matching row in the left DataFrame, so the name value is NaN.

Joins are a powerful tool in Pandas that allow you to combine multiple DataFrames based on the values of one or more columns. In this article, I covered the four types of joins in Pandas: inner join, left join, right join, and outer join. I also provided sample code to illustrate how each type of join works.

If you find this blog useful, then you can share a few claps, comment, and follow me for more data science stuff.

--

--

Lavanya Geetha

Data Science Professional | Microsoft Certified Data Scientist