SQL-style joins using Pandas

Mahbub Alam
Data Science Daily

--

Photo by Rubaitul Azad on Unsplash

If you learned SQL you know that joining two or more tables is one of the delicate tasks you’ll do on a daily basis because of how relational databases work. You can do similar operations in pandas .

Below is an overview of the different types of joins in Pandas and how to use them. Depending on the specific requirements of your analysis, you may need to use one or more of these join types.

Let’s import pandas and create a toy dataset:

import pandas as pd

df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': [1, 2, 3, 4]})
df2 = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': [5, 6, 7, 8]})
  1. Inner Join: An inner join returns only the matching rows from both DataFrames.
inner_join = pd.merge(df1, df2, on='key', how='inner')
print(inner_join)

2. Left Join: A left join returns all the rows from the left DataFrame and the matching rows from the right DataFrame. If there is no matching row in the right DataFrame, NaN values are returned.

left_join = pd.merge(df1, df2, on='key', how='left')
print(left_join)

--

--

Data Science Daily
Data Science Daily

Published in Data Science Daily

This publication will publish brief articles on data science theory, algorithms, tools and applications.

No responses yet