SQL-style joins using Pandas
If you learned SQL you know that joining two or more tables is one of the delicate tasks you’ll do on a daily basis because of how relational databases work. You can do similar operations in pandas
.
Below is an overview of the different types of joins in Pandas and how to use them. Depending on the specific requirements of your analysis, you may need to use one or more of these join types.
Let’s import pandas and create a toy dataset:
import pandas as pd
df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': [1, 2, 3, 4]})
df2 = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': [5, 6, 7, 8]})
- Inner Join: An inner join returns only the matching rows from both DataFrames.
inner_join = pd.merge(df1, df2, on='key', how='inner')
print(inner_join)
2. Left Join: A left join returns all the rows from the left DataFrame and the matching rows from the right DataFrame. If there is no matching row in the right DataFrame, NaN values are returned.
left_join = pd.merge(df1, df2, on='key', how='left')
print(left_join)