Pandas >> Data Combination(1): merge()

NextGenTechDawn
8 min readMay 3, 2022

In this tutorial, we will explain how to combinate/join multiple DataFrames using merge().

Table of Contents

  • Introduction to merge()
  • Data Preparation
  • JOIN
    - INNER JOIN
    - LEFT JOIN
    - RIGHT JOIN
    - OUTER JOIN
    - CROSS JOIN
    - LEFT EXCLUSIVE/ANTI JOIN
    - RIGHT EXCLUSIVE/ANTI JOIN
    - FULL OUTER EXCLUSIVE/ANTI JOIN
  • JOIN by multiple columns
  • Removing duplicate columns
  • Specify suffixes for columns with the same name
  • Conclusion

Introduction to merge()

Firstly, let’s see the definition of the merge() function.

pandas.merge: Merge DataFrame or named Series objects with a database-style join.
https://pandas.pydata.org/docs/reference/api/pandas.merge.html

pandas.merge(left, right, how=‘inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=(‘_x’, ‘_y’), copy=True, indicator=False, validate=None)

Let’s explain the main parameters.

  • left: The DataFrame on the left.
  • right: The DataFrame on the right.
  • how: How to join the two DataFrames. There are…

--

--