How to Union Pandas DataFrames by Vertical & Horizontal

Naveen Nelamali
SparkByExamples
Published in
3 min readJan 12, 2024

In pandas, you can use the concat() function to concatenate or union the DataFrames along with a particular axis (either rows or columns). To union the Pandas DataFrames use the concat() function, by either vertical concatenation (concatenating along rows) or horizontal concatenation (concatenating along columns).

In this article, I will explain how to union two pandas DataFrames by rows and columns with examples.

Create Two DataFrames

Let’s create two DataFrames with a few rows and columns.

# Create DataFrame
import pandas as pd
technologies = [ ["Spark",20000,"30days"],
["Pandas",25000,"40days"],
]
column_names=["Courses","Fee",'Duration']
df=pd.DataFrame(technologies, columns=column_names)
print("First DataFrame:\n",df)

# Create second DataFrame
technologies = [ ["Hadoop",25000,"50days"],
["Java",30000,"40days"],
]
column_names=["Courses","Fee",'Duration']
df1=pd.DataFrame(technologies, columns=column_names)
print("Second DataFrame:\n",df1)

Yields below output

Vertical Union of Pandas DataFrames (Concatenation along Rows)

To concatenate (union) pandas DataFrames using the concat() method to concat two DataFrames by rows meaning appending two DataFrames. By default, it performs append operations similar to a union where it bright all rows from both DataFrames to a single DataFrame.

In the below example, concatenates the DataFrames df and df1 along the rows (axis=0), performing a union operation. The resulting DataFrame (df2) contains all the rows from both original DataFrames, and the index is not reset, meaning it retains the original indices.

# Union pandas dataframes using concat
df2 = pd.concat([df, df1])
print("Union pandas dataframes using concat:\n", df2)

Yields below output.

Concatenating Along Rows (axis=0)

If you want to perform a union operation on two Pandas DataFrames using concat(), you can concatenate them along the rows (axis=0). For instance, ignore_index=True is used to reset the index of the resulting DataFrame.

# Concatenate along rows (performing a union)
df2 = pd.concat([df, df1], ignore_index=True)
print("Union pandas DataFrames and reset index:\n", df2)

# Output:
#Union pandas DataFrames and reset index:
# Courses Fee Duration
#0 Spark 20000 30days
#1 Pandas 25000 40days
#2 Hadoop 25000 50days
#3 Java 30000 40days

As you can see, the resulting DataFrame includes all unique rows from both original DataFrames, performing a union operation. If there are duplicate rows, they will be retained in the result.

5. Horizontal Union (Concatenation along Columns)

Alternatively, if you want to concatenate DataFrames along columns, you can use axis=1 as an argument in the pd.concat() function.

# Concatenate pandas DataFrames along columns
df2 = pd.concat([df, df1], axis=1)
print("Concatenate pandas DataFrames along columns:\n", df2)

# Output
#Concatenate pandas DataFrames along columns:
# Courses Fee Duration Courses Fee Duration
#0 Spark 20000 30days Hadoop 25000 50days
#1 Pandas 25000 40days Java 30000 40days

6. Concatenate with keys

When you use the keys parameter with pd.concat(), it allows you to create a hierarchical index based on the provided keys.

In the below example, the resulting DataFrame has a multi-level index where the first level corresponds to the keys (‘df1’ and ‘df2’), and the second level corresponds to the original index of the DataFrames.

# Concatenate with keys
result = pd.concat([df, df1], keys=['df', 'df1'])
print("Concatenate with keys:\n",result)

# Union with keys
result = pd.concat([df, df1], keys=['df', 'df1']).drop_duplicates()
print("Union with keys:\n",result)

# Output:
# Concatenate with keys:
# Courses Fee Duration
# df 0 Spark 20000 30days
# 1 Pandas 25000 40days
# df1 0 Hadoop 25000 50days
# 1 Java 30000 40days

Hope you like it.

Happy Learning !!

--

--

Naveen Nelamali
SparkByExamples

I am seasoned Data Engineer Architect with extensive knowledge in Apache Spark, Hive, Kafka, Azure Databricks, AWS and GenAI