3 Simple Steps to Merge a Large Number of Files using Python

Kelly Szutu
Analytics Vidhya
Published in
2 min readApr 10, 2020
Photo by Luca Bravo on Unsplash

After data extraction, we usually need to merge those files together for further analysis. It can be several files, but it can also be hundreds of files. It’s easy to copy and paste the data we need in one sheet with just a few files. However, when dealing with a large number of files, there’s no point in doing it manually. So here I’m going to share a three-step way I used to put the data together using Python.

1. Import Library

import pandas as pd
import os

To merge the files, we need to use two modules, pandas for reading the CSV file and os for interacting with the operating (file) system.

2. Define Variables

file_list = os.listdir(r'C://Users//kelly//Desktop//data)
Folder_Path = r'C://Users//kelly//Desktop//data'
SaveFile_Path = r'C://Users//kelly//Desktop//data'
SaveFile_Name = r'merged.csv'

Then I create 4 variables for the simplicity of my code. Only the datatype of file_list is a list (with all the files in the folder path), other than that are all strings.

3. Run for loop

for i in range(1, len(file_list)):
df = pd.read_csv(Folder_Path + '//' + file_list[i])
df.to_csv(SaveFile_Path + '//' + SaveFile_Name, index = False, header = False)

Lastly, in the for loop, I read the files and save them into another file (in this case it’s merged.csv) one by one. Therefore, the merged.csv file will be appended by rows. You can also change the parameter of to_csv if needed.

About me

Hey, I’m Kelly, a business analytics graduate student with journalism and communication background who likes to share the life of exploring data and interesting findings. If you have any questions, feel free to contact me at kelly.szutu@gmail.com

--

--

Kelly Szutu
Analytics Vidhya

Journalist x Data Visualization | Data Analyst x Machine Learning | Python, SQL, Tableau | LinkedIn: www.linkedin.com/in/szutuct/