How to Delete Duplicate File With Different Name

With script Python

Pietro Colombo
Geek Culture

--

Photo by Nana Smirnova on Unsplash

One of the things we usually do is make copies of files, folders to make sure we don’t lose anything. In the end, we are left with a lot of duplicate files and don’t know what to keep.

In this article, I will explain how to delete duplicate files in all subdirectories. This method even works if the file has a different name.

For example, I have multiple directories with photos but with a lot of duplicate images and video, so I find a script python that finds those duplicate files and it writes them to a CSV file.

If you have a lot of duplicates is not feasible to read the CSV file and delete all duplicates file by hand.

To find a duplicated file is to calculate an md5 checksum. The function that calculates the md5 checksum is:

def generate_md5(fname, chunk_size=1024):
"""
Function which takes a file name and returns md5 checksum of the file
"""
hash = hashlib.md5()
with open(fname, "rb") as f:
# Read the 1st block of the file
chunk = f.read(chunk_size)
# Keep reading the file until the end and update hash
while chunk:
hash.update(chunk)
chunk = f.read(chunk_size)
# Return the hex checksum
return…

--

--