Comparing Files and Folders in Linux using diff command

Vamsi Penmetsa
itversity
Published in
5 min readSep 5, 2022

--

How to compare files and folders in Linux using the diff command?

This article will teach you how to compare Files and Folders in Linux using the diffcommand.

👨🏽‍💻🧑🏻‍💻For more ARTICLES, FOLLOW📍DevOps Engineering on Cloud

Let’s get started.

Introduction to comparing files and folders in Linux using diff

The diff stands for difference. This command is used to display the differences in the files by comparing the files line by line.

🚨👉🏼 You can also check the complete udemy course (Linux Shell Commands for Absolute Beginners using Ubuntu 20x)🔗Referral link

diff is a very important command in Linux when it comes to troubleshooting issues related to data and code.

Introduction about diff command in Linux

Overview of diff command

You can get the details of the diff command by using diff --help or man diff from the Linux terminal.

NAME
diff - compare files line by line
SYNOPSIS
diff [OPTION]... FILES
man diff command in Linux terminal
diff command overview in Linux

Prepare Dataset to explore the diff command in Linux

In this section, you will be using the following data set to get hands-on on the diff command. You can clone the repo by using the below git command.

👆🏻Clone the data set for hands-on practice.👆🏻

Once you clone the repo use the following command to create a copy of retail_dbdata set.

cp -r retail_db retail_db_csv

You will see the new copy of retail_db as retail_db_csv in the below picture.👇🏻

Create a copy of the data set using cp command in linux
Create a copy of the directory by using the cp command in Linux

You can go through the following video and make changes in the retail_db_csv/departments/part-00000

Change text in part-00000 file to use diff command
Make the changes in the newly created directory

Understand the output of diff command in Linux

You can see the difference between these two directories' sizes by using ls -ltr command.

The difference in the size of directories

/data/retail_db/departments/part-00000 and

/data/retail_db_csv/departments/part-00000

Now you can use the diff command to check the difference between the files above.

diff retail_db/departments/part-00000 retail_db_csv/departments/part-00000
How to use the diff command in Linux?

You can go through the below video to understand the output of the diff command in detail.

Understand the output of the diff command in Linux.

Compare Files Ignoring Blank Lines and White Spaces using diff in Linux

You can ignore the blank lines and white spaces by using the relevant control arguments along with the diff command in Linux.

diff -B -w retail_db/departments/part-00000 retail_db_csv/departments/part-00000-w  --ignore-all-space
Ignore all white space.
-B --ignore-blank-lines
Ignore changes whose lines are all blank.

The output for the above command will look something like the below 👇🏻

Using the diff command with control arguments.
A detailed explanation of how to use control arguments with diff command

Compare Files Ignoring case using diff in Linux

You can ignore the case in the diff command by adding -i to the diff command in Linux.

diff -i retail_db/departments/part-00000 retail_db_csv/departments/part-00000-i  
--ignore-case Ignore case differences in file contents.
--ignore-file-name-case Ignore case when comparing file names.
--no-ignore-file-name-case Consider case when comparing file names.
ignore the case in diff command output

Now you can use more control arguments along with -i to get the output.

diff -B -w -i retail_db/departments/part-00000 retail_db_csv/departments/part-00000
Using multiple control arguments along with the diff command
A detailed explanation about the diff command along with multiple control arguments.

Unified and Side By Side Comparison using diff in Linux

You can use -u control argument along with the diff command to get the unified output along with the time stamp. The two files will be differentiated by -and+ symbols at the starting of each line.

diff -u retail_db/departments/part-00000 retail_db_csv/departments/part-00000-u  -U NUM  --unified[=NUM]  Output NUM (default 3) lines of unified context.
--label LABEL Use LABEL instead of file name.
unified comparison using diff command in linux

If you want side by side comparison of the diff command then you can use -y control argument.

diff -y retail_db/departments/part-00000 retail_db_csv/departments/part-00000-y  --side-by-side  Output in two columns.
-W NUM --width=NUM Output at most NUM (default 130) print columns.
--left-column Output only the left column of common lines.
--suppress-common-lines Do not output common lines.
Using -y control argument along with diff command in Linux

You can use -y with multiple control arguments to get the required output for the diff command.

diff -B -w -i -y retail_db/departments/part-00000 retail_db_csv/departments/part-00000
diff command in Linux with multiple control arguments.
A detailed explanation of diff command in Linux

Compare Folders in Linux using the diff command

You can compare the Folders in Linux by using the diff command. To get hands-on you can use the following command to copy the .md file to the retail_db_csv in the data directory.

cp README.md retail_db_csv

You can see the difference between the two directories by using ls -ltr retail_db retail_db_csv . You can see one extra README.md file in retail_db_csv the directory.

You can compare two directories by using ls -ltr command in Linux

Now, you can use -r the control argument along with the diff command to compare two directories in Linux.

diff -r retail_db retail_db_csv-r  --recursive
Recursively compare any subdirectories found.

The output will look like the following as shown in the image.

The output of the diff command with -r control argument in Linux.

You can use -rq control argument along with the diffcommand to get the files that are different.

diff -rq retail_db retail_db_csv-r  --recursive
Recursively compare any subdirectories found.
-q --brief
Output only whether files differ.
Using -rq control argument along with the diff command in Linux

You can also use the diff command in the following way to get the required output.

diff -riwb retail_db retail_db_csv
Using diff command in Linux along with -riwb control arguments.
A detailed explanation about the diff command with multiple control arguments in Linux.

🙏🏼Thank you, for reading the article. If you find it valuable please follow our publication DevOps Engineering on Cloud

🚨👉🏼 You can also check the complete udemy course (Linux Shell Commands for Absolute Beginners using Ubuntu 20x)🔗Referral link

--

--

Vamsi Penmetsa
itversity

Lead SRE, I post a FREE daily DevOps blog – FOLLOW ✅ & consider a SUBSCRIBE 📩 | DevOps Community (40K+ Pro's) Q&A? https://www.linkedin.com/groups/13986647