Best ways to compare Excel files in 2024

Cortese Francesco
10 min readSep 9, 2024

--

Let’s be honest: Excel is here to stay for a long time.

It is THE SOFTWARE that soon or later everyone started to use, either for personal or professional purposes.

What I’ve noticed in over 15 years of experience as an IT consultant, particularly supporting finance departments, is that one crucial yet often underestimated task is comparing two Excel files.

It can be a very time-consuming activity, and depending on your skills and the quality of the files, you or your team can easily spend hours of valuable time.

That’s why I want to share my expertise and offer you three immediate solutions:

Solution 1 👉 In this article, I’ll guide you step-by-step with an example, showing you everything you need to know in Excel to minimize effort and avoid mistakes when comparing files. No matter your experience level, you’ll be able to complete the task efficiently. You’ll learn how to effectively use formulas like VLOOKUP, both in a position-based and label-based (dynamic) context, as well as the new XLOOKUP.

Solution 2 👉 If you prefer to learn by working directly with a pre-built Excel template, you can do that too. Check out what I’ve created here: ExcelTemplate-SMlite. For a 20% discount, use the coupon MEDIUM2420.

Solution 3 👉 If you’re short on time or building Excel expertise isn’t in your plan, and you need the fastest way to compare two files, I’ve got you covered. I developed an online tool that allows you to compare files in seconds. Not only will you save time and money, but you’ll also be 100% confident in the results. Don’t miss out — check at Sheetmatcher.com

To introduce the topic, let’s first explore some scenarios where you might need to compare two files:

  • The common case: You need to compare two different versions of the same file.
  • System migration: Your company has implemented a new system, and someone needs to verify that the data from the legacy system is also present in the new one. While specific tools are better suited for this, business teams often need to perform solid validations as well.
  • Cross-system comparison: You need to compare the same dataset stored in two different systems. For example, when a leading system manages master data that is shared with subsystems, you’ll want to ensure that the data propagation is working correctly.

In all of these cases, you’ll likely opt for a data comparison in Excel. Why? Because you’re familiar with the tool and already know you can manage it… more or less. 😉

Now, let me guide you through Solution 1.

As mentioned earlier, there are several important details to consider, so let’s walk through an example together. I’ll show you exactly what to do and how to ensure you’re on the right track. The best part? These concepts can be applied to files of any complexity!

Let’s imagine this is our first file:

Leading file

We will call it from now on the leading file, as it is our trusted file and it represents the truth.

Our goal is to know, if the information contained in it is also found in the next file, the one whose quality is not yet verified.

We will call it the subordinate file. It will look like this:

Subordinate file

As you can see, I put together several problems we want to be able to spot:

  • Different values (yellow)
  • Missing records (horizontal orange lines)
  • Unwanted columns (grey)
  • Missing columns (vertical orange line)
  • Columns having different names (purple)

Of course, this situation is kind of a worst-case scenario, but it represents all that you can eventually encounter.

The first concrete step is to mark or build the Key column on both files: it is the column that contains the unique information to identify each of the records on both files. We will need it soon.

Pay attention!

If your files represent a list of items, like a product catalog, the Key will be very likely the item-id/product-id.

But if your files contain for example time-dependent information, like a list of sales events of your items, then the Key will be very likely the concatenation of the item-id and the timestamp of the sale (if you don't have a sale transaction-id). E.g:

composite key

The latest is a concrete example where you would need a so-called composite key, which is nothing but a concatenation of more fields into one to make it unique within the file. You get this by doing the following:

=Concat(Value1&Value2..)

HINT: Once you defined the key, double-check that it is unique by marking the column and using a handy function to highlight duplicate values under conditional formatting:

conditional formatting

Once you do that, check with a filter that no rows have been colored.

At this point, we have two methods to compare the data:

  • Position-based method: A more straightforward approach, using fewer formulas. This is useful if the column order is mostly the same, though any differences will need to be adjusted manually.
  • Dynamic method: A more advanced approach, using complex formulas. It’s particularly helpful when the column order differs significantly between the files.

Let's try first to complete the job using the position-based method.

If you encounter any deviations, begin by manually adjusting the order of the columns in the subordinate file to match those in the leading file. You may also need to add empty columns if any are missing. I recommend moving all non-relevant columns (those not present in the leading file) to the end for easier comparison. Like this:

subordinate file now with the same column order as the leading file

Once you are done with it, you will prepare the leading file in the following way: just add a row at the very top and add a counter starting from 1 until the last column -> This will let you save much time and manual effort using the VLOOKUP function!

Then copy the newly added counter as well as the heading of the columns and paste them into a new Sheet called for example “results”. Copy-paste also the whole Key column, as we will need it.

Sheet Results is prepared in the leading file

Now we can start applying the VLOOKUP formula to this sheet to have a first overview of the data. Please follow my example and read the explanations:

apply VLOOKUP
VLOOKUP explained:

=VLOOKUP(the ID to search for; the columns range containing the IDs and the
data to find; the column number in the range of the second parameter containing
the value to return; 0 or FALSE to match properly).

This is the formula of cell B3 of the sheet Results:

=VLOOKUP($A3;'[Subordinate file.xlsx]Sheet1'!$A:$E;B$1;0)
  • Please add the dollar symbol in the first parameter before the letter of the column ($A3) meaning, if you copy the formula to the right, the key will stay fixed as it should
  • Please add the dollar symbol in the third parameter (B$1) to block reading the counter in the first row. The counter prevents you from typing all the column numbers manually!

You can now simply copy the formula horizontally until the end and vertically as well, this lets you reach the first half-result:

first feedback on the results

Let's now rework this first status, as we can do better:

  • Different values -> are not immediately recognized. When we don't have differences, we don't want to see any value. This will not disturb your eyes when looking for the actual differences
  • Missing records -> ok, are recognized with #N/A
  • Unwanted columns -> ok, we already moved them away manually
  • Missing columns -> not clear enough, as we want more precise info than a zero
  • Columns having different names -> not a problem as we are working with the position of the columns and not with their names

To have a better result, you need to apply these changes to the formula:

Add the IF formula before the VLOOKUP so that the VLOOKUP will be included
in the first parameter of the IF statement.
Add an Equal sign (meaning compare) and take the value coming from the
leading file as a comparison.

The second parameter of the IF is used when the two values match,
so we return just a "" for an empty result.

The third parameter is used when the two values don't match, so we
take again the value from the leading file followed by a " | " and the value
coming from the VLOOKUP:

=IF(VLOOKUP($A3;'[Subordinate file.xlsx]Sheet1'!$A:$E;B$1;0)=Data!B3;"";
Data!B3&" | "&VLOOKUP($A3;'[Subordinate file.xlsx]Sheet1'!$A:$E;B$1;0))

After applying these changes, we have a much better understanding of what is happening:

final improved results
  • Different values -> ok, they are now immediately recognized. When we don’t have differences, we don’t see anything
  • Missing records -> ok recognised with #N/A
  • Unwanted columns -> ok, we already moved them away manually
  • Missing columns -> ok, as the whole colored column is signaling an issue
  • Columns having different names -> not a problem, as we are working with the position of the columns and not with their names

Great!

Now, let me introduce you to the dynamic method, as mentioned earlier. With this approach, the position of the columns becomes irrelevant, offering a significant advantage. However, the requirement is that the columns must have the same names in both files.

To prove that, I have changed the position of characteristics 2 and 3 in the subordinate file:

Subordinate file for method 2

Similarly as done before, create a new sheet called “Results” and copy and paste the headings of the leading file in the second row. In the first row, you can apply a formula, that looks for each of the column names and returns the position found in the subordinate file.

using the MATCH formula to match the cols names
The function match allows you to search for a text string in an array of cells.
It will then return the position of the text found.

=MATCH(A2;'[Subordinate file.xlsx]Sheet2'!$A$1:$E$1;0)

Once you have done this, you can simply reuse the formula of the first method to execute the comparison:

We apply the IF formula before the VLOOKUP so that the VLOOKUP will be included
in the first parameter of the IF statement.
Add an equal sign and compare the values coming from the leading file.

The second parameter of the IF is for the case when the two values match, so we return
just a "" for an empty result.

The third parameter is for the case when the two values don't match, so we
take again the value from the leading file followed by a " | " and the value
coming from the VLOOKUP:

=IF(VLOOKUP($A3;'[Subordinate file.xlsx]Sheet2'!$A:$E;B$1;0)=Data!B3;"";
Data!B3&" | "&VLOOKUP($A3;'[Subordinate file.xlsx]Sheet2'!$A:$E;B$1;0))

If you did both steps correctly, you should see this result:

Result comparison dynamic method

Note that I also inserted some useful statistics on the side that allow you to gather valuable insights:

Count matches:
=COUNTIF(B3:E12;"")

Count differences:
=COUNTIF(B3:E12;"* | *")-COUNTIF(B3:E12;"* | not found*")

Missing records:
=COUNTIF(B:B;#N/A)

Missing cols:
=COUNTIF(1:1;"#N/A")

Great! One last topic before closing with the first solution: as Microsoft implemented the XLOOKUP function starting from Excel 2016, and a lot of people don't know about it, I think it makes sense to see also an example. You can use it both in a position-based or a dynamic scenario as well:


XLOOKUP explained:

=XLOOKUP(the ID to search for; the columns range containing the IDs to find;
the columns range containing the data to return; value if_not_found).

This is the formula you can apply in the sheet Results starting cell B2:

=XLOOKUP($A2;'[Subordinate file.xlsx]Sheet2'!$A:$A;
'[Subordinate file.xlsx]Sheet2'!B:B;"not found";0;)

As you can see, the main differences to VLOOKUP are:

  • Second parameter: XLOOKUP needs only the column you want to return while VLOOKUP wants you first to specify the whole column range, and then the number of the column to return-> you don't need the counter!
  • Third parameter: you will only find it in XLOOKUP and it is very useful, as you can return a specific value when the dataset is not found. VLOOKUP in the same case can only return #N/A
Results with XLOOKUP position based

I hope you found my solution 1 useful and encourage you to exercise.

I know building stuff in Excel requires a lot of time and energy, this is why I will provide you also two more solutions that might fit your needs:

  • Solution 2: is a ready-made Excel Template that provides you with everything you saw before and much more to bring you up to speed and learn with it. You can grab it here: ExcelTemplate-SMlite. For a 20% Coupon discount use MEDIUM2420.
  • Solution 3: Let me present you Sheetmatcher, my baby :) a SAAS solution made for everyone that assists you in comparing Excel files. At its core is like the Excel Template but it outperforms it in every aspect and gives you a more comprehensive experience. Check it out here at Sheetmatcher.com for a 10% Coupon discount use MEDIUM2410

If you enjoyed the educational content in this article, don't forget to follow me.

I would really appreciate a feedback in the comments, I'm also curious to hear some topics from you for a new article!

--

--

Cortese Francesco

Senior SAP consultant | Excel and Access expert | Founder of Sheetmatcher - match your Excels in seconds