Don’t use Excel’s VLOOKUP. It’s a trap!

Bediako George
Advanced Spreadsheet Computing
6 min readOct 6, 2022

What’s wrong with VLOOKUP? A whole lot!

Even if you are just starting out with Excel, you probably already encountered the VLOOKUP function. It serves a good purpose, allowing you to find and pull data in cells in other worksheets into the current working cell. Many Excel influencers do a fantastic job explaining how to use VLOOKUP; my goal today is to show you why you should avoid using it at all costs.

But first, let’s review how to use VLOOKUP. Its function syntax is as follows:

=VLOOKUP (lookup_value, table_array, column_index_num, [range_lookup])

Where the table_array is a rectangular range the VLOOKUP will work on; the lookup_value is the value you want to find in the first column of the table_array; the column_index_num is an integer value that indicates the column in the table_array that holds the VLOOKUP’s returned value; and the optional range_lookup which specifies whether VLOOKUP should find an exact match or an approximate match of the lookup_value.

Get all that?

Using VLOOKUP’s exact-match option to correctly return Kwame’s age as 72.

This verbosity in its description is the first reason you should avoid using VLOOKUP. Its usage is confusing. Novice users are often intimidated by how VLOOKUP works, but even more concerning, many experienced users mistakenly believe they understand VLOOKUP, but they don’t.

Here are the critical VLOOKUP considerations that confuse spreadsheet users.

VLOOKUP is a very brittle function

VLOOKUP functions depend on hard coding the column of the value to be retrieved. Whenever a column is inserted or removed from the table range, the VLOOKUP function will no longer work as expected. Excel, for instance, will not warn you of this, resulting in a bug in your worksheet.

Adding a new column “Experience” causes VLOOKUP to return the age as 45 instead of the originally intended 72.

Even if you are aware of this limitation, it still means that changing worksheets that use VLOOKUP can be time confusing.

VLOOKUP can only search on one column at a time

VLOOKUP’s range-based search can only search one column at a time. A user that wants to match two columns cannot simply specify that with VLOOKUP. Instead, Excel forces the user to rely on workarounds. A common trick is concatenating two column values into one column, then applying VLOOKUP on that new column. This trick works but adds additional complexity to worksheets, making them harder for users to understand.

The exact-match option for range_lookup is a CPU hog

The last parameter of VLOOKUP is the optional range_lookup. The range_lookup allows the user to specify whether or not VLOOKUP should find either the precise lookup_value or a value that is close enough. Most users resort to setting this value to false in order to activate exact-match. Exact-matching is convenient for most users since finding the exact match is often the desired behavior. However, most users don’t realize this choice has a terrible cost.

When VLOOKUP looks for values using exact-match, it visits each cell in the range one after the other and returns the cell value only when finding the first match. Looking at each cell one at a time seems harmless when you have one VLOOKUP matching against ten rows; at most, ten comparisons are made. But when a user drags down a VLOOKUP to one thousand rows, VLOOKUP can make up to ten thousand (1000x 10) comparisons. And if the matching range grows over time to a thousand rows instead, VLOOKUP is now making as many as one million (1000 x 1000) comparisons! And this is just one column. This behavior is the most significant factor in user complaints about slow Excel spreadsheets.

VLOOKUP will only find the first matching value in its search column

Suppose your search column has more than one matching value. In that case, VLOOKUP will return the first one it finds, and complicating this more, the range_lookup option (remember approximate-match and exact-match?) determines the order in which it finds any of those matching values. This lack of specificity can result in hidden bugs in a user’s spreadsheet model.

How about approximate-match searching?

Now we know that Excel’s VLOOKUP exact-match implementation is slow. Is there a better option? There may be, as Excel offers an approximate match feature to overcome this. Approximate-match implements a binary search algorithm when looking for values. This algorithm is far more efficient than the exact-match algorithm, primarily when used to search large numbers of rows. In the above example, searching a 1000-row range with 1 VLOOKUP would take only 10 x 1 (ten) comparisons, and searching one million rows with 1 VLOOKUP would take just 20 comparisons. Approximate-match is a more powerful search algorithm than exact-match.

Because the search column A is not sorted, VLOOKUP returns Kwame’s age as 34 instead of 72!

Then you should use approximate-match more often, right? Here is the problem. One requirement for successful approximate-match use is that the user must first sort the search column. If the user does not sort the search column keys, approximate-match will fail to find the correct values. The worst part? Excel will not inform you of this error! Instead, it will dutifully return the wrong result. This failure to warn the user is a significant source of bugs, and misunderstanding this requirement is the primary reason most users eschew using VLOOKUP’s approximate-match option in the first place.

Karla’s age isn’t in the list. This causes VLOOKUP’s approximate-match option to return Karen’s age instead!

We have shown that spreadsheet users should avoid using VLOOKUP. What are alternative solutions? Well, there is INDEX-MATCH and XLOOKUP. These Excel functions are not as brittle as VLOOKUP; however, they both suffer from the approximate-match, exact-match problem. These alternative lookups do not provide the perfect solution.

Pebble Stream users use SQL joins instead of lookups

Pebble Stream users (Pebblers) have a great alternative to lookups. Pebblers don’t specify arcane search algorithms, concatenate columns, or worry about changing worksheet column orders. Instead, Pebblers apply relational technology to join worksheets into a single worksheet that is easier to compute, obviating the need for searches altogether because the data the user is trying to match now presents on a single row.

In addition, Pebble Stream joins are fast, consistent, and simple to understand. They work on single or multiple search columns and will naturally match more than one value, and when running on our Lake servers, they operate at scales much larger than Excel spreadsheets can.

Contact us. We are here to help

Does your team have Excel spreadsheet models suffering from poor performance? Improper use of VLOOKUP could be the culprit. Contact us at Pebble Stream; we can show your team how to address these and other computation speed issues in your spreadsheet models. Curious about how we enable SQL features in Excel? We can also show you our relational-based alternatives to lookups that will boost your model executions beyond what is capable on the desktop. Or, try using Zen, our offering that layers relational capabilities on Google Sheets.

--

--

Bediako George
Advanced Spreadsheet Computing

Bediako is the founder of Georgetown Software House. When not developing software, he runs and plays a lot of soccer.