# How to Extract Unique Values in Excel

## Traditional and modern methods

Extracting unique values in Excel has always been possible, but it’s not always been easy. Only since the implementation of the new calculation engine has it become a doddle — thanks to the UNIQUE function.

Having said that, for many years you’ve been able to use the Remove Duplicates feature, an advanced filter or Power Query. However, nothing compares to the seamless nature of a dynamically updating formula, so that’s what I will be concentrating on.

The word *unique* is often abused in everyday life. Technically it means one instance of something — and only one. However, I’m sure you’ve heard someone describe something as ‘very unique’, ‘more unique’ or the ‘most unique’.

There is also another word called *distinct*. The two are used interchangeably by many people, yet there is a subtle difference.

Let’s just clear up what each means in data terms:

- distinct: a value that appears
once*at least* - unique: a value that appears
*only*

I’ve included a workbook, so please download it and follow along:

🔗 **Extracting**-**Unique-Values.xlsb**

*A Microsoft 365 desktop copy of Excel is required for full functionality.*

At the top is **DataTable**, and this contains the data used by each example. The first column (**DataTable[1]**) contains a full set of numbers, whereas the other (**DataTable[2]**) adds blank cells and errors to the mix. This has been done so you can see how the formulas respond to different scenarios.

Following this are two sections: **Traditional Methods **and **Modern Methods**. There are tables in each that have been split in half. The left side focuses on distinct values, whilst the right concentrates on unique ones.

To understand what’s behind a formula, a useful tip is to click on the argument name in the formula pop-up box and press F9. This will convert the contents to a value or array. Press Ctrl + Z to undo it.

**For those running an older copy of Excel:**

Aside from the modern examples (which you can’t use anyway), the traditional ones are array formulas that will only display if you press Ctrl + Shift + Return when confirming them. Microsoft 365 versions of Excel don’t require this action as they run natively.

Sorry, but I haven’t done this for you—deliberately.

Firstly, when developing the workbook, I did what worked on my version because I don’t have an older copy to test on.

Secondly, I hope seeing an error-ridden workbook convinces you to upgrade!

# Traditional Methods

## Distinct Values

Example **T1.DV1** shows the most basic way of extracting distinct values from a range.

`=IFNA(`

INDEX(DataTable[1],

MATCH(0,

IF(DataTable[1]<>"",COUNTIF($H$23:$H23,DataTable[1]),""),

0)

),

"")

The `array`

argument of the INDEX function references the range that the final value comes from, which is **DataTable[1]**. Then it’s determined which row should be returned in `row_num`

.

`MATCH(0,`

IF(DataTable[1]<>"",COUNTIF($H$23:$H23,DataTable[1]),""),

0)

This part uses a combination of MATCH, IF and COUNTIF to search for the first 0 array value that the COUNTIF statement produces in `lookup_array`

.

The range `$H$23:$H23`

is an expanding reference, meaning it is absolute on the left but row relative on the right. As the fill handle is dragged down from cell H24, the range will continue to start from H23 but will grow incrementally.

It always lags one cell above because only values already in the list need to be counted. In any case, you can’t reference the current cell without causing a circular reference.

When a range with multiple values is given for the `criteria`

argument, the COUNTIF function returns multiple results in an array.

In cells H24 to H33, the COUNTIF formula converts to:

**H24:** COUNTIF("T1.DV1 Unsorted",{7;3;2;3;9;3;8;9;2;5})

**H25:** COUNTIF({"T1.DV1 Unsorted";7},{7;3;2;3;9;3;8;9;2;5})

**H26:** COUNTIF({"T1.DV1 Unsorted";7;3},{7;3;2;3;9;3;8;9;2;5})

**H27:** COUNTIF({"T1.DV1 Unsorted";7;3;2},{7;3;2;3;9;3;8;9;2;5})

**H28:** COUNTIF({"T1.DV1 Unsorted";7;3;2;9},{7;3;2;3;9;3;8;9;2;5})

**H29:** COUNTIF({"T1.DV1 Unsorted";7;3;2;9;8},{7;3;2;3;9;3;8;9;2;5})

**H30:** COUNTIF({"T1.DV1 Unsorted";7;3;2;9;8;5},{7;3;2;3;9;3;8;9;2;5})

**H31:** COUNTIF({"T1.DV1 Unsorted";7;3;2;9;8;5;""},{7;3;2;3;9;3;8;9;2;5})

**H32:** COUNTIF({"T1.DV1 Unsorted";7;3;2;9;8;5;"";""},{7;3;2;3;9;3;8;9;2;5})

**H33:** COUNTIF({"T1.DV1 Unsorted";7;3;2;9;8;5;"";"";""},{7;3;2;3;9;3;8;9;2;5})

COUNTIF will use the figures in **DataTable[1] **to check whether they appear in the expanding reference. If it finds a number, 1 (TRUE) is returned. If not, then it’s 0 (FALSE).

Here is what the `lookup_array`

generates for each cell:

**H24:** {0;0;0;0;0;0;0;0;0;0}

**H25:** {1;0;0;0;0;0;1;0;0;0}

**H26:** {1;1;0;1;0;1;0;0;0;0}

**H27:** {1;1;1;1;0;1;0;0;1;0}

**H28:** {1;1;1;1;1;1;0;1;1;0}

**H29:** {1;1;1;1;1;1;1;1;1;0}

**H30:** {1;1;1;1;1;1;1;1;1;1}

**H31:** {1;1;1;1;1;1;1;1;1;1}

**H32:** {1;1;1;1;1;1;1;1;1;1}

**H33:** {1;1;1;1;1;1;1;1;1;1}

The MATCH `lookup_value`

is 0, so the first array index with this value will be chosen. Even when numbers are repeated, this never affects the desired outcome because they will always be represented by a 1.

It’s also important to set the `[match_type]`

to 0 for an exact match, so the formula produces the correct result.

Take cell H27 for example:

`=IFNA(`

INDEX({7;3;2;3;9;3;8;9;2;5},

MATCH(0,{1;1;1;1;0;1;0;0;1;0},

0)

),

"")

As the first 0 occurs in the fifth index of the array, the `row_num`

value is 5. This means the fifth row of **DataTable[1]** is returned, which is 9.

IFNA has also been wrapped around the rest of the formula so all #N/A errors in non-matching cells are replaced with empty strings.

**T1.DV2** sorts the same group of numbers in ascending order.

`=IFERROR(`

AGGREGATE(15,6,

DataTable[1]/ISNA(MATCH(DataTable[1],$I$23:$I23,0))/ISNUMBER(DataTable[1]),

1),

"")

Key to this method is the AGGREGATE function, which is often described as the ‘Swiss Army knife of Excel’ due to its versatility! It allows many types of calculations to take place, with the added benefit of having greater customisation over error handling and hidden rows.

The `function_num`

has been set to 15, which is the SMALL function. And as 6 is stated for `options`

, error values will be ignored.

Housed in the `array`

argument is a conditional statement that ensures **DataTable[1]** is filtered correctly. The calculation starts with the range first, and then two conditions are included— each separated by a division symbol (‘/’).

The first condition checks for #N/A! errors in the MATCH statement, which evaluates each value in **DataTable[1] **to determine if it has already appeared in the expanding range. If so, the first instance of that number’s position is returned. If not, an #N/A! error occurs.

Formula

MATCH(DataTable[1],$I$23:$I28,0)DataTable[1]

{7;3;2;3;9;3;8;9;2;5}Expanding range

{"T1.DV2 Ascending Order";2;3;5;7;8}Position of DataTable[1] value in expanding range

{5;3;2;3;#N/A;3;6;#N/A;2;4}Cell is #N/A!

{FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;TRUE;FALSE;FALSE}

In the second condition, the ISNUMBER statement does a simple check of **DataTable[1]** to verify each cell contains a number.

FormulaISNUMBER(DataTable[1])DataTable[1]{7;3;2;3;9;3;8;9;2;5}Cell contains a number{TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE}

Both conditions produce an array of TRUE and FALSE values, but for a successful calculation to take place, the same array index for each must be TRUE. The TRUE and FALSE values will automatically coerce into 1s and 0s — and dividing anything by 0 returns a #DIV/0! error.

This is how the `array`

argument calculates for the first three cells:

I24:Formula

DataTable[1]

/

ISNA(MATCH(DataTable[1],$I$23:$I23,0))

/

ISNUMBER(DataTable[1])DataTable[1]

{7;3;2;3;9;3;8;9;2;5}

/MATCH statement equals #N/A!{TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE}

/Cell contains a number{TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE}

=

{7;3;2;3;9;3;8;9;2;5}I25:Formula

DataTable[1]

/

ISNA(MATCH(DataTable[1],$I$23:$I24,0))

/

ISNUMBER(DataTable[1])DataTable[1]{7;3;2;3;9;3;8;9;2;5}

/MATCH statement equals #N/A!{TRUE;TRUE;FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE}

/Cell contains a number{TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE}

=

{7;3;#DIV/0!;3;9;3;8;9;#DIV/0!;5}I26:Formula

DataTable[1]

/

ISNA(MATCH(DataTable[1],$I$23:$I25,0))

/

ISNUMBER(DataTable[1])DataTable[1]

{7;3;2;3;9;3;8;9;2;5}

/MATCH statement equals #N/A!{TRUE;FALSE;FALSE;FALSE;TRUE;FALSE;TRUE;TRUE;FALSE;TRUE}

/Cell contains a number{TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE}

=

{7;#DIV/0!;#DIV/0!;#DIV/0!;9;#DIV/0!;8;9;#DIV/0!;5}

The final argument of the AGGREGATE function is `[k]`

, and this has been set to 1 so the smallest value is extracted from `array`

.

To conclude the formula, IFERROR’s `value_if_error`

argument returns a blank string for the redundant cells. ISNA couldn’t be used again as it does not detect #NUM! errors.

**T1.DV3** is similar to **T1.DV1**, except it will only extract distinct values that are greater than 5. It does this due to a small change in the IF `logical_test `

argument.

`=IFNA(`

INDEX(DataTable[1],

MATCH(0,

IF(**DataTable[1]>5**,COUNTIF($J$23:$J23,DataTable[1]),""),

0)

),

"")

## Unique Values

Unfortunately, extracting unique values isn’t a simple case of making minor alterations to the formulas you’ve just seen.

Proof of this is shown in **T1.UV1**, where a** **more complex formula is required compared to its distinct equivalent in **T1.DV1**.

`=IFERROR(`

INDEX(DataTable[1],

SMALL(

IF(FREQUENCY(IFERROR(MATCH(DataTable[1],DataTable[1],0),""),

ROW(DataTable[1])-ROW(DataTable[[#Headers],[1]]))=1,

ROW(DataTable[1])-ROW(DataTable[[#Headers],[1]])),

ROWS($K$24:$K24))

),

"")

Let’s focus on the `array`

argument of the SMALL function.

Nested inside an IF statement is the FREQUENCY function, which allows data to be grouped through ‘binning’.

It has two arguments:

`data_array`

: generates an array specifying the first-row instance of each value in **DataTable[1]**. In other words, repeated numbers have the same value as their first occurrence.

FREQUENCY data_arrayFormulaIFERROR(MATCH(DataTable[1],DataTable[1],0),"")Result

{1;2;3;2;5;2;7;5;3;10}

`bins_array`

: subtracting the heading row from the whole of **DataTable[1] **produces an array that is the same size as the range itself. These are the 10 bins that house the `data_array`

indexes.

FREQUENCY bins_arrayFormula

ROW(DataTable[1])-ROW(DataTable[[#Headers],[1]])Result{1;2;3;4;5;6;7;8;9;10}

When the two arguments are put together, the FREQUENCY statement will dump each `data_array`

index into a matching bin in `bins_array`

. Then, `bins_array`

is checked for bins that contain a single value. As these are 1, 7 and 10, their respective indexes are declared TRUE.

IF logical_testFREQUENCY({1;2;3;2;5;2;7;5;3;10},{1;2;3;4;5;6;7;8;9;10})=1Bin has one value only

{TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;TRUE;FALSE}

The `[value_if_true]`

argument then converts these TRUE values into row numbers, whilst the others remain FALSE.

**IF [value_if_true]**{1;FALSE;FALSE;FALSE;FALSE;FALSE;7;FALSE;FALSE;10;FALSE}

Finally, the expanding reference `ROWS($K$24:$K24)`

in the `k`

argument ensures that each calculation will return the next smallest row number. Once this figure exceeds the quantity of array values, a #NUM! error occurs. However, the IFERROR function traps this and replaces it with an empty string.

Using the row numbers 1, 7 and 10 generated by the INDEX `row_num`

argument, the values extracted from **DataTable[1]** are 7, 8, and 5, respectively.

**T1.UV2** employs a similar method to the previous example, but it’s actually a shorter formula because the numbers are sorted in ascending order. There’s no need for the INDEX function, as SMALL takes care of the ordering.

`=IFERROR(`

SMALL(

IF(FREQUENCY(IFERROR(MATCH(DataTable[1],DataTable[1],0),""),

ROW(DataTable[1])-ROW(DataTable[[#Headers],[1]]))=1,

DataTable[1]),

ROWS($L$24:$L24)),

"")

**T1.UV3** returns unique values greater than 5.

The formula’s MATCH statement is structured a little differently from what’s found in **T1.DV3**. This time the IF function does not get used—replaced by a COUNTIFS statement instead that’s added to the original COUNTIF.

`=IFNA(`

INDEX(DataTable[1],

MATCH(0,

COUNTIF($M$23:$M23,DataTable[1])

**+**

**(COUNTIFS(DataTable[1],DataTable[1],DataTable[1],">5")<>1)**,0)),

"")

As described earlier, the first COUNTIF statement uses the values in the expanding range and searches for them in `DataTable[1]`

. Where a match is found, its array index becomes 1.

Added to this is a COUNTIFS statement, which is necessary to allow for two conditions. The `criteria_range1`

and `criteria1`

arguments both contain **DataTable[1]**, as this will force the function to produce an array instead of a single value. After this, the second condition ensures only numbers over 5 are checked in **DataTable[1]**.

An array is produced from this with the total count for each: `{1;0;0;0;2;0;1;2;0;0}`

.

Affixing <>1 to the end of the statement converts this array into a set of TRUE and FALSE values: `{FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE;TRUE;TRUE}`

.

When the contents of `lookup_array`

is summed, another array is formed comprising the totals of the two parts: `{0;1;1;1;1;1;0;1;1;1}`

.

What the statement is looking for is 0, as this is the sum of two FALSE values.

M24:Values in both expanding range and DataTable[1]

COUNTIF($M$23:$M23,DataTable[1])

{0;0;0;0;0;0;0;0;0;0}

+Values over 5 in DataTable[1]COUNTIFS(DataTable[1],DataTable[1],DataTable[1],">5"

{1;0;0;0;2;0;1;2;0;0}Counts not equal to 1 (<>1){FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE;TRUE;TRUE}

=

{0;1;1;1;1;1;0;1;1;1}M25:Values in both expanding range and DataTable[1]

COUNTIF($M$23:$M24,DataTable[1])

{1;0;0;0;0;0;0;0;0;0}

+Values over 5 in DataTable[1]COUNTIFS(DataTable[1],DataTable[1],DataTable[1],">5"

{1;0;0;0;2;0;1;2;0;0}Counts not equal to 1 (<>1){FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE;TRUE;TRUE}

=

{1;1;1;1;1;1;0;1;1;1}

The first 0 is picked out from the array and becomes the value of `row_num`

, allowing the correct number from **DataTable[1]** to be returned.

**M24:** MATCH(0,{0;1;1;1;1;1;0;1;1;1} **=** **row 1 of DataTable[1] = 7**

**M25:** MATCH(0,{1;1;1;1;1;1;0;1;1;1} = **row 7 of DataTable[1] = 8**

Empty cells result in a clean sweep of ones: `{1;1;1;1;1;1;1;1;1;1}`

. These produce #N/A! errors, which IFNA's `value_if_na`

argument cleans up with blank strings.

## Handling Blanks and Errors

Before I move on to the modern methods, I wanted to show you how these formulas respond when there are blank cells and errors in the source data.

**Table 2** makes use of the same formulas as **Table 1**, except it’s based on **DataTable[2]**. As you can see, the results are as clean as **Table 1**.

# Modern Methods

Firstly, make yourself fully aware of the UNIQUE function given its prominence throughout this section.

## Syntax

`=UNIQUE (array, [by_col], [exactly_once])`

## Arguments

`array`

: range to extract unique values from.`[by_col]`

(optional; default is FALSE): determines whether column values should be returned instead of rows. By default, rows are returned.`[exactly_once]`

(optional; default is FALSE): determines whether unique values should be returned. By default, distinct values are returned.

**T3.DV1** demonstrates the most basic use of the UNIQUE function. The range **DataTable[1]** has been referenced in the `array`

argument and the distinct values spill down from cell H24.

`=UNIQUE(DataTable[1])`

{7;3;2;9;8;5}

**T3.DV2** builds on this by wrapping the SORT function around the original formula. This will place the numbers in ascending order. Use -1 in the `[sort_order]`

argument for descending order.

`=SORT(UNIQUE(DataTable[1]))`

{2;3;5;7;8;9}

**T3.DV3** shows how distinct values over 5 are extracted by wrapping a FILTER statement inside the UNIQUE function and using the condition `DataTable[1]>5`

in the `include`

argument.

`=UNIQUE(FILTER(DataTable[1],DataTable[1]>5))`

{7;9;8}

Examples **T3.UV1**, **T3.UV3** and **T3.UV3** are constructed very similarly, except they extract unique values instead. This is done by setting the `[exactly_once]`

arguments to TRUE.

K52:=UNIQUE(DataTable[1],,TRUE)

{7;8;5}L52:=SORT(UNIQUE(DataTable[1],,TRUE))

{5;7;8}M52:=UNIQUE(FILTER(DataTable[1],DataTable[1]>5),,TRUE)

{7;8}

## Handling Blanks and Errors

That was all straightforward, wasn’t it? Certainly compared to the traditional methods anyway. However, as shown in **Table 4—**which is based on **DataTable[2]**—these methods don’t ignore blank cells and error values. Blanks are counted as 0, and the #N/A and #DIV/0! errors are considered values in their own right.

Although the others still return a result, examples **T4.DV3** and **T4.UV3** completely break down because of the error values in the data. Even if you take out the #N/A from **DataTable[2]**, the result will be #DIV/0! instead.

**Table 5** corrects these by extending the formulas to strip out errors and blanks.

All six examples feature this additional statement in the `include`

argument of the FILTER function:

`NOT(ISBLANK(DataTable[2]))*NOT(ISERROR(DataTable[2]))`

For the first part, FALSE is returned for the empty cells. The ISBLANK function itself will return TRUE, but as it’s nested in NOT, the opposite occurs. This is necessary as the `include`

argument requires TRUE values for inclusion.

`NOT(ISBLANK(DataTable[2])){TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE;FALSE;TRUE;TRUE}`

The asterisk (*) represents AND logic and allows a new condition to be introduced.

This statement returns FALSE for the #N/A! and #DIV/0! errors:

`NOT(ISERROR(DataTable[2])){TRUE;TRUE;TRUE;FALSE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE}`

Examples **T5.DV3** and **T5.UV3** have an additional FILTER statement to weed out the errors and blanks. Whilst this could be done with just one, it would involve repeating part of the formula.

The LET function allows names to be defined and assigned values. These variables can then be used multiple times in the same formula. This is great for constructing more readable formulas and reducing processing time.

Let’s focus on **T5.DV3**:

=LET(FilterBlanksandErrors,UNIQUE(

FILTER(DataTable[2],

NOT(ISBLANK(DataTable[2]))*NOT(ISERROR(DataTable[2]))

)),FILTER(FilterBlanksandErrors,FilterBlanksandErrors>5))

The formula opens with LET, and inside it the `name1`

argument contains the name of the value: ‘BlanksAndErrorsRemoved’.

Next, the calculation is defined in `name_value1`

. This consists of the UNIQUE function encompassing a FILTER statement that removes blanks and errors from the range.

`UNIQUE(`

FILTER(DataTable[2],

NOT(ISBLANK(DataTable[2]))*NOT(ISERROR(DataTable[2]))

))

The argument `calculation_or_name2`

houses the second FILTER condition, but inside—the defined name is referenced instead.

`FILTER(BlanksAndErrorsRemoved,BlanksAndErrorsRemoved>5)`

{7;9;8}

Before the advent of the LET function, you had to repeat part of the formula to achieve this. No longer!

# Final Words

When I started this article, the original plan was to show you only the modern techniques to extract unique values from a range. Then I had an epiphany.

I was driven by a desire to convince those struggling with their antiquated versions of Excel to upgrade to Microsoft 365. By demonstrating the traditional methods as well, it knew it would enable you to gain a sense of perspective—to show you how far Excel has come in a short period.

If you don’t have Microsoft 365, you cannot enjoy the pleasure of creating these new kinds of formulas. You are resigned to the clunky, cumbersome, and archaic approaches. It’s like running through treacle: you work your arse off, but the progress you make is never representative of your effort.

Even if you construct a formula that mimics a modern one, I can bet my bottom dollar that—not only has it taken you considerably longer—but it’s bound to cause issues further down the line.

When I was tinkering with the traditional formulas, sometimes I would forget to double-press the fill handle on the first cell to update the others as well. I think it must be my mind that is so accustomed to a dynamic array way of thinking.

This is a classic error people make anyway. Thankfully, in modern Excel it isn’t a problem. The one formula = many cells approach mitigates the risk of making these kinds of mistakes.

If you’re not dynamic array-ready and still using formulaic approaches of the 1990s, it’s time to change…or get left behind.