The UNIQUE()
function in Excel offers a quick and versatile method to eliminate duplicates from a range or table, though it occasionally produces unexpected results. Consider the following example: I have a simple dataset containing people’s names, their locations, and their scores. I aim to identify any potential duplicates within this dataset, making UNIQUE()
an apt choice.
The peculiar results found in cells F6
and G4
, where expected blank values are instead returned as zeros, can lead to confusion. For instance, Bob from Chicago did not necessarily score a zero; his result might simply be unknown. This ambiguity complicates the interpretation of the data, especially when determining if certain rows, such as the one where Bob actually scores a 9, are duplicates. Similarly, Clara’s location being listed as 0 further muddies the clarity of the dataset, making it challenging to discern potential duplicate entries.
This issue arises from Excel’s treatment of blank cells within array functions and calculations. In Excel, empty cells are often automatically interpreted as zeros during numerical operations, including those involving dynamic array functions. This default behavior aims to maintain consistency in calculations that include blank cells.
To address this inconvenient and potentially misleading feature of Excel’s backend operations, let’s explore a few strategies to mitigate its impact. You can follow these suggestions using the provided workbook as a guide:
Option A: Use FILTER()
to remove blank values
One approach here is to first eliminate any missing values prior to employing the UNIQUE()
function, which can be achieved through the use of another dynamic array function, FILTER()
.
By specifying multiple criteria within FILTER()
, it’s possible to exclude any rows that contain missing values in the specified columns. Subsequently, the unique values from this filtered dataset are returned. This method ensures that the data processed by UNIQUE()
is clean and devoid of any missing entries, resulting in more accurate and reliable outcomes.
I’m not particularly fond of this approach because I still wish to account for rows with blank values. It’s not my intention to simply write these off as duplicates or incomplete data. Instead, I aim to delve into them further and fully grasp the array combinations in the data.
Option B: Change the appearance of zero values in the workbook
To adjust the appearance of zeros in Excel, an alternative approach involves modifying their display settings. Begin by navigating to the File tab on the ribbon, then proceed to Options and select Advanced.
Scroll down to the section labeled “Display options for this worksheet.” Here, you will find a setting titled “Show a zero in cells that have zero value.” Disable this option to change how zeros are displayed in your worksheet. This method offers a straightforward way to customize the visual representation of zeros, enhancing the readability or aesthetic of your Excel document.
This approach will ensure we obtain all the distinct rows, but with a clearer presentation: instead of displaying confusing zeros, those cells will be left blank.
This approach ensures that no rows are arbitrarily removed due to potential data integrity issues, preserving the data in a more authentic format without resorting to converting missing values to zeros in Excel.
Which method do you prefer for handling missing values in UNIQUE()
? Share your thoughts in the comments below.
To learn more about the UNIQUE()
function and other dynamic array functions, check out my book Modern Data Analytics in Excel:
Jon Peltier
“Show a zero in cells that have zero value” is not a good option, because it turns zero values into blanks. What if Bob really had a score of zero? And if subsequent cells had formulas that included the zero/blank cells, the result may be confusing: the average of 2 and 0 is obviously 1, but not so the average of 2 and blank. Excel ignores actual blanks when doing most calculations, but not a zero that is displayed as a blank.
George Mount
Thanks Jon for pointing out those contingencies — Over on LinkedIn Rick Rothstein suggested another option that could avoid these downsides:
> If you don’t mind your numbers displaying as text, you can suppress the zeros that occur due to blanks using this formula instead of hiding all zeros…
`=UNIQUE(“”&names)`
I suppose the problem there is if you want to aggregate those numeric results.
Jon Peltier
I strenuously object to storing numbers as text. I would use something like this to ensure that what looks blank contains no numerical data:
=UNIQUE(IF(names=””,””,names))
Text in a numeric column, including “”, is ignored in most numerical functions.
George Mount
Makes sense, thanks Jon for another great contender here.