Blog Viewer

Nifty New Feature in Excel 2007 - Remove Duplicates
Created By: Herman, Marie On Fri, Mar 12, 2010 07:24 AM

So you have inherited data that you know includes duplicate rows (or maybe you are merging a couple of lists together that have overlapping data) and you sure would like to find a way to remove those duplicates without having to manually delete the rows. To accomplish this action previously in prior Excel versions, you had to create a somewhat complicated Visual Basic macro. Enter Excel 2007 and the very nifty (I will repeat that: VERY NIFTY) Remove Duplicates feature that has been added. 

HOWEVER... DISCLAIMER!!! CAUTION!!! PROCEED WITH CAUTION BEFORE TRYING THIS!!! DATA DOES GET DELETED!!!! SAVE A COPY OF YOUR DATA FIRST JUST IN CASE!!! HAVE I MADE MYSELF CLEAR????

 

OK, back to the cool Remove Duplicates feature now. There are a couple of things you need to know about this feature so that it functions correctly for you. To do that, we're going to set up a quick sample spreadsheet so you can see the consequences of your choices immediately.

 

Type (or just copy and paste) the white cells below in a new blank spreadsheet:

 

 

A

B

C

1

First

Last

Fruit

2

John

Smith

Apple

3

Mary

Smith

Orange

4

John

Smith

Watermelon

5

Mary

Smith

Lemon

6

Allan

Smith

Lime

 

Got it?

Now click the Data Ribbon, then Remove Duplicates. Excel automatically selects your cell range. A window pops up and shows you a list of your columns with checkmarks in front of all the column names.

If you were to leave all three columns selected at this point, Excel would check for duplicates in all cells. That means that ALL THREE CELLS have to match ALL THREE CELLS in another row in order to be considered a duplicate. Thus, you would find NO duplicates if you ran it right now. (Feel free to go ahead and try it anyway).

If you were to uncheck Column C but leave columns A and B selected, Excel would remove the duplicate rows where there were exact matches in BOTH CELLS on each row and leave you with one row each of John Smith, Mary Smith, Allan Smith. Voila, right? Except for the minor detail that you don’t get to pick which rows stay or go. Excel goes in order from top to bottom, keeping the first occurrence and deleting any duplicate rows that follow. Also, one more thing to note. Excel doesn't actually delete the full rows in your spreadsheet when it removes duplicates. It deletes the cells in the row within the selected data. Data outside your range is not affected.  (So if you had data in cells E1-E6, it wouldn’t be affected by the remove duplicates process taking place in cells A1-C6.)

If you were to uncheck Columns C and A, but leave Column B selected, Excel would remove ALL duplicates of Smith, which means you end up with one entry: John Smith.

Be aware that if you have preselected cells and you choose “continue with that selection” when the window pops up after you click Remove Duplicates, it will delete the duplicate cells (but only from what you have selected) and thus may leave blank cells in your data (which may or may not be what you want).

Another option for you to consider… Conditional Formatting in Excel 2007 now allows you to apply formatting on duplicate values and will show you a visual representation (i.e. certain cell color, text color, bold, whatever you want) of the matching cells. I am not explaining in detail here, but you can see my past post on conditional formatting to get started with it. However, this only checks each cell against other cells in the same column (not multiple cells against multiple cells), so it’s more limited, but it’s a bit of a doublecheck before you go randomly deleting rows.

That Remove Duplicates feature is pretty cool, isn’t it? Obviously you can see why you need to be cautious in how you run it or you may end up with more or less data deleted than you wanted or expected. But this can certainly save you a ton of time!

Go forth and play!!! (Just remember that caution to save your data first!)

 

Score 2.6/5
1Ratings
1Comments
100Views
Permalink

To link to this Blog, copy and paste the URL below into your blog or homepage. Using this link will ensure access to the Blog, even if it is archived.

Comment  
Thanks Marie, I have indeed used this feature in 2007. Works great. Although our office hasn't migrated to 2007 yet, I've used this at home and I believe this was one of the questions on the MCAS exam for Excel.
No Related Links
Copyright © 2009. All rights reserved.