• August 29th, 2010 @ 5:07 PM by Jean-Francois LEBON | Be the 1st to comment

    Wonder how to display or delete duplicate records ? Sometimes, under certain circumstances duplicate data may occur. Then it should be deleted.

    In this example, we are going to create a dummy table with an identity column as a unique key, and with two varchar columns. Then we are going to insert data and duplicates manually. The script for the test comes as follow:

    create table dummytable(
    keycolumn   numeric(5,0) identity,
    column1     varchar(10),
    column2     varchar(10)
    )
    go
    
    insert dummytable(column1,column2) values("AAA","AAA")
    go 10
    (1 row affected)
    10 xacts:
    
    insert dummytable(column1,column2) values("AAA","BBB")
    go
    (1 row affected)
    
    insert dummytable(column1,column2) values("BBB","BBB")
    go
    (1 row affected)
    
    insert dummytable(column1,column2) values("BBB","AAA")
    go
    (1 row affected)

    The result should looks like this:

    select * from dummytable
    go
     keycolumn    column1    column2
     ------------ ---------- --------------------
                1 AAA        AAA
                2 AAA        AAA
                3 AAA        AAA
                4 AAA        AAA
                5 AAA        AAA
                6 AAA        AAA
                7 AAA        AAA
                8 AAA        AAA
                9 AAA        AAA
               10 AAA        AAA
               11 AAA        BBB
               12 BBB        BBB
               13 BBB        AAA
    
    (13 rows affected)
    1°) how to select duplicate rows:

    To look for duplicates rows and display them, you can execute the sql statement below. It will return one row for each duplicate values in the table.

    select keycolumn from dummytable A
    group by column1, column2 having count(*) > 1
    go

    Or you can look for legitimate rows (the ones that are not duplicates) by executing the sql statement below. Here it will display one row for each non duplicates values in the table.

    select max(keycolumn) from dummytable
    group by column1,column2 order by 1
    go
    
     --------
           10
           11
           12
           13
    
    (4 rows affected)
    2°) delete duplicate rows and keep one copy of the row:

    Once we have identified the duplicates we want to remove from the table, we can delete them. It’s wise to use begin tran to create a transaction in your statement and check the result before to commit. If the result is not what you expect, you can still rollback your transaction safely.

    begin tran
    delete dummytable where keycolumn not in (select max( keycolumn )
    from dummytable group by column1,column2)
    go
    select * from dummytable
    go
     keycolumn    column1    column2
     ------------ ---------- --------------------
               10 AAA        AAA
               11 AAA        BBB
               12 BBB        BBB
               13 BBB        AAA
    
    (4 rows affected)
    commit tran
    go

    Tags: , , , , , ,