Here’s How to Delete Duplicate Rows in SQL with Syntax and Examples | Simplilearn

Simplilearn
Simplilearn Engineering
8 min readMay 7, 2021

In SQL tables, there may be duplicate rows which often leads to data inconsistency. These problems can be overcome with a primary key, but sometimes when these rules are not followed or an exception occurs, the problem becomes more daunting. It is a better practice to use the relevant keys and constraints to remove the risk of duplicate rows. To clean up duplicate data, you must use some special procedures.

Deleting Duplicate Rows in SQL

In an SQL Server table, duplicate records can be a serious problem. Duplicate data can result in orders being handled several times, incorrect reporting results, and more. In SQL Server, there are many options for dealing with duplicate records in a table, depending on the circumstances. They are:

  • Unique constraints in Table
  • No Unique constraints in Table

Unique Constraints in Table

According to Delete duplicate rows in SQL, a table with a unique index can use the index to identify duplicate data, and then delete the duplicate records. Self-joins, ordering the data by the maximum value, using the RANK function, or using NOT IN logic is used to perform identification.

No Unique Constraints in Table

Under Delete duplicate rows in SQL, it’s a little difficult for tables without a special index. The ROW NUMBER() function can be used in connection with a common table expression (CTE) to sort the data and then remove the duplicate records.

SQL Delete Duplicate Rows using Group By and Having Clause

According to Delete Duplicate Rows in SQL, for finding duplicate rows, you need to use the SQL GROUP BY clause. The COUNT function can be used to verify the occurrence of a row using the Group by clause, which groups data according to the given columns.

Code:

Creating and inserting data into the table:

First, you need to create a table as follows:

CREATE TABLE stud(Regno integer, Name text,Marks integer);

/* Create few records in this table */

INSERT INTO stud VALUES(1,’Tom’,77);

INSERT INTO stud VALUES(2,’Lucy’,78);

INSERT INTO stud VALUES(3,’Frank’,89);

INSERT INTO stud VALUES(4,’Jane’,98);

INSERT INTO stud VALUES(5,’Robert’,78);

INSERT INTO stud VALUES(3,’Frank’,89);

INSERT INTO stud VALUES(5,’Robert’,78);

INSERT INTO stud VALUES(4,’Jane’,98);

COMMIT;

Fetching and Identifying the duplicate rows in SQL

/* Display all the records from the table */

SELECT * FROM stud;

Input

Output

The table depicted above consisting of data repetition, that is duplicate data, can be deleted using the group by clause as follows:

Deleting Duplicate Rows From the Table Using Group by and Having Clause

Under SQL, delete duplicate Rows in SQL is done with the Group by and Having clause. It is done as follows:

Code:

select Name,Marks,grade,count(*) as cnt from stud group by Name,Marks,grade having count(*) > 1;

Input:

Output:

SQL Delete Duplicate Rows Using Common Table Expressions (CTE)

Common Table Expression

In delete duplicate rows in SQL, the acronym CTE stands for “common table expression.” It’s a named temporary result set created by a simple query and specified within the scope of a single SELECT, INSERT, UPDATE, or DELETE expression. You can write complex recursive queries using CTE. It has a lot more traction than temporary tables.

Syntax

WITH [CTEName]

As

( Select col1,col2,col3 from [tablename] where [condition]

Select col1,col2,col3 from [CTEName]

Procedure for Removing the Duplicate Rows Using CTE

First, you need to make a table Employ_DB table script to SQL Server and run it in the required database.

Creating the Table

Create table Employ_DB(emp_no number(10),emp_name varchar(20),emp_address varchar(25), emp_eoj date);

Input:

Output

Inserting Data Into the Table

After the table has been established, you must now insert some records into it, including some duplicates.

Code:

Insert into Employ_DB values(11,’Mohith’,’tokya’,’12-may-2000');

Input:

Output:

(8) rows inserted

Searching Data From the Table

Next, execute the preceding script and use the following query to search for the data in the table.

To print the records from the Employ_EB in a sorted list, use the field name and address of an employee.

Code:

Select * from Employ_DB order by emp_name,emp_address;

Input:

Output:

Deleting Duplicate Rows in SQL Using CTE

According to Delete Duplicate Rows in SQL, the above contains many of the duplicated records that have been fixed onto the table. For numbering duplicate city records by state, you’ll use CTE’s row number () feature. CTE can generate a temporary result set that you can use to remove redundant records from the actual table with a single question.

Code:

With CTE as (Select emp_no,emp_name,row_number()

Over (partition by emp_no order by emp_no) as number_of_employ

From Employ_DB)Select * from CTE order by emp_name,emp_no;

Finally, delete the duplicate record using the Common Type expression is as follows:

Code:

With CTE as (Select emp_no,emp_name,row_number()

Over (partition by emp_no order by emp_no) as number_of_employ

From Employ_DB)

Select * from CTE where number of employ >1 order by emp_no;

According to Delete Duplicate Rows in SQL, in the above table, only two of the records are duplicated based on the emp_no. So, now you are going to delete that duplicate record from the table Employ_DB using the following code.

Rank Function to SQL Delete Duplicate Rows

According to Delete Duplicate Rows in SQL, you can also use the SQL RANK feature to get rid of the duplicate rows. Regardless of duplicate rows, the SQL RANK function returns a unique row ID for each row.

You need to use aggregate functions like Max, Min, and AVG to perform calculations on data. Using these functions, you get a single output row. The SQL RANK functions in SQL Server allow you to specify a rank for individual fields based on categorizations. For each participating line, it returns an aggregated value. RANK functions in SQL are often referred to as Window Functions.

You can implement Rank functions in four ways.

  1. ROW_NUMBER()
  2. RANK()
  3. DENSE_RANK()
  4. NTILE()

According to Delete Duplicate Rows in SQL, the PARTITION BY clause is used with the RANK function in the following query. The PARTITION BY clause divides the data into subsets for the listed columns and assigns a rank to each partition.

Code:

Select *,

RANK() OVER ( PARTITION BY Animal_id,Animal_name ORDER BY sno DESC )rank

From Animals ) T on A.sno=T.sno

Where T.Rank>1

Alter table Animals

Drop column sno

Select * from Animals

You can see in the screenshot that the row with a rank greater than one needs to be removed. Let’s use the following question to get rid of those rows.

Use SSIS Package to SQL Delete Duplicate Rows

The SQL Server integration service includes several transformations and operators that aid administrators and developers in reducing manual labor and optimizing tasks. The SSIS package can also be used to delete duplicate rows from a SQL table.

Use Sort Operator in an SSIS Package for Removing Duplicating Rows

A Sort operator can be used to sort the values in a SQL table. You may wonder how data sorting will help you get rid of duplicate rows. Here is how it is.

To demonstrate this challenge, build an SSIS kit.

First, build a new Integration kit in SQL Server Data Tools. Add an OLE DB source link to the new bundle.

Gain expertise in the latest Business analytics tools and techniques with the Business Analyst Certification Program. Enroll now!

Conclusion

In this article, you looked at how to remove duplicate rows in SQL using T-SQL, CTE, and the SSIS kit, among other methods. You are free to use whatever approach makes you feel most at ease. However, it is advised against directly implementing these procedures and packaging on output results. You should conduct your tests in a less demanding setting.

To become an expert in SQL programming language, join our Simplilearn’s SQL Certification Training Course. This SQL certification course provides you with everything you need to get started working with SQL databases and incorporating them into your applications. Learn how to organize your database correctly, write effective SQL statements and clauses, and maintain your SQL database for scalability. This course includes comprehensive coverage of SQL basics, comprehensive coverage of all relevant query tools and SQL commands, an industry-recognized course completion certificate, and lifetime access to self-paced learning.

Despite the fact that SQL is an old language, it is still very useful today as companies all over the world collect massive amounts of data. SQL is one of the most in-demand engineering skills, and mastering it will vastly improve your resume.

Database and relationship management, Query tools and SQL commands, Aggregate functions, Group by clause, Tables and joins, Subqueries, Data manipulation, Transaction control, Views, and procedures are among the skills covered.

Start learning for free today’s most in-demand skills. This course emphasize the development of good core skills for future career advancement. Experts in the field will teach you. Get access to 300+ job-ready skills in today’s most in-demand fields. Learn from anywhere, on any laptop, while working or studying. Explore free courses here. You can find free guides on various career paths, wages, interview tips, and more.

Have any questions for us? Leave them in the comments section of this article, and our experts will get back to you on them, as soon as possible!

Originally published at https://www.simplilearn.com on May 7, 2021.

--

--

Simplilearn
Simplilearn Engineering

The world’s leading digital skills provider & an Onalytica Top 20 brand. Our 400+ courses have helped 1,000,000+ professionals advance their careers