Last Updated : 15 Jul, 2025
In PostgreSQL, handling duplicate rows is a common task, especially when working with large datasets. Fortunately, PostgreSQL provides several techniques to efficiently delete duplicate rows, and one of the most effective approaches is using subqueries.
In this article, we will demonstrate how to identify and remove duplicate rows while keeping the row with either the lowest or highest ID, depending on your requirements.
Setting Up a Sample TableFor the purpose of demonstration let's set up a sample table(say, 'basket') that stores 'fruits' as follows:
PostgreSQL
CREATE TABLE basket(
id SERIAL PRIMARY KEY,
fruit VARCHAR(50) NOT NULL
);
INSERT INTO basket(fruit) values('apple');
INSERT INTO basket(fruit) values('apple');
INSERT INTO basket(fruit) values('orange');
INSERT INTO basket(fruit) values('orange');
INSERT INTO basket(fruit) values('orange');
INSERT INTO basket(fruit) values('banana');
SELECT * FROM basket;
This should result into below:
Now that we have set up the sample table, we will query for the duplicates using the following.
Query:
SELECT fruit, COUNT( fruit ) FROM basket GROUP BY fruit HAVING COUNT( fruit )> 1 ORDER BY fruit;
This should lead to the following results:
Deleting Duplicate Rows with a SubqueryTo delete the duplicate rows while keeping the row with the lowest ID, you can use a subquery with the 'ROW_NUMBER()'
window function. This method ensures that only one row per fruit is retained, and all other duplicates are removed.
Query:
DELETE FROM basket WHERE id IN (SELECT id FROM (SELECT id, ROW_NUMBER() OVER( PARTITION BY fruit ORDER BY id ) AS row_num FROM basket ) t WHERE t.row_num > 1 );
Explanation:
If you want to keep the duplicate row with highest id, just change the order in the subquery:
DELETE FROM basket WHERE id IN (SELECT id FROM (SELECT id, ROW_NUMBER() OVER( PARTITION BY fruit ORDER BY id ) AS row_num FROM basket ) t WHERE t.row_num > 1 );
This query will retain the row with the highest ID for each duplicate group and delete all other duplicates.
Deleting Duplicates Based on Multiple ColumnsIn case you want to delete duplicate based on values of multiple columns, here is the query template.
Query:
DELETE FROM table_name WHERE id IN (SELECT id FROM (SELECT id, ROW_NUMBER() OVER( PARTITION BY column_1, column_2 ORDER BY id ) AS row_num FROM table_name ) t WHERE t.row_num > 1 );
Explanation:
PARTITION BY
clause includes multiple columns ('column_1', 'column_2'
), ensuring duplicates are identified based on the combination of those columns.In this case, the statement will delete all rows with duplicate values in the 'column_1' and 'column_2' columns. To verify the above use the below query.
Query:
SELECT fruit, COUNT( fruit ) FROM basket GROUP BY fruit HAVING COUNT( fruit )> 1 ORDER BY fruit;
Output:
If the deletion was successful, this query should return an empty result set, indicating no duplicates remain.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4