patternsqlMinor
Deleting all duplicates
Viewed 0 times
deletingduplicatesall
Problem
I am trying to delete all the duplicates but keeping single record only (shorter id).
Following query deletes duplicates but take lot of iterations to delete all copies and keeping original ones.
Its MySQL.
DDL
Following query deletes duplicates but take lot of iterations to delete all copies and keeping original ones.
DELETE FROM emailTable WHERE id IN (
SELECT * FROM (
SELECT id FROM emailTable GROUP BY email HAVING ( COUNT(email) > 1 )
) AS q
)Its MySQL.
DDL
CREATE TABLE `emailTable` (
`id` mediumint(9) NOT NULL auto_increment,
`email` varchar(200) NOT NULL default '',
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=298872 DEFAULT CHARSET=latin1Solution
Try this:
The above worked for my test of 50 emails (5 different emails duplicated 10 times).
You might need to add an index on the 'email' column:
It might be a bit slow fro 250,000 rows. It was slow for me on a table that had 1.5million rows (properly indexed), which is how I came up with this strategy:
The benefit of the memory table is there's an index that is used (primary key on minID) that speeds up the process over a normal temporary table.
DELETE FROM emailTable WHERE NOT EXISTS (
SELECT * FROM (
SELECT MIN(id) minID FROM emailTable
GROUP BY email HAVING COUNT(*) > 0
) AS q
WHERE minID=id
)The above worked for my test of 50 emails (5 different emails duplicated 10 times).
You might need to add an index on the 'email' column:
ALTER TABLE emailTable ADD INDEX ind_email (email);It might be a bit slow fro 250,000 rows. It was slow for me on a table that had 1.5million rows (properly indexed), which is how I came up with this strategy:
/* CREATE MEMORY TABLE TO HOUSE IDs of the MIN */
CREATE TABLE email_min (minID INT, PRIMARY KEY(minID)) ENGINE=Memory;
/* INSERT THE MINIMUM IDs */
INSERT INTO email_min SELECT id FROM email
GROUP BY email HAVING MIN(id);
/* MAKE SURE YOU HAVE RIGHT INFO */
SELECT * FROM email
WHERE NOT EXISTS (SELECT * FROM email_min WHERE minID=id)
/* DELETE FROM EMAIL */
DELETE FROM email
WHERE NOT EXISTS (SELECT * FROM email_min WHERE minID=id)
/* IF ALL IS WELL, DROP MEMORY TABLE */
DROP TABLE email_min;The benefit of the memory table is there's an index that is used (primary key on minID) that speeds up the process over a normal temporary table.
Code Snippets
DELETE FROM emailTable WHERE NOT EXISTS (
SELECT * FROM (
SELECT MIN(id) minID FROM emailTable
GROUP BY email HAVING COUNT(*) > 0
) AS q
WHERE minID=id
)ALTER TABLE emailTable ADD INDEX ind_email (email);/* CREATE MEMORY TABLE TO HOUSE IDs of the MIN */
CREATE TABLE email_min (minID INT, PRIMARY KEY(minID)) ENGINE=Memory;
/* INSERT THE MINIMUM IDs */
INSERT INTO email_min SELECT id FROM email
GROUP BY email HAVING MIN(id);
/* MAKE SURE YOU HAVE RIGHT INFO */
SELECT * FROM email
WHERE NOT EXISTS (SELECT * FROM email_min WHERE minID=id)
/* DELETE FROM EMAIL */
DELETE FROM email
WHERE NOT EXISTS (SELECT * FROM email_min WHERE minID=id)
/* IF ALL IS WELL, DROP MEMORY TABLE */
DROP TABLE email_min;Context
StackExchange Database Administrators Q#5859, answer score: 8
Revisions (0)
No revisions yet.