patternsqlMinor
Remove duplicates in sequence based on keys
Viewed 0 times
sequenceremovekeysbasedduplicates
Problem
I am looking to remove duplicates on a row by row comparison based on keys Member and Group and data sorted ASC on Member, Group and StartDate. Just return the records in the select query and not delete in the table.
I am basically looking at a row by row comparison to eliminate the duplicates based on the 2 keys and value of the Status column with the data sorted by Member, Group and Start_date.
Sample data:
Expected Output:
DDL and DML:
``
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Insert into temp1(Member_id, Groupie_id, Status,Start_date, End_Date )
values (1,2,'Active', '2011-04-12 01:19:51 PM', NULL);
Insert into temp1(Member_id, Groupie_id, Status,Start_date, End_Date )
values (1,2,'Active', '2011-04-12 02:19:51 PM', NULL);
Insert into temp1(Member_id, Groupie_id, Status,Start_date, End_Date )
values (1,2,'InActive',
I am basically looking at a row by row comparison to eliminate the duplicates based on the 2 keys and value of the Status column with the data sorted by Member, Group and Start_date.
Sample data:
Member Group Status Start_date End Date
1 2 Active 4/12/2011 01:19 NULL
1 2 Active 4/12/2011 02:19 NULL
1 2 Inactive 4/12/2011 03:19 NULL
1 2 Inactive 4/15/2011 04:19 NULL
1 2 Inactive 4/16/2011 05:19 NULL
1 2 Pending 4/16/2011 11:23 NULL
1 2 Active 4/16/2011 11:25 NULL
1 2 Pending 4/16/2011 11:27 NULL
11 22 Inactive 4/16/2011 11:27 NULLExpected Output:
Member Group Status Start_date End Date
1 2 Active 4/12/2011 01:19 NULL
1 2 Inactive 4/12/2011 03:19 NULL
1 2 Pending 4/16/2011 11:23 NULL
1 2 Active 4/16/2011 11:25 NULL
1 2 Pending 4/16/2011 11:27 NULL
11 22 Inactive 4/16/2011 11:27 NULLDDL and DML:
``
CREATE TABLE temp1 (
MEMBER_ID int(10) unsigned NOT NULL,
GROUPIE_ID int(10) unsigned NOT NULL,
STATUS varchar(12) CHARACTER SET utf8 NOT NULL,
START_DATE datetime DEFAULT NULL,
end_date` datetime DEFAULT NULL) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Insert into temp1(Member_id, Groupie_id, Status,Start_date, End_Date )
values (1,2,'Active', '2011-04-12 01:19:51 PM', NULL);
Insert into temp1(Member_id, Groupie_id, Status,Start_date, End_Date )
values (1,2,'Active', '2011-04-12 02:19:51 PM', NULL);
Insert into temp1(Member_id, Groupie_id, Status,Start_date, End_Date )
values (1,2,'InActive',
Solution
This works on consecutive
Sample SQL Fiddle.
The behavior of this query is similar to the
Notes on Partitions:
In term of performance, an index matching the order and columns used in the
Output:
Start_date for each partition (on Member):SELECT `Member`, `Group`, `Status`, `Start_date`, `End Date`
FROM (
SELECT @row := CASE WHEN @status=Status AND @member=Member AND @group = `Group`
THEN @row + 1 ELSE 1 END as row
, @member:=Member as Member
, @group:=`Group` as `Group`
, @status:=Status as Status
, `Start_date`, `End Date`
FROM data d
, (SELECT @row := 0, @status := '', @member := 0, @group := 0) v
ORDER BY Member, `Group`, Start_date
) as n
WHERE row = 1
ORDER BY Member, Start_date
;Sample SQL Fiddle.
The behavior of this query is similar to the
ROW_NUMBER() available in Oracle (>= 10g), PostgreSQL (>= 8.4) and SQL Server (>= 2012).Notes on Partitions:
- This query and SQL Fiddle partition by
Membersand orderStatusbyMembersandStart_date.
- Partition can be remove by removing
@member=MemberfromCASE
- Partition can be added or set by both
MembersandGrouplike this (SQL Fiddle):
- Add a test on
@group = 'group'to theCASE
- Add
@group := 'Group' as 'Group'in the innerSELECT[n]
- Add
@group := 0in the variableSELECT[v]
- Add
GrouptoORDER BYclauses
In term of performance, an index matching the order and columns used in the
ORDER BY and CASE is very likely needed.Output:
Member | Group | Status | Start_date | End Date
1 | 2 | Active | April, 12 2011 03:19:00 | (null)
1 | 2 | Inactive | April, 12 2011 05:19:00 | (null)
1 | 2 | Pending | April, 16 2011 13:23:00 | (null)
1 | 2 | Active | April, 16 2011 13:25:00 | (null)
1 | 2 | Pending | April, 16 2011 13:27:00 | (null)
11 | 22 | Inactive | April, 16 2011 13:27:00 | (null)Code Snippets
SELECT `Member`, `Group`, `Status`, `Start_date`, `End Date`
FROM (
SELECT @row := CASE WHEN @status=Status AND @member=Member AND @group = `Group`
THEN @row + 1 ELSE 1 END as row
, @member:=Member as Member
, @group:=`Group` as `Group`
, @status:=Status as Status
, `Start_date`, `End Date`
FROM data d
, (SELECT @row := 0, @status := '', @member := 0, @group := 0) v
ORDER BY Member, `Group`, Start_date
) as n
WHERE row = 1
ORDER BY Member, Start_date
;Member | Group | Status | Start_date | End Date
1 | 2 | Active | April, 12 2011 03:19:00 | (null)
1 | 2 | Inactive | April, 12 2011 05:19:00 | (null)
1 | 2 | Pending | April, 16 2011 13:23:00 | (null)
1 | 2 | Active | April, 16 2011 13:25:00 | (null)
1 | 2 | Pending | April, 16 2011 13:27:00 | (null)
11 | 22 | Inactive | April, 16 2011 13:27:00 | (null)Context
StackExchange Database Administrators Q#128298, answer score: 2
Revisions (0)
No revisions yet.