HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Check if DISTINCT is really necessary

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
distinctcheckreallynecessary

Problem

I have a query in mysql as follows.

SELECT DISTINCT `tableA`.`ColA`, `tableA`.`ColB`, `tableB`.`ColA`, ...
FROM `tableA`
LEFT JOIN `tableB` ON `tableA`.`colC` = `tableB`.`colA`
WHERE 
ORDER BY `tableA`.`ColA` DESC `tableB`.`ColA` ASC
LIMIT 0,20


Now executing the query takes approximately 13 seconds due to the size of the respective tables. However this query is a lot faster (i.e. 2 seconds)

SELECT `tableA`.`ColA`, `tableA`.`ColB`, `tableB`.`ColA`, ...
FROM `tableA`
LEFT JOIN `tableB` ON `tableA`.`colC` = `tableB`.`colA`
WHERE 
ORDER BY `tableA`.`ColA` DESC `tableB`.`ColA` ASC
LIMIT 0,20


I'm aware that this may result in non-distinct rows but I'm not sure if this situation has been taken care of by the php code on the front end. So I actually want to compare the data of the two result sets both of which result in approximately 83,000 rows if not limited.

Can anyone think of a way to do this in SQL instead of doing it via a php script. I've compared the first 100 rows by eye without any issues but as we all know it only takes one exception at the 101st row to mess everything up. So I need a query that I can execute a few times every so often to see if there are rows that are in one result set that are not in the other.

Solution

OK, first things first, talk to the PHP devs to see if this situation is taken care of for you, which would then save a lot of pain and heartache!

Other than that, try with a group by instead of a distinct, see what the performance differences are - it may be that the group by is faster based on the indexes you have on the tables.

With the GROUP BY option, you can add a count(*) then talk to the php devs to ensure they know that if that result is 2 or more, then they need to handle it differently than if the result is a 1.

Context

StackExchange Database Administrators Q#5131, answer score: 5

Revisions (0)

No revisions yet.