patternsqlMinor
MySQL: Using DISTINCT and GROUP BY together?
Viewed 0 times
distinctgrouptogethermysqlusingand
Problem
I saw the following MySQL query that that uses both DISTINCT and GROUP BY together:
Here is a scenario to go along with the query: Each user has a unique id,
I found this confusing(after coming from Oracle DBs) and had below questions:
SELECT DISTINCT user_id, post_id, post_content
FROM some_table
GROUP BY post_id, user_id
HAVING post_content LIKE '%abc%';Here is a scenario to go along with the query: Each user has a unique id,
user_id, and can make multiple posts which are identified by a unique id, post_id. Each post would contain some text. I found this confusing(after coming from Oracle DBs) and had below questions:
- What is the meaning of using
GROUP BYwithout doing any aggregation?
- What is the significance of switching the order of columns in
SELECTvs inGROUP BY?
- What is the meaning of omitting the third column from
GROUP BY?
- Why is
DISTINCTused along withGROUP BY? Does distinct operation run after all the groupings are done on the final result or before?
Solution
The insane ability to allow partial group by in older versions of MySQL, has to be one top contender for most caused confusion in the it industry.
Given the table:
The statement
could mean (1,1) or (1,2) and MySQL would randomly return one of these. DISTINCT does not matter in this case, the result is still in-deterministic.
SQL92 required that all columns in the select clause (except aggregated columns, and constants) is part of the GROUP BY clause.
SQL99 loosened this restriction a bit and allowed us to leave out columns from the GROUP BY that are functionally dependent of the remaining columns. I.e.
would be valid since y is f.d. of x
Surprisingly enough (for me) later version of MySQL is best in class when it comes to implement the SQL99 version. I haven't checked it lately, but when I did MySQL handled fairly complicated scenarios well, where as PostgreSQL only handled trivial ones.
To answer your questions
1)
means that the combination of x, y is a group. In all possible situations I can think of this is the same as:
Since they are logically evaluated at different times, there might be some corner-case where they would actually differ (I cant think of one though)
2) None, in this regard they are a set of columns, so there is no order
3) See above.
4) The logical order of evaluation of an SQL query is:
so GROUP BY is supposed to be evaluated before DISTINCT. I can not think of a situation where this would matter.
In your query I suspect that someone got confusing results, and tried to get another result using DISTINCT. They probably where lucky (or unlucky) to get the result they expected, so the DISTINCT stayed. The bug is still there though
Given the table:
CREATE TABLE t
( x int not null primary key
, y int not null
);
INSERT INTO t (x,y) VALUES (1,1),(1,2);The statement
SELECT x, y FROM t GROUP BY xcould mean (1,1) or (1,2) and MySQL would randomly return one of these. DISTINCT does not matter in this case, the result is still in-deterministic.
SQL92 required that all columns in the select clause (except aggregated columns, and constants) is part of the GROUP BY clause.
SQL99 loosened this restriction a bit and allowed us to leave out columns from the GROUP BY that are functionally dependent of the remaining columns. I.e.
CREATE TABLE t
( x int not null primary key
, y int not null
);
SELECT x, y FROM t GROUP by xwould be valid since y is f.d. of x
Surprisingly enough (for me) later version of MySQL is best in class when it comes to implement the SQL99 version. I haven't checked it lately, but when I did MySQL handled fairly complicated scenarios well, where as PostgreSQL only handled trivial ones.
To answer your questions
1)
SELECT x, y FROM t GROUP BY x, ymeans that the combination of x, y is a group. In all possible situations I can think of this is the same as:
SELECT DISTINCT x, y FROM tSince they are logically evaluated at different times, there might be some corner-case where they would actually differ (I cant think of one though)
2) None, in this regard they are a set of columns, so there is no order
3) See above.
4) The logical order of evaluation of an SQL query is:
FROM, JOIN
WHERE
GROUP BY
HAVING
SELECT
DISTINCT
ORDER BY
FETCH FIRSTso GROUP BY is supposed to be evaluated before DISTINCT. I can not think of a situation where this would matter.
In your query I suspect that someone got confusing results, and tried to get another result using DISTINCT. They probably where lucky (or unlucky) to get the result they expected, so the DISTINCT stayed. The bug is still there though
Code Snippets
CREATE TABLE t
( x int not null primary key
, y int not null
);
INSERT INTO t (x,y) VALUES (1,1),(1,2);SELECT x, y FROM t GROUP BY xCREATE TABLE t
( x int not null primary key
, y int not null
);
SELECT x, y FROM t GROUP by xSELECT x, y FROM t GROUP BY x, ySELECT DISTINCT x, y FROM tContext
StackExchange Database Administrators Q#262408, answer score: 5
Revisions (0)
No revisions yet.