patternsqlMinor

MySQL: Using DISTINCT and GROUP BY together?

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

distinctgrouptogethermysqlusingand

Problem

I saw the following MySQL query that that uses both DISTINCT and GROUP BY together:

SELECT  DISTINCT user_id, post_id, post_content
    FROM  some_table
    GROUP BY  post_id, user_id
    HAVING  post_content LIKE '%abc%';

Here is a scenario to go along with the query: Each user has a unique id, user_id, and can make multiple posts which are identified by a unique id, post_id. Each post would contain some text.

I found this confusing(after coming from Oracle DBs) and had below questions:

What is the meaning of using GROUP BY without doing any aggregation?

What is the significance of switching the order of columns in SELECT vs in GROUP BY?

What is the meaning of omitting the third column from GROUP BY?

Why is DISTINCT used along with GROUP BY? Does distinct operation run after all the groupings are done on the final result or before?

Solution

The insane ability to allow partial group by in older versions of MySQL, has to be one top contender for most caused confusion in the it industry.

Given the table:

CREATE TABLE t
( x int not null primary key
, y int not null 
);

INSERT INTO t (x,y) VALUES (1,1),(1,2);

The statement

SELECT x, y FROM t GROUP BY x

could mean (1,1) or (1,2) and MySQL would randomly return one of these. DISTINCT does not matter in this case, the result is still in-deterministic.

SQL92 required that all columns in the select clause (except aggregated columns, and constants) is part of the GROUP BY clause.

SQL99 loosened this restriction a bit and allowed us to leave out columns from the GROUP BY that are functionally dependent of the remaining columns. I.e.

CREATE TABLE t
( x int not null primary key
, y int not null 
);

SELECT x, y FROM t GROUP by x

would be valid since y is f.d. of x

Surprisingly enough (for me) later version of MySQL is best in class when it comes to implement the SQL99 version. I haven't checked it lately, but when I did MySQL handled fairly complicated scenarios well, where as PostgreSQL only handled trivial ones.

To answer your questions

1)

SELECT x, y FROM t GROUP BY x, y

means that the combination of x, y is a group. In all possible situations I can think of this is the same as:

SELECT DISTINCT x, y FROM t

Since they are logically evaluated at different times, there might be some corner-case where they would actually differ (I cant think of one though)

2) None, in this regard they are a set of columns, so there is no order

3) See above.

4) The logical order of evaluation of an SQL query is:

FROM, JOIN 
WHERE
GROUP BY
HAVING
SELECT
DISTINCT
ORDER BY
FETCH FIRST

so GROUP BY is supposed to be evaluated before DISTINCT. I can not think of a situation where this would matter.

In your query I suspect that someone got confusing results, and tried to get another result using DISTINCT. They probably where lucky (or unlucky) to get the result they expected, so the DISTINCT stayed. The bug is still there though

Code Snippets

CREATE TABLE t
( x int not null primary key
, y int not null 
);

INSERT INTO t (x,y) VALUES (1,1),(1,2);

SELECT x, y FROM t GROUP BY x

CREATE TABLE t
( x int not null primary key
, y int not null 
);

SELECT x, y FROM t GROUP by x

SELECT x, y FROM t GROUP BY x, y

SELECT DISTINCT x, y FROM t

Context

StackExchange Database Administrators Q#262408, answer score: 5

Revisions (0)

No revisions yet.