HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Does the SQL Spec require a GROUP BY in EXISTS ()

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thegroupsqlexistsrequiredoesspec

Problem

Microsoft currently permits this syntax.

SELECT *
FROM ( VALUES (1) ) AS g(x)
WHERE EXISTS (
  SELECT *
  FROM ( VALUES (1),(1) )
    AS t(x)
  WHERE g.x = t.x
  HAVING count(*) > 1
);


Notice that there is no GROUP BY in the EXISTS clause, is that valid ANSI SQL. Or is it merely exposing an implementation detail.

For reference, this same syntax isn't permitted in PostgreSQL.


ERROR: column "t.x" must appear in the GROUP BY clause or be used in an aggregate function

But this syntax is permitted..

SELECT *
FROM ( VALUES (1) ) AS g(x)
WHERE EXISTS (
  SELECT 1  -- This changed from the first query
  FROM ( VALUES (1),(1) )
    AS t(x)
  WHERE g.x = t.x
  HAVING count(*) > 1
);


And this syntax is permitted.

SELECT *
FROM ( VALUES (1) ) AS g(x)
WHERE EXISTS (
  SELECT *
  FROM ( VALUES (1),(1) )
    AS t(x)
  WHERE g.x = t.x
  GROUP BY t.x  -- This changed from the first query
  HAVING count(*) > 1
);


Question arises from a conversation with @ErikE in chat

Solution

I found it in the SQL 2011 spec...


If the ` “*” is simply contained in a that is immediately contained in an , then the is equivalent to a that is an arbitrary .

This confirms that by
* not being equivalent to an arbitrary literal in this context that it is in fact PostgreSQL breaking the spec.

Keep in mind this is a distinct problem from

SELECT *
FROM ( VALUES (1),(2),(3) ) AS t(x)
HAVING count(*) > 1


Which both databases reject.

PostgreSQL,


ERROR: column "t.x" must appear in the GROUP BY clause or be used in an aggregate function

SQL Server,


Column 't.x' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Why this bug persists in PostgreSQL

Thanks goes to RhodiumToad on irc.freenode.net/#PostgreSQL for his help trouble shooting this. He also points out the difficulty in resolving this situation


20:33 the one problem is that in pg you can do exists(select func() from ... where the func() is an SRF that might return 0 rows

An SRF is a set returning function.

In PostgreSQL, we can do for instance use an SRF to generate a series from 1-10 (
generate_series` is in core)

SELECT * FROM generate_series(1,10);


And, we likewise can put it right here.

SELECT generate_series(1,10);


Two of them together give us a cross-join (cartesian product)

SELECT generate_series(1,10), generate_series(1,2);


But, if either of those return 0-rows you get nothing.. Effectually the same as this

SELECT * FROM ( VALUES (1) ) AS t(x)
CROSS JOIN ( SELECT 1 LIMIT 0 ) AS g;


And, that's the problem with optimizing this out entirely. You can have an SRF in a select-list inside of an EXIST statement that returns 0-rows, and forces the EXISTS to evaluate to false.

Code Snippets

SELECT *
FROM ( VALUES (1),(2),(3) ) AS t(x)
HAVING count(*) > 1
SELECT * FROM generate_series(1,10);
SELECT generate_series(1,10);
SELECT generate_series(1,10), generate_series(1,2);
SELECT * FROM ( VALUES (1) ) AS t(x)
CROSS JOIN ( SELECT 1 LIMIT 0 ) AS g;

Context

StackExchange Database Administrators Q#168047, answer score: 11

Revisions (0)

No revisions yet.