debugsqlModerate
ERROR: column "field" must appear in the GROUP BY clause or be used in an aggregate function
Viewed 0 times
fielderrorthemustcolumngroupusedfunctionclauseappear
Problem
I've hit an SQL problem that I'm a bit mystified by. I have two different
questions regarding this problem:
The following query:
...produces the following error:
That field is a
or (more important to my understanding) why one is necessary.
As I said, I'm not sure I understand why this occurs. I'm assuming that I
don't understand
This isn't my query, I'm translating a system prototyped in MySQL to
PostgreSQL. This query does work in MySql. Does that constitute a
bug in MySql, or a shortcoming of PostgreSQL, or just a difference of
interpretation?
And when I added this column in the
Working query
```
SELECT subs.date_added, users.userid, users.username, users.email,
users.avatar, users.fbuid, users.level, users.avatar_url,
users.sex, users.dob, users.profile_hits, users.total_videos,
users.subscribers, users.doj, users.extras, users.first_name,
users.last_name, users.ban_status, users.usr_status,
users.last_logged, users.country, users.user_filter_level,
users.signup_ip, subs.us
questions regarding this problem:
- Why?
- How do I work around it?
The following query:
SELECT subs.date_added,subs.subscribed_to,users.userid, users.username, users.email, users.avatar, users.fbuid, users.level, users.avatar_url, users.sex, users.dob, users.profile_hits, users.total_videos, users.subscribers, users.doj, users.extras, users.first_name, users.last_name, users.ban_status, users.usr_status, users.last_logged, users.country, users.user_filter_level, users.signup_ip, subs.userid as subscriber
FROM cb_subscriptions as subs
LEFT JOIN cb_users AS users
ON subs.userid=users.userid
WHERE subs.subscribed_to ='2960'
GROUP BY subs.subscribed_to ,subs.userid
ORDER BY subs.date_added ASC
LIMIT 8;...produces the following error:
ERROR: column "users.userid" must appear in the GROUP BY clause or be used in an aggregate functionThat field is a
CHAR, so I'm not sure what kind of aggregate to use,or (more important to my understanding) why one is necessary.
As I said, I'm not sure I understand why this occurs. I'm assuming that I
don't understand
group by as well as I thought I did ;)This isn't my query, I'm translating a system prototyped in MySQL to
PostgreSQL. This query does work in MySql. Does that constitute a
bug in MySql, or a shortcoming of PostgreSQL, or just a difference of
interpretation?
And when I added this column in the
group by clause it's working, but I don't understand why?Working query
```
SELECT subs.date_added, users.userid, users.username, users.email,
users.avatar, users.fbuid, users.level, users.avatar_url,
users.sex, users.dob, users.profile_hits, users.total_videos,
users.subscribers, users.doj, users.extras, users.first_name,
users.last_name, users.ban_status, users.usr_status,
users.last_logged, users.country, users.user_filter_level,
users.signup_ip, subs.us
Solution
This isn't my query, I'm translating a system prototyped in MySQL to PostgreSQL. This query does work in MySql.
Does that constitute a bug in MySql, or a shortcoming of PostgreSQL, or just a difference of interpretation?
More like all of the above ;)
Does that constitute a bug in MySql?
It's a "feature" that many consider a bug. Mainly because it can be misused, much like you (or whoever wrote the first query) misused it.
The correct use of the feature is to have a "shorter"
So, if there is a unique constraint on
Your DDL definitions do not have such
It's also possible that your data do not conform to such a constraint, in which case the query has been giving you wrong results - and you just haven't noticed! Check your data and the output of the query (in MySQL). Do you have many subscriptions for the same (magazine or whatever) and the same user? In those case, which
The rest of the columns (from
Please also note that MySQL does not really check all these things (at least until version 5.6). It's the developers' responsibility that the feature is correctly used. And since it is so complicated to get it right, it's no wonder that it is widely misused, giving incorrect or inconsistent or even plainly inexistent results in many cases. That's why many people considered it a bug. (The feature has been widely improved in version 5.7 of MySQL.)
Does that constitute a shortcoming of PostgreSQL.
Not really. Postgres has implemented the same feature, but slightly differently - more restrictedly than MySQL. Postgres only uses it when the
Does that constitute a difference of interpretation?
As explained above, yes. The two DBMS have implemented the feature differently, so the query is interpreted differently.
MySQL (up to 5.6) says "I don't care, I'll give you some results. You are responsible if they are not correct)"
Postgres says "I'll only give you results if I'm 100% sure that they are consistent and correct."
So, what to do so the query works correctly?
We have two issues:
-
For the
-
For the
The query becomes - and works correctly in all MySQL and Postgres versions:
Does that constitute a bug in MySql, or a shortcoming of PostgreSQL, or just a difference of interpretation?
More like all of the above ;)
Does that constitute a bug in MySql?
It's a "feature" that many consider a bug. Mainly because it can be misused, much like you (or whoever wrote the first query) misused it.
The correct use of the feature is to have a "shorter"
GROUP BY clause. Any column that is functionally dependent on the GROUP BY columns can be omitted from the GROUP BY list and freely used in SELECT and ORDER BY without aggregates. So, if there is a unique constraint on
subs (subscribed_to, userid), then your 1st query is valid SQL - according to some (2011 or 2013) added feature for GROUP BY). Your DDL definitions do not have such
UNIQUE constraint, so it's misuse from your developers. The reason that it produces correct results (if it does), is likely that the data conform to such a UNIQUE constraint, although it is not enforced. It seems plausible that a user can have only one subscription to a magazine (or whatever that subscribed_to refers).It's also possible that your data do not conform to such a constraint, in which case the query has been giving you wrong results - and you just haven't noticed! Check your data and the output of the query (in MySQL). Do you have many subscriptions for the same (magazine or whatever) and the same user? In those case, which
date_added is returned? And ask the business users (whoever knows the requirements of the query), which date_added should be returned? The last one (of the many from a user and magazine)? The first one? A random one?The rest of the columns (from
users table) in the SELECT list are actually correctly shown, as the join between the two tables is on userid which appears on the GROUP BY list and has a UNIQUE constraint on users.Please also note that MySQL does not really check all these things (at least until version 5.6). It's the developers' responsibility that the feature is correctly used. And since it is so complicated to get it right, it's no wonder that it is widely misused, giving incorrect or inconsistent or even plainly inexistent results in many cases. That's why many people considered it a bug. (The feature has been widely improved in version 5.7 of MySQL.)
Does that constitute a shortcoming of PostgreSQL.
Not really. Postgres has implemented the same feature, but slightly differently - more restrictedly than MySQL. Postgres only uses it when the
GROUP BY columns have the PRIMARY KEY of one (or more) tables. In that case, you can use the freely rest of the columns (of those tables) in the SELECT and ORDER BY lists without aggregates. It doesn't work with UNIQUE constraints and it doesn't work with (provable) constraints from foreign keys.Does that constitute a difference of interpretation?
As explained above, yes. The two DBMS have implemented the feature differently, so the query is interpreted differently.
MySQL (up to 5.6) says "I don't care, I'll give you some results. You are responsible if they are not correct)"
Postgres says "I'll only give you results if I'm 100% sure that they are consistent and correct."
So, what to do so the query works correctly?
We have two issues:
-
For the
users table, it's simple: add users.userid in the GROUP BY list.-
For the
subs table and the date_added column, it's important to know what results your users want/expect. If they want the earliest date (per user and subscribed_to), then use MIN(date_added). If they want the latest, use MAX(date_added).The query becomes - and works correctly in all MySQL and Postgres versions:
SELECT MIN(subs.date_added) AS min_date_added,
users.userid, users.username, --- any users column you need
---
subs.userid AS subscriber
FROM cb_subscriptions AS subs
LEFT JOIN cb_users AS users
ON subs.userid = users.userid
WHERE subs.subscribed_to = '2960'
GROUP BY users.userid,
-- subs.subscribed_to, -- not needed really, it's fixed in WHERE
subs.userid
ORDER BY min_date_added ASC
LIMIT 8 ;Code Snippets
SELECT MIN(subs.date_added) AS min_date_added,
users.userid, users.username, --- any users column you need
---
subs.userid AS subscriber
FROM cb_subscriptions AS subs
LEFT JOIN cb_users AS users
ON subs.userid = users.userid
WHERE subs.subscribed_to = '2960'
GROUP BY users.userid,
-- subs.subscribed_to, -- not needed really, it's fixed in WHERE
subs.userid
ORDER BY min_date_added ASC
LIMIT 8 ;Context
StackExchange Database Administrators Q#209371, answer score: 15
Revisions (0)
No revisions yet.