gotchaMajor
Why does ANSI SQL define SUM(no rows) as NULL?
Viewed 0 times
rowswhysqlnullansidefinedoessum
Problem
The ANSI SQL standard defines (chapter 6.5, set function specification) the following behaviour for aggregate functions on empty result sets:
Returning NULL for AVG, MIN and MAX makes perfect sense, since the average, minimum and maximum of an empty set is undefined.
The last one, however, bothers me: Mathematically, the SUM of an empty set is well-defined:
Defining
Is there some obvious advantage of the choice that was made (SUM being NULL) that I have missed?
COUNT(...) = 0
AVG(...) = NULL
MIN(...) = NULL
MAX(...) = NULL
SUM(...) = NULLReturning NULL for AVG, MIN and MAX makes perfect sense, since the average, minimum and maximum of an empty set is undefined.
The last one, however, bothers me: Mathematically, the SUM of an empty set is well-defined:
0. Using 0, the neutral element of addition, as the base case makes everything consistent:SUM({}) = 0 = 0
SUM({5}) = 5 = 0 + 5
SUM({5, 3}) = 8 = 0 + 5 + 3
SUM({5, NULL}) = NULL = 0 + 5 + NULLDefining
SUM({}) as null basically makes "no rows" a special case that does not fit in with the others:SUM({}) = NULL = NULL
SUM({5}) = 5 != NULL + 5 (= NULL)
SUM({5, 3}) = 8 != NULL + 5 + 3 (= NULL)Is there some obvious advantage of the choice that was made (SUM being NULL) that I have missed?
Solution
I'm afraid that the reason is simply that the rules were set in an adhoc fashion (like quite many other "features" of the ISO SQL standard) at a time when SQL aggregations and their connection with mathematics were less understood than they are now (*).
It's just one of the extremely many inconsistencies in the SQL language. They make the language harder to teach, harder to learn, harder to understand, harder to use, harder to whatever you want, but that's just the way things are. The rules cannot be changed "cold" and "just like that", for obvious reasons of backward compatibility (If the ISO committee publishes a final version of the standard, and vendors then set out to implement that standard, then those vendors will not appreciate it very much if in a subsequent version, the rules are changed such that existing (compliant) implementations of the former version of the standard "automatically fail to comply" the new version ...)
(*) It is now better understood that aggregations over an empty set behave more consistently if they systematically return the identity value (= what you call the 'neutral element') of the underlying binary operator at hand. That underlying binary operator for COUNT and SUM is addition, and its identity value is zero. For MIN and MAX, that identity value is the highest and lowest value of the type at hand, respectively, if the concerned types are finite. Cases like averaging, harmonic means, medians, etc. are extremely intricate and exotic in this respect, though.
It's just one of the extremely many inconsistencies in the SQL language. They make the language harder to teach, harder to learn, harder to understand, harder to use, harder to whatever you want, but that's just the way things are. The rules cannot be changed "cold" and "just like that", for obvious reasons of backward compatibility (If the ISO committee publishes a final version of the standard, and vendors then set out to implement that standard, then those vendors will not appreciate it very much if in a subsequent version, the rules are changed such that existing (compliant) implementations of the former version of the standard "automatically fail to comply" the new version ...)
(*) It is now better understood that aggregations over an empty set behave more consistently if they systematically return the identity value (= what you call the 'neutral element') of the underlying binary operator at hand. That underlying binary operator for COUNT and SUM is addition, and its identity value is zero. For MIN and MAX, that identity value is the highest and lowest value of the type at hand, respectively, if the concerned types are finite. Cases like averaging, harmonic means, medians, etc. are extremely intricate and exotic in this respect, though.
Context
StackExchange Database Administrators Q#25435, answer score: 20
Revisions (0)
No revisions yet.