patternsqlModerate
Why are queries parsed in such a way that disallows the use of column aliases in most clauses?
Viewed 0 times
suchwhythecolumnaredisallowswayaliasesthatqueries
Problem
While trying to write a query, I found out (the hard way) that SQL Server parses WHEREs in a query long before parsing the SELECTs when executing a query.
The MSDN docs say that the general logical parsing order is such that SELECT is parsed nearly last (thus resulting in "no such object [alias]" errors when trying to use a column alias in other clauses). There was even a suggestion to allow for aliases to be used anywhere, which was shot down by the Microsoft team, citing ANSI standards compliance issues (which suggests that this behavior is part of the ANSI standard).
As a programmer (not a DBA), I found this behavior somewhat confusing, since it seems to me that it largely defeats the purpose of having column aliases (or, at the very least, column aliases could be made significantly more powerful if they were parsed earlier in the query execution), since the only place you can actually use the aliases is in ORDER BY. As a programmer, it seems like it's missing a huge opportunity for making queries more powerful, convenient, and DRY.
It looks like it's such a glaring issue that it stands to reason, then, that there are other reasons for deciding that column aliases shouldn't be allowed in anything other than SELECT and ORDER BY, but what are those reasons?
The MSDN docs say that the general logical parsing order is such that SELECT is parsed nearly last (thus resulting in "no such object [alias]" errors when trying to use a column alias in other clauses). There was even a suggestion to allow for aliases to be used anywhere, which was shot down by the Microsoft team, citing ANSI standards compliance issues (which suggests that this behavior is part of the ANSI standard).
As a programmer (not a DBA), I found this behavior somewhat confusing, since it seems to me that it largely defeats the purpose of having column aliases (or, at the very least, column aliases could be made significantly more powerful if they were parsed earlier in the query execution), since the only place you can actually use the aliases is in ORDER BY. As a programmer, it seems like it's missing a huge opportunity for making queries more powerful, convenient, and DRY.
It looks like it's such a glaring issue that it stands to reason, then, that there are other reasons for deciding that column aliases shouldn't be allowed in anything other than SELECT and ORDER BY, but what are those reasons?
Solution
Summary
There's no logical reason it couldn't be done, but the benefit is small and there are some pitfalls that may not be immediately apparent.
Research Results
I did some research and found some good information. The following is a direct quote from a reliable primary source (that wishes to remain anonymous) at 2012-08-09 17:49 GMT:
When SQL was first invented, it had no aliases in the SELECT clause.
This was a serious shortcoming that was corrected when the language
was standardized by ANSI in about 1986.
The language was intended to be "non-procedural"--in other words, to
describe the data that you want without specifying how to find it. So,
as far as I know, there's no reason why an SQL implementation couldn't
parse the whole query before processing it, and allow aliases to be
defined anywhere and used everywhere. For example, I don't see any
reason why the following query shouldn't be valid:
Although I think this is a reasonable query, some SQL-based systems
may introduce restrictions on the use of aliases for some
implementation-related reason. I'm not surprised to hear that SQL
Server does this.
I am interested in further research into the SQL-86 standard and why modern DBMSes don't support alias reuse, but haven't had the time to get very far with it yet. For starters, I don't know where to get the documentation or how to find out who exactly made up the committee. Can anyone help out? I also would like to know more about the original Sybase product that SQL Server came from.
From this research and some further thought, I have come to suspect that using aliases in other clauses, while quite possible, simply has never been that high a priority for DBMS manufacturers compared to other language features. Since it is not that much of an obstacle, being easily worked around by the query writer, putting effort into it over other advancements is not optimal. Additionally, it would be proprietary as it is obviously not part of the SQL standard (though I'm waiting to find out more on that for sure) and thus would be a minor improvement, breaking SQL compatibility between DBMSes. By comparison,
Problems With Using Aliases Everywhere
If you allow SELECT items to be put in the WHERE clause, you can not only explode the complexity of the query (and thus the complexity of finding a good execution plan) it is possible to come up with completely illogical stuff. Try:
What if MyTable already has a column Y, which one is the WHERE clause referring to? The solution is to use a CTE or a derived table, which in most cases should cost no extra but achieves the same final end result. CTEs and derived tables at least enforce the resolution of ambiguity by allowing an alias to be used only once.
Also, not using aliases in the FROM clause makes eminent sense. You can't do this:
That's a circular reference (in the sense that T2 is secretly referring to a value from T3, before that table has been presented in the JOIN list), and darn hard to see. How about this one:
How much do you want to bet that the newid() function is going to be put into the execution plan twice, completely unexpectedly making the two columns show different values? What about when the above query is used N levels deep in CTEs or derived tables. I guarantee that the problem is worse than you can imagine. There are already serious inconsistency problems about when things are evaluated only once or at what point in a query plan, and Microsoft has said it will not fix some of them because they are expressing query algebra properly--if one gets unexpected results, break the query up into parts. Allowing chained references, detecting circular references through potentially very long such chains–these are quite tricky problems. Introduce parallelism and you've got a nightmare in the making.
Note: Using the alias in WHERE or GROUP BY isn't going to make a difference to the problems with functions like newid() or rand().
A SQL Server way to create reusable expressions
CROSS APPLY/OUTER APPLY is one way in SQL Server to create expressions that can be used anywhere else in the query (just not earlier in the FROM clause):
```
SELECT
X.CalcID
FROM
Table1 T
INNER JOIN Table3 T3
ON T.ID = T3.ID
CROSS AP
There's no logical reason it couldn't be done, but the benefit is small and there are some pitfalls that may not be immediately apparent.
Research Results
I did some research and found some good information. The following is a direct quote from a reliable primary source (that wishes to remain anonymous) at 2012-08-09 17:49 GMT:
When SQL was first invented, it had no aliases in the SELECT clause.
This was a serious shortcoming that was corrected when the language
was standardized by ANSI in about 1986.
The language was intended to be "non-procedural"--in other words, to
describe the data that you want without specifying how to find it. So,
as far as I know, there's no reason why an SQL implementation couldn't
parse the whole query before processing it, and allow aliases to be
defined anywhere and used everywhere. For example, I don't see any
reason why the following query shouldn't be valid:
select name, salary + bonus as pay
from employee
where pay > 100000Although I think this is a reasonable query, some SQL-based systems
may introduce restrictions on the use of aliases for some
implementation-related reason. I'm not surprised to hear that SQL
Server does this.
I am interested in further research into the SQL-86 standard and why modern DBMSes don't support alias reuse, but haven't had the time to get very far with it yet. For starters, I don't know where to get the documentation or how to find out who exactly made up the committee. Can anyone help out? I also would like to know more about the original Sybase product that SQL Server came from.
From this research and some further thought, I have come to suspect that using aliases in other clauses, while quite possible, simply has never been that high a priority for DBMS manufacturers compared to other language features. Since it is not that much of an obstacle, being easily worked around by the query writer, putting effort into it over other advancements is not optimal. Additionally, it would be proprietary as it is obviously not part of the SQL standard (though I'm waiting to find out more on that for sure) and thus would be a minor improvement, breaking SQL compatibility between DBMSes. By comparison,
CROSS APPLY (which is really nothing more than a derived table allowing outer references) is a huge change, that while proprietary offers incredible expressive power not easily performed in other ways.Problems With Using Aliases Everywhere
If you allow SELECT items to be put in the WHERE clause, you can not only explode the complexity of the query (and thus the complexity of finding a good execution plan) it is possible to come up with completely illogical stuff. Try:
SELECT X + 5 Y FROM MyTable WHERE Y = XWhat if MyTable already has a column Y, which one is the WHERE clause referring to? The solution is to use a CTE or a derived table, which in most cases should cost no extra but achieves the same final end result. CTEs and derived tables at least enforce the resolution of ambiguity by allowing an alias to be used only once.
Also, not using aliases in the FROM clause makes eminent sense. You can't do this:
SELECT
T3.ID + (SELECT Min(Interval) FROM Intervals WHERE IntName = 'T') CalcID
FROM
Table1 T
INNER JOIN Table2 T2
ON T2.ID = CalcID
INNER JOIN Table3 T3
ON T2.ID = T3.IDThat's a circular reference (in the sense that T2 is secretly referring to a value from T3, before that table has been presented in the JOIN list), and darn hard to see. How about this one:
INSERT dbo.FinalTransaction
SELECT
newid() FinalTransactionGUID,
'GUID is: ' + Convert(varchar(50), FinalTransactionGUID) TextGUID,
T.*
FROM
dbo.MyTable THow much do you want to bet that the newid() function is going to be put into the execution plan twice, completely unexpectedly making the two columns show different values? What about when the above query is used N levels deep in CTEs or derived tables. I guarantee that the problem is worse than you can imagine. There are already serious inconsistency problems about when things are evaluated only once or at what point in a query plan, and Microsoft has said it will not fix some of them because they are expressing query algebra properly--if one gets unexpected results, break the query up into parts. Allowing chained references, detecting circular references through potentially very long such chains–these are quite tricky problems. Introduce parallelism and you've got a nightmare in the making.
Note: Using the alias in WHERE or GROUP BY isn't going to make a difference to the problems with functions like newid() or rand().
A SQL Server way to create reusable expressions
CROSS APPLY/OUTER APPLY is one way in SQL Server to create expressions that can be used anywhere else in the query (just not earlier in the FROM clause):
```
SELECT
X.CalcID
FROM
Table1 T
INNER JOIN Table3 T3
ON T.ID = T3.ID
CROSS AP
Code Snippets
select name, salary + bonus as pay
from employee
where pay > 100000SELECT X + 5 Y FROM MyTable WHERE Y = XSELECT
T3.ID + (SELECT Min(Interval) FROM Intervals WHERE IntName = 'T') CalcID
FROM
Table1 T
INNER JOIN Table2 T2
ON T2.ID = CalcID
INNER JOIN Table3 T3
ON T2.ID = T3.IDINSERT dbo.FinalTransaction
SELECT
newid() FinalTransactionGUID,
'GUID is: ' + Convert(varchar(50), FinalTransactionGUID) TextGUID,
T.*
FROM
dbo.MyTable TSELECT
X.CalcID
FROM
Table1 T
INNER JOIN Table3 T3
ON T.ID = T3.ID
CROSS APPLY (
SELECT
T3.ID + (SELECT Min(Interval) FROM Intervals WHERE IntName = 'T') CalcID
) X
INNER JOIN Table2 T2
ON T2.ID = X.CalcIDContext
StackExchange Database Administrators Q#21965, answer score: 19
Revisions (0)
No revisions yet.