patternMinor
"SELECT *" why is it an antipattern
Viewed 0 times
antipatternselectwhy
Problem
On multiple questions here and on stack overflow i saw people saying in the comments and answers that
So here is my question, why people say that
select * from table is almost always an antipattern without any explaination why. Althrough I can sort of deduce why it is an antipattern. I may be looking over a detail that someone else with better understanding of the issue noticed.So here is my question, why people say that
select * is an antipattern.Solution
The two reasons that I find the most compelling not to use
Memory Grants
When queries need to Sort, Hash, or go Parallel, they ask for memory for those operations. The size of the memory grant is based on the size of the data, both row and column wise.
String data especially has an impact on this, since the optimizer guesses half of the defined length as the 'fullness' of the column. So for a VARCHAR 100, it's 50 bytes * the number of rows.
Using Stack Overflow as an example, if I run these queries against the Users table:
DisplayName is NVARCHAR 40, and Location is NVARCHAR 100.
Without an index on Reputation, SQL Server needs to sort the data on its own.
But the memory it nearly doubles.
DisplayName:
DisplayName, Location:
This gets much worse with
It does this to cope with the larger amount of data it needs to pass through the Sort operator, including the AboutMe column, which has a MAX length.
Index Usage
If I have this index on the Users table:
And I have this query, with a WHERE clause that matches the index, but doesn't cover/include all the columns the query is selecting...
The optimizer may choose not to use the narrow index with a key lookup, in favor of just scanning the clustered index.
You would either have to create a very wide index, or experiment with rewrites to get the narrow index chosen, even though using the narrow index results in a much faster query.
CX:
NC:
SELECT * in SQL Server are- Memory Grants
- Index usage
Memory Grants
When queries need to Sort, Hash, or go Parallel, they ask for memory for those operations. The size of the memory grant is based on the size of the data, both row and column wise.
String data especially has an impact on this, since the optimizer guesses half of the defined length as the 'fullness' of the column. So for a VARCHAR 100, it's 50 bytes * the number of rows.
Using Stack Overflow as an example, if I run these queries against the Users table:
SELECT TOP 1000 DisplayName
FROM dbo.Users AS u
ORDER BY u.Reputation;
SELECT TOP 1000 DisplayName, u.Location
FROM dbo.Users AS u
ORDER BY u.Reputation;DisplayName is NVARCHAR 40, and Location is NVARCHAR 100.
Without an index on Reputation, SQL Server needs to sort the data on its own.
But the memory it nearly doubles.
DisplayName:
DisplayName, Location:
This gets much worse with
SELECT *, asking for 8.2 GB of memory:It does this to cope with the larger amount of data it needs to pass through the Sort operator, including the AboutMe column, which has a MAX length.
Index Usage
If I have this index on the Users table:
CREATE NONCLUSTERED INDEX ix_Users
ON dbo.Users
(
CreationDate ASC,
Reputation ASC,
Id ASC
);And I have this query, with a WHERE clause that matches the index, but doesn't cover/include all the columns the query is selecting...
SELECT u.*,
p.Id AS [PostId]
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON p.OwnerUserId = u.Id
WHERE u.CreationDate > '20171001'
AND u.Reputation > 100
AND p.PostTypeId = 1
ORDER BY u.IdThe optimizer may choose not to use the narrow index with a key lookup, in favor of just scanning the clustered index.
You would either have to create a very wide index, or experiment with rewrites to get the narrow index chosen, even though using the narrow index results in a much faster query.
CX:
SQL Server Execution Times:
CPU time = 6374 ms, elapsed time = 4165 ms.NC:
SQL Server Execution Times:
CPU time = 1623 ms, elapsed time = 875 ms.Code Snippets
SELECT TOP 1000 DisplayName
FROM dbo.Users AS u
ORDER BY u.Reputation;
SELECT TOP 1000 DisplayName, u.Location
FROM dbo.Users AS u
ORDER BY u.Reputation;CREATE NONCLUSTERED INDEX ix_Users
ON dbo.Users
(
CreationDate ASC,
Reputation ASC,
Id ASC
);SELECT u.*,
p.Id AS [PostId]
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON p.OwnerUserId = u.Id
WHERE u.CreationDate > '20171001'
AND u.Reputation > 100
AND p.PostTypeId = 1
ORDER BY u.IdSQL Server Execution Times:
CPU time = 6374 ms, elapsed time = 4165 ms.SQL Server Execution Times:
CPU time = 1623 ms, elapsed time = 875 ms.Context
StackExchange Database Administrators Q#220216, answer score: 8
Revisions (0)
No revisions yet.