HiveBrain v1.2.0
Get Started
← Back to all entries
principlesqlModerate

Select * Order by vs Select column Order by performance

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
selectorderperformancecolumn

Problem

I have the following tables

Create Table dbo.product
(
productId varchar(100) primary key,
productStatus varchar(100),
productRegion varchar(100),
productCreated  datetime,
productUpdated datetime
)
Go

declare @id int = 1
while @id <= 100
Begin

Insert Into dbo.product values ('product'+cast(@id as varchar(10)),'Active','North',getdate(),getdate())
    set @id = @id + 1
End

set @id = 1
while @id <= 100
Begin

Insert Into dbo.product values ('inprod'+ cast(@id as varchar(10)),'InActive','South',getdate(),getdate())
    set @id = @id + 1
End
Go

Create Table dbo.productRef
(
productRef int Identity(1,1) primary key,
productId varchar(100),
productName varchar(100)
)
Go

Insert Into dbo.productRef (productId)
Select top 20 productId
from dbo. product 
Go

declare @id int = 1
while @id <= 20
Begin
    update dbo.productRef
    set productName = 'productName'+convert(varchar(10),@id)
    where productRef = @id

    set @id = @id + 1
End
Go

Create nonclustered index idx_productRef1 On dbo.productRef(productId)


Which one of these selects will perform better?

select p.*
from dbo.product p
join dbo.productref pr
on p.productid = pr.productid
order by p.productUpdated

select p.productId
from dbo.product p
join dbo.productref pr
on p.productid = pr.productid
order by p.productUpdated


Here's the query plans for both the select statements:

select * plan:

https://www.brentozar.com/pastetheplan/?id=SyY21P0Jo

select productId plan:

https://www.brentozar.com/pastetheplan/?id=BkK-gwA1i

From what I see both the plans are same. Is there a performance difference when doing select * order by vs select column order by?

Solution

From what I see both the plans are same.

Indeed in this case you are getting the same shaped query plans. This won't always be the case.

Is there a performance difference when doing select * order by vs select column order by?

Yes, even in your case with the same shaped query plans, there's a performance difference (though perhaps negligible in this specific case). Forget about the ORDER BY clause for a second, and let's just talk about SELECT * vs SELECT OneSingleColumn, since the following is true regardless:

-
Selecting more columns than needed (in this case 7 extra columns) via SELECT * requires more data to be located, loaded off disk into Memory, processed, and transferred across the network to the consumer. In your schema, those 7 extra columns can contain up to about an extra half a KB of data per row. In a decent sized table, say 100 million rows big, that would be an extra 50 GB of data undergoing all the aforementioned steps.

-
It can / will result in more resources being allocated to serve the query to support all the aforementioned steps for the additional columns. This reduces resources that could've been available to other queries running concurrently on the server.

-
It can result in different shaped plans in a number of different ways. One of the most common ways being getting scan operations for the SELECT * version of the query instead of efficient seeks for the SELECT OneSingleColumn version, when the query surpasses the tipping point. Or another way is when one query plan uses a completely different index than what would've been the most efficient one to use in the case where you only need the OneSingleColumn.

-
One way the shape of the plan can vary (as pointed out by maple_shaft) that affects performance is that even in the best case with getting an index seek on the most appropriate index to serve the query, it likely won't contain all fields, i.e. *. So you'll end up with an additional key lookups operator causing extra work to lookup the remaining fields from the clustered index. Or in a less than optimal outcome, you'll end up with the clustered index being scanned to satisfy the query, instead of seeking on a more optimal index that would've contained only the columns you're interested in SELECTing (assuming your indexes are catered to your queries appropriately). When you only SELECT the columns you need, and there's an index that covers those columns, then the most optimal index can be used to serve you query without additional operations / work needing to occur, and there'll be a good chance you'll get a seek operation against that index too, being quite fast.

SELECT * is also an anti-pattern for a multitude of reasons, not just relating to performance, but also for readability and maintainability:

  • Why is SELECT * considered harmful?



  • "SELECT *" why is it an antipattern



  • Why is "Select * from table" considered bad practice

Context

StackExchange Database Administrators Q#316360, answer score: 11

Revisions (0)

No revisions yet.