HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Reducing Key Lookups

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
lookupsreducingkey

Problem

I am using SQL server, and I have been looking closely at the concept of key lookups,

http://blog.sqlauthority.com/2009/10/07/sql-server-query-optimization-remove-bookmark-lookup-remove-rid-lookup-remove-key-lookup/

So if you have a key lookup you can create an index with the 'include' columns to cover the non index columns you have in the select statement.

For instance,

SELECT ID, FirstName FROM OneIndex WHERE City = 'Las Vegas'
GO


This index will include a key lookup,

CREATE NONCLUSTERED INDEX [IX_OneIndex_City] ON [dbo].[OneIndex]
(
[City] ASC
) ON [PRIMARY]
GO


But this one will remove the key lookup,

CREATE NONCLUSTERED INDEX [IX_OneIndex_Include] ON [dbo].[OneIndex]
(
City
) INCLUDE (FirstName,ID) ON [PRIMARY]
GO


I mean how much of an impact will this have on performance? The key lookup has an operator cost of 0.295969 (99%), but what does that really mean?

How do you know that you need the second index there, and at what point does it become the case that you are trying to add too many indexes and it is not worth it?

It seems to me that some queries can include index scans, key lookups, and still seem to perform very fast.

Solution

Background

In the worst case, a query containing a lookup has to go to physical storage for rows that require column data not covered by the nonclustered index. In the very worst of worst cases, each lookup will require a separate I/O, and execution will have to wait for that single row's worth of data to come back before proceeding. This scenario usually has severe performance implications if the lookup has to process a significant number of rows.

That is why lookups get such a bad press. On the other hand, consider that the ability to do lookups was introduced in SQL Server 2000. In SQL Server 7.0 the query processor could only use a nonclustered index if it contained all the information necessary to satisfy the query; in all other cases, it had to access data via a clustered index (if present, or a heap scan otherwise). If lookups were always so very bad, SQL Server would surely never have introduced them.

In SQL Server 2000+ then, where we have a nonclustered index that provides useful ordering and/or (most of) the columns required by a query, and where the number of lookups is likely to be relatively small, using the nonclustered index and performing a limited number of lookups on the base table is likely to be the cheapest available access method (though a fully-covering nonclustered index might be cheaper still, of course).

In many cases, it is just not practical to create as many nonclustered indexes as would be needed to avoid scanning the base table for all common queries. One reason might be that INSERT/UPDATE/DELETE/MERGE performance is more important that querying speed (remember that data modification operations also have to maintain all affected nonclustered indexes). Another reason might be space; each nonclustered index represents a copy of a subset of the columns of the base table (or expressions thereon) just sorted differently. More copies of the data means more storage space, and more things competing for space in SQL Server's in-memory data cache.

Other times, we can create just a few extra indexes (perhaps filtered in SQL Server 2008+) with just enough INCLUDE columns to satisfy the vast majority of performance-critical queries, without compromising data modification performance too much, and without using too much extra disk space. Balancing the competing considerations is what makes index tuning more art than science.
Cost

You ask what the 99% cost for the lookup operator really means in the query plan. The costing component of the query optimizer produces an estimated cost for that operation that is 99% of the total estimated for the query. The number itself (0.29) does not mean very much at all; for all practical purposes, you should consider it as a unit-less number used internally by the optimizer when comparing alternative strategies for that specific query.

The estimated cost takes no account of your hardware, configuration, application needs, or very much of anything else. The cost model used by the optimizer includes a significant number of heuristics and simplifying assumptions that happen to produce reasonable plans most of the time, for most queries, on most hardware. That's not to say that there is no correlation between high-cost operators in plans and performance; rather the link is often much weaker than commonly expected. By all means check the reasons for high-estimated-cost plan operators first, but don't treat the information as anything other than a very possibly flawed estimate.
Impact

I also want to mention a couple of factors that can ameliorate the impact of lookups. First, I mentioned right at the start that the worst case involves row-by-row physical I/O. This will obviously be avoided if the data pages (clustered index or heap) needed to satisfy the lookups are already in memory (data cache). Where this is the case, the execution time difference between a plan with a lookup versus a covering index may well be immeasurable. Even where physical I/O is required, if the number of reads is small, you still may not care. (How likely data pages for a table are to be in the data cache depends on many factors, and will be specific to your hardware and circumstances).

Where more that a little physical I/O is needed, the impact of the lookups may still be reduced by optimizations present in the query plan. If SQL Server expects the number of lookups to be significant, it may choose to explicitly sort the rows entering the nested loops join driving the lookup in order of the non-clustered keys. This reordering promotes sequential reading of the nonclustered index, which may be very much faster than random I/O on your hardware.

With or without an explicit sort, the nested loops join driving the lookup may have the WithOrderedPrefetch or WithUnorderedPrefetch attributes present. In either case, the query execution engine 'looks ahead' in the index key stream driving the lookups and issues read-ahead reads. The idea is to issue asy

Context

StackExchange Database Administrators Q#9124, answer score: 13

Revisions (0)

No revisions yet.