HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

How Selective Should Index Be

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
indexshouldselectivehow

Problem

Is there a general Selectivity rule of when to apply Nonclustered index?

We know not to create an index on a bit column, 50/50. "Rows with 50/50 distribution, it might buy you very little performance gain " Index Bit Field in SQL Server

So how selective should a query be in SQL Server, before an index should be applied? Is there a general rule in SQL Server guidelines? At 25% average selectivity distribution in a column? 10% Selectivity?

This article is stating around 31%? How Selective Should Index Be?

Solution

Considering column selectivity only when deciding which columns to index ignores quite a bit of what indexes can do, and what they're generally good for.

For instance, you may have an identity or guid column that's incredibly selective -- unique, even -- but never gets used. In that case, who cares? Why index columns that queries don't touch?

Much less selective indexes, even BIT columns, can make useful, or useful parts of indexes. In some scenarios, very un-selective columns on large tables can benefit quite a bit from indexing when they need to be sorted on, or grouped by.

Joins

Take this query:

SELECT COUNT(*) AS records
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON u.Id = p.OwnerUserId;


Without a helpful index on OwnerUserId, this is our plan with a Hash Join -- which spills -- but that's secondary to the point.

With a helpful index -- CREATE INDEX ix_yourmom ON dbo.Posts (OwnerUserId); -- our plan changes.

Aggregates

Likewise, grouping operations can benefit from indexing.

SELECT   p.OwnerUserId, COUNT(*) AS records
FROM     dbo.Posts AS p
GROUP BY p.OwnerUserId;


Without an index:

With an index:

Sorts

Sorting data can be another sticking point in queries that indexes can help.

Without an index:

With our index:

Blocking

Indexes can also help avoid blocking pile-ups.

If we try to run this update:

UPDATE p
SET p.Score += 100
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656;


And concurrently run this select:

SELECT *
FROM   dbo.Posts AS p
WHERE  p.OwnerUserId = 8;


They'll end up blocking:

With our index in place, the select finishes instantly without being blocked. SQL Server has a way to access the data it needs efficiently.

In case you're wondering (using the equation Kumar provided) the OwnerUserId column's selectivity is 0.0701539878296839478

Wrap it up

Don't just blindly index columns based on how selective they are. Design indexes that help your workload run efficiently. Using more selective columns as leading key columns is generally a good idea when you're searching for equality predicates, but can be less helpful when searching on ranges.

Code Snippets

SELECT COUNT(*) AS records
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON u.Id = p.OwnerUserId;
SELECT   p.OwnerUserId, COUNT(*) AS records
FROM     dbo.Posts AS p
GROUP BY p.OwnerUserId;
UPDATE p
SET p.Score += 100
FROM dbo.Posts AS p
WHERE p.OwnerUserId = 22656;
SELECT *
FROM   dbo.Posts AS p
WHERE  p.OwnerUserId = 8;

Context

StackExchange Database Administrators Q#208852, answer score: 10

Revisions (0)

No revisions yet.