patternsqlMinor
Indexing Referenced Table Columns
Viewed 0 times
indexingcolumnsreferencedtable
Problem
I would like to do a search across columns of joined tables. What would the best index configuration be for the query below?
At the moment, I have clustered indexes on the Primary Keys and non-clustered indexes (which don't include any columns) for columns
Update
Table A is likely to run into the low millions while Table B will only have low tens of thousands of entries.
In the real world Table A will be a list of transactions, where the value is some sort of a transaction reference. Table B will be a list of users where the value could be anything such as a name or any other descriptive information.
At the moment we use an ORM which will essentially do a
Table A
------------
Id
ValueA
-- Table A has many more columns
Table B
------------
Id
AId
ValueB
-- Table B has many more columnsAt the moment, I have clustered indexes on the Primary Keys and non-clustered indexes (which don't include any columns) for columns
AId, ValueA, and ValueB.SELECT *
FROM TableA a
INNER JOIN TableB b ON a.Id = b.AId
WHERE
a.ValueA LIKE 'SearchTerm%' OR
b.ValueB LIKE 'SearchTerm%'Update
Table A is likely to run into the low millions while Table B will only have low tens of thousands of entries.
In the real world Table A will be a list of transactions, where the value is some sort of a transaction reference. Table B will be a list of users where the value could be anything such as a name or any other descriptive information.
At the moment we use an ORM which will essentially do a
SELECT *. Naturally this could be changed, but there would be a fairly large impact on the application side.Solution
Your query is a tricky one for the optimizer, for two reasons.
Together, these limitations mean there is very little indexing can do to assist the query as written (ORM generated), leaving aside the possibility of a (nonclustered) column store index.
If performance is inadequate, you may have to bite the bullet and write a custom query. In that case, modifying the existing index on
The target plan shape is:
How much better this is in practice depends on several factors, including how selective the
Guessed schema:
- The optimizer doesn't currently have logic to transform a disjunction (OR) across tables into a union (related Q & A).
- The optimizer doesn't reason about deferring lookups (dealing with keys only until the last moment). This relates to the
SELECT *component of the given query.
Together, these limitations mean there is very little indexing can do to assist the query as written (ORM generated), leaving aside the possibility of a (nonclustered) column store index.
If performance is inadequate, you may have to bite the bullet and write a custom query. In that case, modifying the existing index on
ValueB to include AId would be useful, coupled with a rewrite like the following:SELECT TA.*, TB.*
FROM
(
-- Disjunction as union
SELECT A.Id, b.Id
FROM TableA a
INNER JOIN TableB b
ON b.AId = a.Id
WHERE
a.ValueA LIKE 'SearchTerm%'
UNION
SELECT A.Id, b.Id
FROM TableB b
INNER JOIN TableA a
ON a.Id = b.AId
WHERE
b.ValueB LIKE 'SearchTerm%'
) AS M (A_Id, B_Id)
-- Lookups
JOIN dbo.TableA AS TA ON TA.Id = M.A_Id
JOIN dbo.TableB AS TB ON TB.Id = M.B_Id;The target plan shape is:
How much better this is in practice depends on several factors, including how selective the
LIKE predicates are (and that they are always prefix searches).Guessed schema:
DROP TABLE IF EXISTS dbo.TableB, dbo.TableA;
CREATE TABLE dbo.TableA
(
Id integer NOT NULL PRIMARY KEY,
ValueA varchar(256) NULL INDEX iValueA,
Padding char(4000) NOT NULL DEFAULT '', -- other columns
/*INDEX i (ValueA) INCLUDE (Id)*/
);
CREATE TABLE dbo.TableB
(
Id integer NOT NULL PRIMARY KEY,
AId integer NOT NULL REFERENCES dbo.TableA (Id) INDEX iAId,
ValueB varchar(256) NULL /*INDEX iValueB*/,
Padding char(2000) NOT NULL DEFAULT '', -- other columns
INDEX i (ValueB) INCLUDE (AId)
);
UPDATE STATISTICS dbo.TableA WITH ROWCOUNT = 3000000, PAGECOUNT = 3000000;
UPDATE STATISTICS dbo.TableB WITH ROWCOUNT = 30000, PAGECOUNT = 30000;Code Snippets
SELECT TA.*, TB.*
FROM
(
-- Disjunction as union
SELECT A.Id, b.Id
FROM TableA a
INNER JOIN TableB b
ON b.AId = a.Id
WHERE
a.ValueA LIKE 'SearchTerm%'
UNION
SELECT A.Id, b.Id
FROM TableB b
INNER JOIN TableA a
ON a.Id = b.AId
WHERE
b.ValueB LIKE 'SearchTerm%'
) AS M (A_Id, B_Id)
-- Lookups
JOIN dbo.TableA AS TA ON TA.Id = M.A_Id
JOIN dbo.TableB AS TB ON TB.Id = M.B_Id;DROP TABLE IF EXISTS dbo.TableB, dbo.TableA;
CREATE TABLE dbo.TableA
(
Id integer NOT NULL PRIMARY KEY,
ValueA varchar(256) NULL INDEX iValueA,
Padding char(4000) NOT NULL DEFAULT '', -- other columns
/*INDEX i (ValueA) INCLUDE (Id)*/
);
CREATE TABLE dbo.TableB
(
Id integer NOT NULL PRIMARY KEY,
AId integer NOT NULL REFERENCES dbo.TableA (Id) INDEX iAId,
ValueB varchar(256) NULL /*INDEX iValueB*/,
Padding char(2000) NOT NULL DEFAULT '', -- other columns
INDEX i (ValueB) INCLUDE (AId)
);
UPDATE STATISTICS dbo.TableA WITH ROWCOUNT = 3000000, PAGECOUNT = 3000000;
UPDATE STATISTICS dbo.TableB WITH ROWCOUNT = 30000, PAGECOUNT = 30000;Context
StackExchange Database Administrators Q#278633, answer score: 7
Revisions (0)
No revisions yet.