debugsqlModerate
Cannot tune database any further; what next?
Viewed 0 times
cannottunewhatanyfurthernextdatabase
Problem
We have a vendor-supplied application. It is under support and we are talking to their developers (we have their largest database size by an order of magnitude), but in the meantime, we have the following query that is run thousands of times a day, with the only parts changing the
There are about 500,000 records in the dJobs table, and similar numbers of records in the joined tables:
```
select * from (
select top 20 * from (
select top 20
( --Start Count
select count(*) from dJobs
left join dClients on cltClientID = jobClientId
left join dJobStatus ON jbsID = jobJobStatus
where ((jobSupervisor= -1.00000000 or jobCoordinator= -1.00000000 ) OR 1=1 )
and
(
(jobjobStatus=0 OR (0=0 AND 0=0))
AND (jbsType=0 OR (0=0 AND 0=0))
AND (jobPriority=0 OR (0=0 AND 0=0))
AND (jobEventID=0 OR (0=0 AND 0=0))
AND (jobSiteCode='' OR (''='' AND ''=''))
AND (jobjobStatus IN (
select jbsid from djobstatusgroupmapping where jsgid = 0
)
OR (0=0 AND 0=0))
AND jobDateCreated BETWEEN '2004-06-10' AND '2014-06-10' AND jobBookedDate BETWEEN '1901-01-01 00:00:00' AND '2024-06-10 23:59:00'
and
(('ABCDEFG12345'<>''
AND
(
(1 = 1 AND
(
jobWorkOrderNo like '%ABCDEFG12345%' OR
jobSiteName like '%ABCDEFG12345%' OR
jobSiteLocationBuilding like '%ABCDEFG12345%' OR
jobSiteAddress like '%ABCDEFG12345%' OR
ABCDEFG12345 in the where clause. Almost everything in the where's are customisable, as it is the query generated by their master job search.There are about 500,000 records in the dJobs table, and similar numbers of records in the joined tables:
```
select * from (
select top 20 * from (
select top 20
( --Start Count
select count(*) from dJobs
left join dClients on cltClientID = jobClientId
left join dJobStatus ON jbsID = jobJobStatus
where ((jobSupervisor= -1.00000000 or jobCoordinator= -1.00000000 ) OR 1=1 )
and
(
(jobjobStatus=0 OR (0=0 AND 0=0))
AND (jbsType=0 OR (0=0 AND 0=0))
AND (jobPriority=0 OR (0=0 AND 0=0))
AND (jobEventID=0 OR (0=0 AND 0=0))
AND (jobSiteCode='' OR (''='' AND ''=''))
AND (jobjobStatus IN (
select jbsid from djobstatusgroupmapping where jsgid = 0
)
OR (0=0 AND 0=0))
AND jobDateCreated BETWEEN '2004-06-10' AND '2014-06-10' AND jobBookedDate BETWEEN '1901-01-01 00:00:00' AND '2024-06-10 23:59:00'
and
(('ABCDEFG12345'<>''
AND
(
(1 = 1 AND
(
jobWorkOrderNo like '%ABCDEFG12345%' OR
jobSiteName like '%ABCDEFG12345%' OR
jobSiteLocationBuilding like '%ABCDEFG12345%' OR
jobSiteAddress like '%ABCDEFG12345%' OR
Solution
Fulltext isn't going to help without refactoring to use the full text functions ( CONTAINS, FREETEXT or their table equivalents ). It also doesn't really work with leading wildcard. Hacks are available, but basically you're going to struggle to write a semantically equivalent query for fulltext. For the future consider redesigning for fulltext which has stemming ( run, runner, running ) and thesaurus ( jogger ) which could serve your searches much better than two wildcards.
SSD is unlikely to help you unless you are memory bound. Your tables (at only 500k records) are probably in-memory most of the time. Can you confirm the size of the dJobs table, and server RAM?
Enterprise Edition could help where the limitation of 64GB RAM / lesser of 4 sockets or 16 cores goes up to 8, but you're going to need a really powerful box to notice a difference. For example, the 4 really means you could have something like 4 quad-core processors totalling 16 cores, with HT enabled, you're already at 32 logical processors. The general recommended server maxdop for this type of OLTP machine would be 8 anyway. I think this unlikely to benefit because your query has more fundamental problems but you never know.
Non-clustered indexes (particularly on dJobs) are unlikely to help because the query has so many columns from this table in the SELECT and many criteria in the WHERE clause. A non-clustered would have to be so wide to cover it would be practically a duplicate of the clustered index, therefore overly expensive to maintain. As the query sorts by jobID DESC, I considered a descending index but haven't trialled this.
Partitioning, (Enterprise only) is really a great feature, but again is unlikely to help you. I did a quick investigation of partitioning on dbo.dJobs.jobJobStatus column, eg I imagine you only have a small percentage of Jobs 'active' at any one time, eg a few hundred, even a few thousand from the 500,000 records. Partition elimination would probably be cancelled out by the OR OR OR approach. Parallel scans of multiple partitions are also an Enterprise feature:
This would probably work:
This probably won't work:
This leads me into the query. The OR OR OR approach basically means 'always get the whole table'. The TOP 20 masks this design problem. The TOP also probably pushed the plan towards Nested Loops which Jon suggested was suspect. What also stands out to me about this nightmareish "scan all columns" constructed query is that you bascially have two copies of the same query (and therefore tables), one to count, one for the resultset. This might be more efficient if the data went into an intermediate table and the count was done from there for example.
Finally, this brings me to the only only thing that would actually help you (without a large-scale refactor of the code): data deletion or archiving. As mentioned, I imagine you only have a small percentage of Jobs 'active' at any one time. Carve off the 'inactive' ones into a different table. Create a view over the top of the two tables for reporting. Set up a nightly job to copy out the old records.
Having only a few thousand active jobs in your main table will most likely transform your query performance.
Some recommended reading:
Erland Sommarskog's article on these "search all columns" queries
Dynamic Search Conditions in T-SQL
http://www.sommarskog.se/dyn-search-2008.html
Querying Multiple Columns (Full-Text Search)
http://technet.microsoft.com/en-us/library/ms142488(v=sql.105).aspx
I hope that helps!
SSD is unlikely to help you unless you are memory bound. Your tables (at only 500k records) are probably in-memory most of the time. Can you confirm the size of the dJobs table, and server RAM?
Enterprise Edition could help where the limitation of 64GB RAM / lesser of 4 sockets or 16 cores goes up to 8, but you're going to need a really powerful box to notice a difference. For example, the 4 really means you could have something like 4 quad-core processors totalling 16 cores, with HT enabled, you're already at 32 logical processors. The general recommended server maxdop for this type of OLTP machine would be 8 anyway. I think this unlikely to benefit because your query has more fundamental problems but you never know.
Non-clustered indexes (particularly on dJobs) are unlikely to help because the query has so many columns from this table in the SELECT and many criteria in the WHERE clause. A non-clustered would have to be so wide to cover it would be practically a duplicate of the clustered index, therefore overly expensive to maintain. As the query sorts by jobID DESC, I considered a descending index but haven't trialled this.
Partitioning, (Enterprise only) is really a great feature, but again is unlikely to help you. I did a quick investigation of partitioning on dbo.dJobs.jobJobStatus column, eg I imagine you only have a small percentage of Jobs 'active' at any one time, eg a few hundred, even a few thousand from the 500,000 records. Partition elimination would probably be cancelled out by the OR OR OR approach. Parallel scans of multiple partitions are also an Enterprise feature:
This would probably work:
SELECT TOP 20 *
FROM dJobs
LEFT JOIN dClients on cltClientID = jobClientId
LEFT JOIN dUsers on regUserId = jobCoordinator
LEFT JOIN dJobStatus ON jbsID = jobJobStatus
WHERE
(
jobjobStatus IN ( SELECT jbsid FROM djobstatusgroupmapping WHERE jsgid = 0 )
)
ORDER BY jobID DESCThis probably won't work:
SELECT TOP 20 *
FROM dJobs
LEFT JOIN dClients on cltClientID = jobClientId
LEFT JOIN dUsers on regUserId = jobCoordinator
LEFT JOIN dJobStatus ON jbsID = jobJobStatus
WHERE
(
jobjobStatus IN ( SELECT jbsid FROM djobstatusgroupmapping WHERE jsgid = 0 )
OR ( 0=0 ) OR ( 0=0 ) --<< this 'OR always true' means 'get the whole table'
)
ORDER BY jobID DESCThis leads me into the query. The OR OR OR approach basically means 'always get the whole table'. The TOP 20 masks this design problem. The TOP also probably pushed the plan towards Nested Loops which Jon suggested was suspect. What also stands out to me about this nightmareish "scan all columns" constructed query is that you bascially have two copies of the same query (and therefore tables), one to count, one for the resultset. This might be more efficient if the data went into an intermediate table and the count was done from there for example.
Finally, this brings me to the only only thing that would actually help you (without a large-scale refactor of the code): data deletion or archiving. As mentioned, I imagine you only have a small percentage of Jobs 'active' at any one time. Carve off the 'inactive' ones into a different table. Create a view over the top of the two tables for reporting. Set up a nightly job to copy out the old records.
Having only a few thousand active jobs in your main table will most likely transform your query performance.
Some recommended reading:
Erland Sommarskog's article on these "search all columns" queries
Dynamic Search Conditions in T-SQL
http://www.sommarskog.se/dyn-search-2008.html
Querying Multiple Columns (Full-Text Search)
http://technet.microsoft.com/en-us/library/ms142488(v=sql.105).aspx
I hope that helps!
Code Snippets
SELECT TOP 20 *
FROM dJobs
LEFT JOIN dClients on cltClientID = jobClientId
LEFT JOIN dUsers on regUserId = jobCoordinator
LEFT JOIN dJobStatus ON jbsID = jobJobStatus
WHERE
(
jobjobStatus IN ( SELECT jbsid FROM djobstatusgroupmapping WHERE jsgid = 0 )
)
ORDER BY jobID DESCSELECT TOP 20 *
FROM dJobs
LEFT JOIN dClients on cltClientID = jobClientId
LEFT JOIN dUsers on regUserId = jobCoordinator
LEFT JOIN dJobStatus ON jbsID = jobJobStatus
WHERE
(
jobjobStatus IN ( SELECT jbsid FROM djobstatusgroupmapping WHERE jsgid = 0 )
OR ( 0=0 ) OR ( 0=0 ) --<< this 'OR always true' means 'get the whole table'
)
ORDER BY jobID DESCContext
StackExchange Database Administrators Q#67917, answer score: 10
Revisions (0)
No revisions yet.