HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Sql server full text search performance with additional conditions

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
fullsearchsqlwithtextperformanceconditionsserveradditional

Problem

We have a performance problem with SQL Server (2008 R2) Full text search. When we have additional where conditions to full-text search condition, it gets too slow.

Here is my simplified query:

SELECT * FROM Calls C
WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00')
AND CONTAINS(CustomerText, '("efendim")')


Calls table's primary key is CallId (int, clustered index) and also Calls table indexed by CallTime. We have 16.000.000 rows and CustomerText is about 10KB for each row.

When I see execution plan, first it finds full-text search resultset and then joins with Calls table by CallId. Because of that, if first resultset has more rows, query gets slower (over a minute).

This is the execution plan:

When I run where conditions seperately, it returns 360.000 rows for CallTime condition:

SELECT COUNT(*) FROM Calls C
WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00')


and 1.200.000 rows for Contains condition:

SELECT COUNT(*) FROM Calls C
WHERE CONTAINS(AgentText, '("efendim")')


What can I do to increase performance of my query?

Solution

Consider using a hint to remove nested loops as a choice for the optimizer, eg

-- Hint means Nested Loop join will never be chosen, but still leaves the Optimizer some choices
SELECT * FROM Calls C
WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00')
AND CONTAINS(CustomerText, '("warning")')
OPTION ( MERGE JOIN, HASH JOIN )


Be aware of the potential negative impact of using this hint, eg some of your queries with lower rowcount may actually perform better with Nested Loops, so test before implementing.

To make better use of your non-clustered index you could try a rewrite. I experimented with the options below, ie a more prescriptive query but still found the above query quicker. YMMV.

-- CONTAINSTABLE; as already suggested by others
SELECT * 
FROM dbo.Calls C
    INNER JOIN CONTAINSTABLE(dbo.Calls, CustomerText, '("warning")') ft ON c.callId = ft.[KEY]
WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00')

-- This query is more likely to use the non-clustered index as it is covering for CallTime 
-- and callId only (not SELECT *); you could then join back to main table
SELECT callId
FROM Calls C
WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00')
INTERSECT
SELECT callId
FROM Calls C
WHERE CONTAINS(CustomerText, '("warning")')

-- Or explicitly persist the results to a temp table; this may be slower on occasions than the 
--'all-in-one' queries but would probably make for more consistent results
IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp 
CREATE TABLE #tmp ( callId INT PRIMARY KEY )
GO

INSERT INTO #tmp
SELECT callId
FROM Calls C
WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00')

SELECT * 
FROM Calls C
WHERE CONTAINS(CustomerText, 'warning')
AND EXISTS ( SELECT * FROM #tmp t WHERE c.callId = t.callId )

Code Snippets

-- Hint means Nested Loop join will never be chosen, but still leaves the Optimizer some choices
SELECT * FROM Calls C
WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00')
AND CONTAINS(CustomerText, '("warning")')
OPTION ( MERGE JOIN, HASH JOIN )
-- CONTAINSTABLE; as already suggested by others
SELECT * 
FROM dbo.Calls C
    INNER JOIN CONTAINSTABLE(dbo.Calls, CustomerText, '("warning")') ft ON c.callId = ft.[KEY]
WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00')


-- This query is more likely to use the non-clustered index as it is covering for CallTime 
-- and callId only (not SELECT *); you could then join back to main table
SELECT callId
FROM Calls C
WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00')
INTERSECT
SELECT callId
FROM Calls C
WHERE CONTAINS(CustomerText, '("warning")')


-- Or explicitly persist the results to a temp table; this may be slower on occasions than the 
--'all-in-one' queries but would probably make for more consistent results
IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp 
CREATE TABLE #tmp ( callId INT PRIMARY KEY )
GO

INSERT INTO #tmp
SELECT callId
FROM Calls C
WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00')

SELECT * 
FROM Calls C
WHERE CONTAINS(CustomerText, 'warning')
AND EXISTS ( SELECT * FROM #tmp t WHERE c.callId = t.callId )

Context

StackExchange Database Administrators Q#50346, answer score: 4

Revisions (0)

No revisions yet.