patternsqlMinor
Simple yet problematic update query
Viewed 0 times
simpleupdatequeryyetproblematic
Problem
I have a rather simple update/query, that has been causing me a lot of grief over the years.
in simplest form it is :
However, in some stored procs, the query also include this "OR VillageID in (...)" subquery
Note that this OR, can return many villageIDs, not just one, @Z. This version of the query, sometimes runs a very long time. No index rebuild, stats rebuild helps. It runs slowly when the contents of Villages table is deleted and repopulated. In this case, the row count would be just a few hundred rows. I never figured out why this is, and always just lived with it.
However, recently I was looking at the query plan:
It seems that estimated number of rows (4000) is huge in comparison with the actual number of rows (2).
I created this statistic but it does not help
SO MY QUESTION : any suggestions why this could be and what I could do to improve this ?
for reference, the table looks like this :
UPDATE: Trying out this version of the query as suggested by srutzky
```
CREATE TABLE #VillagesToLock (VillageID INT NOT NULL);
insert into #VillagesToLock values (@X)
insert into #VillagesToLock values (@Y)
insert into #VillagesToLock select VSU.SupportingVillageID
from VillageSupportUnits VSU
where SupportedVillageID = @Z
a
in simplest form it is :
update VillageSemaphore
set TimeStamp = getdate()
where VillageID in (@X, @Y)However, in some stored procs, the query also include this "OR VillageID in (...)" subquery
update VillageSemaphore
set TimeStamp = getdate()
where VillageID in (@X, @Y)
OR VillageID in ( -- this subquery can return many rows, many different VillageIDs
select VSU.SupportingVillageID
from VillageSupportUnits VSU
where SupportedVillageID = @Z
and VSU.UnitCount <> 0
)Note that this OR, can return many villageIDs, not just one, @Z. This version of the query, sometimes runs a very long time. No index rebuild, stats rebuild helps. It runs slowly when the contents of Villages table is deleted and repopulated. In this case, the row count would be just a few hundred rows. I never figured out why this is, and always just lived with it.
However, recently I was looking at the query plan:
It seems that estimated number of rows (4000) is huge in comparison with the actual number of rows (2).
I created this statistic but it does not help
CREATE STATISTICS [stat_x] ON [VillageSU]([UnitCount], [VillageID])SO MY QUESTION : any suggestions why this could be and what I could do to improve this ?
for reference, the table looks like this :
CREATE TABLE VillageSemaphore(
VillageID int NOT NULL,
TimeStamp datetime NOT NULL,
CONSTRAINT PK97 PRIMARY KEY CLUSTERED (VillageID)
)UPDATE: Trying out this version of the query as suggested by srutzky
```
CREATE TABLE #VillagesToLock (VillageID INT NOT NULL);
insert into #VillagesToLock values (@X)
insert into #VillagesToLock values (@Y)
insert into #VillagesToLock select VSU.SupportingVillageID
from VillageSupportUnits VSU
where SupportedVillageID = @Z
a
Solution
While I'm not convinced this is a problem with the query itself (did you check for blocking when it runs slow? did you check the wait type(s) occurring while it was running),
This may solve the estimation problem, but I agree with Max, a statistic with a leading column of
IN and OR can be a problematic pattern to optimize for. Have you considered breaking this into multiple statements?UPDATE dbo.VillageSemaphoreset
SET [TimeStamp] = GETDATE() -- TimeStamp is a terrible column name btw
WHERE VillageID = @X;
UPDATE dbo.VillageSemaphoreset
SET [TimeStamp] = GETDATE()
WHERE VillageID = @Y;
IF (whatever condition leads you to "sometimes add this OR")
BEGIN
UPDATE v
SET [TimeStamp] = GETDATE()
FROM dbo.VillageSemaphoreset AS v
WHERE VillageID = @Z
AND EXISTS
(
SELECT 1 FROM dbo.VillageSU AS vs
WHERE vs.VillageID = v.VillageID
);
ENDThis may solve the estimation problem, but I agree with Max, a statistic with a leading column of
UnitCount won't help the estimates for these queries anyway.Code Snippets
UPDATE dbo.VillageSemaphoreset
SET [TimeStamp] = GETDATE() -- TimeStamp is a terrible column name btw
WHERE VillageID = @X;
UPDATE dbo.VillageSemaphoreset
SET [TimeStamp] = GETDATE()
WHERE VillageID = @Y;
IF (whatever condition leads you to "sometimes add this OR")
BEGIN
UPDATE v
SET [TimeStamp] = GETDATE()
FROM dbo.VillageSemaphoreset AS v
WHERE VillageID = @Z
AND EXISTS
(
SELECT 1 FROM dbo.VillageSU AS vs
WHERE vs.VillageID = v.VillageID
);
ENDContext
StackExchange Database Administrators Q#118898, answer score: 6
Revisions (0)
No revisions yet.