HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Simple yet problematic update query

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
simpleupdatequeryyetproblematic

Problem

I have a rather simple update/query, that has been causing me a lot of grief over the years.

in simplest form it is :

update VillageSemaphore
set TimeStamp = getdate() 
        where VillageID in (@X, @Y)


However, in some stored procs, the query also include this "OR VillageID in (...)" subquery

update VillageSemaphore
set TimeStamp = getdate() 
        where VillageID in (@X, @Y)

        OR VillageID in  ( -- this subquery can return many rows, many different VillageIDs
        select VSU.SupportingVillageID 
        from VillageSupportUnits VSU
        where SupportedVillageID = @Z       
            and VSU.UnitCount <> 0
            )


Note that this OR, can return many villageIDs, not just one, @Z. This version of the query, sometimes runs a very long time. No index rebuild, stats rebuild helps. It runs slowly when the contents of Villages table is deleted and repopulated. In this case, the row count would be just a few hundred rows. I never figured out why this is, and always just lived with it.

However, recently I was looking at the query plan:

It seems that estimated number of rows (4000) is huge in comparison with the actual number of rows (2).

I created this statistic but it does not help

CREATE STATISTICS [stat_x] ON [VillageSU]([UnitCount], [VillageID])


SO MY QUESTION : any suggestions why this could be and what I could do to improve this ?

for reference, the table looks like this :

CREATE TABLE VillageSemaphore(
    VillageID    int         NOT NULL,
    TimeStamp    datetime    NOT NULL,
    CONSTRAINT PK97 PRIMARY KEY CLUSTERED (VillageID)
)


UPDATE: Trying out this version of the query as suggested by srutzky

```
CREATE TABLE #VillagesToLock (VillageID INT NOT NULL);
insert into #VillagesToLock values (@X)
insert into #VillagesToLock values (@Y)
insert into #VillagesToLock select VSU.SupportingVillageID
from VillageSupportUnits VSU
where SupportedVillageID = @Z
a

Solution

While I'm not convinced this is a problem with the query itself (did you check for blocking when it runs slow? did you check the wait type(s) occurring while it was running), IN and OR can be a problematic pattern to optimize for. Have you considered breaking this into multiple statements?

UPDATE dbo.VillageSemaphoreset 
  SET [TimeStamp] = GETDATE() -- TimeStamp is a terrible column name btw 
  WHERE VillageID = @X;

UPDATE dbo.VillageSemaphoreset 
  SET [TimeStamp] = GETDATE()
  WHERE VillageID = @Y;

IF (whatever condition leads you to "sometimes add this OR")
BEGIN
  UPDATE v 
    SET [TimeStamp] = GETDATE()
    FROM dbo.VillageSemaphoreset AS v
    WHERE VillageID = @Z
    AND EXISTS 
    (
      SELECT 1 FROM dbo.VillageSU AS vs
      WHERE vs.VillageID = v.VillageID
    );
END


This may solve the estimation problem, but I agree with Max, a statistic with a leading column of UnitCount won't help the estimates for these queries anyway.

Code Snippets

UPDATE dbo.VillageSemaphoreset 
  SET [TimeStamp] = GETDATE() -- TimeStamp is a terrible column name btw 
  WHERE VillageID = @X;

UPDATE dbo.VillageSemaphoreset 
  SET [TimeStamp] = GETDATE()
  WHERE VillageID = @Y;

IF (whatever condition leads you to "sometimes add this OR")
BEGIN
  UPDATE v 
    SET [TimeStamp] = GETDATE()
    FROM dbo.VillageSemaphoreset AS v
    WHERE VillageID = @Z
    AND EXISTS 
    (
      SELECT 1 FROM dbo.VillageSU AS vs
      WHERE vs.VillageID = v.VillageID
    );
END

Context

StackExchange Database Administrators Q#118898, answer score: 6

Revisions (0)

No revisions yet.