patternsqlMinor
Slow While Loop, Query Improvment Assistance
Viewed 0 times
assistancewhilequeryloopslowimprovment
Problem
I am working on creating a Datawarehouse.
I have created a Time Dimension (Dim_Time), at 5-minute intervals. Hour Aggregations will have [Minutes] = NULL.
For the purpose of this example:
Then I have an Incoming Table, which is updated every 5 minutes from the OLTP Database.
I then have the following While loop. It's purpose is to get the correct 5-Minute time slot (TimeID) that relates to a particular incoming row:
It's such a simple process, and yet I cannot figure out a simpler way to update the TimeID. As per the CTE SELECT in the loop, I need to get the MAX(TimeID) where the StartDateTime is less then the rows InsertTime.
Because time is the only relationship, I am struggling with all options to do this in 1 query without the loop, but I feel it is possible
Please can someone help me out here either with a better option or confirming that this is the simplest way.
Thank you
I have created a Time Dimension (Dim_Time), at 5-minute intervals. Hour Aggregations will have [Minutes] = NULL.
For the purpose of this example:
CREATE TABLE [dbo].[Dim_Time](
[TimeID] [int] IDENTITY(1,1) NOT NULL,
[StartDateTime] [datetime] NULL,
[Hour] [int] NULL,
[Minute] [int] NULL,
CONSTRAINT [PK_Dim_Time] PRIMARY KEY CLUSTERED
([TimeID] ASC)
) ON [PRIMARY]
GOThen I have an Incoming Table, which is updated every 5 minutes from the OLTP Database.
CREATE TABLE [dbo].[Stg_IncomingQueue](
[IncomingID] [int] IDENTITY(1,1) NOT NULL,
[CustomerID] [int] NOT NULL,
[TimeID] [int] NULL,
[InsertTime] [datetime] NULL,
CONSTRAINT [PK_IncomingQueueMonitor] PRIMARY KEY CLUSTERED
([IncomingID] ASC)
) ON [PRIMARY]
GOI then have the following While loop. It's purpose is to get the correct 5-Minute time slot (TimeID) that relates to a particular incoming row:
WHILE 0 < (SELECT COUNT(*) FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL)
BEGIN
SELECT TOP 1 @IncomingID = IncomingID, @RowInserTime = InsertTime
FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL
;WITH DimTime
AS (
SELECT MAX(TimeID) AS MaxTimeID FROM [dba_local].[dbo].[Dim_Time]
WHERE StartDateTime < @RowInserTime AND [Minute] IS NOT NULL
)
UPDATE [dba_local].[dbo].[Stg_IncomingQueue]
SET TimeID = (SELECT MaxTimeID FROM DimTime)
WHERE IncomingID = @IncomingID
ENDIt's such a simple process, and yet I cannot figure out a simpler way to update the TimeID. As per the CTE SELECT in the loop, I need to get the MAX(TimeID) where the StartDateTime is less then the rows InsertTime.
Because time is the only relationship, I am struggling with all options to do this in 1 query without the loop, but I feel it is possible
Please can someone help me out here either with a better option or confirming that this is the simplest way.
Thank you
Solution
I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.
This piece replaces your entire
The results, compared side-by-side with the Dim_Time table:
The output looks like:
╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║ 1 ║ 1 ║ 271 ║ 1875-06-30 22:31:49.000 ║ 271 ║ 22:30:00 ║
║ 2 ║ 2 ║ 116 ║ 1857-07-01 09:38:59.000 ║ 116 ║ 09:35:00 ║
║ 3 ║ 3 ║ 218 ║ 1854-09-18 18:08:39.000 ║ 218 ║ 18:05:00 ║
║ 4 ║ 4 ║ 221 ║ 1860-05-31 18:22:25.000 ║ 221 ║ 18:20:00 ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝
Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using
Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.
If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of
The update statement then becomes:
IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
TimeID int IDENTITY(1,1) NOT NULL,
StartDateTime time(0) NULL,
CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED
(TimeID ASC)
) ON [PRIMARY]
GO
;WITH src AS
(
SELECT TOP (10) sv.number
FROM master.dbo.spt_values sv
WHERE sv.type = N'P'
ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));This piece replaces your entire
WHILE loop, with a single UPDATE statement, which is both more efficient, and easier to understand.UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;The results, compared side-by-side with the Dim_Time table:
SELECT *
FROM dbo.Stg_IncomingQueue iq
INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;The output looks like:
╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║ 1 ║ 1 ║ 271 ║ 1875-06-30 22:31:49.000 ║ 271 ║ 22:30:00 ║
║ 2 ║ 2 ║ 116 ║ 1857-07-01 09:38:59.000 ║ 116 ║ 09:35:00 ║
║ 3 ║ 3 ║ 218 ║ 1854-09-18 18:08:39.000 ║ 218 ║ 18:05:00 ║
║ 4 ║ 4 ║ 221 ║ 1860-05-31 18:22:25.000 ║ 221 ║ 18:20:00 ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝
Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using
CONVERT() to convert the incoming datetime column into a time(0) value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.
If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of
dbo.Stg_IncomingQueue to include a persisted computed column, as in:CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GOThe update statement then becomes:
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;Code Snippets
IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
TimeID int IDENTITY(1,1) NOT NULL,
StartDateTime time(0) NULL,
CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED
(TimeID ASC)
) ON [PRIMARY]
GO
;WITH src AS
(
SELECT TOP (10) sv.number
FROM master.dbo.spt_values sv
WHERE sv.type = N'P'
ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;SELECT *
FROM dbo.Stg_IncomingQueue iq
INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GOUPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;Context
StackExchange Database Administrators Q#230890, answer score: 7
Revisions (0)
No revisions yet.