patternsqlMinor
Calculating stock quantity based on change log
Viewed 0 times
logquantitystockcalculatingbasedchange
Problem
Imagine that you have the following table structure:
Some position ID:s have special meaning, for example
This table currently holds around 5.5 million rows. We calculate the stock value for each product and position into a cache table on a schedule using a query that looks something like this:
Even though this completes in a reasonable amount of time (around 20 seconds), I feel like this is a pretty inefficient way of calculating the stock values.
We rarely do anything but
I had an idea of creating "checkpoints" in a separate table, calculating the value up to a specific point in time and using that as a start value when creating our stock quantity cache table:
```
LogId | ProductId | FromPositionId | ToPositionId | Date | Quantity
-----------------------------------------------------------------------------------
1 | 123 | 0 | 10002 | 2018-01-01 08:10:22 | 5
2 | 123 | 0 | 10003 | 2018-01-03 15:15:10 | 9
3 | 123 | 10002 | 10004 | 2018-01-07 21:08:56 | 3
4 | 123 | 10004 | 0 | 2018-02-09 10:03:23 | 1FromPositionId and ToPositionId are stock positions.Some position ID:s have special meaning, for example
0. An event from or to 0 means that stock was created or removed. From 0 could be stock from a delivery and to 0 could be a shipped order.This table currently holds around 5.5 million rows. We calculate the stock value for each product and position into a cache table on a schedule using a query that looks something like this:
WITH t AS
(
SELECT ToPositionId AS PositionId, SUM(Quantity) AS Quantity, ProductId
FROM ProductPositionLog
GROUP BY ToPositionId, ProductId
UNION
SELECT FromPositionId AS PositionId, -SUM(Quantity) AS Quantity, ProductId
FROM ProductPositionLog
GROUP BY FromPositionId, ProductId
)
SELECT t.ProductId, t.PositionId, SUM(t.Quantity) AS Quantity
FROM t
WHERE NOT t.PositionId = 0
GROUP BY t.ProductId, t.PositionId
HAVING SUM(t.Quantity) > 0Even though this completes in a reasonable amount of time (around 20 seconds), I feel like this is a pretty inefficient way of calculating the stock values.
We rarely do anything but
INSERT:s in this table, but sometimes we go in and adjust the quantity or remove a row manually due to mistakes by the people generating these rows.I had an idea of creating "checkpoints" in a separate table, calculating the value up to a specific point in time and using that as a start value when creating our stock quantity cache table:
```
Solution
Sometimes you can improve query performance just by doing a little bit of tuning instead of changing your entire query. I noticed in your actual query plan that your query spills to tempdb in three places. Here's one example:
Resolving those tempdb spills may improve performance. If
If you wish to improve performance further I recommend trying out indexed views instead of building and maintaining your own checkpoint table. Indexed views are significantly easier to get right than a custom solution involving your own materialized table or triggers. They will add a small amount of overhead to all DML operations but it may allow you to remove some of the nonclustered indexes that you currently have. Indexed views appear to be supported in the web edition of the product.
There are some restrictions on indexed views so you'll need to create a pair of them. Below is an example implementation, along with the fake data that I used for testing:
Without the indexed views the query takes about 2.7 seconds to finish on my machine. I get a similar plan to yours except mine runs in serial:
I believe that you'll need to query the indexed views with the
This query has a simpler plan and finishes in under 400 ms on my machine:
The best part is that you won't have to change any of the application code that loads data into the
Resolving those tempdb spills may improve performance. If
Quantity is always non-negative then you can replace UNION with UNION ALL which will likely change the hash union operator to something else that doesn't require a memory grant. Your other tempdb spills are caused by issues with cardinality estimation. You're on SQL Server 2014 and using the new CE so it may be difficult to improve the cardinality estimates because the query optimizer won't use multi-column statistics. As a quick fix, consider using the MIN_MEMORY_GRANT query hint made available in SQL Server 2014 SP2. Your query's memory grant is only 49104 KB and the max available grant is 5054840 KB so hopefully bumping it up won't impact concurrency too much. 10% is a reasonable starting guess but you may need to adjust it up and done depending on your hardware and data. Putting that all together, this is what your query might look like:WITH t AS
(
SELECT ToPositionId AS PositionId, SUM(Quantity) AS Quantity, ProductId
FROM ProductPositionLog
GROUP BY ToPositionId, ProductId
UNION ALL
SELECT FromPositionId AS PositionId, -SUM(Quantity) AS Quantity, ProductId
FROM ProductPositionLog
GROUP BY FromPositionId, ProductId
)
SELECT t.ProductId, t.PositionId, SUM(t.Quantity) AS Quantity
FROM t
WHERE NOT t.PositionId = 0
GROUP BY t.ProductId, t.PositionId
HAVING SUM(t.Quantity) > 0
OPTION (MIN_GRANT_PERCENT = 10);If you wish to improve performance further I recommend trying out indexed views instead of building and maintaining your own checkpoint table. Indexed views are significantly easier to get right than a custom solution involving your own materialized table or triggers. They will add a small amount of overhead to all DML operations but it may allow you to remove some of the nonclustered indexes that you currently have. Indexed views appear to be supported in the web edition of the product.
There are some restrictions on indexed views so you'll need to create a pair of them. Below is an example implementation, along with the fake data that I used for testing:
CREATE TABLE dbo.ProductPositionLog (
LogId BIGINT NOT NULL,
ProductId BIGINT NOT NULL,
FromPositionId BIGINT NOT NULL,
ToPositionId BIGINT NOT NULL,
Quantity INT NOT NULL,
FILLER VARCHAR(20),
PRIMARY KEY (LogId)
);
INSERT INTO dbo.ProductPositionLog WITH (TABLOCK)
SELECT RN, RN % 100, RN % 3999, 3998 - (RN % 3999), RN % 10, REPLICATE('Z', 20)
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
) q;
CREATE INDEX NCI1 ON dbo.ProductPositionLog (ToPositionId, ProductId) INCLUDE (Quantity);
CREATE INDEX NCI2 ON dbo.ProductPositionLog (FromPositionId, ProductId) INCLUDE (Quantity);
GO
CREATE VIEW ProductPositionLog_1
WITH SCHEMABINDING
AS
SELECT ToPositionId AS PositionId, SUM(Quantity) AS Quantity, ProductId, COUNT_BIG(*) CNT
FROM dbo.ProductPositionLog
WHERE ToPositionId <> 0
GROUP BY ToPositionId, ProductId
GO
CREATE UNIQUE CLUSTERED INDEX IDX_V1
ON ProductPositionLog_1 (PositionId, ProductId);
GO
CREATE VIEW ProductPositionLog_2
WITH SCHEMABINDING
AS
SELECT FromPositionId AS PositionId, SUM(Quantity) AS Quantity, ProductId, COUNT_BIG(*) CNT
FROM dbo.ProductPositionLog
WHERE FromPositionId <> 0
GROUP BY FromPositionId, ProductId
GO
CREATE UNIQUE CLUSTERED INDEX IDX_V2
ON ProductPositionLog_2 (PositionId, ProductId);
GOWithout the indexed views the query takes about 2.7 seconds to finish on my machine. I get a similar plan to yours except mine runs in serial:
I believe that you'll need to query the indexed views with the
NOEXPAND hint because you aren't on enterprise edition. Here's one way to do that:WITH t AS
(
SELECT PositionId, Quantity, ProductId
FROM ProductPositionLog_1 WITH (NOEXPAND)
UNION ALL
SELECT PositionId, Quantity, ProductId
FROM ProductPositionLog_2 WITH (NOEXPAND)
)
SELECT t.ProductId, t.PositionId, SUM(t.Quantity) AS Quantity
FROM t
GROUP BY t.ProductId, t.PositionId
HAVING SUM(t.Quantity) > 0;This query has a simpler plan and finishes in under 400 ms on my machine:
The best part is that you won't have to change any of the application code that loads data into the
ProductPositionLog table. You simply need to verify that the DML overhead of the pair of indexed views is acceptable.Code Snippets
WITH t AS
(
SELECT ToPositionId AS PositionId, SUM(Quantity) AS Quantity, ProductId
FROM ProductPositionLog
GROUP BY ToPositionId, ProductId
UNION ALL
SELECT FromPositionId AS PositionId, -SUM(Quantity) AS Quantity, ProductId
FROM ProductPositionLog
GROUP BY FromPositionId, ProductId
)
SELECT t.ProductId, t.PositionId, SUM(t.Quantity) AS Quantity
FROM t
WHERE NOT t.PositionId = 0
GROUP BY t.ProductId, t.PositionId
HAVING SUM(t.Quantity) > 0
OPTION (MIN_GRANT_PERCENT = 10);CREATE TABLE dbo.ProductPositionLog (
LogId BIGINT NOT NULL,
ProductId BIGINT NOT NULL,
FromPositionId BIGINT NOT NULL,
ToPositionId BIGINT NOT NULL,
Quantity INT NOT NULL,
FILLER VARCHAR(20),
PRIMARY KEY (LogId)
);
INSERT INTO dbo.ProductPositionLog WITH (TABLOCK)
SELECT RN, RN % 100, RN % 3999, 3998 - (RN % 3999), RN % 10, REPLICATE('Z', 20)
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
) q;
CREATE INDEX NCI1 ON dbo.ProductPositionLog (ToPositionId, ProductId) INCLUDE (Quantity);
CREATE INDEX NCI2 ON dbo.ProductPositionLog (FromPositionId, ProductId) INCLUDE (Quantity);
GO
CREATE VIEW ProductPositionLog_1
WITH SCHEMABINDING
AS
SELECT ToPositionId AS PositionId, SUM(Quantity) AS Quantity, ProductId, COUNT_BIG(*) CNT
FROM dbo.ProductPositionLog
WHERE ToPositionId <> 0
GROUP BY ToPositionId, ProductId
GO
CREATE UNIQUE CLUSTERED INDEX IDX_V1
ON ProductPositionLog_1 (PositionId, ProductId);
GO
CREATE VIEW ProductPositionLog_2
WITH SCHEMABINDING
AS
SELECT FromPositionId AS PositionId, SUM(Quantity) AS Quantity, ProductId, COUNT_BIG(*) CNT
FROM dbo.ProductPositionLog
WHERE FromPositionId <> 0
GROUP BY FromPositionId, ProductId
GO
CREATE UNIQUE CLUSTERED INDEX IDX_V2
ON ProductPositionLog_2 (PositionId, ProductId);
GOWITH t AS
(
SELECT PositionId, Quantity, ProductId
FROM ProductPositionLog_1 WITH (NOEXPAND)
UNION ALL
SELECT PositionId, Quantity, ProductId
FROM ProductPositionLog_2 WITH (NOEXPAND)
)
SELECT t.ProductId, t.PositionId, SUM(t.Quantity) AS Quantity
FROM t
GROUP BY t.ProductId, t.PositionId
HAVING SUM(t.Quantity) > 0;Context
StackExchange Database Administrators Q#215772, answer score: 6
Revisions (0)
No revisions yet.