patternsqlMinor
Best way of aggregating, storing and using data in SQL Server (triggers, scheduled jobs, SSAS?)
Viewed 0 times
serversqlaggregatingwaydatatriggersjobsusingandscheduled
Problem
We have a table with spatial data and some attributes and need to aggregate it so it could be used in various queries. We have the mechanism to do the aggregation on-the-fly via CLR stored procedures but I was wondering what would be the best way of caching this aggregated data in SQL Server 2008 R2 as the aggregation is very slow.
So we have table like this:
FeatureId, MeasureFrom and MeasureTo basically define an interval and Rating is the value of this interval. Note that FeatureId actually references a real lines (shapes) from a different table. We have about 250k rows in this table currently with overlapping intervals and various timestamps (historical records). Our aggregation process tries to find the latest Value across all intervals so it needs to cut and join intervals based on timestamp and keeps the related value.
As the on-the-fly querying is slow and takes a lot of resources, we thought we might need to create another database storing the results of this aggregation. The source table is constantly updated (but not too often, once in a couple of days).
What would be the best way of creating this database and keeping the aggregated values up-to-date? I can think of using triggers, or scheduled task. I have no experience with SSAS but would that be more suitable?
Just to note, we have a few similar tables storing slightly different values (some with more that one Value column) and hence the reason for a separate database rather than another table in the original database.
How quick would be the querying of this separate database from the original database using cross-database joins?
Edited:
To demonstrate what our "aggregation" does, here is some sample data:
```
FeatureId | MeasureFrom | MeasureTo | Value | Timestamp
1 | 1 | 20 | 2
So we have table like this:
[TableId] [int] NOT NULL,
[FeatureId] [int] NOT NULL,
[MeasureFrom] [float] NOT NULL,
[MeasureTo] [float] NOT NULL,
[Value] [smallint] NOT NULL,
[Timestamp] [datetime2](7) NOT NULLFeatureId, MeasureFrom and MeasureTo basically define an interval and Rating is the value of this interval. Note that FeatureId actually references a real lines (shapes) from a different table. We have about 250k rows in this table currently with overlapping intervals and various timestamps (historical records). Our aggregation process tries to find the latest Value across all intervals so it needs to cut and join intervals based on timestamp and keeps the related value.
As the on-the-fly querying is slow and takes a lot of resources, we thought we might need to create another database storing the results of this aggregation. The source table is constantly updated (but not too often, once in a couple of days).
What would be the best way of creating this database and keeping the aggregated values up-to-date? I can think of using triggers, or scheduled task. I have no experience with SSAS but would that be more suitable?
Just to note, we have a few similar tables storing slightly different values (some with more that one Value column) and hence the reason for a separate database rather than another table in the original database.
How quick would be the querying of this separate database from the original database using cross-database joins?
Edited:
To demonstrate what our "aggregation" does, here is some sample data:
```
FeatureId | MeasureFrom | MeasureTo | Value | Timestamp
1 | 1 | 20 | 2
Solution
The aggregate table can probably stay in the same DB. You can however create it on a separate filegroup and disk.
Full update of a new aggregate table
One way of doing it using pure SQL and your sample:
With this really small sample, I am not sure it covers all your needs. It may help to add a bigger sample with more data.
It does 4 table scans. Since you have 250k rows, it may not perform so well. It will probablibly better to run it in batch of X consecutive FeatureIds.
This could be done using a job and a SSIS package with either a single update or batch update. You would have to truncate the aggregate table first.
Trigger on each new row in main table
For new rows, using a trigger, this query could be used:
You must replace the new CTE by the values of the new inserted row in the trigger and use it to merge with the aggregate table.
Scheduled update of newly added rows
If you can find a way to get a list of all newly added rows to the main table since the last update of the aggregate table, you could schedule a job every x minutes or hours and only update what is necessary based on what has been recently added.
The trigger query will work as well with scheduled updates.
This can be run as a schedueled job and within a SSIS package.
Full update of a new aggregate table
One way of doing it using pure SQL and your sample:
WITH list ([Type], [FeatureId], [Measure], [Value], [Timestamp], [ID]) as (
SELECT *
, ROW_NUMBER() OVER(PARTITION BY [FeatureId] ORDER BY [Measure])
FROM (
SELECT 0, [FeatureId], [MeasureFrom], [Value], [Timestamp] FROM data
UNION ALL
SELECT 1, [FeatureId], [MeasureTo], [Value], [Timestamp] FROM data
) l([Type], [FeatureId], [Measure], [Value], [Timestamp])
)
SELECT l1.FeatureId, [MeasureFrom] = l1.Measure, [MeasureTo] = l2.Measure
, [Value] = CASE WHEN l1.Type = 0 THEN l1.Value ELSE l2.Value END
, [Timestamp] = CASE WHEN l1.Type = 0 THEN l1.Timestamp ELSE l2.Timestamp END
FROM list l1
INNER JOIN list l2 ON l1.FeatureId = l2.FeatureId AND l1.ID+1 = l2.ID
;With this really small sample, I am not sure it covers all your needs. It may help to add a bigger sample with more data.
It does 4 table scans. Since you have 250k rows, it may not perform so well. It will probablibly better to run it in batch of X consecutive FeatureIds.
This could be done using a job and a SSIS package with either a single update or batch update. You would have to truncate the aggregate table first.
Trigger on each new row in main table
For new rows, using a trigger, this query could be used:
WITH new([FeatureId], [MeasureFrom], [MeasureTo], [Value], [Timestamp]) as (
-- SELECT 1, 1, 20, 2, '2015-01-01'
-- SELECT 1, 5, 15, 3, '2015-01-02'
SELECT 1, 9, 10, 8, '2015-01-03'
), gap([FeatureId], mn, mx) as (
SELECT n.[FeatureId], mn.mn, mx.mx
FROM new n
CROSS APPLY (SELECT mn = MAX([MeasureFrom]) FROM data3 WHERE [FeatureId] = n.[FeatureId] AND [MeasureFrom] n.MeasureTo) mx
), list (x,[FeatureId], [Measure], [Value], [Timestamp], [ID]) as (
SELECT *
, ROW_NUMBER() OVER(PARTITION BY [FeatureId] ORDER BY [Measure])
FROM (
SELECT 0, d.[FeatureId], d.[MeasureFrom], d.[Value], d.[Timestamp]
FROM data3 d
INNER JOIN gap g ON d.[FeatureId] = g.[FeatureId] AND (d.MeasureFrom >= g.mn AND d.MeasureFrom < g.mx)
UNION ALL
SELECT 1, d.[FeatureId], d.[MeasureTo], d.[Value], d.[Timestamp]
FROM data3 d
INNER JOIN gap g ON d.[FeatureId] = g.[FeatureId] AND d.MeasureTo = g.mx
UNION ALL
SELECT 2, [FeatureId], [MeasureFrom], [Value], [Timestamp] FROM new
UNION ALL
SELECT 3, [FeatureId], [MeasureTo], [Value], [Timestamp] FROM new
) l(x, [FeatureId], [Measure], [Value], [Timestamp])
)
MERGE data3 AS target
USING (
SELECT l1.FeatureId, [MeasureFrom] = l1.Measure, [MeasureTo] = l2.Measure
, [Value] = COALESCE(d.[Value], l1.[Value])
, [Timestamp] = COALESCE(d.[Timestamp], l1.[Timestamp])
FROM list l1
INNER JOIN list l2 ON l1.FeatureId = l2.FeatureId AND l1.ID+1 = l2.ID
LEFT JOIN data3 d ON d.MeasureFrom = l1.Measure OR d.MeasureTo = l2.Measure
) as source(FeatureId, [MeasureFrom], [MeasureTo], [Value], [Timestamp])
ON (target.FeatureId = source.FeatureId AND target.[MeasureFrom] = source.[MeasureFrom])
WHEN MATCHED THEN
UPDATE SET target.[MeasureTo] = source.[MeasureTo]
WHEN NOT MATCHED BY target THEN
INSERT (FeatureId, [MeasureFrom], [MeasureTo], [Value], [Timestamp])
VALUES (source.FeatureId, source.[MeasureFrom], source.[MeasureTo], source.[Value], source.[Timestamp])
;You must replace the new CTE by the values of the new inserted row in the trigger and use it to merge with the aggregate table.
Scheduled update of newly added rows
If you can find a way to get a list of all newly added rows to the main table since the last update of the aggregate table, you could schedule a job every x minutes or hours and only update what is necessary based on what has been recently added.
The trigger query will work as well with scheduled updates.
This can be run as a schedueled job and within a SSIS package.
Code Snippets
WITH list ([Type], [FeatureId], [Measure], [Value], [Timestamp], [ID]) as (
SELECT *
, ROW_NUMBER() OVER(PARTITION BY [FeatureId] ORDER BY [Measure])
FROM (
SELECT 0, [FeatureId], [MeasureFrom], [Value], [Timestamp] FROM data
UNION ALL
SELECT 1, [FeatureId], [MeasureTo], [Value], [Timestamp] FROM data
) l([Type], [FeatureId], [Measure], [Value], [Timestamp])
)
SELECT l1.FeatureId, [MeasureFrom] = l1.Measure, [MeasureTo] = l2.Measure
, [Value] = CASE WHEN l1.Type = 0 THEN l1.Value ELSE l2.Value END
, [Timestamp] = CASE WHEN l1.Type = 0 THEN l1.Timestamp ELSE l2.Timestamp END
FROM list l1
INNER JOIN list l2 ON l1.FeatureId = l2.FeatureId AND l1.ID+1 = l2.ID
;WITH new([FeatureId], [MeasureFrom], [MeasureTo], [Value], [Timestamp]) as (
-- SELECT 1, 1, 20, 2, '2015-01-01'
-- SELECT 1, 5, 15, 3, '2015-01-02'
SELECT 1, 9, 10, 8, '2015-01-03'
), gap([FeatureId], mn, mx) as (
SELECT n.[FeatureId], mn.mn, mx.mx
FROM new n
CROSS APPLY (SELECT mn = MAX([MeasureFrom]) FROM data3 WHERE [FeatureId] = n.[FeatureId] AND [MeasureFrom] < n.MeasureFrom) mn
CROSS APPLY (SELECT mx = MIN([MeasureTo]) FROM data3 WHERE [FeatureId] = n.[FeatureId] AND [MeasureTo] > n.MeasureTo) mx
), list (x,[FeatureId], [Measure], [Value], [Timestamp], [ID]) as (
SELECT *
, ROW_NUMBER() OVER(PARTITION BY [FeatureId] ORDER BY [Measure])
FROM (
SELECT 0, d.[FeatureId], d.[MeasureFrom], d.[Value], d.[Timestamp]
FROM data3 d
INNER JOIN gap g ON d.[FeatureId] = g.[FeatureId] AND (d.MeasureFrom >= g.mn AND d.MeasureFrom < g.mx)
UNION ALL
SELECT 1, d.[FeatureId], d.[MeasureTo], d.[Value], d.[Timestamp]
FROM data3 d
INNER JOIN gap g ON d.[FeatureId] = g.[FeatureId] AND d.MeasureTo = g.mx
UNION ALL
SELECT 2, [FeatureId], [MeasureFrom], [Value], [Timestamp] FROM new
UNION ALL
SELECT 3, [FeatureId], [MeasureTo], [Value], [Timestamp] FROM new
) l(x, [FeatureId], [Measure], [Value], [Timestamp])
)
MERGE data3 AS target
USING (
SELECT l1.FeatureId, [MeasureFrom] = l1.Measure, [MeasureTo] = l2.Measure
, [Value] = COALESCE(d.[Value], l1.[Value])
, [Timestamp] = COALESCE(d.[Timestamp], l1.[Timestamp])
FROM list l1
INNER JOIN list l2 ON l1.FeatureId = l2.FeatureId AND l1.ID+1 = l2.ID
LEFT JOIN data3 d ON d.MeasureFrom = l1.Measure OR d.MeasureTo = l2.Measure
) as source(FeatureId, [MeasureFrom], [MeasureTo], [Value], [Timestamp])
ON (target.FeatureId = source.FeatureId AND target.[MeasureFrom] = source.[MeasureFrom])
WHEN MATCHED THEN
UPDATE SET target.[MeasureTo] = source.[MeasureTo]
WHEN NOT MATCHED BY target THEN
INSERT (FeatureId, [MeasureFrom], [MeasureTo], [Value], [Timestamp])
VALUES (source.FeatureId, source.[MeasureFrom], source.[MeasureTo], source.[Value], source.[Timestamp])
;Context
StackExchange Database Administrators Q#125691, answer score: 3
Revisions (0)
No revisions yet.