snippetsqlMinor
Is there a better way to create dynamic time averaged series over a dynamic frequency?
Viewed 0 times
createfrequencywayaveragedbettertimedynamicseriesthereover
Problem
I've got a series of tables with lots of high precision data collected from various devices. The intervals that they were collected on varies and even wanders over the time series. My users want the ability to pick a date range and get an average / min / max over these variables at a specific frequency. This is the second stab I've taken at this, and it works, but I wonder if there is a better/faster way to accomplish this?
Right now over 32721 rows for this 3 day period this query takes about 43 seconds to run and gives me the 577 rows I expect but I'd like to get this faster. The big hit comes from the nested loop to do the inner join.
declare @start datetime
declare @end datetime
set @start = '3/1/2012'
set @end = '3/3/2012'
declare @interval int
set @interval = 300
declare @tpart table(
dt datetime
);
with CTE_TimeTable
as
(
select @start as [date]
union all
select dateadd(ss,@interval, [date])
from CTE_TimeTable
where DateAdd(ss,@interval, [date]) = @start
and
Date <= @end
group by t.dtRight now over 32721 rows for this 3 day period this query takes about 43 seconds to run and gives me the 577 rows I expect but I'd like to get this faster. The big hit comes from the nested loop to do the inner join.
Solution
Your join between the tables is embedded in a function making it really hard for the optimizer to do anything smart with it. I guess it has to compare every row in one table against every other row in the other table.
Rewriting your join with a range check should be a lot faster. I also added a primary key to your table variable to remove a sort operation from the query plan and I turned your table variable into a temp table instead. The difference in my tests was that the query plan started to use Parallelism.
Note: This query does not return exactly the same intervals as your query does. It will divide the date range into equally sized parts where your query had one half interval at the start and one half interval at the end of the range. It is of course possible to modify the query to do be equivalent to your query if that is desired.
Update
I tested on a table with a total of
Script to generate the test data:
Rewriting your join with a range check should be a lot faster. I also added a primary key to your table variable to remove a sort operation from the query plan and I turned your table variable into a temp table instead. The difference in my tests was that the query plan started to use Parallelism.
declare @start datetime;
declare @end datetime;
set @start = '20120301';
set @end = '20120303';
declare @interval int;
set @interval = 300;
create table #tpart
(
dt datetime primary key
);
with CTE_TimeTable
as
(
select @start as [date]
union all
select dateadd(second ,@interval, [date])
from CTE_TimeTable
where dateadd(second, @interval, [date]) = t.dt and
c.Date < dateadd(second, @interval, t.dt)
group by t.dt;
drop table #tpart;Note: This query does not return exactly the same intervals as your query does. It will divide the date range into equally sized parts where your query had one half interval at the start and one half interval at the end of the range. It is of course possible to modify the query to do be equivalent to your query if that is desired.
Update
I tested on a table with a total of
1036801 rows and with 34560 in the interval 2012-03-01 to 2012-03-03. In my tests the original query takes 4.1 seconds. The query above takes 0.1 seconds.Script to generate the test data:
create table clean.data
(
Date datetime primary key,
x1 int,
x2 int
);
go
with C as
(
select cast('20120201' as datetime) as D
union all
select dateadd(second, 5, D)
from C
where D < '20120401'
)
insert into clean.data(Date, x1, x2)
select D, checksum(newid()) % 1000, checksum(newid()) % 1000
from C
option (maxrecursion 0);Code Snippets
declare @start datetime;
declare @end datetime;
set @start = '20120301';
set @end = '20120303';
declare @interval int;
set @interval = 300;
create table #tpart
(
dt datetime primary key
);
with CTE_TimeTable
as
(
select @start as [date]
union all
select dateadd(second ,@interval, [date])
from CTE_TimeTable
where dateadd(second, @interval, [date]) <= @end
)
insert into #tpart
select [date]
from CTE_TimeTable
option (maxrecursion 0);
select t.dt, avg(c.x1), min(c.x1), max(c.x2), avg(c.x2), min(c.x2), max(c.x2)
from clean.data c
inner join #tpart t
on c.Date >= t.dt and
c.Date < dateadd(second, @interval, t.dt)
group by t.dt;
drop table #tpart;create table clean.data
(
Date datetime primary key,
x1 int,
x2 int
);
go
with C as
(
select cast('20120201' as datetime) as D
union all
select dateadd(second, 5, D)
from C
where D < '20120401'
)
insert into clean.data(Date, x1, x2)
select D, checksum(newid()) % 1000, checksum(newid()) % 1000
from C
option (maxrecursion 0);Context
StackExchange Database Administrators Q#17837, answer score: 4
Revisions (0)
No revisions yet.