patternsqlMinor
Populating missing data based on previous month-end values
Viewed 0 times
previousvaluespopulatingbasedmissingmonthenddata
Problem
Given the following data:
... which represents when a given user was assigned to an account.
I am looking to establish who owned a given account on the last day of each month (the assigned date is the date that the account transferred ownership), with any missing month-ends populated (possibly created from a handy
The desired results would be:
Doing the initial part of this with a windowing function is trivial, it's adding the "missing" rows that I'm struggling with.
create table #histories
(
username varchar(10),
account varchar(10),
assigned date
);
insert into #histories
values
('PHIL','ACCOUNT1','2017-01-04'),
('PETER','ACCOUNT1','2017-01-15'),
('DAVE','ACCOUNT1','2017-03-04'),
('ANDY','ACCOUNT1','2017-05-06'),
('DAVE','ACCOUNT1','2017-05-07'),
('FRED','ACCOUNT1','2017-05-08'),
('JAMES','ACCOUNT1','2017-08-05'),
('DAVE','ACCOUNT2','2017-01-02'),
('PHIL','ACCOUNT2','2017-01-18'),
('JOSH','ACCOUNT2','2017-04-08'),
('JAMES','ACCOUNT2','2017-04-09'),
('DAVE','ACCOUNT2','2017-05-06'),
('PHIL','ACCOUNT2','2017-05-07') ;... which represents when a given user was assigned to an account.
I am looking to establish who owned a given account on the last day of each month (the assigned date is the date that the account transferred ownership), with any missing month-ends populated (possibly created from a handy
dates table that I have available, with useful columns DateKey, Date and LastDayOfMonth, [courtesy of @AaronBertrand])1.The desired results would be:
PETER, ACCOUNT1, 2017-01-31
PETER, ACCOUNT1, 2017-02-28
DAVE, ACCOUNT1, 2017-03-31
DAVE, ACCOUNT1, 2017-04-30
FRED, ACCOUNT1, 2017-05-31
FRED, ACCOUNT1, 2017-06-30
FRED, ACCOUNT1, 2017-07-31
JAMES, ACCOUNT1, 2017-08-31
PHIL, ACCOUNT2, 2017-01-31
PHIL, ACCOUNT2, 2017-02-28
PHIL, ACCOUNT2, 2017-03-31
JAMES, ACCOUNT2, 2017-04-30
PHIL, ACCOUNT2, 2017-05-31Doing the initial part of this with a windowing function is trivial, it's adding the "missing" rows that I'm struggling with.
Solution
One approach to this problem is to do the following:
I modified your test data a little bit to make the results deterministic. Also added an index:
Here's the laziest date dimension table of all time:
For step 1, there are plenty of ways to emulate
For step 2, we need to change the NULL values to something else. You want to include the final month for each account, so adding one month to the starting date suffices:
For step 3, we can join to the date dimension table. The column from the dimension table is exactly the column you need for the result set:
I didn't like the query that I got when I put it all together. There can be issues with join order when combining
I don't know how much data you have so it might not matter for you. But the plan looks how I want it to:
The results match yours:
- Emulate
LEADon SQL Server 2008. You can useAPPLYor a suquery for this.
- For rows without a next row, add one month to their account date.
- Join to a dimension table that contains month end dates. This eliminates all rows that don't span at least a month and adds rows to fill in the gaps as necessary.
I modified your test data a little bit to make the results deterministic. Also added an index:
create table #histories
(
username varchar(10),
account varchar(10),
assigned date
);
insert into #histories
values
('PHIL','ACCOUNT1','2017-01-04'),
('PETER','ACCOUNT1','2017-01-15'),
('DAVE','ACCOUNT1','2017-03-04'),
('ANDY','ACCOUNT1','2017-05-06'),
('DAVE','ACCOUNT1','2017-05-07'),
('FRED','ACCOUNT1','2017-05-08'),
('JAMES','ACCOUNT1','2017-08-05'),
('DAVE','ACCOUNT2','2017-01-02'),
('PHIL','ACCOUNT2','2017-01-18'),
('JOSH','ACCOUNT2','2017-04-08'), -- changed this date to have deterministic results
('JAMES','ACCOUNT2','2017-04-09'),
('DAVE','ACCOUNT2','2017-05-06'),
('PHIL','ACCOUNT2','2017-05-07') ;
-- make life easy
create index gotta_go_fast ON #histories (account, assigned);Here's the laziest date dimension table of all time:
create table #date_dim_months_only (
month_date date,
primary key (month_date)
);
-- put 2500 month ends into table
INSERT INTO #date_dim_months_only WITH (TABLOCK)
SELECT DATEADD(DAY, -1, DATEADD(MONTH, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), '20000101'))
FROM master..spt_values;For step 1, there are plenty of ways to emulate
LEAD. Here's one method:SELECT
h1.username
, h1.account
, h1.assigned
, next_date.assigned
FROM #histories h1
OUTER APPLY (
SELECT TOP 1 h2.assigned
FROM #histories h2
WHERE h1.account = h2.account
AND h1.assigned < h2.assigned
ORDER BY h2.assigned ASC
) next_date;For step 2, we need to change the NULL values to something else. You want to include the final month for each account, so adding one month to the starting date suffices:
ISNULL(next_date.assigned, DATEADD(MONTH, 1, h1.assigned))For step 3, we can join to the date dimension table. The column from the dimension table is exactly the column you need for the result set:
INNER JOIN #date_dim_months_only dd ON
dd.month_date >= h1.assigned AND
dd.month_date < ISNULL(next_date.assigned, DATEADD(MONTH, 1, h1.assigned))I didn't like the query that I got when I put it all together. There can be issues with join order when combining
OUTER APPLY and INNER JOIN. To get the join order I wanted I rewrote it with a subquery:SELECT
hist.username
, hist.account
, dd.month_date
FROM
(
SELECT
h1.username
, h1.account
, h1.assigned
, ISNULL(
(SELECT TOP 1 h2.assigned
FROM #histories h2
WHERE h1.account = h2.account
AND h1.assigned = hist.assigned AND
dd.month_date < hist.next_assigned;I don't know how much data you have so it might not matter for you. But the plan looks how I want it to:
The results match yours:
╔══════════╦══════════╦════════════╗
║ username ║ account ║ month_date ║
╠══════════╬══════════╬════════════╣
║ PETER ║ ACCOUNT1 ║ 2017-01-31 ║
║ PETER ║ ACCOUNT1 ║ 2017-02-28 ║
║ DAVE ║ ACCOUNT1 ║ 2017-03-31 ║
║ DAVE ║ ACCOUNT1 ║ 2017-04-30 ║
║ FRED ║ ACCOUNT1 ║ 2017-05-31 ║
║ FRED ║ ACCOUNT1 ║ 2017-06-30 ║
║ FRED ║ ACCOUNT1 ║ 2017-07-31 ║
║ JAMES ║ ACCOUNT1 ║ 2017-08-31 ║
║ PHIL ║ ACCOUNT2 ║ 2017-01-31 ║
║ PHIL ║ ACCOUNT2 ║ 2017-02-28 ║
║ PHIL ║ ACCOUNT2 ║ 2017-03-31 ║
║ JAMES ║ ACCOUNT2 ║ 2017-04-30 ║
║ PHIL ║ ACCOUNT2 ║ 2017-05-31 ║
╚══════════╩══════════╩════════════╝Code Snippets
create table #histories
(
username varchar(10),
account varchar(10),
assigned date
);
insert into #histories
values
('PHIL','ACCOUNT1','2017-01-04'),
('PETER','ACCOUNT1','2017-01-15'),
('DAVE','ACCOUNT1','2017-03-04'),
('ANDY','ACCOUNT1','2017-05-06'),
('DAVE','ACCOUNT1','2017-05-07'),
('FRED','ACCOUNT1','2017-05-08'),
('JAMES','ACCOUNT1','2017-08-05'),
('DAVE','ACCOUNT2','2017-01-02'),
('PHIL','ACCOUNT2','2017-01-18'),
('JOSH','ACCOUNT2','2017-04-08'), -- changed this date to have deterministic results
('JAMES','ACCOUNT2','2017-04-09'),
('DAVE','ACCOUNT2','2017-05-06'),
('PHIL','ACCOUNT2','2017-05-07') ;
-- make life easy
create index gotta_go_fast ON #histories (account, assigned);create table #date_dim_months_only (
month_date date,
primary key (month_date)
);
-- put 2500 month ends into table
INSERT INTO #date_dim_months_only WITH (TABLOCK)
SELECT DATEADD(DAY, -1, DATEADD(MONTH, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), '20000101'))
FROM master..spt_values;SELECT
h1.username
, h1.account
, h1.assigned
, next_date.assigned
FROM #histories h1
OUTER APPLY (
SELECT TOP 1 h2.assigned
FROM #histories h2
WHERE h1.account = h2.account
AND h1.assigned < h2.assigned
ORDER BY h2.assigned ASC
) next_date;ISNULL(next_date.assigned, DATEADD(MONTH, 1, h1.assigned))INNER JOIN #date_dim_months_only dd ON
dd.month_date >= h1.assigned AND
dd.month_date < ISNULL(next_date.assigned, DATEADD(MONTH, 1, h1.assigned))Context
StackExchange Database Administrators Q#183087, answer score: 9
Revisions (0)
No revisions yet.