patternMinor
Fill in the gaps between rows using the previous row's values
Viewed 0 times
rowsthepreviousbetweenusingvaluesrowfillgaps
Problem
I have a
I can visualize road condition over time like this (
However, as noted in red, there are time gaps between the inspections.
Inspections for
The gaps exist because not all of the roads in the City are inspected every year. While I wouldn't consider the gaps to be errors in the data, I am concerned that when I visualize
road_insp table:create table road_insp
(
insp_id int,
road_id int,
insp_date date,
condition number(38,2)
) ;
insert into road_insp (insp_id, road_id, insp_date, condition) values (1,100,to_date('01-APR-04','DD-MON-YY'),.9);
insert into road_insp (insp_id, road_id, insp_date, condition) values (2,100,to_date('01-APR-11','DD-MON-YY'),.7);
insert into road_insp (insp_id, road_id, insp_date, condition) values (3,100,to_date('01-MAR-12','DD-MON-YY'),.7);
insert into road_insp (insp_id, road_id, insp_date, condition) values (4,100,to_date('01-MAR-17','DD-MON-YY'),.6);
insert into road_insp (insp_id, road_id, insp_date, condition) values (5,100,to_date('01-MAR-18','DD-MON-YY'),.6);
insert into road_insp (insp_id, road_id, insp_date, condition) values (6,200,to_date('01-JUN-10','DD-MON-YY'),.4);
insert into road_insp (insp_id, road_id, insp_date, condition) values (7,200,to_date('01-JUN-12','DD-MON-YY'),.3);
commit;
select
insp_id,
road_id,
extract(year from insp_date) as insp_year,
condition as condition
from
road_insp
order by
road_id,
insp_date
;
INSP_ID ROAD_ID INSP_YEAR CONDITION
---------- ---------- ---------- ----------
1 100 2004 .9
2 100 2011 <-gap .7
3 100 2012 .7
4 100 2017 <-gap .6
5 100 2018 .6
6 200 2010 .4
7 200 2012 <-gap .3I can visualize road condition over time like this (
road #100):However, as noted in red, there are time gaps between the inspections.
Inspections for
road #100 are missing for these years:- 2005
- 2006
- 2007
- 2008
- 2009
- 2010
- 2013
- 2014
- 2015
- 2016
The gaps exist because not all of the roads in the City are inspected every year. While I wouldn't consider the gaps to be errors in the data, I am concerned that when I visualize
Solution
with
date_range as (select min(insp_date) mind, max(insp_date) maxd from road_insp),
insp_dates as (select add_months(trunc(mind, 'YYYY'), (level - 1)*12) as insp_date
from date_range connect by level = trunc(ri.insp_date, 'YYYY')
and id.insp_date < trunc(ri.next_insp_date, 'YYYY')
order by insp_id, road_id, insp_date
;Generate dates between min and max insp_date.
Find the next insp_date for each inspection to have them in 1 row.
Join the generated dates to the original dataset on date ranges based on year, excluding the upper bound, because that is the next inspection with existing data. Outer join, so the last inspection for a road (where next inspection is null) is included as well.
If there is generated insp_date is in the same year as the original data, use the original data, otherwise use the generated insp_date.
INSP_ID ROAD_ID INSP_DATE CONDITION
---------- ---------- ---------- ----------
1 100 2004-04-01 .9
1 100 2005-01-01 .9
1 100 2006-01-01 .9
1 100 2007-01-01 .9
1 100 2008-01-01 .9
1 100 2009-01-01 .9
1 100 2010-01-01 .9
2 100 2011-04-01 .7
3 100 2012-03-01 .7
3 100 2013-01-01 .7
3 100 2014-01-01 .7
3 100 2015-01-01 .7
3 100 2016-01-01 .7
4 100 2017-03-01 .6
5 100 2018-03-01 .6
6 200 2010-06-01 .4
6 200 2011-01-01 .4
7 200 2012-06-01 .3Code Snippets
with
date_range as (select min(insp_date) mind, max(insp_date) maxd from road_insp),
insp_dates as (select add_months(trunc(mind, 'YYYY'), (level - 1)*12) as insp_date
from date_range connect by level <= extract(year from maxd) - extract(year from mind))
select
ri.insp_id, ri.road_id,
nvl(case when extract(year from ri.insp_date) = extract(year from id.insp_date)
then ri.insp_date else id.insp_date end, ri.insp_date) as insp_date,
ri.condition
from
insp_dates id right join
(select insp_id, road_id, insp_date, condition as condition,
lead(insp_date) over (partition by road_id order by insp_date) as next_insp_date
from road_insp) ri
on id.insp_date >= trunc(ri.insp_date, 'YYYY')
and id.insp_date < trunc(ri.next_insp_date, 'YYYY')
order by insp_id, road_id, insp_date
;INSP_ID ROAD_ID INSP_DATE CONDITION
---------- ---------- ---------- ----------
1 100 2004-04-01 .9
1 100 2005-01-01 .9
1 100 2006-01-01 .9
1 100 2007-01-01 .9
1 100 2008-01-01 .9
1 100 2009-01-01 .9
1 100 2010-01-01 .9
2 100 2011-04-01 .7
3 100 2012-03-01 .7
3 100 2013-01-01 .7
3 100 2014-01-01 .7
3 100 2015-01-01 .7
3 100 2016-01-01 .7
4 100 2017-03-01 .6
5 100 2018-03-01 .6
6 200 2010-06-01 .4
6 200 2011-01-01 .4
7 200 2012-06-01 .3Context
StackExchange Database Administrators Q#190680, answer score: 3
Revisions (0)
No revisions yet.