HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Sorting multiple columns in a full outer join, possibly containing NULLs

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
sortingcontainingfullcolumnspossiblyjoinmultiplenullsouter

Problem

I am trying to compare a plan against a history to see if there are steps omitted or if unplanned steps have been included. The regular, planned history items as well as the anomalies are intended for an elaborate report. The problem is, I have a column in each joined table that I need for sorting, but due to the full outer join I cannot guarantee them to be not null, which breaks sorting. I'll continue with a simplified example, where the plan corresponds to a recipe:

Table: RecipeSteps

 Id | RecipeId | Position | Message
----+----------+----------+----------------------------
 50 |      123 |        1 | Wash vegetables
 51 |      123 |        2 | Peel vegetables
 52 |      123 |        3 | Cut meat
 53 |      123 |        4 | Turn on stove
 54 |      123 |        5 | Cook and stir it
 55 |      123 |        6 | Turn off stove
----+----------+----------+----------------------------


... and the history to the actual steps while cooking:

Table: History

 Id | TimeStamp | Session | StepId | Message
----+-----------+---------+--------+-----------------------------
 90 |  00:01:00 |     321 |   NULL | Chef entered kitchen
 91 |  00:02:00 |     321 |     51 | Chef peeled vegetables
 92 |  00:03:00 |     321 |     52 | Chef cut meat
 93 |  00:04:00 |     321 |   NULL | Chef picked his nose
 94 |  00:05:00 |     321 |     53 | Chef turned on stove
 95 |  00:06:00 |     321 |     54 | Chef started cooking
 96 |  00:10:00 |     321 |   NULL | Chef left kitchen
 97 |  01:00:00 |     321 |   NULL | FIRE!
----+-----------+---------+--------+-----------------------------


Now, I want to compare these tables in a result table with a full outer join similar to this:

```
Table: Result

Sorting | TimeStamp | Position | Message | Comment
--------+-----------+----------+------------------------+---------------
1 | 00:01:00 | NULL | Chef entered kitchen | unplanned
2 | NULL | 1 | Wash vegetables | omitt

Solution

Please note: The query below doesn't take into account partitioning by Session to keep it simple.

The main idea is to split the data set into two groups and process them separately. History goes first. I decided to make History more important than Steps. If some steps are swapped, they will be presented in the history order.

So, the first group is all rows from History plus possible details from the corresponding Step. This is a simple LEFT JOIN.

The second group is omitted Steps. Another LEFT JOIN with a filter.

If you run the first CTE_History ordering by timestamp you'll see that unplanned events have NULL Position, but they will be in correct position in the result set, because of their timestamp. All we need to do is fill in these blanks.

+-------------+---------------------+----------+------------------------+-----------+
| NewPosition |         ts          | Position |       NewMessage       |  Comment  |
+-------------+---------------------+----------+------------------------+-----------+
|           0 | 2017-01-01 00:01:00 | NULL     | Chef entered kitchen   | unplanned |
|           2 | 2017-01-01 00:02:00 | 2        | Chef peeled vegetables | planned   |
|           3 | 2017-01-01 00:03:00 | 3        | Chef cut meat          | planned   |
|           3 | 2017-01-01 00:04:00 | NULL     | Chef picked his nose   | unplanned |
|           4 | 2017-01-01 00:05:00 | 4        | Chef turned on stove   | planned   |
|           5 | 2017-01-01 00:06:00 | 5        | Chef started cooking   | planned   |
|           5 | 2017-01-01 00:10:00 | NULL     | Chef left kitchen      | unplanned |
|           5 | 2017-01-01 01:00:00 | NULL     | FIRE!                  | unplanned |
+-------------+---------------------+----------+------------------------+-----------+


One way to do it is to take the last non-null value from the previous rows. Unfortunately, SQL Server doesn't implement IGNORE NULLS clause in the LAST_VALUE function, so we have to use a workaround with MAX and rely on the fact that Position grows as ts grows. In a more general case it will be a bit more complicated, as Itzik Ben-Gan showed in his The Last non NULL Puzzle.

The very first NULL is changed to 0 (or some number that is less than any possible Position).

Once we have values for all Positions, it is trivial to union this result with all omitted steps and order them.

In this approach all omitted steps go after unplanned steps in case of ambiguity, so Turn off stove will appear last. If you choose to calculate MIN of following rows instead of MAX of previous, this rule will be swapped.

Sample data

DECLARE @RecipeSteps TABLE (Id int, RecipeId int, Position int, Message nvarchar(255));
INSERT INTO @RecipeSteps (Id, RecipeId, Position, Message) VALUES
(50, 123, 1, 'Wash vegetables '),
(51, 123, 2, 'Peel vegetables '), --
(52, 123, 3, 'Cut meat        '), --
(53, 123, 4, 'Turn on stove   '), --
(54, 123, 5, 'Cook and stir it'), --
(55, 123, 6, 'Turn off stove  ');

DECLARE @History TABLE (Id int, ts datetime2(0), Session int, StepId int, Message nvarchar(255));
INSERT INTO @History (Id, ts, Session, StepId, Message) VALUES
(90, '2017-01-01 00:01:00', 321, NULL, 'Chef entered kitchen  '),
(91, '2017-01-01 00:02:00', 321,   51, 'Chef peeled vegetables'),
(92, '2017-01-01 00:03:00', 321,   52, 'Chef cut meat         '),
(93, '2017-01-01 00:04:00', 321, NULL, 'Chef picked his nose  '),
(94, '2017-01-01 00:05:00', 321,   53, 'Chef turned on stove  '),
(95, '2017-01-01 00:06:00', 321,   54, 'Chef started cooking  '),
(96, '2017-01-01 00:10:00', 321, NULL, 'Chef left kitchen     '),
(97, '2017-01-01 01:00:00', 321, NULL, 'FIRE!                 ');


Query

WITH
CTE_History
AS
(
    SELECT
        ISNULL(
            MAX(Position) 
            OVER (ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
            , 0) AS NewPosition
        ,H.ts
        ,S.Position
        ,H.Message AS NewMessage
        ,CASE WHEN S.Id IS NULL THEN 'unplanned' ELSE 'planned' END AS Comment
    FROM
        @History AS H
        LEFT JOIN @RecipeSteps AS S ON S.Id = H.StepId
)
,CTE_OmittedSteps
AS
(
    SELECT
        Position AS NewPosition
        ,H.ts
        ,S.Position
        ,S.Message AS NewMessage
        ,'omitted' AS Comment
    FROM
        @RecipeSteps AS S
        LEFT JOIN @History AS H ON S.Id = H.StepId
    WHERE
        H.Id IS NULL
)
SELECT * FROM CTE_History
UNION ALL
SELECT * FROM CTE_OmittedSteps
ORDER BY NewPosition, ts;


Result

```
+-------------+---------------------+----------+------------------------+-----------+
| NewPosition | ts | Position | NewMessage | Comment |
+-------------+---------------------+----------+------------------------+-----------+
| 0 | 2017-01-01 00:01:00 | NULL | Chef entered kitchen | unplanned |
| 1 | NULL | 1 | Wash vegetables | omitted |
|

Code Snippets

+-------------+---------------------+----------+------------------------+-----------+
| NewPosition |         ts          | Position |       NewMessage       |  Comment  |
+-------------+---------------------+----------+------------------------+-----------+
|           0 | 2017-01-01 00:01:00 | NULL     | Chef entered kitchen   | unplanned |
|           2 | 2017-01-01 00:02:00 | 2        | Chef peeled vegetables | planned   |
|           3 | 2017-01-01 00:03:00 | 3        | Chef cut meat          | planned   |
|           3 | 2017-01-01 00:04:00 | NULL     | Chef picked his nose   | unplanned |
|           4 | 2017-01-01 00:05:00 | 4        | Chef turned on stove   | planned   |
|           5 | 2017-01-01 00:06:00 | 5        | Chef started cooking   | planned   |
|           5 | 2017-01-01 00:10:00 | NULL     | Chef left kitchen      | unplanned |
|           5 | 2017-01-01 01:00:00 | NULL     | FIRE!                  | unplanned |
+-------------+---------------------+----------+------------------------+-----------+
DECLARE @RecipeSteps TABLE (Id int, RecipeId int, Position int, Message nvarchar(255));
INSERT INTO @RecipeSteps (Id, RecipeId, Position, Message) VALUES
(50, 123, 1, 'Wash vegetables '),
(51, 123, 2, 'Peel vegetables '), --
(52, 123, 3, 'Cut meat        '), --
(53, 123, 4, 'Turn on stove   '), --
(54, 123, 5, 'Cook and stir it'), --
(55, 123, 6, 'Turn off stove  ');

DECLARE @History TABLE (Id int, ts datetime2(0), Session int, StepId int, Message nvarchar(255));
INSERT INTO @History (Id, ts, Session, StepId, Message) VALUES
(90, '2017-01-01 00:01:00', 321, NULL, 'Chef entered kitchen  '),
(91, '2017-01-01 00:02:00', 321,   51, 'Chef peeled vegetables'),
(92, '2017-01-01 00:03:00', 321,   52, 'Chef cut meat         '),
(93, '2017-01-01 00:04:00', 321, NULL, 'Chef picked his nose  '),
(94, '2017-01-01 00:05:00', 321,   53, 'Chef turned on stove  '),
(95, '2017-01-01 00:06:00', 321,   54, 'Chef started cooking  '),
(96, '2017-01-01 00:10:00', 321, NULL, 'Chef left kitchen     '),
(97, '2017-01-01 01:00:00', 321, NULL, 'FIRE!                 ');
WITH
CTE_History
AS
(
    SELECT
        ISNULL(
            MAX(Position) 
            OVER (ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
            , 0) AS NewPosition
        ,H.ts
        ,S.Position
        ,H.Message AS NewMessage
        ,CASE WHEN S.Id IS NULL THEN 'unplanned' ELSE 'planned' END AS Comment
    FROM
        @History AS H
        LEFT JOIN @RecipeSteps AS S ON S.Id = H.StepId
)
,CTE_OmittedSteps
AS
(
    SELECT
        Position AS NewPosition
        ,H.ts
        ,S.Position
        ,S.Message AS NewMessage
        ,'omitted' AS Comment
    FROM
        @RecipeSteps AS S
        LEFT JOIN @History AS H ON S.Id = H.StepId
    WHERE
        H.Id IS NULL
)
SELECT * FROM CTE_History
UNION ALL
SELECT * FROM CTE_OmittedSteps
ORDER BY NewPosition, ts;
+-------------+---------------------+----------+------------------------+-----------+
| NewPosition |         ts          | Position |       NewMessage       |  Comment  |
+-------------+---------------------+----------+------------------------+-----------+
|           0 | 2017-01-01 00:01:00 | NULL     | Chef entered kitchen   | unplanned |
|           1 | NULL                | 1        | Wash vegetables        | omitted   |
|           2 | 2017-01-01 00:02:00 | 2        | Chef peeled vegetables | planned   |
|           3 | 2017-01-01 00:03:00 | 3        | Chef cut meat          | planned   |
|           3 | 2017-01-01 00:04:00 | NULL     | Chef picked his nose   | unplanned |
|           4 | 2017-01-01 00:05:00 | 4        | Chef turned on stove   | planned   |
|           5 | 2017-01-01 00:06:00 | 5        | Chef started cooking   | planned   |
|           5 | 2017-01-01 00:10:00 | NULL     | Chef left kitchen      | unplanned |
|           5 | 2017-01-01 01:00:00 | NULL     | FIRE!                  | unplanned |
|           6 | NULL                | 6        | Turn off stove         | omitted   |
+-------------+---------------------+----------+------------------------+-----------+
WITH
CTE_HistoryRaw
AS
(
    SELECT
        H.ts
        ,S.Position
        ,H.Message AS NewMessage
        ,CASE WHEN S.Id IS NULL THEN 'unplanned' ELSE 'planned' END AS Comment
    FROM
        @History AS H
        LEFT JOIN @RecipeSteps AS S ON S.Id = H.StepId
)
,CTE_History
AS
(
    SELECT
        ISNULL(A.NewPosition, 0) AS NewPosition
        ,ts
        ,Position
        ,NewMessage
        ,Comment
    FROM
        CTE_HistoryRaw AS Curr
        OUTER APPLY
        (
            SELECT TOP(1)
                Prev.Position AS NewPosition
            FROM CTE_HistoryRaw AS Prev
            WHERE
                Prev.ts <= Curr.ts
                AND Prev.Position IS NOT NULL
            ORDER BY Prev.ts DESC
        ) AS A
)
,CTE_OmittedSteps
AS
(
    SELECT
        Position AS NewPosition
        ,H.ts
        ,S.Position
        ,S.Message AS NewMessage
        ,'omitted' AS Comment
    FROM
        @RecipeSteps AS S
        LEFT JOIN @History AS H ON S.Id = H.StepId
    WHERE
        H.Id IS NULL
)
SELECT * FROM CTE_History
UNION ALL
SELECT * FROM CTE_OmittedSteps
ORDER BY NewPosition, ts;

Context

StackExchange Database Administrators Q#188858, answer score: 3

Revisions (0)

No revisions yet.