patternsqlMinor
Managing CTE execution order under uncertainty
Viewed 0 times
ordermanagingcteunderuncertaintyexecution
Problem
I am writing a large, multi-step CTE for performance reasons.
In one query, data must be moved from one table to another, but the quantity of rows moved is uncertain and could be zero.
In a subsequent table, the origin from the previous query is deleted but must be after the previous query is completed.
Finally, rows must be written in place of the deleted rows after the second query above is completed.
In the first two queries, I am using RETURNING to enforce execution order.
In the second query, I'm determining that the first query is completed by this subquery
In the the third query, I'm determining that the second query is completed by this subquery
Is the subquery to determine that the first query has completed correct?
Is the subquery to determine that the second query, which must return rows, has completed optimal for accuracy, precision, and performance?
Using the above subqueries to enforce execution order is giving duplicate key value violations.
Query subsection
It is the final query that is violating the unique constraint.
In one query, data must be moved from one table to another, but the quantity of rows moved is uncertain and could be zero.
In a subsequent table, the origin from the previous query is deleted but must be after the previous query is completed.
Finally, rows must be written in place of the deleted rows after the second query above is completed.
In the first two queries, I am using RETURNING to enforce execution order.
In the second query, I'm determining that the first query is completed by this subquery
(SELECT COUNT(*) FROM first_query) >= 0In the the third query, I'm determining that the second query is completed by this subquery
SELECT EXISTS (SELECT 1 FROM second_query)Is the subquery to determine that the first query has completed correct?
Is the subquery to determine that the second query, which must return rows, has completed optimal for accuracy, precision, and performance?
Using the above subqueries to enforce execution order is giving duplicate key value violations.
Query subsection
WITH copy_to_other_table AS (
INSERT INTO other_table (column_a, column_b)
SELECT column_a, column_b
FROM main_table
WHERE column_a = $1::bigint
RETURNING *
),
main_table_deleted AS (
DELETE FROM main_table WHERE column_a = $1::bigint
AND (SELECT COUNT(*) FROM copy_to_other_table) >= 0
RETURNING *
)
INSERT INTO main_table (column_a, column_b)
SELECT column_a, column_b
FROM another_table WHERE column_a = $1::bigint
AND EXISTS (SELECT 1 FROM main_table_deleted)It is the final query that is violating the unique constraint.
Solution
This should work but I'm not really sure if it's the best regarding efficiency:
But what is wrong with the original query?
-
First, the
-
Second, there is no need at all to have any condition at all there, because all the rows from
-
The third part is more tricky. It might seem at first glance, that no checking is needed either for the "interaction" between the 2nd (delete) and the 3rd (insert) part. Both are to the same table but if the 2nd cte is performed before the main query then all should be well.
Alas, the order of execution is not consecutive. From Postgres docs:
Data-modifying statements in WITH are executed exactly once, and always to completion, independently of whether the primary query reads all (or indeed any) of their output. Notice that this is different from the rule for SELECT in WITH: as stated in the previous section, execution of a SELECT is carried only as far as the primary query demands its output.
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13), so they cannot "see" each others' effects on the target tables. This alleviates the effects of the unpredictability of the actual order of row updates, and means that
-
As a test, you could altering the order of the 3 sub-statements. The result will be the same:
-
The related issue is when the unique constraints are checked. I'm not 100% sure about the detail of those checks in combination with CTEs but unique constraints should be checked at the end of statements. It appears that they are also checked concurrently for each modifying cte.
(Note: This behaviour seems like a bug to be honest - unless I missed something in the documentation.)
-
Regarding your last question, setting isolation level to
Another way - that is more close to your original query - would be to use the
```
WITH copy_to_other_table AS (
INSERT INTO other_table (column_a, column_b)
SELECT column_a, column_b
FROM main_table
WHERE column_a = 1
),
main_table_deleted AS (
DELETE FROM main_table WHERE column_a = 1
RETURNING *
)
INSERT INTO main_table (column_a, column_b)
SELECT column_a, column_b
FROM another_table WHERE column_a = 1
EXCEPT
WITH copy_to_other_table AS (
INSERT INTO other_table (column_a, column_b)
SELECT column_a, column_b
FROM main_table
WHERE column_a = 1
),
main_table_deleted AS (
DELETE
FROM main_table
WHERE column_a = 1
AND NOT EXISTS (SELECT 1 FROM another_table
WHERE column_a = 1
AND column_b = main_table.column_b)
)
INSERT INTO main_table (column_a, column_b)
SELECT column_a, column_b
FROM another_table WHERE column_a = 1
EXCEPT
SELECT column_a, column_b
FROM main_table WHERE column_a = 1 ;But what is wrong with the original query?
-
First, the
(SELECT COUNT(*) ...) >= 0 is completely redundant. A count aggregate will always return a values of 0 or more so that condition is always true.-
Second, there is no need at all to have any condition at all there, because all the rows from
main that you want copied to the other table, you also want them tobe deleted from main. There is no reason to "check" if they have copied before you delete them . All 3 subqueries, (the 2 CTEs and the main query) will "see" the same tables with the same exact number of rows and data.-
The third part is more tricky. It might seem at first glance, that no checking is needed either for the "interaction" between the 2nd (delete) and the 3rd (insert) part. Both are to the same table but if the 2nd cte is performed before the main query then all should be well.
Alas, the order of execution is not consecutive. From Postgres docs:
Data-modifying statements in WITH are executed exactly once, and always to completion, independently of whether the primary query reads all (or indeed any) of their output. Notice that this is different from the rule for SELECT in WITH: as stated in the previous section, execution of a SELECT is carried only as far as the primary query demands its output.
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13), so they cannot "see" each others' effects on the target tables. This alleviates the effects of the unpredictability of the actual order of row updates, and means that
RETURNING data is the only way to communicate changes between different WITH sub-statements and the main query. -
As a test, you could altering the order of the 3 sub-statements. The result will be the same:
WITH main_table_deleted AS (
DELETE
FROM main_table
WHERE column_a = 1
AND NOT EXISTS (SELECT 1 FROM another_table
WHERE column_a = 1
AND column_b = main_table.column_b)
),
copy_to_other_table AS (
INSERT INTO other_table (column_a, column_b)
SELECT column_a, column_b
FROM main_table
WHERE column_a = 1
)
INSERT INTO main_table (column_a, column_b)
SELECT column_a, column_b
FROM another_table WHERE column_a = 1
EXCEPT
SELECT column_a, column_b
FROM main_table WHERE column_a = 1 ;-
The related issue is when the unique constraints are checked. I'm not 100% sure about the detail of those checks in combination with CTEs but unique constraints should be checked at the end of statements. It appears that they are also checked concurrently for each modifying cte.
(Note: This behaviour seems like a bug to be honest - unless I missed something in the documentation.)
-
Regarding your last question, setting isolation level to
SERIALIZABLE would not have solved the issue as the whole operation is one statement, with 3 sub-statements. You could however, split the actions into 2 or 3 statements and then they would be executed one after the other. So, the 2nd would see the results of the 1st and the 3rd, the results of the first two. (If you do that, put the 2 or 3 statements inside a transaction, to isolate the operation from other executing statements.)Another way - that is more close to your original query - would be to use the
RETURNING clause to force the execution of the sub-statements in a specific order, i.e. the 3rd after the 2nd (The 1st can stay without RETURNING and executed concurently). Test in SQLFIddle-3: ```
WITH copy_to_other_table AS (
INSERT INTO other_table (column_a, column_b)
SELECT column_a, column_b
FROM main_table
WHERE column_a = 1
),
main_table_deleted AS (
DELETE FROM main_table WHERE column_a = 1
RETURNING *
)
INSERT INTO main_table (column_a, column_b)
SELECT column_a, column_b
FROM another_table WHERE column_a = 1
EXCEPT
Code Snippets
WITH copy_to_other_table AS (
INSERT INTO other_table (column_a, column_b)
SELECT column_a, column_b
FROM main_table
WHERE column_a = 1
),
main_table_deleted AS (
DELETE
FROM main_table
WHERE column_a = 1
AND NOT EXISTS (SELECT 1 FROM another_table
WHERE column_a = 1
AND column_b = main_table.column_b)
)
INSERT INTO main_table (column_a, column_b)
SELECT column_a, column_b
FROM another_table WHERE column_a = 1
EXCEPT
SELECT column_a, column_b
FROM main_table WHERE column_a = 1 ;WITH main_table_deleted AS (
DELETE
FROM main_table
WHERE column_a = 1
AND NOT EXISTS (SELECT 1 FROM another_table
WHERE column_a = 1
AND column_b = main_table.column_b)
),
copy_to_other_table AS (
INSERT INTO other_table (column_a, column_b)
SELECT column_a, column_b
FROM main_table
WHERE column_a = 1
)
INSERT INTO main_table (column_a, column_b)
SELECT column_a, column_b
FROM another_table WHERE column_a = 1
EXCEPT
SELECT column_a, column_b
FROM main_table WHERE column_a = 1 ;WITH copy_to_other_table AS (
INSERT INTO other_table (column_a, column_b)
SELECT column_a, column_b
FROM main_table
WHERE column_a = 1
),
main_table_deleted AS (
DELETE FROM main_table WHERE column_a = 1
RETURNING *
)
INSERT INTO main_table (column_a, column_b)
SELECT column_a, column_b
FROM another_table WHERE column_a = 1
EXCEPT
(TABLE main_table_deleted EXCEPT TABLE another_table) ;WITH main_table_deleted AS (
DELETE FROM main_table WHERE column_a = 1
RETURNING *
),
copy_to_other_table AS (
INSERT INTO other_table (column_a, column_b)
TABLE main_table_deleted
)
INSERT INTO main_table (column_a, column_b)
SELECT column_a, column_b
FROM another_table WHERE column_a = 1
EXCEPT
(TABLE main_table_deleted EXCEPT TABLE another_table) ;Context
StackExchange Database Administrators Q#86263, answer score: 7
Revisions (0)
No revisions yet.