gotchasqlMinor
Is it a good or bad idea to use "ON UPDATE CASCADE ON DELETE CASCADE" for foreign keys? Why does this mechanism exist at all?
Viewed 0 times
thiswhyupdateideadeleteallbadforeignexistkeys
Problem
I understand what foreign keys are, and have made a point of including them wherever they make sense for all my database tables that I design.
However, something which has always confused me is whether or not I should be explicitly setting the "ON UPDATE" and "ON DELETE" features (in lack of a better term). Example:
This code goes out of its way to explicitly add the technically "unnecessary" part: "ON UPDATE CASCADE ON DELETE CASCADE".
Since this is not done by default, there must be a reason for this! After all, the default behaviour is always (or at least should always be) the most commonly needed behaviour:
In the test1 table, as I understand it, if the "othertable" either changes the "id" column values, or deletes any record(s), that means that the referenced records in the test1 table will either be updated or deleted. This seems, on the surface, like what should be the default behaviour.
In the test2 table, again as I understand it, if the "othertable" either changes the "id" column values, or deletes any record(s), that means that PostgreSQL will refuse to perform the query if there are records in test2 which reference the ones being modified.
I'm basically confused about the entire concept of "ON UPDATE" and "ON DELETE". Why would one ever want a query to be refused like that? And "CASCADE" isn't even the only option (besides none); there are multiple other values you can use which cause various behaviour (which I don't understand).
Since there is a stated relationship between the tables (through the foreign keys), isn't the whole point that you want them to remain consistent? So why you not want it to "CASCADE"
However, something which has always confused me is whether or not I should be explicitly setting the "ON UPDATE" and "ON DELETE" features (in lack of a better term). Example:
CREATE TABLE "test1"
(
id serial,
referenceid integer,
FOREIGN KEY (referenceid) REFERENCES "othertable" (id) ON UPDATE CASCADE ON DELETE CASCADE
)This code goes out of its way to explicitly add the technically "unnecessary" part: "ON UPDATE CASCADE ON DELETE CASCADE".
Since this is not done by default, there must be a reason for this! After all, the default behaviour is always (or at least should always be) the most commonly needed behaviour:
CREATE TABLE "test2"
(
id serial,
referenceid integer,
FOREIGN KEY (referenceid) REFERENCES "othertable" (id)
)In the test1 table, as I understand it, if the "othertable" either changes the "id" column values, or deletes any record(s), that means that the referenced records in the test1 table will either be updated or deleted. This seems, on the surface, like what should be the default behaviour.
In the test2 table, again as I understand it, if the "othertable" either changes the "id" column values, or deletes any record(s), that means that PostgreSQL will refuse to perform the query if there are records in test2 which reference the ones being modified.
I'm basically confused about the entire concept of "ON UPDATE" and "ON DELETE". Why would one ever want a query to be refused like that? And "CASCADE" isn't even the only option (besides none); there are multiple other values you can use which cause various behaviour (which I don't understand).
Since there is a stated relationship between the tables (through the foreign keys), isn't the whole point that you want them to remain consistent? So why you not want it to "CASCADE"
Solution
I'm not sure about
The existence of
Furthermore, it might be considered safer than implementing this in other logic because the database is taking care of transactional consistency, deadlocks, and so forth, so the operation should (bugs permitting) be guaranteed atomic. If you implement your own "find children, delete, then delete parent," which may have to be nested, you have to do some legwork[!] to ensure that if there is an error part way through there is no way that you delete the great-great-great-grand-children of a row but leave the rest standing (leaving a partly deleted entity which could cause difficult to diagnose issues later).
[!] Taking appropriate locks, preferably not by locking whole tables, ensuring transaction isolation settings are right isn't as simple as it might first look.
Why Not Cascade?
As I said above, I consider a need to cascade updates routinely to be a bit of a design smell. You shouldn't need to change a primary key value during normal operations.
I'm very wary of cascaded deletes, despite the danger of bugs in more manually deleting complex structured entities. Too often you see inexperienced people perform
Also, in a lot of cases with real data you don't actually want a cascaded delete. For example, if a manager leaves a company, you don't want to delete their sub-ordinates because assigning a new manager first was forgotten or prevented by a bug.
[*] in postgres via
[^] either because they aren't aware of the available syntax, or are avoiding it in order to be cross-DB compatible
[†]
[‡] mySQL also supports
ON UPDATE CASCADE. If you find yourself needing this sort of cascaded update then that is perhaps a "code smell" in your database design. In theory your primary key should be static so changes that need cascading shouldn't need to happen. Perhaps it was added as a logical step along from ON DELETE CASCADE. It is at least safer than cascading deletes.The existence of
ON DELETE CASCADE makes more sense: while PKs shouldn't really change, things do often get deleted. The cascading is simply a convenience. It saves you from having to write code to drop child entities manually when getting rid of a parent.Furthermore, it might be considered safer than implementing this in other logic because the database is taking care of transactional consistency, deadlocks, and so forth, so the operation should (bugs permitting) be guaranteed atomic. If you implement your own "find children, delete, then delete parent," which may have to be nested, you have to do some legwork[!] to ensure that if there is an error part way through there is no way that you delete the great-great-great-grand-children of a row but leave the rest standing (leaving a partly deleted entity which could cause difficult to diagnose issues later).
[!] Taking appropriate locks, preferably not by locking whole tables, ensuring transaction isolation settings are right isn't as simple as it might first look.
Why Not Cascade?
As I said above, I consider a need to cascade updates routinely to be a bit of a design smell. You shouldn't need to change a primary key value during normal operations.
I'm very wary of cascaded deletes, despite the danger of bugs in more manually deleting complex structured entities. Too often you see inexperienced people perform
UPSERT operations[*] using a DELETE-then-re-INSERT method, even when the DB supports single-statement upsert operations[^], which damages your data if cascaded deletes are enabled: the delete removes the children too, and they don't get put back by the subsequent insert.Also, in a lot of cases with real data you don't actually want a cascaded delete. For example, if a manager leaves a company, you don't want to delete their sub-ordinates because assigning a new manager first was forgotten or prevented by a bug.
[*] in postgres via
INSERT ... ON CONFLICT ... but this is not standard and quite different syntax[†] is used elsewhere[^] either because they aren't aware of the available syntax, or are avoiding it in order to be cross-DB compatible
[†]
MERGE can be used in Microsoft's TSQL for the same effect, mySQL[‡] supports INSERT ... ON DUPLICATE KEY ...[‡] mySQL also supports
REPLACE INTO, but IIRC that is just syntactic sugar for delete+insert so has the same dangersContext
StackExchange Database Administrators Q#254605, answer score: 7
Revisions (0)
No revisions yet.