patternsqlModerate
Modelling constraints on subset aggregates?
Viewed 0 times
subsetconstraintsmodellingaggregates
Problem
I am using PostgreSQL but I figure most of the top-end db's must have some similar capabilities, and moreover, that solutions for them may inspire solutions for me, so don't consider this PostgreSQL-specific.
I know I am not the first to try to solve this problem so I figure it is worth asking here but I am trying to evaluate the costs of modelling accounting data such that every transaction is fundamentally balanced. The accounting data is append-only. The overall constraint (written in pseudo-code) here might look roughly like:
Obviously such a check constraint will never work. It operates per row and might check over the entire db. So it will always fail and be slow doing it.
So my question is what is the best way to model this constraint? I have basically looked at two ideas so far. Wondering if these are the only ones, or if someone has a better way (other than leave it to the app level or a stored proc).
I know I am not the first to try to solve this problem so I figure it is worth asking here but I am trying to evaluate the costs of modelling accounting data such that every transaction is fundamentally balanced. The accounting data is append-only. The overall constraint (written in pseudo-code) here might look roughly like:
CREATE TABLE journal_entry (
id bigserial not null unique, --artificial candidate key
journal_type_id int references journal_type(id),
reference text, -- source document identifier, unique per journal
date_posted date not null,
PRIMARY KEY (journal_type_id, reference)
);
CREATE TABLE journal_line (
entry_id bigint references journal_entry(id),
account_id int not null references account(id),
amount numeric not null,
line_id bigserial not null unique,
CHECK ((sum(amount) over (partition by entry_id) = 0) -- this won't work
);Obviously such a check constraint will never work. It operates per row and might check over the entire db. So it will always fail and be slow doing it.
So my question is what is the best way to model this constraint? I have basically looked at two ideas so far. Wondering if these are the only ones, or if someone has a better way (other than leave it to the app level or a stored proc).
- I could borrow a page from the accounting world's concept of the difference between a book of original entry and a book of final entry (general journal vs general ledger). In this regard I could model this as an array of journal lines attached to the journal entry, enforce the constraint on the array (in PostgreSQL terms, select sum(amount) = 0 from unnest(je.line_items). A trigger could expand and save these to a line items table, where individual column constraints could more easi
Solution
As we have to span multiple rows it cannot be implemented with a simple
We can also rule out exclusion constraints. Those would span multiple rows, but only check for inequality. Complex operations like a sum over multiple rows are not possible.
The tool that seems to best fit your case is a
So that's your option 2.
Once we can rely on the constraint being enforced at all times, we need not check the whole table any more. Checking only rows inserted in the current transaction - at the end of the transaction - is sufficient. Performance should be ok.
Also, as
The accounting data is append-only.
... we only need to care about newly inserted rows. (Assuming
I use the system column
To compare the types, casting is needed ...
This should be reasonably safe. Consider this related, later answer with a safer method:
Demo
Deferred, so it is only checked at the end of the transaction.
Tests
Works.
Fails:
ERROR: Entries not balanced!
Works. :)
If you need to enforce your constraint before the end of the transaction, you can do so at any point in the transaction, even at the start:
Faster with plain trigger
If you operate with multi-row
Constraint triggers can only be specified
Use a plain trigger instead and fire
DELETE possible
In reply to your comment: If
CHECK constraint.We can also rule out exclusion constraints. Those would span multiple rows, but only check for inequality. Complex operations like a sum over multiple rows are not possible.
The tool that seems to best fit your case is a
CONSTRAINT TRIGGER (Or even just a plain TRIGGER - the only difference in the current implementation is that you can adjust the timing of the trigger with SET CONSTRAINTS.So that's your option 2.
Once we can rely on the constraint being enforced at all times, we need not check the whole table any more. Checking only rows inserted in the current transaction - at the end of the transaction - is sufficient. Performance should be ok.
Also, as
The accounting data is append-only.
... we only need to care about newly inserted rows. (Assuming
UPDATE or DELETE are not possible.)I use the system column
xid and compare it to the function txid_current() - which returns the xid of the current transaction.To compare the types, casting is needed ...
This should be reasonably safe. Consider this related, later answer with a safer method:
- How to view tuples changed in a PostgreSQL transaction?
Demo
CREATE TABLE journal_line(amount int); -- simplistic table for demo
CREATE OR REPLACE FUNCTION trg_insaft_check_balance()
RETURNS trigger AS
$func$
BEGIN
IF sum(amount) <> 0
FROM journal_line
WHERE xmin::text::bigint = txid_current() -- consider link above
THEN
RAISE EXCEPTION 'Entries not balanced!';
END IF;
RETURN NULL; -- RETURN value of AFTER trigger is ignored anyway
END;
$func$ LANGUAGE plpgsql;
CREATE CONSTRAINT TRIGGER insaft_check_balance
AFTER INSERT ON journal_line
DEFERRABLE INITIALLY DEFERRED
FOR EACH ROW
EXECUTE PROCEDURE trg_insaft_check_balance();Deferred, so it is only checked at the end of the transaction.
Tests
INSERT INTO journal_line(amount) VALUES (1), (-1);Works.
INSERT INTO journal_line(amount) VALUES (1);Fails:
ERROR: Entries not balanced!
BEGIN;
INSERT INTO journal_line(amount) VALUES (7), (-5);
-- do other stuff
SELECT * FROM journal_line;
INSERT INTO journal_line(amount) VALUES (-2);
-- INSERT INTO journal_line(amount) VALUES (-1); -- make it fail
COMMIT;Works. :)
If you need to enforce your constraint before the end of the transaction, you can do so at any point in the transaction, even at the start:
SET CONSTRAINTS insaft_check_balance IMMEDIATE;Faster with plain trigger
If you operate with multi-row
INSERT it is more effective to trigger per statement - which is not possible with constraint triggers:Constraint triggers can only be specified
FOR EACH ROW.Use a plain trigger instead and fire
FOR EACH STATEMENT to ...- lose the option of
SET CONSTRAINTS.
- gain performance.
DELETE possible
In reply to your comment: If
DELETE is possible you might add similar trigger doing a whole-table balance check after a DELETE has happened. This would be much more expensive, but won't matter much as it rarely happens.Code Snippets
CREATE TABLE journal_line(amount int); -- simplistic table for demo
CREATE OR REPLACE FUNCTION trg_insaft_check_balance()
RETURNS trigger AS
$func$
BEGIN
IF sum(amount) <> 0
FROM journal_line
WHERE xmin::text::bigint = txid_current() -- consider link above
THEN
RAISE EXCEPTION 'Entries not balanced!';
END IF;
RETURN NULL; -- RETURN value of AFTER trigger is ignored anyway
END;
$func$ LANGUAGE plpgsql;
CREATE CONSTRAINT TRIGGER insaft_check_balance
AFTER INSERT ON journal_line
DEFERRABLE INITIALLY DEFERRED
FOR EACH ROW
EXECUTE PROCEDURE trg_insaft_check_balance();INSERT INTO journal_line(amount) VALUES (1), (-1);INSERT INTO journal_line(amount) VALUES (1);BEGIN;
INSERT INTO journal_line(amount) VALUES (7), (-5);
-- do other stuff
SELECT * FROM journal_line;
INSERT INTO journal_line(amount) VALUES (-2);
-- INSERT INTO journal_line(amount) VALUES (-1); -- make it fail
COMMIT;SET CONSTRAINTS insaft_check_balance IMMEDIATE;Context
StackExchange Database Administrators Q#23475, answer score: 13
Revisions (0)
No revisions yet.