patternsqlMinor
Postgres Transaction OOM on 100k DDL statements
Viewed 0 times
oompostgresstatementsddl100ktransaction
Problem
We execute approximately 100k DDL statements in a single transaction in PostgreSQL. During execution, the respective Postgres connection gradually increases on its memory usage and once it can't acquire more memory (increasing from 10MB to 2.2GB usage on 3GB ram), OOM killer hits it with 9 which results in Postgres being gone to recovery mode.
Including the create schema statement, approximately 94304 DDL statements were meant to be executed.
As per Transactional DDL in PostgreSQL
Like several of its commercial competitors, one of the more advanced features of PostgreSQL is its ability to perform transactional DDL via its Write-Ahead Log design. This design supports backing out even large changes to DDL, such as table creation. You can't recover from an add/drop on a database or tablespace, but all other catalog operations are reversible.
We have even imported approximately 35GB of data into PostgreSQL in a single transaction without any problem, but why does the Postgres connection requires huge memory when executing thousands of DDL statements in single transaction?
We can temporarily resolve it by increasing the RAM or allocating swap, but we can say that the number of schema creation in a single transaction can increase up to 50 - 60 (Approx 1M DDL statements) which would require 100+ Gigs of RAM or swap which isn't feasible right now.
PostgreSQL version: 9.6.10
Is there any reason why executing lots of DDL statements requir
BEGIN;
CREATE SCHEMA schema_1;
-- create table stmts - 714
-- alter table add pkey stmts - 714
-- alter table add constraint fkey stmts - 34
-- alter table add unique constraint stmts - 2
-- alter table alter column set default stmts - 9161
-- alter table alter column set not null stmts - 2405
-- alter table add check constraint stmts - 4
-- create unique index stmts - 224
-- create index stmts - 213
CREATE SCHEMA schema_2;
-- same ddl statements as schema_1 upto schema_7
-- ...
-- ...
-- ...
CREATE SCHEMA schema_7;
COMMITIncluding the create schema statement, approximately 94304 DDL statements were meant to be executed.
As per Transactional DDL in PostgreSQL
Like several of its commercial competitors, one of the more advanced features of PostgreSQL is its ability to perform transactional DDL via its Write-Ahead Log design. This design supports backing out even large changes to DDL, such as table creation. You can't recover from an add/drop on a database or tablespace, but all other catalog operations are reversible.
We have even imported approximately 35GB of data into PostgreSQL in a single transaction without any problem, but why does the Postgres connection requires huge memory when executing thousands of DDL statements in single transaction?
We can temporarily resolve it by increasing the RAM or allocating swap, but we can say that the number of schema creation in a single transaction can increase up to 50 - 60 (Approx 1M DDL statements) which would require 100+ Gigs of RAM or swap which isn't feasible right now.
PostgreSQL version: 9.6.10
Is there any reason why executing lots of DDL statements requir
Solution
A better idea entirely is to use SQL Server FDW which actually has the logic to pull in Microsoft SQL Server into PostgreSQL format (for example,
Then every three minutes,
What do you gain?
Bit gets mapped to Bool). From this pointThen every three minutes,
- you import the foreign schema into
last_fetch_schema
- if the
last_fetch_schemais different fromlocal_schema
- you resync schemas
- you copy all of the data over with a
INSERT INTO ... SELECT ON CONFLICT DO UPDATE, and you can select only the newest data.
- you drop the foreign schema
last_fetch_schema
What do you gain?
- On first load, you can simply use
CREATE TABLE local.foo ( LIKE foreign.foo)
- You can easily compare meta-data differences
- CSVs loses types and leave you to infer things, FDW can read meta-data catalog.
- Grabbing only the newest stuff is very simply if the rows are versioned/ you don't have to send the entire database anymore.
Context
StackExchange Database Administrators Q#220053, answer score: 3
Revisions (0)
No revisions yet.