principlesqlpostgresqlModerate
COPY vs INSERT: bulk loading performance difference
Viewed 0 times
COPYbulk insertETLbulk loadcopy from stdinINSERT performance
Problem
Loading millions of rows with individual INSERT statements or even multi-row INSERT batches is orders of magnitude slower than necessary, causing ETL jobs that take hours instead of minutes.
Solution
Use COPY for bulk inserts:
-- Load from a CSV file:
COPY orders (id, customer_id, amount, created_at)
FROM '/tmp/orders.csv'
CSV HEADER;
-- Load from stdin (application pipe):
COPY orders FROM STDIN WITH (FORMAT csv, HEADER);
-- From application code (Python psycopg2):
cursor.copy_expert(
"COPY orders FROM STDIN WITH (FORMAT csv)",
open_csv_file
)
-- For maximum speed: drop indexes before COPY, recreate after:
DROP INDEX idx_orders_customer;
COPY orders FROM '/tmp/orders.csv' CSV HEADER;
CREATE INDEX idx_orders_customer ON orders(customer_id);
ANALYZE orders;
-- Load from a CSV file:
COPY orders (id, customer_id, amount, created_at)
FROM '/tmp/orders.csv'
CSV HEADER;
-- Load from stdin (application pipe):
COPY orders FROM STDIN WITH (FORMAT csv, HEADER);
-- From application code (Python psycopg2):
cursor.copy_expert(
"COPY orders FROM STDIN WITH (FORMAT csv)",
open_csv_file
)
-- For maximum speed: drop indexes before COPY, recreate after:
DROP INDEX idx_orders_customer;
COPY orders FROM '/tmp/orders.csv' CSV HEADER;
CREATE INDEX idx_orders_customer ON orders(customer_id);
ANALYZE orders;
Why
COPY bypasses the SQL parser, planner, and per-row trigger overhead. It uses a binary protocol optimized for streaming. A single COPY of 1 million rows can be 10-50x faster than equivalent INSERT statements.
Gotchas
- COPY does not fire row-level triggers; if triggers are required, use INSERT instead
- Foreign key checks still run per-row during COPY unless deferred
- COPY requires superuser or the pg_read_server_files role for server-side file access; use STDIN for application-controlled data
- \COPY in psql client copies from the client machine; COPY copies from the server machine
Code Snippets
COPY with binary format for maximum speed
-- Benchmark: 1M rows
-- INSERT (individual): ~300 seconds
-- INSERT (1000-row batches): ~30 seconds
-- COPY: ~3 seconds
-- Fastest COPY pattern:
BEGIN;
COPY orders FROM STDIN WITH (FORMAT binary);
COMMIT;
ANALYZE orders;Context
ETL pipelines, data migrations, and seeding large amounts of data into PostgreSQL
Revisions (0)
No revisions yet.