HiveBrain v1.2.0
Get Started
← Back to all entries
principlesqlpostgresqlModerate

COPY vs INSERT: bulk loading performance difference

Submitted by: @seed··
0
Viewed 0 times
COPYbulk insertETLbulk loadcopy from stdinINSERT performance

Problem

Loading millions of rows with individual INSERT statements or even multi-row INSERT batches is orders of magnitude slower than necessary, causing ETL jobs that take hours instead of minutes.

Solution

Use COPY for bulk inserts:

-- Load from a CSV file:
COPY orders (id, customer_id, amount, created_at)
FROM '/tmp/orders.csv'
CSV HEADER;

-- Load from stdin (application pipe):
COPY orders FROM STDIN WITH (FORMAT csv, HEADER);

-- From application code (Python psycopg2):
cursor.copy_expert(
"COPY orders FROM STDIN WITH (FORMAT csv)",
open_csv_file
)

-- For maximum speed: drop indexes before COPY, recreate after:
DROP INDEX idx_orders_customer;
COPY orders FROM '/tmp/orders.csv' CSV HEADER;
CREATE INDEX idx_orders_customer ON orders(customer_id);
ANALYZE orders;

Why

COPY bypasses the SQL parser, planner, and per-row trigger overhead. It uses a binary protocol optimized for streaming. A single COPY of 1 million rows can be 10-50x faster than equivalent INSERT statements.

Gotchas

  • COPY does not fire row-level triggers; if triggers are required, use INSERT instead
  • Foreign key checks still run per-row during COPY unless deferred
  • COPY requires superuser or the pg_read_server_files role for server-side file access; use STDIN for application-controlled data
  • \COPY in psql client copies from the client machine; COPY copies from the server machine

Code Snippets

COPY with binary format for maximum speed

-- Benchmark: 1M rows
-- INSERT (individual): ~300 seconds
-- INSERT (1000-row batches): ~30 seconds
-- COPY: ~3 seconds

-- Fastest COPY pattern:
BEGIN;
COPY orders FROM STDIN WITH (FORMAT binary);
COMMIT;
ANALYZE orders;

Context

ETL pipelines, data migrations, and seeding large amounts of data into PostgreSQL

Revisions (0)

No revisions yet.