patternModerate
Optimize PostgreSQL for a lot of INSERTS and bytea updates
Viewed 0 times
postgresqlinsertsbytealotforoptimizeandupdates
Problem
What we have (software):
Hardware:
So, we have to load into DB aprox. 100.000.000 rows with bytea column, and more simple 500.000.000 rows (without LOBs). There are 2
There are also sequences for id generation for each table.
By now these operations are doing with 8 connections in parallel with 50 JDBC batch size. The picture below demonstrates system load: it is zero-load on
We are asking for help in tuning
1) for ultra fast loading this amount of data, it is once-only operation, so it could be temporary configuration
2) for production mode for doing moderate number of SELECTs into these 2 tables by their indexes without join and without sorting.
- PostrgeSQL 9.3 with base configuration (no changes in
postgresql.conf)
- Windows 7 64 bit
Hardware:
- Intel Core i7-3770 3.9 Ghz
- 32 Gb RAM
- WDC WD10EZRX-00L4HBAta Drive (1000Gb, SATA III)
So, we have to load into DB aprox. 100.000.000 rows with bytea column, and more simple 500.000.000 rows (without LOBs). There are 2
varchar indexes on 1st table (with 13, 19 length) and 2 varchar indexes on 2nd table (18, 10 lengths).There are also sequences for id generation for each table.
By now these operations are doing with 8 connections in parallel with 50 JDBC batch size. The picture below demonstrates system load: it is zero-load on
postgresql processes. After 24 hours of loading we have loaded only 10.000.000 rows which is very slow result.We are asking for help in tuning
PostrgreSQL configuration in purposes of:1) for ultra fast loading this amount of data, it is once-only operation, so it could be temporary configuration
2) for production mode for doing moderate number of SELECTs into these 2 tables by their indexes without join and without sorting.
Solution
For
You're wasting your time with JDBC batching for
Use
For your "production mode" I suggest loading a sample of data, setting up the queries you expect to run, and using
For details, see the PostgreSQL user manual. I suggest popping back here when you have a more concrete problem with
insert performance, see speeding up insert performance in PostgreSQL and bulk insert in PostgreSQL.You're wasting your time with JDBC batching for
insert. PgJDBC doesn't do anything useful with insert batches, it just runs each statement. <-- This is no longer true in newer PgJDBC versions, which can now batch prepared statements to reduce round-trip times considerably. But it's still better to:Use
COPY instead; see PgJDBC batch copy and the CopyManager. As for number of concurrent loaders: Aim for a couple per disk, if the operations are disk I/O bound. Eight is probably the most you'll want.For your "production mode" I suggest loading a sample of data, setting up the queries you expect to run, and using
explain analyze to investigate performance. For testing purposes only, use the enable_ params to explore different plan selections. Set the query planner cost parameters (random_page_cost, seq_page_cost, effective_cache_size, etc) appropriately for your system, and make sure shared_buffers is set appropriately. Continue to monitor as you add a simulated production workload, using the auto_explain module, log_min_duration_statement setting, the pg_stat_statements extension, etc.For details, see the PostgreSQL user manual. I suggest popping back here when you have a more concrete problem with
explain analyze query execution details, etc.Context
StackExchange Database Administrators Q#58155, answer score: 14
Revisions (0)
No revisions yet.