patternsqlModerate
Postgresql 9 speeding up indexed inserts (JPA)
Viewed 0 times
postgresqlinsertsindexedspeedingjpa
Problem
I have an application which generates a lot of data which needs to be inserted quickly (something around 13million records). I use JPA 2.0/Hibernate with Postgres 9.1, and I managed to achieve quite a good performance (around 25k inserts per second) with multi-threading and batching of inserts every few thousand inserts or so, completing a whole run in around 8mins.
However, I noticed that I had a few of the foreign keys which had an index missing, which I would really wish to have both from an analysis point of view to drill down in the data, and also to delete data to a specific run. Unfortunately when I added in these 3 indexes to the table that is getting most inserts, performance dropped down drastically to around 3k per second.
Is there any way to avoid this performance slow down? I know that one option is to drop the indexes before a run and recreate them in the end. Another more clumsy option is to generate the data of the biggest table in a file instead and use
Both alternatives seem to be hacks. Is there any other solution, maybe a bit less intrusive on the application? Some setting to tell postgres to defer indexing or something of that sort?
Any ideas welcome.
However, I noticed that I had a few of the foreign keys which had an index missing, which I would really wish to have both from an analysis point of view to drill down in the data, and also to delete data to a specific run. Unfortunately when I added in these 3 indexes to the table that is getting most inserts, performance dropped down drastically to around 3k per second.
Is there any way to avoid this performance slow down? I know that one option is to drop the indexes before a run and recreate them in the end. Another more clumsy option is to generate the data of the biggest table in a file instead and use
COPY. I guess I can only do it on the largest table in the relation, due to the foreign key values which I would need to know (generated through sequences).Both alternatives seem to be hacks. Is there any other solution, maybe a bit less intrusive on the application? Some setting to tell postgres to defer indexing or something of that sort?
Any ideas welcome.
Solution
In addition to Craig's advice I would like to advise you to examine the storage parameters of the affected tables.
I am currently in a similar situation to yours. The largest table in my system contains ~200 million records and the performance was really bad.
Tune the storage parameters of your tables and indexes
Besides adding several indexes to the database, I changed the storage parameters of some tables and specified a custom value for the fillfactor of the table itself and the indexes.
Setting a custom value for the fillfactor allows you to instruct PostgreSQL how much space in each page should be reserved for further updates. The same applies to indexes.
See the documentation on CREATE TABLE and the description of the available storage parameters for details.
Monitor your infrastructure
Monitor and analyze your infrastructure. The PostgreSQL wiki lists a lot of usefull tools.
Find long running statements
Enable statement logging by altering the following values in your
See the description of the runtime logging configuration for details
Install pgFounine to analyse your PostgreSQL log file easily.
Be picky
Besides altering the storage parameters I also gained a lot of performance by optimizing all frequently executed statements. In parts I won only 100 or 50 milliseconds for each execution but in total I gained more then 5 seconds for complex operations.
I am currently in a similar situation to yours. The largest table in my system contains ~200 million records and the performance was really bad.
Tune the storage parameters of your tables and indexes
Besides adding several indexes to the database, I changed the storage parameters of some tables and specified a custom value for the fillfactor of the table itself and the indexes.
Setting a custom value for the fillfactor allows you to instruct PostgreSQL how much space in each page should be reserved for further updates. The same applies to indexes.
See the documentation on CREATE TABLE and the description of the available storage parameters for details.
Monitor your infrastructure
Monitor and analyze your infrastructure. The PostgreSQL wiki lists a lot of usefull tools.
Find long running statements
Enable statement logging by altering the following values in your
postgresql.conf file:log_min_duration_statement=xto log all statments which run longer the x milliseconds
log_min_messages=levelto a level what helps you the understand the statements generated by JPA
See the description of the runtime logging configuration for details
Install pgFounine to analyse your PostgreSQL log file easily.
Be picky
Besides altering the storage parameters I also gained a lot of performance by optimizing all frequently executed statements. In parts I won only 100 or 50 milliseconds for each execution but in total I gained more then 5 seconds for complex operations.
Context
StackExchange Database Administrators Q#28751, answer score: 16
Revisions (0)
No revisions yet.