HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to reduce the restore time?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thetimereducehowrestore

Problem

I'm using Postgres 9.2. I would like to know how to reduce time to restore process for a big dump (400GB)

We have a database that takes weeks to restore due to the single-threaded nature of index generation on PostgreSQL. How can we restore it quicker, maybe by disabling index generation at the load time and adding the indexes back later, or some better trick?

Sample pg_dump command:
pg_dump --compress=0 -bo -F c --lock-wait-timeout=1500 -h $HOST -p $PORT $DBNAME | lbzip2 > $DB-$TIMESTAMP.bz2


Sample pg_restore command:
pg_restore -Ov -j 2 -h $HOST -p $PORT --dbname=$DBNAME $RESTOREFILE


The -j option does not help us since it helps the part that doesn't take long (backup restore) but not the part that does take a long time (index generation). Is there a way to streamline the process so index generation is done separately on an already working database or to speed up index generation?

I'd like a clear procedure for removing index generation from the restoration process and doing it afterwards, to not block usage of the DB.

Solution

pg_restore has an option to run the time-consuming parts of the restore, such as the index rebuild process, with multiple "jobs".

From the pg_restore documentation for PostgreSQL 9.2:

-j number-of-jobs
--jobs=number-of-jobs

Run the most time-consuming parts of pg_restore — those which load data, create indexes, or create constraints — using multiple concurrent jobs. This option can dramatically reduce the time to restore a large database to a server running on a multiprocessor machine.

Each job is one process or one thread, depending on the operating system, and uses a separate connection to the server.

The optimal value for this option depends on the hardware setup of the server, of the client, and of the network. Factors include the number of CPU cores and the disk setup. A good place to start is the number of CPU cores on the server, but values larger than that can also lead to faster restore times in many cases. Of course, values that are too high will lead to decreased performance because of thrashing.

Only the custom archive format is supported with this option. The input file must be a regular file (not, for example, a pipe). This option is ignored when emitting a script rather than connecting directly to a database server. Also, multiple jobs cannot be used together with the option --single-transaction.

Context

StackExchange Database Administrators Q#184621, answer score: 2

Revisions (0)

No revisions yet.