HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

What is the "last" data that get's backed up, when running pg_dump on a live server?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
lastthepg_dumpwhatlivegetthatrunningbackedwhen

Problem

I'm wondering: If I start doing a pg_dump of a very large database (it would take hours), that is still running actively receiving writes, what is then the last data that goes into the backup? Is it:

  • The data as-it-was, at the point in time where the pg_dump command was initiated.



  • The last changes that it encountered at some point where it was dumping that individual record.



  • Something else.



Bonus question: If I'm trying to dump a database as part of trying to rescue it from corrupt data, will it then make any difference whether I use the directory format or the custom format?

Solution

It's the data at the start of the command for an entire database. According to the manpage:


It makes consistent backups even if the database is being used
concurrently. pg_dump does not block other users accessing the
database (readers or writers).

and in SQL Dump:


Dumps created by pg_dump are internally consistent, meaning, the dump
represents a snapshot of the database at the time pg_dump began
running

Dumping in parallel (--jobs) may be problematic with changing data, but only when targeting less recent versions:


For a consistent backup, the database server needs to support
synchronized snapshots, a feature that was introduced in PostgreSQL
9.2

I don't think the output format makes any difference in the rescue operation. Note that for a parallel dump, directory is the only possible format.

Context

StackExchange Database Administrators Q#62857, answer score: 2

Revisions (0)

No revisions yet.