patternsqlMinor
Storage snapshots for consistent backup of postgresql - different data and log volumes
Viewed 0 times
postgresqlvolumesconsistentlogsnapshotsstoragedifferentforanddata
Problem
We are running many Linux VM's in a vmware/shared storage environment, each running its own instance of postgreSQL (a mix of 9.0 and 9.3). Currently, the entire VM sits on a single root partition/volume, and we've had great success (~8 years) using storage-based snapshots of the underlying VMFS volumes for backup/restore process (and replication to our DR site).
Due to the architecture of our storage, it would be advantageous to separate postgres WAL files to a non-cached, mostly-write volume to give us less cache churn on the storage side. With our storage (Nimble Storage), we can assign both volumes to a single protection/snapshot group, but I haven't been able to elicit from our vendor that the snapshots will happen at EXACTLY the same time across all volumes in the protection group - it likely will, but there's always that chance that its milliseconds apart.
To that end, we ran some experiments, all while writing data to the DB as fast as possible using pg_bench. After the experiments, we restored our snapshot'ed volumes and started the VM+postgres
So testing seem to tell us as long as both snapshots are consistent at the volume level, and relatively close together, you get a consistent copy of the DB, based on the time of the WAL/Log volume snapshot.
My question: Is this safe? What are the corner cases we are missing in our testing, and what could go wrong?
Postgres' doc indicates this is not safe, but testing seems to indicate its pretty robust:
http://www.postgresql.org/docs/9.1/static/backup-file.html
If your database is spread across multiple file systems, there might not be
Due to the architecture of our storage, it would be advantageous to separate postgres WAL files to a non-cached, mostly-write volume to give us less cache churn on the storage side. With our storage (Nimble Storage), we can assign both volumes to a single protection/snapshot group, but I haven't been able to elicit from our vendor that the snapshots will happen at EXACTLY the same time across all volumes in the protection group - it likely will, but there's always that chance that its milliseconds apart.
To that end, we ran some experiments, all while writing data to the DB as fast as possible using pg_bench. After the experiments, we restored our snapshot'ed volumes and started the VM+postgres
- Snapshot both data and log volumes close to simultaneously - result: DB recovered
- Snapshot data volume first, log volume ~1 minute later - result: DB recovered
- Snapshot log volume first, data volume ~1 minute later - result: DB recovered
- Snapshot log volume first, data volume ~3 minutes later, after a WAL checkpoint wrote new data to datafiles: result: DB recovered
So testing seem to tell us as long as both snapshots are consistent at the volume level, and relatively close together, you get a consistent copy of the DB, based on the time of the WAL/Log volume snapshot.
My question: Is this safe? What are the corner cases we are missing in our testing, and what could go wrong?
Postgres' doc indicates this is not safe, but testing seems to indicate its pretty robust:
http://www.postgresql.org/docs/9.1/static/backup-file.html
If your database is spread across multiple file systems, there might not be
Solution
The documentation you cited says it all, but I wouldn't blame you if you want to try to verify the claims of the vendor regarding snapshots taken at the same time. Perhaps a way of uncovering something could be to stress test the WAL system more specifically.
e.g. In addition to your pgbench-based tests, try adding random calls to
Unless there's something I'm missing, for your snapshots not taken at the same time, I would attribute your recovered DBs perhaps to a bit of lucky timing. In the last case, imagine you took your log snapshot while the current xlog location was, say,
e.g. In addition to your pgbench-based tests, try adding random calls to
pg_switch_xlog() to force log rotation, shorter and longer checkpoint intervals (shortening and lengthening checkpoint_timeout and checkpoint_timeout) and even using small or large wal file sizes. Unless there's something I'm missing, for your snapshots not taken at the same time, I would attribute your recovered DBs perhaps to a bit of lucky timing. In the last case, imagine you took your log snapshot while the current xlog location was, say,
0/A1C0FFEE. Then you have 3 minutes of particularly heavy load on the system, that causes a full cycle through the WAL files, and your DB is now at 0/DEADBEEF when the data snapshot is taken. When you attempt to restore, the WAL files being written to at the time of the data snapshot are long gone, and recovery will fail.Context
StackExchange Database Administrators Q#94319, answer score: 2
Revisions (0)
No revisions yet.