HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

pg_dump does not finish

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
pg_dumpfinishdoesnot

Problem

I am trying to backup our postgres database (8.4.17), which is about 25GB in size.

pg_dump database_name > database_db_dump_2014-05-05.sql


unfortunately, the backup stops after about 600MB in, and does not continue. It always stops when trying to dump a specific table (fb_crawler_event). I was able to successfully dump all the other tables using the --table swtich. I have already stopped any other process that could interact with the database.

server log after a restart of the DB (looks ok to me):

2014-05-05 14:34:46 CEST LOG:  all server processes terminated; reinitializing
2014-05-05 14:34:46 CEST LOG:  database system was interrupted; last known up at 2014-05-05 14:32:50 CEST
2014-05-05 14:34:46 CEST LOG:  database system was not properly shut down; automatic recovery in progress
2014-05-05 14:34:46 CEST LOG:  record with zero length at 1A1/AD6A78C0
2014-05-05 14:34:46 CEST LOG:  redo is not required
2014-05-05 14:34:47 CEST LOG:  database system is ready to accept connections
2014-05-05 14:34:47 CEST LOG:  autovacuum launcher started


verbose output from pg_dump (nothing suspicious, but the obvious table, which doesn't let the database continue to dump)

... lots of lines ...
pg_dump: restoring data for table "django_site"
pg_dump: dumping contents of table django_site
pg_dump: restoring data for table "fb_crawler_event"
pg_dump: dumping contents of table fb_crawler_event


And then it just halts.

output from the pg_locks table, which seems to be huge to me (294 entries all together):

```
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted
------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+-----------------+---------
relation | 16384 | 2674 | | | | | | | | 1/23

Solution

I was able to figure out which file in the database storage was the culprit, by copying all the files to /dev/null.

cp -vR /usr/lib/postgresql/8.4 /dev/null


(The path to your DB files might differ)

The currupt file could not be copied, but there was nothing I could do to change that. (so it was most probably a FS error or hardware failure)

So I restarted the server with a forced fsck (e.g. touch /forcefsck), to make sure the FS would do the best to fix itself. This might not be the way you'll want to go, since it is possible to have a total data loss afterwards, but I was able to preserve the most precious data already beforehand, so I took the risk.

After reboot I could finally access the inaccessable table again, but I am not sure, if the data contained is corrupted or not. Anyway, I do have a backup now, which I can disect to find out, and my server can go back online for now...

I recommend reading the wiki of postgres about corruption and the slides of this FOSDEM presentation for some more info on DB corruption

Code Snippets

cp -vR /usr/lib/postgresql/8.4 /dev/null

Context

StackExchange Database Administrators Q#64481, answer score: 4

Revisions (0)

No revisions yet.