HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Recover PostgreSQL database from WAL errors on startup?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqldatabaserecovererrorswalfromstartup

Problem

I'm trying to set up an OpenStreetMap server on an Ubuntu 12.04 machine using the Ubuntu packages listed at switch2osm.org. I initially installed and set up everything using a Northeast US-only map extract, but now I want to install the entire planet of maps. I downloaded planet-latest.osm.bz2 and ran osm2pgsql --slim -C 60000 planet-latest.osm.bz2 as a user with write permission to the database; this was the same command that worked to install us-northeast.osm.pbf earlier. I came back the next day to find this command appeared to finish successfully, but for some reason the rendering daemon wasn't generating new tiles from the new data. I tried restarting renderd, and when that had no effect I tried restarting the PostgreSQL server with sudo /etc/init.d/postgresql restart. However, server startup failed with the following errors in the log:

2012-07-13 18:54:59 UTC WARNING:  page 1525147 of relation base/16385/477861 was uninitialized
2012-07-13 18:54:59 UTC WARNING:  page 2247965 of relation base/16385/477861 was uninitialized
...500 more lines like this...
2012-07-13 18:54:59 UTC WARNING:  page 2262926 of relation base/16385/477861 was uninitialized
2012-07-13 18:54:59 UTC PANIC:  WAL contains references to invalid pages
2012-07-13 18:55:00 UTC LOG:  startup process (PID 22826) was terminated by signal 6: Aborted


(Pastebin of entire log here).

There isn't much information on these kinds of errors on the Internet, but from what I can find it seems to mean that either my indexes are corrupted or my Write-Ahead-Log is. The only way to fix corrupted indexes, though, is to start the database in single-user mode and rebuild them, and I can't even do that because I get the same fatal errors even when I start in single-user mode with indexing disabled.

Is there any way for me to delete the Write-Ahead Log and force the server to start up "from scratch", or a fix for this kind of corruption that doesn't require first starting the database successfully?

Alt

Solution

UPDATE: it looks like this is a bug in the Debian/Ubuntu packaging of PostgreSQL, where the init scripts - extremely unsafely - kill -9 the postmaster and remove postmaster.pid. See this post on pgsql-general.

See:

  • https://bugs.launchpad.net/debian/+source/postgresql-common/+bug/1042556



  • http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=686060



Personally, I've gone and edited my init scripts to get rid of this rather hairy and dangerous code.

The original answer

Please go back in the logs to before the restart and see if you can find any errors. WAL corruption absolutely should not happen, so if it has it's important to look into why. If you can upload a copy of the whole log to a pastebin or something that'd be really handy.

The only time where WAL corruption is an accepted possibility with PostgreSQL is if you are running with fsync=off set in PostgreSQL.conf and your system crashes or unexpectedly loses power. If that's not the cause, it'd be really good to look into what happened.

Please do not use pg_resetxlog without some idea why your xlogs are damaged. If the transaction logs become damaged something is badly wrong and you need to find out what. If you band-aid it now, you might be bitten by it later when you care about the data.

The transaction logs exist for a reason and just removing them can leave your tables and indexes in an inconsistent, damaged state. After a pg_resetxlog it's a very good idea to pg_dumpall, drop your cluster, re-initdb, and reload the DB. As I said, though, this should not happen and you should look back in the logs for clues about what could've happened.

Now read the comments

Context

StackExchange Database Administrators Q#20959, answer score: 5

Revisions (0)

No revisions yet.