patternsqlModerate
Postgres is stuck in recovery mode
Viewed 0 times
stuckrecoverymodepostgres
Problem
I have a stand-alone instance of PostgreSQL which is in recovery mode. It has been saying
For many hours, and
Is there any way to watch the progress of the recovery, ideally with an ETA? How can I get this process "un-stuck"?
I can stop postgres using the standard start/stop scripts, but when I start it again, it's still stuck in recovery mode.
Debian 7.4, Linux kernel 3.2.0, PostgreSQL 9.1.12
Output of
The above output repeats seemingly infinitely.
A gdb backtrace (three subsequent bts were all practically identical):
```
(gdb) bt
#0 0x00007f0c8f917d70 in fsync () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f0c9169f255 in pg_fsync_no_writethrough (fd=) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/storage/file/fd.c:286
#2 0x00007f0c9169f265 in pg_fsync (fd=) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/storage/file/fd.c:274
#3 0x00007f0c9151917b in SlruPhysicalWritePage (ctl=ctl@entry=0x7f0c91b856c0, pageno=pageno@entry=123, slotno=slotno@entry=3, fdata=fdata@entry=0x0) at /home/cbe/projects/postgresql/9.1/postgresql-9.1- 9.1.12/build/../src/backend/access/transam/slru.c:801
#4 0x00007f0c91519925 in SlruInternalWritePage (ctl=ctl@entry=0x7f0c91b856c0, slotno=3, fdata=fdata@entry=0x0) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/t
2014-03-24 18:45:57 MDT FATAL: the database system is starting upFor many hours, and
ps shows:postgres 2637 0.1 0.1 116916 4420 ? Ds 15:43 0:18 postgres: startup process recovering 00000001000000040000007EIs there any way to watch the progress of the recovery, ideally with an ETA? How can I get this process "un-stuck"?
I can stop postgres using the standard start/stop scripts, but when I start it again, it's still stuck in recovery mode.
Debian 7.4, Linux kernel 3.2.0, PostgreSQL 9.1.12
Output of
strace -p 2637:Process 2637 attached - interrupt to quit
close(154) = 0
getppid() = 2600
open("pg_clog/0003", O_RDWR|O_CREAT, 0600) = 154
lseek(154, 221184, SEEK_SET) = 221184
write(154, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
fsync(154) = 0The above output repeats seemingly infinitely.
A gdb backtrace (three subsequent bts were all practically identical):
```
(gdb) bt
#0 0x00007f0c8f917d70 in fsync () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f0c9169f255 in pg_fsync_no_writethrough (fd=) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/storage/file/fd.c:286
#2 0x00007f0c9169f265 in pg_fsync (fd=) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/storage/file/fd.c:274
#3 0x00007f0c9151917b in SlruPhysicalWritePage (ctl=ctl@entry=0x7f0c91b856c0, pageno=pageno@entry=123, slotno=slotno@entry=3, fdata=fdata@entry=0x0) at /home/cbe/projects/postgresql/9.1/postgresql-9.1- 9.1.12/build/../src/backend/access/transam/slru.c:801
#4 0x00007f0c91519925 in SlruInternalWritePage (ctl=ctl@entry=0x7f0c91b856c0, slotno=3, fdata=fdata@entry=0x0) at /home/cbe/projects/postgresql/9.1/postgresql-9.1-9.1.12/build/../src/backend/access/t
Solution
After googling for hours, I stumbled across a thread that wasn't really related to my issue, best I could tell, but it seemed harmless enough to try, and voilà!
I looked at
Noticing that
And now postgres starts again.
I looked at
/var/lib/postgresql/9.1/main/pg_clog, and saw:drwx------ 2 postgres postgres 4096 Mar 15 15:20 .
drwx------ 13 postgres postgres 4096 Mar 25 12:15 ..
-rw------- 1 postgres postgres 262144 Feb 4 19:39 0000
-rw------- 1 postgres postgres 262144 Feb 13 11:10 0001
-rw------- 1 postgres postgres 262144 Mar 15 15:20 0002
-rw------- 1 postgres postgres 229376 Mar 25 14:51 0003Noticing that
/var/lib/postgresql/9.1/main/pg_clog/0003 was the file being opened/seeked in the strace output, and that this was the file in question in the forum post, I tried the suggested action in the forum:dd if=/dev/zero bs=8k count=1 >> /var/lib/postgresql/9.1/main/pg_clog/0003And now postgres starts again.
Code Snippets
drwx------ 2 postgres postgres 4096 Mar 15 15:20 .
drwx------ 13 postgres postgres 4096 Mar 25 12:15 ..
-rw------- 1 postgres postgres 262144 Feb 4 19:39 0000
-rw------- 1 postgres postgres 262144 Feb 13 11:10 0001
-rw------- 1 postgres postgres 262144 Mar 15 15:20 0002
-rw------- 1 postgres postgres 229376 Mar 25 14:51 0003dd if=/dev/zero bs=8k count=1 >> /var/lib/postgresql/9.1/main/pg_clog/0003Context
StackExchange Database Administrators Q#61650, answer score: 13
Revisions (0)
No revisions yet.