patternsqlMinor
Postgres stuck with `FATAL: the database system is starting up ` for hours. Should we wait or is this a sign for a corrupted DB?
Viewed 0 times
thisstartingcorruptedthepostgreswaitwithstucksystemfatal
Problem
I found answers to this problem suggesting that it might take a while to restart postgres. But now we are already waiting for over 2 hours which seems extremely long compared to the 5 minutes it usually takes.
(https://stackoverflow.com/questions/54922433/postgresql-fatal-the-database-system-is-starting-up-windows-10,
https://stackoverflow.com/questions/54922433/postgresql-fatal-the-database-system-is-starting-up-windows-10
)
There are no other logs than
Furthermore, the CPU and RAM utilization is minimal. It looks like nothing is happening.
We also checked disk space which is no problem.
How long should we wait until trying something else? And do you have any idea what could have caused this and how to fix it?
Some additional information:
After starting postgres with the
```
2022-12-02 15:17:10.471 UTC [1] LOG: starting PostgreSQL 13.8 (Ubuntu 13.8-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, 64-bit
2022-12-02 15:17:10.472 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-12-02 15:17:10.472 UTC [1] LOG: listening on IPv6 address "::", port 5432
2022-12-02 15:17:10.480 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-12-02 15:17:10.683 UTC [14] LOG: database system was interrupted; last known up at 2022-12-02 10:01:04 UTC
2022-12-02 15:17:10.683 UTC [14] DEBUG: removing all temporary WAL segments
2022-12-02 15:17:10.914 UTC [1] DEBUG: forked new backend, pid=15 socket=9
2022-12-02 15:17:10.915 UTC [15] LOG: connection received: host=10.255.0.181 port=46576
2022-12-02 15:17:10.918 UTC [15]
(https://stackoverflow.com/questions/54922433/postgresql-fatal-the-database-system-is-starting-up-windows-10,
https://stackoverflow.com/questions/54922433/postgresql-fatal-the-database-system-is-starting-up-windows-10
)
There are no other logs than
FATAL: the database system is starting up.Furthermore, the CPU and RAM utilization is minimal. It looks like nothing is happening.
We also checked disk space which is no problem.
How long should we wait until trying something else? And do you have any idea what could have caused this and how to fix it?
Some additional information:
- Postgres version is 13.8
- the size of the database is something in the range of 1.5TB
- if our monitoring works correctly the last checkpoint was around 20 minutes before the shutdown
- wal settings:
max_wal_size = 8GB
min_wal_size = 512MB
checkpoint_completion_target = 0.9After starting postgres with the
-d 3 flag we got the following logs:```
2022-12-02 15:17:10.471 UTC [1] LOG: starting PostgreSQL 13.8 (Ubuntu 13.8-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, 64-bit
2022-12-02 15:17:10.472 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-12-02 15:17:10.472 UTC [1] LOG: listening on IPv6 address "::", port 5432
2022-12-02 15:17:10.480 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-12-02 15:17:10.683 UTC [14] LOG: database system was interrupted; last known up at 2022-12-02 10:01:04 UTC
2022-12-02 15:17:10.683 UTC [14] DEBUG: removing all temporary WAL segments
2022-12-02 15:17:10.914 UTC [1] DEBUG: forked new backend, pid=15 socket=9
2022-12-02 15:17:10.915 UTC [15] LOG: connection received: host=10.255.0.181 port=46576
2022-12-02 15:17:10.918 UTC [15]
Solution
The database is running again. We are not entirely sure why but that is what we did:
We figured out that while Postgres was starting we got a lot of
which were error messages responding to a lot of client requests trying to reach the database. We started the database on another port so that it can start without all the clients trying to reach it. For some reason this fixed it (at least we think so). The database started normally and after shutting it down again and restarting it on the original port it is back in operation.
A scary experience ...
We figured out that while Postgres was starting we got a lot of
2022-12-02 15:17:10.918 UTC [15] FATAL: the database system is starting upwhich were error messages responding to a lot of client requests trying to reach the database. We started the database on another port so that it can start without all the clients trying to reach it. For some reason this fixed it (at least we think so). The database started normally and after shutting it down again and restarting it on the original port it is back in operation.
A scary experience ...
Code Snippets
2022-12-02 15:17:10.918 UTC [15] FATAL: the database system is starting upContext
StackExchange Database Administrators Q#320487, answer score: 5
Revisions (0)
No revisions yet.