HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Postgres stuck with `FATAL: the database system is starting up ` for hours. Should we wait or is this a sign for a corrupted DB?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thisstartingcorruptedthepostgreswaitwithstucksystemfatal

Problem

I found answers to this problem suggesting that it might take a while to restart postgres. But now we are already waiting for over 2 hours which seems extremely long compared to the 5 minutes it usually takes.
(https://stackoverflow.com/questions/54922433/postgresql-fatal-the-database-system-is-starting-up-windows-10,

https://stackoverflow.com/questions/54922433/postgresql-fatal-the-database-system-is-starting-up-windows-10
)

There are no other logs than FATAL: the database system is starting up.
Furthermore, the CPU and RAM utilization is minimal. It looks like nothing is happening.

We also checked disk space which is no problem.

How long should we wait until trying something else? And do you have any idea what could have caused this and how to fix it?

Some additional information:

  • Postgres version is 13.8



  • the size of the database is something in the range of 1.5TB



  • if our monitoring works correctly the last checkpoint was around 20 minutes before the shutdown



  • wal settings:



max_wal_size = 8GB
min_wal_size = 512MB
checkpoint_completion_target = 0.9


After starting postgres with the -d 3 flag we got the following logs:

```
2022-12-02 15:17:10.471 UTC [1] LOG: starting PostgreSQL 13.8 (Ubuntu 13.8-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, 64-bit
2022-12-02 15:17:10.472 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-12-02 15:17:10.472 UTC [1] LOG: listening on IPv6 address "::", port 5432
2022-12-02 15:17:10.480 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-12-02 15:17:10.683 UTC [14] LOG: database system was interrupted; last known up at 2022-12-02 10:01:04 UTC
2022-12-02 15:17:10.683 UTC [14] DEBUG: removing all temporary WAL segments
2022-12-02 15:17:10.914 UTC [1] DEBUG: forked new backend, pid=15 socket=9
2022-12-02 15:17:10.915 UTC [15] LOG: connection received: host=10.255.0.181 port=46576
2022-12-02 15:17:10.918 UTC [15]

Solution

The database is running again. We are not entirely sure why but that is what we did:

We figured out that while Postgres was starting we got a lot of

2022-12-02 15:17:10.918 UTC [15] FATAL:  the database system is starting up


which were error messages responding to a lot of client requests trying to reach the database. We started the database on another port so that it can start without all the clients trying to reach it. For some reason this fixed it (at least we think so). The database started normally and after shutting it down again and restarting it on the original port it is back in operation.

A scary experience ...

Code Snippets

2022-12-02 15:17:10.918 UTC [15] FATAL:  the database system is starting up

Context

StackExchange Database Administrators Q#320487, answer score: 5

Revisions (0)

No revisions yet.