HiveBrain v1.2.0
Get Started
← Back to all entries
debugMinor

MongoDB crashes with out-of-memory or is being killed by oom-killer

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
killeroomwithcrashesmongodbbeingmemorykilledout

Problem

A two shards MongoDB database regularly crashes with out-of-memory error or is being killed by the oom-killer. The system runs on GCE Debian 9.4 with MongoDB v3.6.5, WiredTiger storage engine and without swap (as is the practice on GCE). The servers are n1-highmem-4 (4 vCPUs, 26 GB memory). On the server runs just mongod and there are no other services. mongos are on different servers.

Usually process exit/crash happens once a day. If mongod process is killed by oom-killer this can be seen in the logs:

Jun 15 14:45:17 server4 kernel: [1731430.432189] Out of memory: Kill process 13130 (mongod) score 980 or sacrifice child
Jun 15 14:45:17 server4 kernel: [1731430.441717] Killed process 13130 (mongod) total-vm:28280536kB, anon-rss:26174876kB, file-rss:0kB, shmem-rss:0kB


Sometimes mongod exits with leaving this in the mongod.log:

```
2018-06-15T02:14:32.456+0200 F - [rsSync] out of memory.

0x55cbc8535751 0x55cbc8534d84 0x55cbc8623b4b 0x55cbc86c665c 0x55cbc70fccff 0x55cbc70f8b02 0x55cbc707b3f1 0x55cbc86449b0 0x7fbbf3507494 0x7fbbf3249acf
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"55CBC6305000","o":"2230751","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55CBC6305000","o":"222FD84","s":"_ZN5mongo29reportOutOfMemoryErrorAndExitEv"},{"b":"55CBC6305000","o":"231EB4B"},{"b":"55CBC6305000","o":"23C165C","s":"_Znam"},{"b":"55CBC6305000","o":"DF7CFF","s":"_ZN5mongo4repl8SyncTail7OpQueueC1Ev"},{"b":"55CBC6305000","o":"DF3B02","s":"_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_22ReplicationCoordinatorE"},{"b":"55CBC6305000","o":"D763F1","s":"_ZN5mongo4repl10RSDataSync4_runEv"},{"b":"55CBC6305000","o":"233F9B0"},{"b":"7FBBF3500000","o":"7494"},{"b":"7FBBF3161000","o":"E8ACF","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.6.5", "gitVersion" : "a20ecd3e3a174162052ff99913bc2ca9a839d618", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.9.0-6-amd64", "version" : "#1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07)", "machine" : "x86_64" }, "so

Solution

It turned out that we had a long running daily query and chunks that were moved were still retained in memory as were used by the cursor of the query. In this way with every chunk moved memory with chunk data was retained and at one moment all memory consumed.

As we removed this long-running query, there were no crashes.

Context

StackExchange Database Administrators Q#209761, answer score: 2

Revisions (0)

No revisions yet.