HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

PostgreSQL pg_stat_activity shows COMMIT

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlcommitshowspg_stat_activity

Problem

We recently replaced our database server with an upgraded machine with 4 x quad core CPUs and 32Gb of ram. We also repurposed our old box to serve as a slave with streaming replication. Both boxes are running CentOS 6.3 and PostgreSQL 9.2. Postgres is the only thing running on each of the boxes.

This configuration has been in place for about a month or so, when suddenly we started running into some issues as traffic started ramping up. What we have started to see is an extremely high CPU load at times (top shows a load average of 270), and when we can look at pg_stat_activity we will see most of our connections are in the COMMIT state. When left alone, this will eventually finish and the system will become responsive with the connections becoming IDLE. We have tried disabling replication to see if that might be the issue, but the problem still persists.

We have tried diagnosing what is happening, and are a bit lost. The output from running perf shows something similar to below, and I have no idea what 0x347ba9 represents.

+  41.40%       48154  postmaster  0x347ba9         f 0x347ba9                                   ◆
+   9.55%       10956  postmaster  0x2dc820         f set_config_option                          ▒
+   8.64%        9946  postmaster  0x5a3d4          f writeListPage     
+   5.75%        6609  postmaster  0x5a2b0          f ginHeapTupleFastCollect                    ▒
+   2.68%        3084  postmaster  0x192483         f build_implied_join_equality                ▒
+   2.61%        2990  postmaster  0x187a55         f build_paths_for_OR                         ▒
+   1.86%        2131  postmaster  0x794aa          f get_collation_oid                          ▒
+   1.56%        1822  postmaster  0x5a67e          f ginHeapTupleFastInsert                     ▒
+   1.53%        1766  postmaster  0x1929bc         f distribute_qual_to_rels                    ▒
+   1.33%        1558  postmaster  0x249671         f cmp_numerics


None

Solution

After further diagnostics and some Googling, we came across this article that described many of the same symtoms we were experiencing. The root cause of their problem (and from what we can tell, ours too) was related to the Transparent Huge Pages implementation.

After disabling Transparent Huge Pages with this command:

echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled


The problem appears to have been resolved. We've been running under an increased workload for the last two weeks and the issue has not resurfaced. The system's contexts and interrupts are consistently 1/10th of what they had been and average system time has decreased as well.

Not sure if it is the solution for everyone, but I post it here as a possible cause in case it can help anyone else resolve a similar issue.

Code Snippets

echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled

Context

StackExchange Database Administrators Q#32890, answer score: 12

Revisions (0)

No revisions yet.