patternsqlModerate
PostgreSQL pg_stat_activity shows COMMIT
Viewed 0 times
postgresqlcommitshowspg_stat_activity
Problem
We recently replaced our database server with an upgraded machine with 4 x quad core CPUs and 32Gb of ram. We also repurposed our old box to serve as a slave with streaming replication. Both boxes are running CentOS 6.3 and PostgreSQL 9.2. Postgres is the only thing running on each of the boxes.
This configuration has been in place for about a month or so, when suddenly we started running into some issues as traffic started ramping up. What we have started to see is an extremely high CPU load at times (top shows a load average of 270), and when we can look at
We have tried diagnosing what is happening, and are a bit lost. The output from running
None
This configuration has been in place for about a month or so, when suddenly we started running into some issues as traffic started ramping up. What we have started to see is an extremely high CPU load at times (top shows a load average of 270), and when we can look at
pg_stat_activity we will see most of our connections are in the COMMIT state. When left alone, this will eventually finish and the system will become responsive with the connections becoming IDLE. We have tried disabling replication to see if that might be the issue, but the problem still persists. We have tried diagnosing what is happening, and are a bit lost. The output from running
perf shows something similar to below, and I have no idea what 0x347ba9 represents. + 41.40% 48154 postmaster 0x347ba9 f 0x347ba9 ◆
+ 9.55% 10956 postmaster 0x2dc820 f set_config_option ▒
+ 8.64% 9946 postmaster 0x5a3d4 f writeListPage
+ 5.75% 6609 postmaster 0x5a2b0 f ginHeapTupleFastCollect ▒
+ 2.68% 3084 postmaster 0x192483 f build_implied_join_equality ▒
+ 2.61% 2990 postmaster 0x187a55 f build_paths_for_OR ▒
+ 1.86% 2131 postmaster 0x794aa f get_collation_oid ▒
+ 1.56% 1822 postmaster 0x5a67e f ginHeapTupleFastInsert ▒
+ 1.53% 1766 postmaster 0x1929bc f distribute_qual_to_rels ▒
+ 1.33% 1558 postmaster 0x249671 f cmp_numericsNone
Solution
After further diagnostics and some Googling, we came across this article that described many of the same symtoms we were experiencing. The root cause of their problem (and from what we can tell, ours too) was related to the
After disabling
The problem appears to have been resolved. We've been running under an increased workload for the last two weeks and the issue has not resurfaced. The system's contexts and interrupts are consistently 1/10th of what they had been and average system time has decreased as well.
Not sure if it is the solution for everyone, but I post it here as a possible cause in case it can help anyone else resolve a similar issue.
Transparent Huge Pages implementation. After disabling
Transparent Huge Pages with this command:echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabledThe problem appears to have been resolved. We've been running under an increased workload for the last two weeks and the issue has not resurfaced. The system's contexts and interrupts are consistently 1/10th of what they had been and average system time has decreased as well.
Not sure if it is the solution for everyone, but I post it here as a possible cause in case it can help anyone else resolve a similar issue.
Code Snippets
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabledContext
StackExchange Database Administrators Q#32890, answer score: 12
Revisions (0)
No revisions yet.