HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Postgres 4x Slower Than It Was

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
wasthanslowerpostgres

Problem

Our Postgres performance has gone down to 1/4 of what it was, and we can't figure out why.

We have two machines with identical hardware (let's call them A and B):

Intel(R) Xeon(R) CPU E5-4640 0 @ 2.40GHz (64 cores)
384 GB RAM
15k SAS, 16 disk RAID 10 array


Each machine has essentially identical Postgres clusters with about 100 GB databases, with the following settings:

version:   PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit
bytea_output:   escape
checkpoint_completion_target:   0.7
checkpoint_segments:   256
checkpoint_timeout:   30min
client_encoding:   UTF8
cpu_index_tuple_cost:   0.001
cpu_operator_cost:   0.0005
cpu_tuple_cost:   0.003
DateStyle:   ISO, MDY
default_text_search_config:   pg_catalog.english
dynamic_shared_memory_type:   posix
effective_cache_size:   128GB
from_collapse_limit:   4
hot_standby:   on
join_collapse_limit:   4
lc_messages:   en_US.UTF-8
lc_monetary:   en_US.UTF-8
lc_numeric:   en_US.UTF-8
lc_time:   en_US.UTF-8
listen_addresses:   *
log_destination:   stderr
log_directory:   pg_log
log_filename:   postgresql-%Y-%m-%d_%H%M%S.log
log_line_prefix:   
log_rotation_age:   1d
log_rotation_size:   0
log_timezone:   US/Eastern
log_truncate_on_rotation:   on
logging_collector:   on
maintenance_work_mem:   1GB
max_connections:   256
max_replication_slots:   3
max_stack_depth:   2MB
max_standby_streaming_delay:   350min
max_wal_senders:   5
shared_buffers:   24GB
temp_buffers:   8MB
TimeZone:   US/Eastern
wal_buffers:   4MB
wal_keep_segments:   5000
wal_level:   hot_standby
work_mem:   96MB


Linux settings:

```
CentOS 6.6
/sys/kernel/mm/redhat_transparent_hugepage/enabled: Always
/sys/kernel/mm/redhat_transparent_hugepage/enabled: Always
/sys/kernel/mm/redhat_transparent_hugepage/defrag: Always
/proc/sys/vm/dirty_background_ratio: 10
/sys/block/sda/queue/scheduler: cfq
/sys/block/sda/queue/read_ahead_kb: 128

blockdev --report:
RO RA SSZ BSZ

Solution

Based on your configurations and output, the only suggestions I can make for you to try are the following.

-
Disable Transparent Huge Pages. Evidence from the PostgreSQL mailing lists, and Red Hat themselves recommend disabling THP for database workloads here

-
Set vm.zone_reclaim_mode=0. Discussion here

-
Set your elevator to deadline from cfq. Red Hat recommends deadline for enterprise storage, which it sounds like you have. Discussion here

-
Change from setting vm.dirty_background_ratio to vm.dirty_background_bytes and vm.dirty_bytes.

The defaults are 10% which given your 384GB of RAM is 38.4 GB of RAM that needs to be dirty before the kernel starts writing it out in the background. I would set the values to 64MB and 50% of the controller RAM, respectively, though that's based on my own anecdotal experience.

Hopefully some of these suggestions work out for you.

Context

StackExchange Database Administrators Q#108454, answer score: 2

Revisions (0)

No revisions yet.