Recent Entries 9
- principle minor 112d agoWhat's the advantage of sharding in AWS Aurora?AWS' RDS docs on multi-master Aurora state the following: In an Aurora multi-master cluster, each shard is managed by a specific DB instance, and a DB instance can be responsible for multiple shards. Later in the same document, we read: You can avoid resharding operations because all DB instances in a cluster can access all databases and tables through the shared storage volume. So, a multi-master Aurora instance can be responsible for multiple shards. This is made possible in part because all DB instances access the same shared storage volume. If multi-master Aurora instances can manage multiple shards, what's the advantage of using sharding at all? Why not just configure all instances to manage all shards? (Essentially obviating the need for sharding) A theory My suspicion is that not using sharding would lead to more deadlocks within Aurora's internals if different masters write to the same page at the same time. This, in turn, would lead to increased latency while Aurora retries the contentious write. Or, perhaps Aurora would generate an error and the application itself would have to retry the query. (I'm clearly not familiar enough with Aurora to know what happens, hence this question )
- pattern minor 112d agoWhy SQL inserts are slower in AWS Aurora Serverless than in AWS RDSWe had an AWS MySQL RDS t3.xlarge instance with 2000 IOPS. Also, I have a cron job written in java which inserts data in a table: it can insert up to 12M rows daily. Since AWS RDS cost is quite high we decided to migrate to AWS Aurora Serverless v1. But what was surprising for us: the job executed on AWS RDS in ~3 hours but on Aurora Serverless it took ~16 hours. According to my searches, it is suggested to change parameter values: `innodb_flush_log_at_trx_commit (1 -> 0)` and `sync_binlog (1 -> 0)`. For Aurora Serverless you can modify `innodb_flush_log_at_trx_commit` but `sync_binlog` - no. Probably that's why changing `innodb_flush_log_at_trx_commit` didn't give any results. My questions are: - Why on Aurora Serverless the same job executes more than 5x times slower than on AWS RDS (Although, I understand that t3.xlarge instance with 2000 IOPS is quite powerful)? - Is there a solution to accelerate Aurora Serverless inserts? EDIT: Also, there is a similar question
- snippet moderate 112d agoAWS Aurora PostgreSQL Serverless: How do you pre-warm the shared buffer after scaling?I'm using `AWS Aurora PostgreSQL Serverless` with autoscaling. It appears as though scaling clears the shared buffer, so right when we want to crank out the performance, we are forced to face-plant with an I/O bottleneck. After we get warmed up, we see a great performance improvement. However, if we run back-to-back once scaled, the second run goes much faster. While I haven't seen anything specific on whether the shared buffer gets cleared on scaling, I'm almost positive that it is. Aurora Serverless is currently using `PostgreSQL 10.14` and it supports the `pg_prewarm` extension. It looks like the newest documentation suggests that prewarm supports auto pre-warm after a server restart, but this is serverless and a version that doesn't appear to mention auto pre-warming in the documentation. I found this post that works great for PostgreSQL when restarting the server or recovering from a crash. - If we could at least retain the contents of the shared buffer of the lower ACU node after scaling, that'd be fine. - If we could pre-warm exactly what needs to be in memory ahead of time, that would be awesome! - There are certain tables that are quite large and we would want to selectively pre-warm the pieces that we want. `pg_prewarm` supports `first_block` and `last_block` block numbers for a table/index, but how would one know what values to put in there? We know ahead of time when our peak is and tell RDS to scale right before, so we have a window of time where we could prepare. What are my options?
- pattern moderate 112d agoUPDATE on big table in PostgreSQL randomly takes too longI'm trying to figure out why an `UPDATE` statement takes too long (>30 sec). This is random, i.e. in most cases it finishes in under 100msec. However, sometimes (randomly) it takes > 30sec to complete. Some specifics: - I'm using PostgreSQL 12 (actually, AWS Aurora) - I'm trying this in a database with no traffic, so it's not being affected by any other queries running in the same time. I'm also monitoring the logs to see if anything else is running, and I don't see anything. - I've tried `REINDEX` ing, `VACUUM`ing (and `VACUUM ANALYZE`), with no improvement - I've checked for locks, (`log_lock_waits`) and I don't see anything. - The query is performed in a loop (from a Python app). It performs ~5000 queries, and at some point some of them, they don't seem to follow a pattern, take a huge time to complete. - I've tried running them in batches, but again, some batches randomly take too long. - The size of the table is kinda big, ~10000000 rows and ~25 indexes. The query: ``` UPDATE "my_table" SET "match_request_id" = 'c607789f-4816-4a38-844b-173fa7bf64ed'::uuid WHERE "my_table"."id" = 129624354; ``` The output of `EXPLAIN (ANALYZE VERBOSE BUFFERS COSTS)` ``` Update on public.my_table (cost=0.56..8.58 rows=1 width=832) (actual time=34106.965..34106.966 rows=0 loops=1) Buffers: shared hit=431280 read=27724 I/O Timings: read=32469.021 -> Index Scan using my_table_pkey on public.my_table (cost=0.56..8.58 rows=1 width=832) (actual time=0.100..0.105 rows=1 loops=1) Output: (...) Index Cond: (my_table.id = 130561719) Buffers: shared hit=7 Planning Time: 23.872 ms Execution Time: 34107.047 ms ``` Note that this is `EXPLAIN ANALYZE`. I'm baffled since while the cost is really low, the actual running time is huge! I'm trying to understand if this is expected and if I can improve the situation somehow. Any ideas would be welcome, I'm kinda running out! EDIT: Adding some more info requested by comments: A query plan for a
- snippet minor 112d agoHow to prevent read replica restarts during high replication lagWe're running an Aurora PostgreSQL cluster, with a single read replica along with the master node. Periodically, there is very heavy write load which causes high replication lag. This can cause the read replica to restart which is undesirable for us in a high availability environment. When this happens, clients that are connected to the cluster via the read-only endpoint get this JDBC error: `org.postgresql.util.PSQLException: FATAL: the database system is starting up`. Additionally, the AWS console shows these peppered throughout the logs: Read replica has fallen behind the master too much. Restarting postgres. followed by DB instance restarted We can tolerate the read replica being behind by several minutes, but we can't tolerate the read replica restarting to catch up. Is there a way to prevent the read replica from restarting during these periods? Alternatively, are there any recommended tweaks for reducing replication lag during periods of heavy write load?
- pattern minor 112d agoAurora PostgreSQL database using a slower query plan than a normal PostgreSQL for an identical query?Following the migration of an application and its database from a classical PostgreSQL database to an Amazon Aurora RDS PostgreSQL database (both using 9.6 version), we have found that a specific query is running much slower -- around 10 times slower -- on Aurora than on PostgreSQL. Both databases have the same configuration, be it for the hardware or the pg_conf. The query itself is fairly simple. It is generated from our backend written in Java and using jOOQ for writing the queries: ``` with "all_acp_ids"("acp_id") as ( select acp_id from temp_table_de3398bacb6c4e8ca8b37be227eac089 ) select distinct "public"."f1_folio_milestones"."acp_id", coalesce("public"."sa_milestone_overrides"."team", "public"."f1_folio_milestones"."team_responsible") from "public"."f1_folio_milestones" left outer join "public"."sa_milestone_overrides" on ( "public"."f1_folio_milestones"."milestone" = "public"."sa_milestone_overrides"."milestone" and "public"."f1_folio_milestones"."view" = "public"."sa_milestone_overrides"."view" and "public"."f1_folio_milestones"."acp_id" = "public"."sa_milestone_overrides"."acp_id" ) where "public"."f1_folio_milestones"."acp_id" in ( select "all_acp_ids"."acp_id" from "all_acp_ids" ) ``` With `temp_table_de3398bacb6c4e8ca8b37be227eac089` being a single-column table, `f1_folio_milestones` (17 million entries) and `sa_milestone_overrides` (Around 1 million entries) being similarly designed tables having indexes on all the columns used for the `LEFT OUTER JOIN`. `temp_table_de3398bacb6c4e8ca8b37be227eac089` can contain up to 5000 entries, all of them being distinct. When we run it on the normal PostgreSQL database, it generates the following query plan: ``` Unique (cost=4802622.20..4868822.51 rows=8826708 width=43) (actual time=483.928..483.930 rows=1 loops=1) CTE all_acp_ids -> Seq Scan on temp_table_de3398bacb6c4e8ca8b37be227eac089 (cost=0.00..23.60 rows=1360 width=32) (actual time=0.004..
- gotcha minor 112d agoDifference between Aurora Global database and Aurora regional database with multiple read replicas?I am planning to migrate my 300GB RDS PostgreSql database to Aurora MySql using SCT and DMS. RDS Postgresql is in seven regions in current setup. Datapipelines are used to ingest data in these instances and keep them in sync. I was thinking that once I create global database in one region, I will be able to add secondary instances in six other regions. But, I read that global instances support just one additional secondary region. The only relevant benefits for global database from the docs are: - Having an additional secondary instance will have faster replication compared to having a read replica. - Faster disaster recovery as the secondary instance can be promoted to primary under a minute. Now I am wondering what is the difference between: - Having a Aurora global database in one region with writer and reader, adding a secondary region, adding five read replicas(from primary or secondary) - Having a Aurora regional database (with a writer and reader) and adding six read replicas.
- pattern minor 112d agoDump roles on AWS without pg_dumpall -g access (postgresql 9.6.8)?We have a system on AWS RDS Aurora, PostgreSQL v 9.6.8 option. We are trying to move its roles to a new encrypted 10.5 instance, but we don't have superuser access that's required to do pg_dumpall -g on the old instance. (I am aware of the update to Postgres 10.0 that lets you do this without superuser access, but this is 9.6.8). Any ideas for a workaround? Thanks in advance for any suggestions whatsoever!
- debug minor 112d agoERROR: improper qualified name (too many dotted names) when trying ALTER FUNCTIONWorking with Amazon Aurora Postgres Version 2.0, which is basically PostgreSQL 10.4 with a modified storage layer. Trying to restore a dump that contains multiple lines like: ``` ALTER FUNCTION myschema.f_myfunc(anyarray, anyelement) OWNER TO myrole; ``` I get an error message for every `ALTER FUNCTION` like that: ``` ERROR: improper qualified name (too many dotted names) ``` Trying the same in psql produces the same error. Even with the simplest form: ``` ALTER FUNCTION foo() OWNER TO myrole; ``` There are no dotted names. I found similar complaints here: https://forums.aws.amazon.com/thread.jspa?messageID=872096&tstart=0 Must be a bug in Aurora - or am I missing something?