Recent Entries 10
- pattern minor 112d agoMySQL replication setup based on an LVM snapshot – am I forgetting anything?I'm tasked with starting a brand new replica of a client's MySQL 5.7.35 database server. While I've observed others doing this in the past in various circumstances, I haven't been in charge of this operation before, so I'd just like to confirm that my plan is correct and that I'm not missing anything crucial. The data lives in an XFS-formatted LVM logical volume, inside a volume group that has enough free space to enable taking an LVM snapshot. The database is a legacy mixture of MyISAM and InnoDB tables, over 200 GB in size at the moment. The replica server is set up with an identical MySQL version, and all of the prerequisites for row-based binlog replication (replica user, server ID, `log_bin` etc.) are in order. My process is this: - On the master, run `FLUSH TABLES WITH READ LOCK`. Leave the client session open. - According to Percona, it's important to try to prevent long-running `SELECT` queries from being in the middle of execution when this is run, so we'll make an effort to ensure this. - In another client session (don't know why, but the docs say so), run `SHOW MASTER STATUS` to get the binlog file name and position. - Create the LVM snapshot from the MySQL data volume. (Does not contain binlogs or relay logs.) - In the original MySQL client session, run `UNLOCK TABLES` to restore normal operation. - Mount the snapshot (`-t xfs -o nouuid`) and `rsync` the data directory from it onto the (shut down) replica server – excluding `auto.cnf` and the autogenerated `.pem` files (we're not using SSL to connect to MySQL, so they shouldn't particularly matter). - Unmount and `lvremove` the snapshot. - Start up MySQL on the replica and do the normal binlog replication initialization steps as outlined in the MySQL documentation, providing the `MASTER_LOG_FILE` and `MASTER_LOG_POS` options to `CHANGE MASTER` with the values from step 2, and then just wait for the replica to catch up. I've tested this process to work on a trafficless staging server, but my m
- snippet minor 112d agoHow to avoid replication lag in case all writes on master and reads on replica?I have stuck with the replication lag problem. I am trying to refactor my DB infrastructure. The first step was read-write split. I used ProxySQL то implement it. Now I have masterDb where all INSERT and UPDATES executing and 2 replicas where I route all SELECT. But the main problem I faced with, replication lag. Because usually when you change something in DB you immediately read data and expect to have new data in the query result, but with replication lag, I receive outdated data. After googling I found this article https://www.percona.com/blog/2018/11/29/mysql-high-availability-stale-reads-and-how-to-fix-them/ and according to this info the best option in 2018 was "ProxySQL 2.0 GTID consistent reads" Any updates from 2018? Maybe you guys know a better solution? Please share!
- pattern minor 112d agoAdd an arbiter to mongodb ReplicaSet failedI am new to mongodb and this is my second post in order to build a test cluster and gain knowledge at the same time. I have a 3 nodes cluster holding server MONGO2, MONGO3 and MONGOARB1. MONGO2 and MONGO3 are in replicaset "rs0" and works great : ``` rs0:PRIMARY> rs.conf() { "_id" : "rs0", "version" : 3, "term" : 9, "members" : [ { "_id" : 0, "host" : "MONGO2:27017", "arbiterOnly" : false, "buildIndexes" : true, "hidden" : false, "priority" : 1, "tags" : { }, "secondaryDelaySecs" : NumberLong(0), "votes" : 1 }, { "_id" : 1, "host" : "MONGO3:27017", "arbiterOnly" : false, "buildIndexes" : true, "hidden" : false, "priority" : 1, "tags" : { }, "secondaryDelaySecs" : NumberLong(0), "votes" : 1 } ], ``` I'd like to add an arbiter to my PSA cluster as described here : https://docs.mongodb.com/manual/tutorial/add-replica-set-arbiter/ So following the documentation: - Create a specific directory => OK (C:\MONGO\arb) - Start the arbiter, specifying the data directory and the name of the replica set to join => OK : `./mongod.exe --port 27017 --dbpath C:\MONGO\arb --replSet rs0 --bind_ip_all` "msg":"Failed to refresh key cache","attr":{"error":"NotYetInitialized: Cannot use non-local read concern until replica set is finished initializing.","nextWakeupMillis":24200}} - Connect to the primary and add the arbiter to the replica set. => KO `rs.addArb("MONGOARB1:27017");` After this last command n
- pattern minor 112d agoPostgres major version upgrade using replication serverI am planning to upgrade my database from Postgres 10 to Posgres 14. To reduce the downtime, I plan to use the replication for the same. Plan (in short) is to: - Dump the data using `pg_dumpall` - Kill the Postgres service running on replication - Setup new Postgres 14 service - Restore the data from the dump created in Step 1 - Setup logical replication with the primary to get the data up to date - Make replication the primary, and applications start connecting to the replication. - Kill the Postgres service running on the old primary, and upgrade it too to 14. - Setup streaming replication between the new primary (old replication) and new replication. I have a few questions regarding this: - Is it advisable to take a dump and restore, or let logical replication handle complete replication (apart from the DDL's) from scratch? In terms of time that it could take, which would be the better option? We have close to 650GB of data. I would think the dump is compressed, but restoring them (on the same machine) could still take time I believe. - Is there something I need to setup to make sure that the replication starts from after the dump point or will it be taken care? - One limitation I see with the logical replication is that it does not replicate sequences. For this I believe, before the switch over - I could increment the sequence by a very large number to make sure that there are no conflicts? - I have also seen an extension `pglogical` that I could use for the setup. Although, I am yet to dig deeper. From your experience, does it give any added advantages \ would it make the setup easier? - Although I would think it depends on different parameters, how long it could take typically to complete the restore\replication for a database of this much size, until the primary and master are in sync?
- pattern minor 112d agoRDS Postgres oldest replication lag increases during inactivity even though Fivetran connector keeps synchronizingI'm in the process of setting-up a datawarehouse using Fivetran as the ETL layer. One of the source databases is a AWS RDS Postgres instance. I've configured the Postgres instance to perform Logical Replication of the WAL using the test_decoding plugin. All seems to work fine during office hours, however outside office hours when no queries are preformed, the oldest replication slot lag size is increasing despite the fact the Fivetran connector performs a sync. You can see this in the picture below. In the red box you the replication slot lag size is increasing (top graph) while the synchronization moments keeps happing every hour (bottom graph). I would expect a graph like shown in the green box, were the replication slot lag size is decreasing around the synchronization moments. I contacted Fivetran with this issue, but they were yet unable to find out the problem, therefore I'm asking the community. I'm using Postgres version 13.3 with the following custom configuration: - `max_slot_wal_keep_size`: `20000` - `rds.logical_replication`: `1` - `wal_sender_timeout`: `0` (required by Fivetran) The rest of the configuration is default. I also checked other questions, there's only one that might come close https://dba.stackexchange.com/a/103806/235086, but I'm unsure if it applies here since it about the lag in seconds instead of size.
- debug minor 112d agoAWS DMS Task getting failed after completion with error as AlwaysOn BACKUP-ed data is not availableI have created DMS task for migrating data from one RDS MS SQL Server instance to another RDS MS SQL Server instance with "Full load + ongoing replication", and data copied to target instance but status showing as failed with below error - Last Error AlwaysOn BACKUP-ed data is not available Task error notification received from subtask 0, thread 0 [reptask/replicationtask.c:2822] [1020465] Error executing source loop; Stream component failed at subtask 0, component st_0_NJZW5VSVPASASA4E4N2SJGTVEZ3UIDLJHX3NDY; Stream component 'st_0_NJZW5VSVASASAZ4E4N2SJGTVEZ3UIDLJHX3NDY' terminated [reptask/replicationtask.c:2829] [1020465] Stop Reason FATAL_ERROR Error Level FATAL Cloud watch logs: 2021-01-04T15:24:02 [SOURCE_CAPTURE ]E: Failed to access LSN '0000009a:00000087:0010' in the backup log sets since BACKUP/LOG-s are not available. [1020465] (sqlserver_endpoint_capture.c:717) 2021-01-04T15:24:02 [TASK_MANAGER ]I: Task - PHTLF2WXLFVWESX4GLZ2CKAW3MY76HLRXHIYYYY is in ERROR state, updating starting status to AR_NOT_APPLICABLE (repository.c:5103) 2021-01-04T15:24:02 [SOURCE_CAPTURE ]E: Error executing source loop [1020465] (streamcomponent.c:1867) 2021-01-04T15:24:02 [TASK_MANAGER ]E: Stream component failed at subtask 0, component st_0_NJZW5VSVPJVW47Z4E4N2SJGTVEZ3UIDLJHX3YYY [1020465] (subtask.c:1409) 2021-01-04T15:24:02 [SOURCE_CAPTURE ]E: Stream component 'st_0_NJZW5VSVPJVW47Z4E4N2SJGTVEZ3UIDLJHX3YYY' terminated [1020465] (subtask.c:1578) 2021-01-04T15:24:02 [TASK_MANAGER ]E: Task error notification received from subtask 0, thread 0 [1020465] (replicationtask.c:2822) 2021-01-04T15:24:02 [TASK_MANAGER ]E: Error executing source loop; Stream component failed at subtask 0, component st_0_NJZW5VSVPJVW47Z4E4N2SJGTVEZ3UIDLJHX3YYY; Stream component 'st_0_NJZW5VSVPJVW47Z4E4N2SJGTVEZ3UIDLJHX3YYY' terminated [1020465] (replicationtask.c:2829) 2021-01-04T15:24:02 [TASK_MANAGER ]E: Task 'PHTLF2WXLFVWESX4GLZ2CKAW3MY76HLRXHIYYYY' encountered a fatal
- pattern minor 112d agoChange host IP of master in mysql replicationI have one master and one slave. Suppose IP of master is changed but the physical server is same. Now I want to update the new IP (of the master) at the slave server. At the slave server `master_info_repository` is set to `TABLE`. I want to resume the replication process from exactly the position where it has paused due to IP change. How to do this? mysql version of slave server is : `14.14 Distrib 5.7.30` and O/S is `ubuntu 18.04`.
- debug minor 112d agoERROR: canceling statement due to conflict with recoveryI got the error below: `ERROR: canceling statement due to conflict with recovery. Detail: User query might have needed to see row versions that must be removed.; nested exception in org.postgresql.util.PSQLException:ERROR:canceling statement due to conflict with recovery. Detail: User query might have needed to see row versions that must be removed. ` This error occurs on the standby, when we use long query. We use PostgreSQL version 12.0. One master and two standby servers. Could you please give us the solution?
- pattern minor 112d agoPre- and Post- Snapshot Replication Scripts Not Executing and No ErrorsI have a Snapshot Replication Publication setup on Server 1. Server 2 subscribes to the publication on Server 1. Data is replicating successfully. The Distributor Agent and Agent job for the Subscriber of Server 2 runs on Server 1 (where the Distributor database lives). I created a stored procedure on Server 2 to drop (pre replication) and create (post replication) schemabound entities on Server 2's subscriber database. (I've done this successfully in the past because Snapshot Replication can stall if there's entities in the subscriber database that are schema bound to the replicated entities.) I also created a script that executes this stored procedure from Server 1 via a linked server to Server 2. (I've verified the linked server is setup correctly and I can query data from and manually execute the stored procedure via the linked server from Server 1 to Server 2.) I've added this script to the Snapshot Publication's "Snapshot" Properties: Initially I received an error in the Snapshot Agent job because my Snapshot Agent didn't have file share access to the path in the scripts, but I added that account and then the error went away and the Snapshot job finishes successfully now. But the entities in the SQL script don't get dropped or created, and I don't see an errors anywhere. I've also tried moving the before and after script file to a folder share local to Server 1 and referencing it with a UNC path (as opposed to the network path in the screenshot) and still no change in behavior. I've updated the TEST.sql script with a really simple test with the following code: ``` INSERT INTO DatabaseA.dbo.Table1 (Field1) -- DatabaseA is on Server1, so it should insert to the local table SELECT 'Test' AS Field1 ``` When I re-run the Snapshot Agent, it still runs successfully, but the table doesn't get inserted into (on Server 1). Why could my pre- and post- scripts not be running but I'm not receiving any errors either? (I also have another Publication for Transactio
- snippet minor 112d agoHow to prevent read replica restarts during high replication lagWe're running an Aurora PostgreSQL cluster, with a single read replica along with the master node. Periodically, there is very heavy write load which causes high replication lag. This can cause the read replica to restart which is undesirable for us in a high availability environment. When this happens, clients that are connected to the cluster via the read-only endpoint get this JDBC error: `org.postgresql.util.PSQLException: FATAL: the database system is starting up`. Additionally, the AWS console shows these peppered throughout the logs: Read replica has fallen behind the master too much. Restarting postgres. followed by DB instance restarted We can tolerate the read replica being behind by several minutes, but we can't tolerate the read replica restarting to catch up. Is there a way to prevent the read replica from restarting during these periods? Alternatively, are there any recommended tweaks for reducing replication lag during periods of heavy write load?