HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

pt-table-checksum help required

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
checksumrequiredhelptable

Problem

I am trying to figure out pt-table-checksum, as I am using it for the first time.
It look likes the documentation is complex and not easy to understand stuff.

I have a complex replication topology setup: I have master-master active topology setup geographically separated and both syncing each other. Further a number of slaves under each master. Also, some of the slaves are using filter table replication under both masters.

The database I am using is around 200GB in size, and slaves are balancing reads. I need to make sure pt-table-checksum will not break replication and will not cause any loads on server which hurts app usage.

My requirements are

-
Connect on master1 and check master2 and all slaves under both masters and inform for any deficiences in between master slaves data.

-
Is it possible to check only specific tables' specific slaves ? e.g. check from master1 and all underlying slaves of master1 but do not touch master2 and its underlying slaves?

-
How to take care of replication filters ? e.g. master1 has slaves slave1 slave2 where slave2 replicating only some of the database tables while slave1 replicates the entire database. I need pt-table-checksum to check for differences on slave1 for entire database and check only tables that are replicating on slave2.

I am running pt-table-checksum (2.0.1) on test machines on an employees database. I inserted some records in employees table with sql_log_bin off to prevent them from replicating so I can test pt-table-checksum to report differences and sync on slave.

Created user on master and on slave with SUPER, REPLICATION SLAVE privilege. The user is testuser with testpass being the password

master$ ./pt-table-checksum --replicate=test.checksum --create-replicate-table --nocheck-replication-filters --databases=employees localhost
Cannot connect to h=slave-ip

TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
07-09T13:48:38 0 0 9 1 0 0.082 employee

Solution

You say you are using pt-table-checksum 2.0.1. I would recommend updating to 2.1, as there are many improvements in the tool.

Next, let me address your test. You say the slave was not updated after the first or second commands that you ran. The second command looks to be trying to connect directly to the slave. pt-table-checksum won't report any differences unless the server you're connecting to has slaves.

Also, the --replicate-check-only option will not do any checksumming. (from the docs):


If specified, pt-table-checksum doesn’t checksum any tables. It checks replicas for differences found by previous checksumming, and then exits.

Your first command doesn't seem to be able to connect to the slave host, which is why it doesn't report any differences. Make sure the user/pass that is connecting to the master can also connect to the slave.

Now, as for your complex setup, you are right to worry about breaking replication. With some slaves replicating only certain tables, you should heed the warning here:


If the replicas are configured with any filtering options, you should be careful not to checksum any databases or tables that exist on the master and not the replicas.

You can specify which databases you want to checksum with the --databases option, and give a specific list of tables with the --tables option. Alternatively you can use the --ignore-databases and --ignore-tables options to provide a list of databases/tables to not checksum.

This will probably mean you will want separate pt-table-checksum commands based on which slaves you are trying to checksum. You will probably have to use the 'dsn' --recursion-method to accomplish this (I've never done it, personally)

As for load, pt-table-checksum comes with some options to throttle itself. Namely --max-load and --max-lag.


The tool keeps track of how quickly the server is able to execute the queries, and adjusts the chunks as it learns more about the server’s performance. It uses an exponentially decaying weighted average to keep the chunk size stable, yet remain responsive if the server’s performance changes during checksumming for any reason. This means that the tool will quickly throttle itself if your server becomes heavily loaded during a traffic spike or a background task, for example.

Context

StackExchange Database Administrators Q#20564, answer score: 2

Revisions (0)

No revisions yet.