patternphpMinor
Comparing data in 2 tables on different servers with CHECKSUM
Viewed 0 times
tablescomparingserverswithchecksumdifferentdata
Problem
So I've had a problem where I need to compare data in 2 different tables on two different servers. Now, I know MySQL supports
So I created this query:
So basically, it's concatenating each row together (all the columns of each row more specifically) and then MD5ing them. Then it walks 4 hexbits at a time through that MD5 and sums them across all rows (4 hexbits to allow me to do huge tables without needing to worry about overflowing). Then, I just compare the result of this query on both tables to see if everything is the same.
By using this binary search, I am able to rather quickly narrow down where the changes are so that I can port them.
It's actually reasonably efficient, so I'm not too concerned about that. What I am concerned about is if this is even necessary. It's screaming to me "You're doing it wrong", but I can't figure out any cleaner method around it...
What are your thoughts?
CHECKSUM TABLES, but from my testing and understanding, it's not reliable across server instances and versions. So I created this query:
$part = '@CRC := MD5(CONCAT_WS(\'#\', COALESCE(`'.
implode('`, "#NULL#"), COALESCE(`', $this->_columns).
'`, "#NULL#")))';
$sql1 = "SELECT COUNT(*) AS cnt,
SUM(CONV(SUBSTRING({$part}, 1, 4), 16, 10)) as a1,
SUM(CONV(SUBSTRING(@CRC, 5, 4), 16, 10)) as a2,
SUM(CONV(SUBSTRING(@CRC, 9, 4), 16, 10)) as a3,
SUM(CONV(SUBSTRING(@CRC, 13, 4), 16, 10)) as a4,
SUM(CONV(SUBSTRING(@CRC, 17, 4), 16, 10)) as a5,
SUM(CONV(SUBSTRING(@CRC, 21, 4), 16, 10)) as a6,
SUM(CONV(SUBSTRING(@CRC, 25, 4), 16, 10)) as a7,
SUM(CONV(SUBSTRING(@CRC, 29, 4), 16, 10)) as a8
FROM `dbname`.`tablename`
WHERE `id` >= $min AND `id` <= $max ";So basically, it's concatenating each row together (all the columns of each row more specifically) and then MD5ing them. Then it walks 4 hexbits at a time through that MD5 and sums them across all rows (4 hexbits to allow me to do huge tables without needing to worry about overflowing). Then, I just compare the result of this query on both tables to see if everything is the same.
By using this binary search, I am able to rather quickly narrow down where the changes are so that I can port them.
It's actually reasonably efficient, so I'm not too concerned about that. What I am concerned about is if this is even necessary. It's screaming to me "You're doing it wrong", but I can't figure out any cleaner method around it...
What are your thoughts?
Solution
We use
It works really great in Master-Slave context where it also allows to sync differences in both directions depending on your choice.
Saidly from what i've seen most people it for replication and i can't provide any copy/pasteable output but if you don't know it it's definitly worth looking into. If you know it i'd like to hear why it doesn't work for you.
To get an overview over many tables you can use something like
mk-table-checksum. It works really great in Master-Slave context where it also allows to sync differences in both directions depending on your choice.
Saidly from what i've seen most people it for replication and i can't provide any copy/pasteable output but if you don't know it it's definitly worth looking into. If you know it i'd like to hear why it doesn't work for you.
To get an overview over many tables you can use something like
mk-table-checksum host1 host2 | mk-checksum-filterContext
StackExchange Code Review Q#16, answer score: 5
Revisions (0)
No revisions yet.