patternsqlMinor
Locking in MySQL stored procedure for deduplication
Viewed 0 times
storedproceduremysqllockingfordeduplication
Problem
Multiple upstream servers are sending notifications to a load-balanced django app. If all the upstream servers are working correctly, the app will always receive duplicates of these notifications (since all the upstream notifiers should send the same notifications; it's just for redundancy). I want to be able to filter out these duplicates. However, since the python app is load-balanced, the only place we can check for these duplicates is in the database (unless we did distributed locking, etc, which sounds like more of a pain).
To do this, I'm hashing the messages, then invoking a stored procedure in the database that checks if a message received in the last 10 seconds had the same hash (I know about hash collisions; it's a risk). I want to be 99% sure the store procedure is safe against race conditions.
Here's the SQL code that seems to work:
```
DROP TABLE IF EXISTS openduty_dedup;
CREATE TABLE openduty_dedup (
request_hash CHAR(40) NOT NULL UNIQUE PRIMARY KEY,
triggered TIMESTAMP NOT NULL);
-- main procedure for checking if a notification is a duplicate
-- given the SHA-1 hash of a notification, it will return 1 if the
-- same notification was received in the last 10 seconds (and was not
-- treated as a duplicate then), otherwise it will return 0
DROP PROCEDURE IF EXISTS openduty_check_duplicate;
CREATE PROCEDURE openduty_check_duplicate(IN new_hash CHAR(40))
BEGIN
DECLARE last_trigger TIMESTAMP;
DECLARE now TIMESTAMP;
DECLARE is_duplicate INT;
SELECT CURRENT_TIMESTAMP INTO now;
DO GET_LOCK('openduty_check_duplicate', 10);
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
START TRANSACTION;
SELECT triggered FROM openduty_dedup WHERE request_hash = new_hash INTO last_trigger;
IF last_trigger IS NULL THEN
INSERT INTO openduty_dedup (request_hash, triggered) VALUES (new_hash, now);
SELECT 0 INTO is_duplicate;
ELSEIF DATE_SUB(now, INTERVAL 10 SECOND) > last_trigger THEN
UPDAT
To do this, I'm hashing the messages, then invoking a stored procedure in the database that checks if a message received in the last 10 seconds had the same hash (I know about hash collisions; it's a risk). I want to be 99% sure the store procedure is safe against race conditions.
Here's the SQL code that seems to work:
```
DROP TABLE IF EXISTS openduty_dedup;
CREATE TABLE openduty_dedup (
request_hash CHAR(40) NOT NULL UNIQUE PRIMARY KEY,
triggered TIMESTAMP NOT NULL);
-- main procedure for checking if a notification is a duplicate
-- given the SHA-1 hash of a notification, it will return 1 if the
-- same notification was received in the last 10 seconds (and was not
-- treated as a duplicate then), otherwise it will return 0
DROP PROCEDURE IF EXISTS openduty_check_duplicate;
CREATE PROCEDURE openduty_check_duplicate(IN new_hash CHAR(40))
BEGIN
DECLARE last_trigger TIMESTAMP;
DECLARE now TIMESTAMP;
DECLARE is_duplicate INT;
SELECT CURRENT_TIMESTAMP INTO now;
DO GET_LOCK('openduty_check_duplicate', 10);
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
START TRANSACTION;
SELECT triggered FROM openduty_dedup WHERE request_hash = new_hash INTO last_trigger;
IF last_trigger IS NULL THEN
INSERT INTO openduty_dedup (request_hash, triggered) VALUES (new_hash, now);
SELECT 0 INTO is_duplicate;
ELSEIF DATE_SUB(now, INTERVAL 10 SECOND) > last_trigger THEN
UPDAT
Solution
Will this work as intended?
Not really.
You've indented the code as if you expect that the statements betweeb
That is not the case.
You shouldn't use
A better solution for MySQL 5.5 and later looks like this:
That will throw an exception and terminate the procedure after the timeout, if the lock has not been obtained. (This is a useful tactic with the event scheduler, when you need to ensure that concurrent invocations of events don't pile up.)
Note that the logical operator
The same is true of
In particular, I'm worried that the commit won't happen until after the lock is released.
No, the lock is released after the commit, here. The get and release operations are only functions. They execute in sequence like everything else.
Also, it's usually best to leave the caller in charge of the transaction, because if this procedure is called while already in a transaction,
Not really.
You've indented the code as if you expect that the statements betweeb
DO GET_LOCK... and DO RELEASE_LOCK... would only execute if the lock is obtained. That is not the case.
You shouldn't use
GET_LOCK() in a void context (with DO) unless you don't actually care whether you actually got the lock. What you have written will wait up to 10 seconds for the lock, and if it isn't available, continues anyway with the next statement. That does not sound like what you want.A better solution for MySQL 5.5 and later looks like this:
IF GET_LOCK('my_lock',10) IS NOT TRUE THEN
SIGNAL SQLSTATE '45000'
SET MESSAGE_TEXT = 'timed out waiting for named semaphore lock';
END IF;That will throw an exception and terminate the procedure after the timeout, if the lock has not been obtained. (This is a useful tactic with the event scheduler, when you need to ensure that concurrent invocations of events don't pile up.)
Note that the logical operator
IS NOT TRUE, used above, matches both 0 (FALSE) and NULL, either of which can indicate that you didn't get the lock.The same is true of
RELEASE_LOCK(), which returns TRUE only if you actually held the lock you tried to release. If you try to release a lock you don't own, that does not throw an error. You have to check return values.In particular, I'm worried that the commit won't happen until after the lock is released.
No, the lock is released after the commit, here. The get and release operations are only functions. They execute in sequence like everything else.
Also, it's usually best to leave the caller in charge of the transaction, because if this procedure is called while already in a transaction,
START TRANSACTION silently commits the previous transaction and starts a new one, leaving the caller unable to roll back any work done before the procedure was called. That may not be applicable, here, but it's good practice to handle it consistently across your code base.Code Snippets
IF GET_LOCK('my_lock',10) IS NOT TRUE THEN
SIGNAL SQLSTATE '45000'
SET MESSAGE_TEXT = 'timed out waiting for named semaphore lock';
END IF;Context
StackExchange Code Review Q#156126, answer score: 2
Revisions (0)
No revisions yet.