patternpythonMinor
Thread Safe File Operation
Viewed 0 times
filethreadsafeoperation
Problem
I am trying to make file read and update operations thread safe and prevent race conditions in Python. Am I missing out something, or would this work in production?
import os
import time
from contextlib import contextmanager
import errno
current_milli_time = lambda: int(round(time.time() * 1000))
lock_retry_wait = 0.05 # in seconds
@contextmanager
def exclusive_file(filename, mod, lock_wait_timeout=3000):
lockfile = filename + ".lock"
wait_start = current_milli_time()
while True:
try:
# acquire lock
fd = os.open(lockfile, os.O_CREAT|os.O_EXCL)
break
except OSError as exception:
if exception.errno != errno.EEXIST:
raise
else:
t = current_milli_time()
if (t - wait_start) > lock_wait_timeout:
raise
time.sleep(lock_retry_wait)
with open(filename, mod) as f:
yield f
os.close(fd)
os.unlink(lockfile)
with exclusive_file("test.txt","a+") as f:
previous_value = f.read()
f.truncate()
f.write(previous_value + str(time.time()))
time.sleep(20)Solution
- Introduction
If you really mean what you say in the post about making access "thread safe", then creating lockfiles is overkill: it would be better to use
threading.Lock objects. I am going to assume, in the rest of this review, that you mean "multiprocess safe" rather than "thread safe"- Review
-
There's no documentation. What does
exclusive_file do? What guarantees does it provide? What operating systems does it run on?-
If code raises an exception while the lock is held, then the lock is not released.
-
The choice of
lock_retry_wait = 0.05 could end up rate-limiting operations involving a file so that they can happen at most 20 times a second. It would be better to make this into a keyword argument to the exclusive_file function, so that a user with different requirements can specify a different value.-
It seems perverse that
lock_retry_wait is in seconds but lock_wait_timeout is in milliseconds. This seems likely to lead to confusion. It is better for all time values to have the same units.-
The
current_milli_time function is written with lambda rather than def. There does not seem to be a good reason for this. It's better to use def because you can provide a docstring.-
Using Python's built-in
datetime.datetime objects, you could avoid the need for the current_milli_time function.-
When programming a timeout, it is more efficient to compute a deadline once, in advance, than to repeatedly compute an elapsed time.
-
When you have a timeout argument to a function, it's often convenient to have a special value (for example
None) meaning "keep retrying indefinitely".-
Instead of catching all
OSError and then re-raising the ones that aren't errno.EEXIST, catch FileExistsError instead.-
The built-in function
open takes more arguments than mode — there's also buffering, encoding, errors, newline, closefd, opener, and maybe others in forthcoming releases. It would be better for exclusive_file to take *args and **kwargs and forward these to open.-
Because
exclusive_file is a wrapper around the built-in open, it would be better for it to be named exclusive_open.-
The code in the post only works if the process has write access to the directory containing the file. But sometimes you want to exclusively open a file in a directory to which you do not have write access.
- Revised code
from contextlib import contextmanager
from datetime import datetime, timedelta
import os
from time import sleep
@contextmanager
def exclusive_open(filename, *args, timeout=3, retry_time=0.05, **kwargs):
"""Open a file with exclusive access across multiple processes.
Requires write access to the directory containing the file.
Arguments are the same as the built-in open, except for two
additional keyword arguments:
timeout -- Seconds to wait before giving up (or None to retry indefinitely).
retry_time -- Seconds to wait before retrying the lock.
Returns a context manager that closes the file and releases the lock.
"""
lockfile = filename + ".lock"
if timeout is not None:
deadline = datetime.now() + timedelta(seconds=timeout)
while True:
try:
fd = os.open(lockfile, os.O_CREAT|os.O_EXCL)
break
except FileExistsError:
if timeout is not None and datetime.now() >= deadline:
raise
sleep(retry_time)
try:
with open(filename, *args, **kwargs) as f:
yield f
finally:
try:
os.close(fd)
finally:
os.unlink(lockfile)Notes:
-
I chose to document the issue in 2.12 rather than fixing it.
-
Possibly the second
try: finally: is overkill (close rarely fails), but I think it's wise to make every effort to release a lock. Deadlock is rarely fun to debug.Code Snippets
from contextlib import contextmanager
from datetime import datetime, timedelta
import os
from time import sleep
@contextmanager
def exclusive_open(filename, *args, timeout=3, retry_time=0.05, **kwargs):
"""Open a file with exclusive access across multiple processes.
Requires write access to the directory containing the file.
Arguments are the same as the built-in open, except for two
additional keyword arguments:
timeout -- Seconds to wait before giving up (or None to retry indefinitely).
retry_time -- Seconds to wait before retrying the lock.
Returns a context manager that closes the file and releases the lock.
"""
lockfile = filename + ".lock"
if timeout is not None:
deadline = datetime.now() + timedelta(seconds=timeout)
while True:
try:
fd = os.open(lockfile, os.O_CREAT|os.O_EXCL)
break
except FileExistsError:
if timeout is not None and datetime.now() >= deadline:
raise
sleep(retry_time)
try:
with open(filename, *args, **kwargs) as f:
yield f
finally:
try:
os.close(fd)
finally:
os.unlink(lockfile)Context
StackExchange Code Review Q#150139, answer score: 3
Revisions (0)
No revisions yet.