patternpythonMinor
Implementing a thread-safe LRUCache
Viewed 0 times
lrucacheimplementingthreadsafe
Problem
Here is the problem I've been trying to tackle:
Design a thread-safe image caching server that can keep in memory only the ten most recently used images.
I chose to implement an LRU cache to solve this as follows:
```
'''
This module defines an LRUCache.
Constraints:
'''
import random
import threading
class LRUCache(object):
def __init__(self, cacheSize=10):
# this variables maps image names to file locations
self.Cache = {}
# this variables maps image names to timestamp of last request
# filed for them. Is volatile - requires local clock to never change,
# or that there is a separate clock service that guarantees global
# consistency.
self.RequestTimestamps = {}
self.CacheSize = cacheSize
def insert(self, imageName, imageLocation):
'''
Insert a new image into our cache. If inserting would exceed our cache size,
then we shall make room for it by removing the least recently used.
'''
with threading.Lock():
if len(self.Cache) == self.CacheSize:
self.removeLeastRecentlyUsed()
self.Cache[imageName] = imageLocation
self.RequestTimestamps[imageName] = time.time()
def get(self, imageName):
'''
Retrieve an image from our cache, keeping note of the time we last
retrieved it.
'''
with threading.Lock():
if imageName in self.Cache:
self.RequestTimestamps[imageName] = time.time()
return self.Cache[imageName]
else:
raise KeyError("Not found!")
def removeLeastRecentlyUsed(self):
'''
Should only be called in a threadsafe context.
'''
if self.RequestTimestamps:
# scan through and find the least recently use
Design a thread-safe image caching server that can keep in memory only the ten most recently used images.
I chose to implement an LRU cache to solve this as follows:
```
'''
This module defines an LRUCache.
Constraints:
- May only hold upto ten items at a time.
- Must be able to synchronise multiple requests.
- Must be able to update its cache.
'''
import random
import threading
class LRUCache(object):
def __init__(self, cacheSize=10):
# this variables maps image names to file locations
self.Cache = {}
# this variables maps image names to timestamp of last request
# filed for them. Is volatile - requires local clock to never change,
# or that there is a separate clock service that guarantees global
# consistency.
self.RequestTimestamps = {}
self.CacheSize = cacheSize
def insert(self, imageName, imageLocation):
'''
Insert a new image into our cache. If inserting would exceed our cache size,
then we shall make room for it by removing the least recently used.
'''
with threading.Lock():
if len(self.Cache) == self.CacheSize:
self.removeLeastRecentlyUsed()
self.Cache[imageName] = imageLocation
self.RequestTimestamps[imageName] = time.time()
def get(self, imageName):
'''
Retrieve an image from our cache, keeping note of the time we last
retrieved it.
'''
with threading.Lock():
if imageName in self.Cache:
self.RequestTimestamps[imageName] = time.time()
return self.Cache[imageName]
else:
raise KeyError("Not found!")
def removeLeastRecentlyUsed(self):
'''
Should only be called in a threadsafe context.
'''
if self.RequestTimestamps:
# scan through and find the least recently use
Solution
Thread-safeness
As others have pointed out in the comments, your implementation is not thread-safe.
Then, you use it like so:
Additionally, using
PEP8 compliance
Your variables and methods are written with a mixture of
And while we're looking at this method...
Single exit only
While there are many arguments against the single-exit-only style, none of them apply here. There's no good reason to have the
As others have pointed out in the comments, your implementation is not thread-safe.
threading.Lock() returns a new lock each time it is called, so each thread will be locking a different lock. Instead, you should have a single lock as an instance member object:def __init__(self, cacheSize=10):
# this variables maps image names to file locations
self.Cache = {}
# this variables maps image names to timestamp of last request
# filed for them. Is volatile - requires local clock to never change,
# or that there is a separate clock service that guarantees global
# consistency.
self.RequestTimestamps = {}
self.CacheSize = cacheSize
self.lock = threading.Lock()Then, you use it like so:
def insert(self, imageName, imageLocation):
'''
Insert a new image into our cache. If inserting would exceed our cache size,
then we shall make room for it by removing the least recently used.
'''
with self.lock:
if len(self.Cache) == self.CacheSize:
self.removeLeastRecentlyUsed()
self.Cache[imageName] = imageLocation
self.RequestTimestamps[imageName] = time.time()Additionally, using
time.time() for access orders can cause inconsistent results: it's not guaranteed to have good precision, and is dependent on the system clock steadily increasing. If the system clock is manually set back, you lose your consistent ordering. Instead, a safer way would be to use an OrderedDict, where you remove and re-insert items as they are accessed, and use OrderedDict.popitem(False) to remove the least-recently inserted item.PEP8 compliance
Your variables and methods are written with a mixture of
PascalCase (Cache.RequestTimestamps), which is typically only used for class names, and camelCase (Cache.removeLeastRecentlyUsed, leastRecentlyUsedKey), which is typically not used in Python. You should use snake_case instead for variables and methods, and use it consistently:def __init__(self, cache_size=10):
# this variables maps image names to file locations
self.cache = {}
# this variables maps image names to timestamp of last request
# filed for them. Is volatile - requires local clock to never change,
# or that there is a separate clock service that guarantees global
# consistency.
self.request_timestamps = {}
self.cache_size = cache_sizedef remove_least_recently_used(self):
'''
Should only be called in a threadsafe context.
'''
if self.request_timestamps:
# scan through and find the least recently used key by finding
# the lowest timestamp among all the keys.
least_recently_used_key = min(self.request_timestamps, key=lambda image_name: self.request_timestamps[image_name])
# now that the least recently used key has been found,
# remove it from the cache and from our list of request timestamps.
self.cache.pop(least_recently_used_key, None)
self.request_timestamps.pop(least_recently_used_key, None)
return
# only called in the event requestTimestamps does not exist - randomly
# pick a key to erase.
random_choice = random.choice(self.cache.keys())
self.cache.pop(random_choice, None)And while we're looking at this method...
Single exit only
While there are many arguments against the single-exit-only style, none of them apply here. There's no good reason to have the
return inside of the if in Cache.removeLastRecentlyUsed. Instead, wrap the rest in an else:if self.request_timestamps:
# scan through and find the least recently used key by finding
# the lowest timestamp among all the keys.
least_recently_used_key = min(self.request_timestamps, key=lambda image_name: self.request_timestamps[image_name])
# now that the least recently used key has been found,
# remove it from the cache and from our list of request timestamps.
self.cache.pop(least_recently_used_key, None)
self.request_timestamps.pop(least_recently_used_key, None)
else:
# only called in the event requestTimestamps does not exist - randomly
# pick a key to erase.
random_choice = random.choice(self.cache.keys())
self.cache.pop(random_choice, None)Code Snippets
def __init__(self, cacheSize=10):
# this variables maps image names to file locations
self.Cache = {}
# this variables maps image names to timestamp of last request
# filed for them. Is volatile - requires local clock to never change,
# or that there is a separate clock service that guarantees global
# consistency.
self.RequestTimestamps = {}
self.CacheSize = cacheSize
self.lock = threading.Lock()def insert(self, imageName, imageLocation):
'''
Insert a new image into our cache. If inserting would exceed our cache size,
then we shall make room for it by removing the least recently used.
'''
with self.lock:
if len(self.Cache) == self.CacheSize:
self.removeLeastRecentlyUsed()
self.Cache[imageName] = imageLocation
self.RequestTimestamps[imageName] = time.time()def __init__(self, cache_size=10):
# this variables maps image names to file locations
self.cache = {}
# this variables maps image names to timestamp of last request
# filed for them. Is volatile - requires local clock to never change,
# or that there is a separate clock service that guarantees global
# consistency.
self.request_timestamps = {}
self.cache_size = cache_sizedef remove_least_recently_used(self):
'''
Should only be called in a threadsafe context.
'''
if self.request_timestamps:
# scan through and find the least recently used key by finding
# the lowest timestamp among all the keys.
least_recently_used_key = min(self.request_timestamps, key=lambda image_name: self.request_timestamps[image_name])
# now that the least recently used key has been found,
# remove it from the cache and from our list of request timestamps.
self.cache.pop(least_recently_used_key, None)
self.request_timestamps.pop(least_recently_used_key, None)
return
# only called in the event requestTimestamps does not exist - randomly
# pick a key to erase.
random_choice = random.choice(self.cache.keys())
self.cache.pop(random_choice, None)if self.request_timestamps:
# scan through and find the least recently used key by finding
# the lowest timestamp among all the keys.
least_recently_used_key = min(self.request_timestamps, key=lambda image_name: self.request_timestamps[image_name])
# now that the least recently used key has been found,
# remove it from the cache and from our list of request timestamps.
self.cache.pop(least_recently_used_key, None)
self.request_timestamps.pop(least_recently_used_key, None)
else:
# only called in the event requestTimestamps does not exist - randomly
# pick a key to erase.
random_choice = random.choice(self.cache.keys())
self.cache.pop(random_choice, None)Context
StackExchange Code Review Q#160277, answer score: 5
Revisions (0)
No revisions yet.