HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Implementing a thread-safe LRUCache

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
lrucacheimplementingthreadsafe

Problem

Here is the problem I've been trying to tackle:


Design a thread-safe image caching server that can keep in memory only the ten most recently used images.

I chose to implement an LRU cache to solve this as follows:

```
'''
This module defines an LRUCache.

Constraints:

  1. May only hold upto ten items at a time.
  2. Must be able to synchronise multiple requests.
  3. Must be able to update its cache.


'''

import random
import threading

class LRUCache(object):

def __init__(self, cacheSize=10):

# this variables maps image names to file locations
self.Cache = {}

# this variables maps image names to timestamp of last request
# filed for them. Is volatile - requires local clock to never change,
# or that there is a separate clock service that guarantees global
# consistency.
self.RequestTimestamps = {}

self.CacheSize = cacheSize

def insert(self, imageName, imageLocation):

'''
Insert a new image into our cache. If inserting would exceed our cache size,
then we shall make room for it by removing the least recently used.
'''

with threading.Lock():

if len(self.Cache) == self.CacheSize:
self.removeLeastRecentlyUsed()

self.Cache[imageName] = imageLocation
self.RequestTimestamps[imageName] = time.time()

def get(self, imageName):

'''
Retrieve an image from our cache, keeping note of the time we last
retrieved it.
'''

with threading.Lock():

if imageName in self.Cache:

self.RequestTimestamps[imageName] = time.time()
return self.Cache[imageName]

else:

raise KeyError("Not found!")

def removeLeastRecentlyUsed(self):

'''
Should only be called in a threadsafe context.
'''

if self.RequestTimestamps:

# scan through and find the least recently use

Solution

Thread-safeness

As others have pointed out in the comments, your implementation is not thread-safe. threading.Lock() returns a new lock each time it is called, so each thread will be locking a different lock. Instead, you should have a single lock as an instance member object:

def __init__(self, cacheSize=10):

    # this variables maps image names to file locations
    self.Cache = {}

    # this variables maps image names to timestamp of last request
    # filed for them. Is volatile - requires local clock to never change,
    # or that there is a separate clock service that guarantees global
    # consistency. 
    self.RequestTimestamps = {}

    self.CacheSize = cacheSize

    self.lock = threading.Lock()


Then, you use it like so:

def insert(self, imageName, imageLocation):

    '''
    Insert a new image into our cache. If inserting would exceed our cache size,
    then we shall make room for it by removing the least recently used. 
    '''

    with self.lock:

        if len(self.Cache) == self.CacheSize:
            self.removeLeastRecentlyUsed()

    self.Cache[imageName] = imageLocation
    self.RequestTimestamps[imageName] = time.time()


Additionally, using time.time() for access orders can cause inconsistent results: it's not guaranteed to have good precision, and is dependent on the system clock steadily increasing. If the system clock is manually set back, you lose your consistent ordering. Instead, a safer way would be to use an OrderedDict, where you remove and re-insert items as they are accessed, and use OrderedDict.popitem(False) to remove the least-recently inserted item.

PEP8 compliance

Your variables and methods are written with a mixture of PascalCase (Cache.RequestTimestamps), which is typically only used for class names, and camelCase (Cache.removeLeastRecentlyUsed, leastRecentlyUsedKey), which is typically not used in Python. You should use snake_case instead for variables and methods, and use it consistently:

def __init__(self, cache_size=10):

    # this variables maps image names to file locations
    self.cache = {}

    # this variables maps image names to timestamp of last request
    # filed for them. Is volatile - requires local clock to never change,
    # or that there is a separate clock service that guarantees global
    # consistency. 
    self.request_timestamps = {}

    self.cache_size = cache_size


def remove_least_recently_used(self):

    '''
    Should only be called in a threadsafe context.
    '''

    if self.request_timestamps:

        # scan through and find the least recently used key by finding
        # the lowest timestamp among all the keys.

        least_recently_used_key = min(self.request_timestamps, key=lambda image_name: self.request_timestamps[image_name])

        # now that the least recently used key has been found, 
        # remove it from the cache and from our list of request timestamps.

        self.cache.pop(least_recently_used_key, None)
        self.request_timestamps.pop(least_recently_used_key, None)
        return

    # only called in the event requestTimestamps does not exist - randomly 
    # pick a key to erase.

    random_choice = random.choice(self.cache.keys())
    self.cache.pop(random_choice, None)


And while we're looking at this method...

Single exit only

While there are many arguments against the single-exit-only style, none of them apply here. There's no good reason to have the return inside of the if in Cache.removeLastRecentlyUsed. Instead, wrap the rest in an else:

if self.request_timestamps:

        # scan through and find the least recently used key by finding
        # the lowest timestamp among all the keys.

        least_recently_used_key = min(self.request_timestamps, key=lambda image_name: self.request_timestamps[image_name])

        # now that the least recently used key has been found, 
        # remove it from the cache and from our list of request timestamps.

        self.cache.pop(least_recently_used_key, None)
        self.request_timestamps.pop(least_recently_used_key, None)
else:

    # only called in the event requestTimestamps does not exist - randomly 
    # pick a key to erase.

    random_choice = random.choice(self.cache.keys())
    self.cache.pop(random_choice, None)

Code Snippets

def __init__(self, cacheSize=10):

    # this variables maps image names to file locations
    self.Cache = {}

    # this variables maps image names to timestamp of last request
    # filed for them. Is volatile - requires local clock to never change,
    # or that there is a separate clock service that guarantees global
    # consistency. 
    self.RequestTimestamps = {}

    self.CacheSize = cacheSize

    self.lock = threading.Lock()
def insert(self, imageName, imageLocation):

    '''
    Insert a new image into our cache. If inserting would exceed our cache size,
    then we shall make room for it by removing the least recently used. 
    '''

    with self.lock:

        if len(self.Cache) == self.CacheSize:
            self.removeLeastRecentlyUsed()

    self.Cache[imageName] = imageLocation
    self.RequestTimestamps[imageName] = time.time()
def __init__(self, cache_size=10):

    # this variables maps image names to file locations
    self.cache = {}

    # this variables maps image names to timestamp of last request
    # filed for them. Is volatile - requires local clock to never change,
    # or that there is a separate clock service that guarantees global
    # consistency. 
    self.request_timestamps = {}

    self.cache_size = cache_size
def remove_least_recently_used(self):

    '''
    Should only be called in a threadsafe context.
    '''

    if self.request_timestamps:

        # scan through and find the least recently used key by finding
        # the lowest timestamp among all the keys.

        least_recently_used_key = min(self.request_timestamps, key=lambda image_name: self.request_timestamps[image_name])

        # now that the least recently used key has been found, 
        # remove it from the cache and from our list of request timestamps.

        self.cache.pop(least_recently_used_key, None)
        self.request_timestamps.pop(least_recently_used_key, None)
        return

    # only called in the event requestTimestamps does not exist - randomly 
    # pick a key to erase.

    random_choice = random.choice(self.cache.keys())
    self.cache.pop(random_choice, None)
if self.request_timestamps:

        # scan through and find the least recently used key by finding
        # the lowest timestamp among all the keys.

        least_recently_used_key = min(self.request_timestamps, key=lambda image_name: self.request_timestamps[image_name])

        # now that the least recently used key has been found, 
        # remove it from the cache and from our list of request timestamps.

        self.cache.pop(least_recently_used_key, None)
        self.request_timestamps.pop(least_recently_used_key, None)
else:

    # only called in the event requestTimestamps does not exist - randomly 
    # pick a key to erase.

    random_choice = random.choice(self.cache.keys())
    self.cache.pop(random_choice, None)

Context

StackExchange Code Review Q#160277, answer score: 5

Revisions (0)

No revisions yet.