Recent Entries 10
- pattern minor 112d agoImplementing a thread-safe LRUCacheHere is the problem I've been trying to tackle: Design a thread-safe image caching server that can keep in memory only the ten most recently used images. I chose to implement an LRU cache to solve this as follows: ``` ''' This module defines an LRUCache. Constraints: 1. May only hold upto ten items at a time. 2. Must be able to synchronise multiple requests. 3. Must be able to update its cache. ''' import random import threading class LRUCache(object): def __init__(self, cacheSize=10): # this variables maps image names to file locations self.Cache = {} # this variables maps image names to timestamp of last request # filed for them. Is volatile - requires local clock to never change, # or that there is a separate clock service that guarantees global # consistency. self.RequestTimestamps = {} self.CacheSize = cacheSize def insert(self, imageName, imageLocation): ''' Insert a new image into our cache. If inserting would exceed our cache size, then we shall make room for it by removing the least recently used. ''' with threading.Lock(): if len(self.Cache) == self.CacheSize: self.removeLeastRecentlyUsed() self.Cache[imageName] = imageLocation self.RequestTimestamps[imageName] = time.time() def get(self, imageName): ''' Retrieve an image from our cache, keeping note of the time we last retrieved it. ''' with threading.Lock(): if imageName in self.Cache: self.RequestTimestamps[imageName] = time.time() return self.Cache[imageName] else: raise KeyError("Not found!") def removeLeastRecentlyUsed(self): ''' Should only be called in a threadsafe context. ''' if self.RequestTimestamps: # scan through and find the least recently use
- pattern minor 112d agoImplementation of lru_cache for daily "top categories" queryIn my website, I want to show top categories by products' view / sell counts. Since this is not something changes every hour, I want to fetch real results only once a day from database (may be once a week). Here is what I have done: ``` def global_context(request): ## ... some other queries today = datetime.datetime.now().day top_cats = top_categories(today) # hit once a day return {'top_categories': top_cats} @lru_cache(maxsize=2) def top_categories(day): # fetch first 50 top-selling products top_products = Product.objects.all().order_by('-sell_count')[:50] # get first 10 unique categories categories = list(set([product.categori for product in top_products]))[:10] return categories ``` This code is working. (I have tested with `per minute` version). But I wonder, - Is this correct way to achieve my goal? - When `currsize` reached to `maxsize` nothing happened. Code kept working. Why? - Should I manually reset / clear cache with `cache_clear`?
- pattern minor 112d agocache with least accessed items evictionI wrote this code, in order to implement a cache decorator which handles least accessed eviction. The role of this decorator is to memoize decorated functions calls with args and kwargs and return the previously computed value if still in cache. Warning, this handles only serializable args and kwargs, for the sake of simplicity. Please review it and tell me if this makes sense. The code: ``` import json import hashlib from collections import OrderedDict class EvictionCache(OrderedDict): def __init__(self, *args, **kwargs): max_size = kwargs.pop('max_size', None) super(EvictionCache, self).__init__(*args, **kwargs) self._max_size = max_size @property def full(self): return len(self) >= self._max_size def set(self, hash, value): if self.full: self.evict() super(EvictionCache, self).__setitem__(hash, value) def get(self, hash): value = super(EvictionCache, self).pop(hash, None) if value: super(EvictionCache, self).__setitem__(hash, value) return value def evict(self): print 'evicting oldest cached item' super(EvictionCache, self).popitem(last=False) def cache(max_size=None): def make_md5(args, kwargs): data = dict(args=args, kwargs=kwargs) md5 = hashlib.md5(json.dumps(data, sort_keys=True)).hexdigest() return md5 def wrapper(f): # make this cache belong to the decorated function to prevent the cache # to be shared between different decorated functions f._cache = EvictionCache(max_size=max_size) def inner(*args, **kwargs): md5 = make_md5(args, kwargs) res = f._cache.get(md5) if res: print 'from cache' return res res = f(*args, **kwargs) print 'to cache' f._cache.set(md5, res) return res return inner return wrapper @cache(max_size=
- pattern minor 112d agoApostle Galaxies: dict subclass with disk cachingI am an astrophysicist working on large simulations as part of the APOSTLE project. The output of the simulations I use are large (TBs) and are stored in tables spread across multiple hdf5 files. Often, I'm interested in studying a particular simulated galaxy, which is represented by a collection of particles of different types and which have different properties. Prior to writing this code, I would typically need to read tables corresponding to various properties for all particles in the simulation, then construct a selection to extract only those particles belonging to the galaxy of interest, copy those, and discard the large tables. Given that I often study one or a few galaxies for a while, I thought it would be a good idea to build a little interface which computes selection "masks" for a given galaxy once, the first time it is run, and caches those to disk. In addition, it caches any additional properties of the galaxy particles that get pulled out of the big tables, and some associated metadata. My solution is a subclass of `dict`, so once an `ApostleObject` (i.e. an abstraction of a galaxy) is initialized, call it `AO`, I can extract e.g. particle properties from the object by `AO['T_g']`. (In this example 'T_g' refers to the gas temperature. Another class, `ApostleFileset`, is aware of all available keys and how to get the raw tables for all particles from the disk.) If `'T_g'` has been loaded before it will already be in memory and simply be returned, otherwise it will be loaded from the master tables if the key is valid, otherwise a `KeyError` is raised. There are two other features of note: - I just implemented `__getattr__` as an alternative to `__getitem__` since writing `AO.T_g` is a bit more succinct and readable than `AO['T_g']` (though `__getitem__` is still useful in many contexts). - This one has to do with how the caching is implemented - the dict subclass itself is actually buried inside a thin wrapper that is just a context manager. Th
- pattern minor 112d agoUsing Concurrent Dictionary and Lazy<T> to cache expensive query results and only run query once in a threadsafe mannerOk, so I'm querying a webservice. This webservice is slow with multiple o's. I want to cache the results of the query, because I only want to query on a given set of parameters once during my transaction (In addition to the query being slow, it's also rate limited, so I don't want to perform unnecessary queries). Finally, to speed things up, I want to be able to run queries in parallel, as latency is my biggest problem here. With all of that, I think I need a thread safe way to cache data and only run the queries I actually need to run. I think I've managed to cobble that together with Concurrent Dictionary and Lazy ``` private static ConcurrentDictionary>> DataCache = new ConcurrentDictionary>>(); public static IEnumerable GetData(KeyObject key, QueryObject query) { var value = new Lazy>(() => { return query.Run(); }); DataCache.TryAdd(key, value); return DataCache[key].Value; } } ``` If I'm correct in understanding how all the moving parts work, it should go something like this. - Generate dictionary key and Lazy intializer for value - Try and add lazy initializer to dictionary with key. If another thread has already added key, fail and continue. - Try and get value of lazy intializer on record in the dictionary. If another thread is already getting value, block until other thread has retrieved value. Does this sound correct?
- pattern minor 112d agoC# cache controllerProblem Currently my project utilises local caching wherever possible. There is a lot of similar looking code throughout that takes the form: ``` private static readonly ConcurrentDictionary GetPMLocks = new ConcurrentDictionary(); /// /// Retrieve a PM by it's unique ID /// public static PrivateMessage GetPrivateMessage(int messageID) { var cacheIndex = GetPMCacheIndex(messageID); var cache = GetCache(); if (cache[cacheIndex == null]) { try { GetPMLocks.TryAdd(messageID, new object()); lock (CacheCommentLocks[messageID]) { if (cache[cacheIndex] == null) { using (var db = new DBContext()) { var q = CompiledQueries.GetPMByID(db, messageID); var pm = new PrivateMessage(q); cache.Add(cacheIndex, pm, null, Cache.NoAbsoluteExpiration, Cache.NoSlidingExpiration, CacheItemPriority.Normal, null); } } } } finally { object tR; GetPMLocks.TryRemove(messageID, out tR); } } return (PrivateMessage) cache[cacheIndex]; } ``` Solution I wish to refactor this to be more concise, and to route every cache request through a central controller so it's easy to fix bugs/modify & measure the caching process and performance. So I've created a new `LocalCacheController` class: ``` public class LocalCacheController { private Cache CacheObj_ { get; set; } private Cache CacheObj { get { if (CacheObj_ == null) CacheObj_ = GetCache(); return CacheObj_; } } /// /// Cache index this cacher is for /// private string ForCacheIndex { get; } public LocalCacheController(string cacheIndex, Cache cache
- pattern minor 112d agoCached repository implementation for small lists of data modelsI have been using Repositories in my ASP.NET MVC projects and I felt the need to fully cache small tables data (dictionaries, cities, countries etc.). This kind of information is changed (very) infrequently, so caching brings a great benefit. Also, I thought it would be nice to have cached repositories setup as simple as possible (even invisible to the services using them). My inspiration was this nice article, but I needed to cache entire table content, rather than query results. Here it is what I have managed to do so far: 1) Repository interface ``` public interface IRepository : IRepository where T : class { IQueryable AllNoTracking { get; } IQueryable All { get; } DbSet GetSet { get; } T Get(int id); void Insert(T entity); void BulkInsert(IEnumerable entities); void Delete(T entity); void RemoveRange(IEnumerable range); void ClearAll(); void Update(T entity); IList ExecProcedure(string procedureName, IList> parameters); void Truncate(); IList Select(string queryText); } /// /// provides methods specific to a cached repository (as opposed to those from normal repositories) /// public interface ICachedRepository where T : class, new() { string CacheKey { get; } void InvalidateCache(); void InsertIntoCache(T item); } ``` 2) Cached repository should work above a normal repository, that's why a cached repository requires a reference to a normal one. It is not fully implemented, but the core functionality is there: ``` public class CachedRepository : ICachedRepository, IRepository where T : class, new() { #region Properties private int AbsoluteExpiration { get; } private int SlidingExpiration { get; } #endregion #region Variables private readonly IRepository _modelRepository; private static readonly object CacheLockObject = new object(); #endregion #region Properties public string CacheKey => $"CachedRepository-{typeof(T).Name}"; #endreg
- pattern minor 112d agoCalculation caching experimentThere are a lot of numerical properties to be invoked many times with an expensive calculation. Let’s say for the sake of example, they expose a Factorial: ``` public class Analysis { public double n10 => F(10); public double n100 => F(100); double F(int n) => n <= 1 ? 1 : n * F(n-1); } ``` I was looking for a way to cache them while keeping syntax clean, so property calculation logic will not intermix with caching. A proposed solution is: ``` public class Analysis { public double n10 => this.Cached() || F(10); public double n100 => this.Cached() || F(100); double F(int n) => n <= 1 ? 1 : n * F(n-1); } ``` Where library code is: ``` public static class Caching { static ConditionalWeakTable> Values { get; } = new ConditionalWeakTable>(); public static Value Cached(this object target, [CallerMemberName] string name = null) => Values.GetOrCreateValue(target).ContainsKey(name) ? new Value(() => Values.GetOrCreateValue(target)[name], null) : new Value(null, v => Values.GetOrCreateValue(target)[name] = v); } ``` And: ``` public class Value { public static implicit operator Value(double value) => new Value(() => value, null); public static implicit operator double(Value value) => value.Getter(); public static bool operator true(Value value) => value.Getter != null; public static bool operator false(Value value) => value.Getter == null; public static Value operator |(Value cached, Value computed) { cached.Setter(computed); return computed; } public Value(Func getter, Action setter) { Getter = getter; Setter = setter; } Func Getter { get; } Action Setter { get; } } ```
- pattern minor 112d agoLRU cache in PythonHere is my simple code for LRU cache in Python 2.7. Appreciate if anyone could review for logic correctness and also potential performance improvements. A confusion want to ask for advice is, I am using a `list` to track access time, the first element of the list the is least time accessed, and the last element is the most recent accessed element. And I am using `remove` method of `list`, for example, statement `accessQueue.remove(key)`, wondering what is the time complexity of `list.remove`, is it linear or log(N)? ``` from collections import defaultdict kvStore = defaultdict(str) accessQueue = [] maxSize = 10 def cacheSet(key, value): kvStore[key] = value if key in accessQueue: accessQueue.remove(key) accessQueue.append(key) else: accessQueue.append(key) if len(accessQueue) > maxSize: accessQueue.pop(0) def cacheGet(key): if key in kvStore: accessQueue.remove(key) accessQueue.append(key) return kvStore[key] else: return None def cachePrint(): for k,v in kvStore.items(): print k,v if __name__ == "__main__": for i in range(10): cacheSet(i, "value " + str(i)) for i in range(10): print cacheGet(i) cacheSet(11, "value " + str(11)) cachePrint() ```
- pattern minor 112d agoProperty cachingTrying to figure out how to efficiently cache property calculations with dependency tracking to invalidate the cache. Here is the syntax I have at the moment (one `Cache` instance supports multiple object properties): ``` class Multiplication { public int Arg1 { get; set; } public int Arg2 { get; set; } public int Result => Cache .Lazy(() => Arg1 * Arg2) .From(() => Arg1, () => Arg2); Cache Cache { get; } = new Cache(); } ``` Here `Result` is recalculated on `Arg*` change. Implementation classes: ``` public class Cache { public LazyProperty Lazy( Func factory, [CallerMemberName] string memberName = "") => new LazyProperty( value => (Cached)Store.GetOrAdd(memberName, value), new Cached(factory)); ConcurrentDictionary Store { get; } = new ConcurrentDictionary(); } ``` This one helps with flow API: ``` public class LazyProperty { public static implicit operator T(LazyProperty property) => property.Resolve(); internal LazyProperty(Func, Cached> getOrAdd, Cached value) { GetOrAdd = getOrAdd; Value = value; } public LazyProperty From(params Func[] dependencies) => new LazyProperty(GetOrAdd, Value.Depending(dependencies)); Func, Cached> GetOrAdd { get; } Cached Value { get; } Cached Resolve() => GetOrAdd(Value); } ``` Dependency tracking done by: ``` public class Cached { public static implicit operator T(Cached cache) => cache.Value; public Cached(Func factory, params Func[] dependencies) { if (factory == null) throw new ArgumentNullException("factory"); if(dependencies == null || dependencies.Any(d => d == null)) throw new ArgumentNullException("dependancies"); _factory = factory; _dependencies = dependencies; } public Cached Depending(params Func[] dependencies) => new Cached(_factory, _dependencies.Concat(dependenci