HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Faster tests with dependency analysis

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
withanalysistestsdependencyfaster

Problem

I have a performance issue causing test->code->test cycle to be really slow. The slowness is hard to avoid, since I am doing some heavy image processing, and I am trying to accelerate things a bit by not running functions when it is not needed. For ex: compute some numbers from big images -> serialize results in text files -> resume computations from text files.

Currently, I commit the text files containing the results to share them with the team, and let other people run the tests quickly. But I have to make sure that whenever one dependency is updated (src images, preprocessing functions, etc), those value are recomputed. I looked for a tool which would allow me to describe dependencies and production rules to automate the process and avoid committing data in the codebase. I thought about makefile, but I read a lot of negative advice against it. I found stuff like Scons and other build automation tools, but those are for building software, and they do not seem adapted for my task.

I decided to write a small lib to tackle this issue (my environment is mainly Python and some C). The goal was to have objects knowing the production rule for their own output, and aware of the other objects they depend upon. An object knows if its output is up-to-date by comparing the current MD5 checksum of the file against the last one (stored somewhere in a small temp file).

Am I reinventing a wheel here? Is there a tool out there I should use instead of a custom lib? If not, is this a good pattern for taking care of this problem?

```
import os
import path
from md5 import md5 as stdMd5

# path.root is the root of our repo
hashDir = os.path.join(path.root, '_hashes')

def md5(toHash):
return stdMd5(toHash).hexdigest()

class BaseRule(object):
'''base object, managing work like checking if results are up to date, and
calling production rules if they are not'''

outPath = None

def __init__(self, func):
self.func = func

@property
def hashPath(sel

Solution

# path.root is the root of our repo
hashDir = os.path.join(path.root, '_hashes')


Python style guide recommends that global constants are ALL_CAPS

class BaseRule(object):
    '''base object, managing work like checking if results are up to date, and
    calling production rules if they are not'''

    outPath = None


Python style guide recommends that class level constants are ALL_CAPS

def isUpToDate(self):
        if not os.path.exists(self.outPath):
            return False
        if not os.path.exists(self.hashPath):
            return False
        with open(self.outPath) as f:
            fileHash = self.getOutHas()


You don't use the file you open here, instead you open the file inside getOutHash

with open(self.hashPath) as f:
            storedHash = f.read().strip()
        return storedHash == fileHash

    def get(self):
        inputPathes = dict([(key, inp.get()) for key, inp in self.inputs.items()])


No need for the square brackets. They make it so that it produces a list. But since you are creating a dictionary there is no need to make a list first.

if not self.isUpToDate():
            self.func(outPath=self.outPath, **inputPathes)
            self.storeHash()
        return self.outPath

class StableSrc(BaseRule):
    'source file that never change'

    inputs = {}


I know you'll never modify this, but inputs is an object level attribute above, and its clearest if you keep it that way.

def __init__(self, path):
        self.outPath = path

    def isUpToDate(self):
        return True

def copyTest():
    'test function'

    import shutil

    @rule({'inp' : StableSrc('test.txt')}, 'test2.txt')
    def newTest(inp, outPath):
        print 'copy'
        shutil.copy(inp, outPath)


Usually you've got a small number of possible actions with different inputs/outputs for each use. Your decorator method assumes that each action will be different.

  • You aren't tracking dependencies amongst the rules. So you can't have some like .swig produces .cpp produces .o which is linked to produce .exe



  • Creating a class for every rule is odd. It would make more sense to create an object for each rule.



  • The only thing you've got over makefiles is your use of a hash rather then a timestamp

Code Snippets

# path.root is the root of our repo
hashDir = os.path.join(path.root, '_hashes')
class BaseRule(object):
    '''base object, managing work like checking if results are up to date, and
    calling production rules if they are not'''

    outPath = None
def isUpToDate(self):
        if not os.path.exists(self.outPath):
            return False
        if not os.path.exists(self.hashPath):
            return False
        with open(self.outPath) as f:
            fileHash = self.getOutHas()
with open(self.hashPath) as f:
            storedHash = f.read().strip()
        return storedHash == fileHash


    def get(self):
        inputPathes = dict([(key, inp.get()) for key, inp in self.inputs.items()])
if not self.isUpToDate():
            self.func(outPath=self.outPath, **inputPathes)
            self.storeHash()
        return self.outPath

class StableSrc(BaseRule):
    'source file that never change'

    inputs = {}

Context

StackExchange Code Review Q#4920, answer score: 2

Revisions (0)

No revisions yet.