HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Dynamically importing Python source modules from a given directory

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
modulesdirectorydynamicallysourcepythonimportingfromgiven

Problem

I'm working on an open-source test framework which needs to dynamically import Python source modules from a given directory.

I can't just use importlib.import_module() or __import__(), because it's a black box and I need to do some source-code-rewriting. On the other hand, I feel the PEP 302 Import Hooks to be overkill for what I'm trying to do (they seem to be designed with importing totally foreign file formats in mind), so I rolled my own:

import os
import sys
import types

def import_module(dir_path, module_name):
    if module_name in sys.modules:
        return sys.modules[module_name]

    filename = resolve_filename(dir_path, module_name)

    with open(filename, 'r', encoding='utf-8') as f:
        source = f.read()
    # Here I do some AST munging    
    code = compile(source, filename, 'exec')

    module = create_module_object(module_name, filename)
    exec(code, module.__dict__)
    sys.modules[module_name] = module
    return module

def resolve_filename(dir_path, module_name):
    filename = os.path.join(dir_path, *module_name.split('.'))

    # I happen to know that the calling code will already have
    # determined whether it's a package or not
    if os.path.isdir(filename):
        filename = os.path.join(filename, '__init__.py')
    else:
        filename += '.py'
    return filename    

def create_module_object(module_name, filename):
    module = types.ModuleType(module_name)
    module.__file__ = filename

    if '__init__.py' in filename:
        module.__package__ = module_name
        module.__path__ = [os.path.dirname(filename)]
    else:
        if '.' in module_name:
            module.__package__, _ = module_name.rsplit('.', 1)
        else:
            module.__package__ = ''

    return module


To paraphrase Greenspun's Tenth Rule:


Any sufficiently complicated Python program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of the import system.

So, does my code do everything it's

Solution

I took @GarethRees's advice and subclassed importlib.abc.SourceLoader. After reading importlib's source code, I arrived at the following implementation:

import importlib.abc

def import_module(dir_path, module_name):
    filename = resolve_filename(dir_path, module_name)

    # I got test failures when I left this out, even though I thought it
    # was a responsibility of the loader.
    # If you know why this is, please enlighten me!
    if module_name in sys.modules:
        return sys.modules[module_name]

    return ASTFrobnicatingLoader(module_name, filename).load_module(module_name)

# in importlib, this function would be the job of the 'finder'
def resolve_filename(dir_path, module_name):
    filename = os.path.join(dir_path, *module_name.split('.'))
    if os.path.isdir(filename):
        filename = os.path.join(filename, '__init__.py')
    else:
        filename += '.py'
    return filename    

class ASTFrobnicatingLoader(importlib.abc.FileLoader, importlib.abc.SourceLoader):
    def get_code(self, fullname):
        source = self.get_source(fullname)
        path = self.get_filename(fullname)

        parsed = ast.parse(source)
        self.frobnicate(parsed)

        return compile(parsed, path, 'exec', dont_inherit=True, optimize=0)

    def module_repr(self, module):
        return ''.format(module.__name__, module.__file__)


import_module now creates an instance of my custom loader and calls its load_module template method, which is provided by the abstract base class. By inheriting from both FileLoader and SourceLoader, I get a 'free' implementation of the whole protocol and I only need to override get_code.

In Python 3.4, SourceLoader provides a source_to_code method which you can override. It would've been ideal for my purposes (because the only thing I'm customising is the generation of the code object) but sadly I'm stuck with having to override the whole of get_code and calling get_source and get_filename manually.

Code Snippets

import importlib.abc

def import_module(dir_path, module_name):
    filename = resolve_filename(dir_path, module_name)

    # I got test failures when I left this out, even though I thought it
    # was a responsibility of the loader.
    # If you know why this is, please enlighten me!
    if module_name in sys.modules:
        return sys.modules[module_name]

    return ASTFrobnicatingLoader(module_name, filename).load_module(module_name)


# in importlib, this function would be the job of the 'finder'
def resolve_filename(dir_path, module_name):
    filename = os.path.join(dir_path, *module_name.split('.'))
    if os.path.isdir(filename):
        filename = os.path.join(filename, '__init__.py')
    else:
        filename += '.py'
    return filename    


class ASTFrobnicatingLoader(importlib.abc.FileLoader, importlib.abc.SourceLoader):
    def get_code(self, fullname):
        source = self.get_source(fullname)
        path = self.get_filename(fullname)

        parsed = ast.parse(source)
        self.frobnicate(parsed)

        return compile(parsed, path, 'exec', dont_inherit=True, optimize=0)

    def module_repr(self, module):
        return '<module {!r} from {!r}>'.format(module.__name__, module.__file__)

Context

StackExchange Code Review Q#37083, answer score: 4

Revisions (0)

No revisions yet.