HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonModerate

Recursively Save Python Dictionaries to HDF5 Files Using h5py

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
hdf5recursivelysavefilespythonusingh5pydictionaries

Problem

I have a bunch of custom classes for which I've implemented a method of saving files in HDF5 format using the h5py module.

A bit of background: I've accomplished this by first implementing a serialization interface that represents the data in each class as a dictionary containing specific types of data (at the moment, the representations can only contain numpy.ndarray, numpy.int64, numpy.float64, str, and other dictionary instances). The advantage of this limitation is that it puts the dictionaries in data types that are h5py defaults. I was surprised to find a dearth of code tutorials on recursively saving dictionaries to HDF5 files, so I would really appreciate feedback on my implementation.

Imports:

import numpy as np
import h5py
import os


Saving the data:

```
def __save_dict_to_hdf5__(cls, dic, filename):
"""
Save a dictionary whose contents are only strings, np.float64, np.int64,
np.ndarray, and other dictionaries following this structure
to an HDF5 file. These are the sorts of dictionaries that are meant
to be produced by the ReportInterface__to_dict__() method.
"""
assert not os.path.exists(filename), 'this is a noclobber operation bud'
with h5py.File(filename, 'w') as h5file:
cls.__recursively_save_dict_contents_to_group__(h5file, '/', dic)

@classmethod
def __recursively_save_dict_contents_to_group__(cls, h5file, path, dic):
"""
Take an already open HDF5 file and insert the contents of a dictionary
at the current path location. Can call itself recursively to fill
out HDF5 files with the contents of a dictionary.
"""
assert type(dic) is types.DictionaryType, "must provide a dictionary"
assert type(path) is types.StringType, "path must be a string"
assert type(h5file) is h5py._hl.files.File, "must be an open h5py file"
for key in dic:
assert type(key) == types.StringType, 'dict keys must be strings to save to hdf5'
if type(dic[key]) in (np.int64, np.fl

Solution

Here's what I tested.

I took out the classmethod stuff to make it easier to read, and simplified names a bit. I'll defer judgement on whether that stuff is needed as part of a larger package or not.

My h5py is installed with Python3, so I had to change the handling of types. Use of isinstance is, I think a preferred testing tool, but I it's not something I've focused on. Most of my code changes are in the recursive write function.

I'll let others focus on preferred naming conventions and error checking.

import numpy as np
import h5py
import os
def save_dict_to_hdf5(dic, filename):
    """
    ....
    """
    with h5py.File(filename, 'w') as h5file:
        recursively_save_dict_contents_to_group(h5file, '/', dic)

def recursively_save_dict_contents_to_group(h5file, path, dic):
    """
    ....
    """
    for key, item in dic.items():
        if isinstance(item, (np.ndarray, np.int64, np.float64, str, bytes)):
            h5file[path + key] = item
        elif isinstance(item, dict):
            recursively_save_dict_contents_to_group(h5file, path + key + '/', item)
        else:
            raise ValueError('Cannot save %s type'%type(item))

def load_dict_from_hdf5(filename):
    """
    ....
    """
    with h5py.File(filename, 'r') as h5file:
        return recursively_load_dict_contents_from_group(h5file, '/')

def recursively_load_dict_contents_from_group(h5file, path):
    """
    ....
    """
    ans = {}
    for key, item in h5file[path].items():
        if isinstance(item, h5py._hl.dataset.Dataset):
            ans[key] = item.value
        elif isinstance(item, h5py._hl.group.Group):
            ans[key] = recursively_load_dict_contents_from_group(h5file, path + key + '/')
    return ans

if __name__ == '__main__':

    data = {'x': 'astring',
            'y': np.arange(10),
            'd': {'z': np.ones((2,3)),
                  'b': b'bytestring'}}
    print(data)
    filename = 'test.h5'
    save_dict_to_hdf5(data, filename)
    dd = load_dict_from_hdf5(filename)
    print(dd)
    # should test for bad type


with results:

0858:~/mypy$ python3.4 cr120802.py 
{'x': 'astring', 'd': {'b': b'bytestring', 'z': array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])}, 'y': array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])}
{'x': 'astring', 'd': {'b': b'bytestring', 'z': array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])}, 'y': array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])}

Code Snippets

import numpy as np
import h5py
import os
def save_dict_to_hdf5(dic, filename):
    """
    ....
    """
    with h5py.File(filename, 'w') as h5file:
        recursively_save_dict_contents_to_group(h5file, '/', dic)

def recursively_save_dict_contents_to_group(h5file, path, dic):
    """
    ....
    """
    for key, item in dic.items():
        if isinstance(item, (np.ndarray, np.int64, np.float64, str, bytes)):
            h5file[path + key] = item
        elif isinstance(item, dict):
            recursively_save_dict_contents_to_group(h5file, path + key + '/', item)
        else:
            raise ValueError('Cannot save %s type'%type(item))

def load_dict_from_hdf5(filename):
    """
    ....
    """
    with h5py.File(filename, 'r') as h5file:
        return recursively_load_dict_contents_from_group(h5file, '/')

def recursively_load_dict_contents_from_group(h5file, path):
    """
    ....
    """
    ans = {}
    for key, item in h5file[path].items():
        if isinstance(item, h5py._hl.dataset.Dataset):
            ans[key] = item.value
        elif isinstance(item, h5py._hl.group.Group):
            ans[key] = recursively_load_dict_contents_from_group(h5file, path + key + '/')
    return ans

if __name__ == '__main__':

    data = {'x': 'astring',
            'y': np.arange(10),
            'd': {'z': np.ones((2,3)),
                  'b': b'bytestring'}}
    print(data)
    filename = 'test.h5'
    save_dict_to_hdf5(data, filename)
    dd = load_dict_from_hdf5(filename)
    print(dd)
    # should test for bad type
0858:~/mypy$ python3.4 cr120802.py 
{'x': 'astring', 'd': {'b': b'bytestring', 'z': array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])}, 'y': array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])}
{'x': 'astring', 'd': {'b': b'bytestring', 'z': array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])}, 'y': array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])}

Context

StackExchange Code Review Q#120802, answer score: 12

Revisions (0)

No revisions yet.