patternpythonMinor
Reading and processing histograms stored in ROOT files
Viewed 0 times
storedreadinghistogramsfilesrootandprocessing
Problem
The Context
My daily workflow largely consists of producing, styling, and circulating plots from a dataset to my advisor and collaborators. We use the C++ framework ROOT to generate and store histograms and I am writing my code in Python to take advantage of its Python bindings (PyROOT).
Since a ROOT file is the fundamental unit of our datasets, I wrote a simple context manager to facilitate the common task of opening a ROOT file, retrieving some histograms, and then closing the file.
This allows me to write the following snippet (imports of necessary modules implied)
in a more idiomatic fashion
A dataset is often a collection of multiple ROOT files, so to make a plot I need to sum the histograms with the same name from each of its files together. The following snippet does n
My daily workflow largely consists of producing, styling, and circulating plots from a dataset to my advisor and collaborators. We use the C++ framework ROOT to generate and store histograms and I am writing my code in Python to take advantage of its Python bindings (PyROOT).
Since a ROOT file is the fundamental unit of our datasets, I wrote a simple context manager to facilitate the common task of opening a ROOT file, retrieving some histograms, and then closing the file.
import ROOT
class HistogramFile(object):
def __init__(self, filename):
self.filename = filename
def __enter__(self):
self.file = ROOT.TFile.Open(self.filename, 'read')
return self
def __exit__(self, exception_type, exception_value, traceback):
self.file.Close()
def get_histogram(self, name):
"""Return the histogram identified by name from the file.
"""
# The TFile::Get() method returns a pointer to an object stored in a ROOT file.
hist = self.file.Get(name)
if hist:
return hist
else:
raise RuntimeError('Unable to retrieve histogram named {0} from {1}'.format(name, self.filename))This allows me to write the following snippet (imports of necessary modules implied)
f = ROOT.TFile.Open('dataset.root', 'read')
# Setup a canvas for plotting. The arguments are a name, an optional title, and the width and height in pixels.
canvas = ROOT.TCanvas('canvas', '', 500, 500)
hist = f.Get('electron_momentum')
hist.Draw()
canvas.SaveAs('plot.pdf')
f.Close()in a more idiomatic fashion
with HistogramFile('dataset.root') as f:
canvas = ROOT.TCanvas('canvas', '', 500, 500)
hist = f.get_histogram('electron_momentum')
hist.Draw()
canvas.SaveAs('plot.pdf')A dataset is often a collection of multiple ROOT files, so to make a plot I need to sum the histograms with the same name from each of its files together. The following snippet does n
Solution
Given both the documentation of Python 3 contextlib or contextlib2, I’d say your usage is pretty standard for the tools at play.
However, there is something bothering me a bit in your code:
Why raise a generic purpose
As PEP8 says:
Derive exceptions from
Design exception hierarchies based on the distinctions that code catching the exceptions is likely to need, rather than the locations where the exceptions are raised. Aim to answer the question "What went wrong?" programmatically, rather than only stating that "A problem occurred" (see PEP 3151 for an example of this lesson being learned for the builtin exception hierarchy)
Class naming conventions apply here, although you should add the suffix "Error" to your exception classes if the exception is an error. Non-error exceptions that are used for non-local flow control or other forms of signaling need no special suffix.
So I’d rather write:
The choice of
A last thing, if you intend to build a lot of canvas to draw on, you may also be interested in wrapping that in a context manager. Either by writing a class like you do (but checking the presence of an exception in the
I’m not using a
Usage being:
However, there is something bothering me a bit in your code:
def get_histogram(self, name):
"""Return the histogram identified by name from the file.
"""
# The TFile::Get() method returns a pointer to an object stored in a ROOT file.
hist = self.file.Get(name)
if hist:
return hist
else:
raise RuntimeError('Unable to retrieve histogram named {0} from {1}'.format(name, self.filename))Why raise a generic purpose
RuntimeError? If anyone wants to use your code and handle failures, they may catch more than it should.As PEP8 says:
Derive exceptions from
Exception rather than BaseException. Direct inheritance from BaseException is reserved for exceptions where catching them is almost always the wrong thing to do.Design exception hierarchies based on the distinctions that code catching the exceptions is likely to need, rather than the locations where the exceptions are raised. Aim to answer the question "What went wrong?" programmatically, rather than only stating that "A problem occurred" (see PEP 3151 for an example of this lesson being learned for the builtin exception hierarchy)
Class naming conventions apply here, although you should add the suffix "Error" to your exception classes if the exception is an error. Non-error exceptions that are used for non-local flow control or other forms of signaling need no special suffix.
So I’d rather write:
class HistogramNotFoundError(KeyError):
pass
def get_histogram(self, name):
hist = self.file.Get(name)
if not hist:
raise HistogramNotFoundError(name)
return histThe choice of
KeyError as a base is a bit arbitrary, but I feel it fits nicely.A last thing, if you intend to build a lot of canvas to draw on, you may also be interested in wrapping that in a context manager. Either by writing a class like you do (but checking the presence of an exception in the
__exit__ method before drawing) or by using the @contextlib.contextmanager decorator:@contextmanager
def canvas(name, filename, idunno, width, height):
canvas = ROOT.TCanvas(name, idunno, width, height)
yield canvas
canvas.SaveAs(filename)I’m not using a
try: ... finally: here to avoid generating a file if the canvas was not properly drawn.Usage being:
dataset_files = ['dataset_part1.root', 'dataset_part2.root', 'dataset_part3.root']
with Dataset(*dataset_files) as dataset, canvas('canvas', 'plot.pdf', '', 500, 500):
hist = dataset.get_histogram('electron_momentum')
hist.Draw()Code Snippets
def get_histogram(self, name):
"""Return the histogram identified by name from the file.
"""
# The TFile::Get() method returns a pointer to an object stored in a ROOT file.
hist = self.file.Get(name)
if hist:
return hist
else:
raise RuntimeError('Unable to retrieve histogram named {0} from {1}'.format(name, self.filename))class HistogramNotFoundError(KeyError):
pass
def get_histogram(self, name):
hist = self.file.Get(name)
if not hist:
raise HistogramNotFoundError(name)
return hist@contextmanager
def canvas(name, filename, idunno, width, height):
canvas = ROOT.TCanvas(name, idunno, width, height)
yield canvas
canvas.SaveAs(filename)dataset_files = ['dataset_part1.root', 'dataset_part2.root', 'dataset_part3.root']
with Dataset(*dataset_files) as dataset, canvas('canvas', 'plot.pdf', '', 500, 500):
hist = dataset.get_histogram('electron_momentum')
hist.Draw()Context
StackExchange Code Review Q#158004, answer score: 5
Revisions (0)
No revisions yet.