snippetpythonMinor
Parse OpenFoam files in Python
Viewed 0 times
openfoamfilespythonparse
Problem
I'm writing a Python parser for OpenFoam mesh files. When the mesh is big, I face performance issues.
Here is the format of the file describing the points:
The file describing the tetrahedron points:
`/--------------------------------- C++ -----------------------------------\
| ========= | |
| \\ / F ield | OpenFOAM: The Open Source CFD Toolbox |
| \\ / O peration | Version: 2.2.0 |
| \\ / A nd | Web: www.OpenFOAM.org |
| \\/ M anipulation | |
\---------------------------------------------------------------------------/
FoamFile
{
version 2.0;
format ascii;
class faceList;
location "const
Here is the format of the file describing the points:
/--------------------------------- C++ -----------------------------------\
| ========= | |
| \\ / F ield | OpenFOAM: The Open Source CFD Toolbox |
| \\ / O peration | Version: 2.2.0 |
| \\ / A nd | Web: www.OpenFOAM.org |
| \\/ M anipulation | |
\---------------------------------------------------------------------------/
FoamFile
{
version 2.0;
format ascii;
class vectorField;
location "constant/polyMesh";
object points;
}
// * //
10
(
(2.14633 0.955 -0.627026)
(2.14633 1.005 -0.627026)
(4.0935 0.955 -0.389604)
(4.0935 1.005 -0.389604)
(0.199157 0.955 -0.864447)
(0.199157 1.005 -0.864447)
(3.075 1.005 0.562347)
(3.11114 1.005 0.558563)
(3.075 0.955 0.562347)
(3.11114 0.955 0.558563)
)
// *** //
The file describing the tetrahedron points:
`/--------------------------------- C++ -----------------------------------\
| ========= | |
| \\ / F ield | OpenFOAM: The Open Source CFD Toolbox |
| \\ / O peration | Version: 2.2.0 |
| \\ / A nd | Web: www.OpenFOAM.org |
| \\/ M anipulation | |
\---------------------------------------------------------------------------/
FoamFile
{
version 2.0;
format ascii;
class faceList;
location "const
Solution
A few suggestions:
So overall, here is how I would structure the code:
```
def clean_lines(lines):
for line in lines:
line = line.strip().strip(';')
if not line:
continue
yield line
def consume_lines(lines, targ):
for line in lines:
if line == targ:
return
def header_parser(lines):
consume_lines(lines, '{')
res = {}
for line in lines:
if line == '}':
break
key, value = line.split(maxsplit=1)
res[key] = value.strip('"')
return res
def boundary_parser(lines):
consume_lines(lines, '(')
sections = {}
for line in lines:
if line == ')':
break
if line != '{':
name = line
sections[name] = {}
continue
for subline in lines:
if subline == '}':
break
key, value = subline.split(maxsplit=1)
sections[name][key] = value
return sections
def points_parser(lines):
consume_lines(lines, '(')
points = []
for line in lines:
if line == ')':
break
points.append(np.fromstring(line[1:-1], sep=' '))
return points
def faces_parser(lines):
consume_lines(lines, '(')
faces = []
for line in lines:
if line == ')':
break
faces.append(np.fromstring(line[2:-1], dtype=np.int32, sep=' '))
return faces
class ParsedMesh(object):
def __init__(self, dirname):
self.dirname = dirname
self.parse_boundary_file()
self.parse_points_file()
self.parse_faces_file()
def _parser(self, parser, fname, dirname):
if dirname is None:
dirname = self.dirname
if dirname:
fname = os.path.join(dirname, fname)
with open(fname) as fobj:
lines = clean_lines(fobj)
header = header_parser(lines)
parsed = parser(lines)
return parsed, header
def parse_boundary_file(self, fname='boundary', dirname=None):
self.boundary
- Follow pep8. Your code is pretty good already, but your naming in particular is not correct.
- Although I can see the advantage of grouping everything into a class, the individual file parsers are not directly dependent on the class. I would split those out into their own functions. This will make testing in particular easier. You can then have wrapper methods in the class that call the parser functions.
- You are loading every file completely into a list. This takes a lot of memory for large files. Worse, it requires parsing the entire list twice, once for the header and once for the body. This is likely a big source of your performance issues. It would be much more memory-efficient to iterate over the lines in the file. I would recommend turning the reader into a generate that takes an iterator (which will usually be the file, but could be arbitrary lists of strings for testing), does the stripping, yields the stripped line, and skips empty lines. This has the added advantage that it will keep track of your progress, so you don't need to go back and read through the header again when you want to parse the body.
- If you use a generator, you can create a for loop that runs until it reaches the part you want, then breaks, then you can have a second for loop that picks up where the first left off. This will greatly reduce the number of
iftests you have to do.
- You are parsing the list of numbers yourself. Don't, numpy has a
fromstringfunction that can parse a string to a numpy array for you. It is much faster than your approach. This is also likely a major source of performance issues.
- You should always use
withfor opening files. This safely closes them even when there is an error. In the default Python version, files are closed automatically when the function exits, but that doesn't necessarily happen in other Python implementations like Pypy. It is much safer to usewithto close the files exactly when you intend to.
- You can use tuple unpacking to split the lines for your dicts. So
key, value = el.split().
- You create a class, but then parse everything into a single
dictthat holds everything. That defeats the purpose of having a class. You should either parse the components into class attributes, or you should just use functions and return a singledict.
- You hard-code the file names. I would make the file names arguments, with the default argument being the default name.
repis a directory, not a repetition. The repetition may be in the directory name, but there is no reason it has to be. So this is stylistic, but I would call itdirnameor something. There is no reason to mentally limit how you organize your files like that.
- You make all the parsers subfunctions of
__init__. This again defeats the purpose of having a class. They should be methods.
- Your classes should derive from
object.
So overall, here is how I would structure the code:
```
def clean_lines(lines):
for line in lines:
line = line.strip().strip(';')
if not line:
continue
yield line
def consume_lines(lines, targ):
for line in lines:
if line == targ:
return
def header_parser(lines):
consume_lines(lines, '{')
res = {}
for line in lines:
if line == '}':
break
key, value = line.split(maxsplit=1)
res[key] = value.strip('"')
return res
def boundary_parser(lines):
consume_lines(lines, '(')
sections = {}
for line in lines:
if line == ')':
break
if line != '{':
name = line
sections[name] = {}
continue
for subline in lines:
if subline == '}':
break
key, value = subline.split(maxsplit=1)
sections[name][key] = value
return sections
def points_parser(lines):
consume_lines(lines, '(')
points = []
for line in lines:
if line == ')':
break
points.append(np.fromstring(line[1:-1], sep=' '))
return points
def faces_parser(lines):
consume_lines(lines, '(')
faces = []
for line in lines:
if line == ')':
break
faces.append(np.fromstring(line[2:-1], dtype=np.int32, sep=' '))
return faces
class ParsedMesh(object):
def __init__(self, dirname):
self.dirname = dirname
self.parse_boundary_file()
self.parse_points_file()
self.parse_faces_file()
def _parser(self, parser, fname, dirname):
if dirname is None:
dirname = self.dirname
if dirname:
fname = os.path.join(dirname, fname)
with open(fname) as fobj:
lines = clean_lines(fobj)
header = header_parser(lines)
parsed = parser(lines)
return parsed, header
def parse_boundary_file(self, fname='boundary', dirname=None):
self.boundary
Code Snippets
def clean_lines(lines):
for line in lines:
line = line.strip().strip(';')
if not line:
continue
yield line
def consume_lines(lines, targ):
for line in lines:
if line == targ:
return
def header_parser(lines):
consume_lines(lines, '{')
res = {}
for line in lines:
if line == '}':
break
key, value = line.split(maxsplit=1)
res[key] = value.strip('"')
return res
def boundary_parser(lines):
consume_lines(lines, '(')
sections = {}
for line in lines:
if line == ')':
break
if line != '{':
name = line
sections[name] = {}
continue
for subline in lines:
if subline == '}':
break
key, value = subline.split(maxsplit=1)
sections[name][key] = value
return sections
def points_parser(lines):
consume_lines(lines, '(')
points = []
for line in lines:
if line == ')':
break
points.append(np.fromstring(line[1:-1], sep=' '))
return points
def faces_parser(lines):
consume_lines(lines, '(')
faces = []
for line in lines:
if line == ')':
break
faces.append(np.fromstring(line[2:-1], dtype=np.int32, sep=' '))
return faces
class ParsedMesh(object):
def __init__(self, dirname):
self.dirname = dirname
self.parse_boundary_file()
self.parse_points_file()
self.parse_faces_file()
def _parser(self, parser, fname, dirname):
if dirname is None:
dirname = self.dirname
if dirname:
fname = os.path.join(dirname, fname)
with open(fname) as fobj:
lines = clean_lines(fobj)
header = header_parser(lines)
parsed = parser(lines)
return parsed, header
def parse_boundary_file(self, fname='boundary', dirname=None):
self.boundary, self.boundary_hdr = self._parser(boundary_parser,
fname=fname,
dirname=dirname)
def parse_points_file(self, fname='points', dirname=None):
self.points, self.points_hdr = self._parser(points_parser,
fname=fname,
dirname=dirname)
def parse_faces_file(self, fname='faces', dirname=None):
self.faces, self.faces_hdr = self._parser(faces_parser,
fname=fname,
dirname=dirname)Context
StackExchange Code Review Q#101191, answer score: 4
Revisions (0)
No revisions yet.