patternbashMinor
Parsing Gaussian 09 output for energy statement on one or more files and reformat it to a table
Viewed 0 times
statementtableenergymoreoutputonefilesparsingforand
Problem
I am a computational chemist working with the program Gaussian 09. After I manually check the output(s) I want to create a summary for easier processing of the obtained values. Also avoid opening all the files again and again. The script searches for the last line of the energy statement. A portion of the output will be at the end of the post.
The outputs can become quite long and I am not completely satisfied with the performance of the script. I does its job, but if there are many big files, I can get a coffee in between. I do know that it is still faster than doing it by hand, but if it could be improved I would be quite happy.
I am pretty certain the problem comes from actually finding the string, i.e. the
Unfortunately this is slow if it also computes properties, where there is a huge block of properties at the end of the file. I don't know how I could skip it.
```
#!/bin/bash
# Find energy statement from a Gaussian 09 calculation
# Find energy statement from all G09 log files in working directory
findEnergy ()
{
# Initiate variables necessary for parsing output
local readWholeLine pattern functional energy cycles
# Find match from the end of the file
# Ref: https://unix.stackexchange.com/q/112159/160000
readWholeLine=$(tac $1 | grep -m1 'SCF Done')
# Gaussian output has following format, trap important information
pattern="(E\(.+\)) = (.+) A\.U\.[^0-9]+([0-9]+) cycles"
if [[ $readWholeLine =~ $pattern ]]
then
functional="${BASH_REMATCH[1]}"
energy="${BASH_REMATCH[2]}"
cycles="${BASH_REMATCH[3]}"
fi
# Print the line, format
The outputs can become quite long and I am not completely satisfied with the performance of the script. I does its job, but if there are many big files, I can get a coffee in between. I do know that it is still faster than doing it by hand, but if it could be improved I would be quite happy.
I am pretty certain the problem comes from actually finding the string, i.e. the
grep command. I am using tac here, since there could be multiple occurrences, but I am only interested in the last one. I have tried some of these solutions, too, but the tac|grep was the fastest. Depending on the steps of the optimisation it is therefore easier to read from the back. Since I already checked the file I also know the last value is the one I want.Unfortunately this is slow if it also computes properties, where there is a huge block of properties at the end of the file. I don't know how I could skip it.
```
#!/bin/bash
# Find energy statement from a Gaussian 09 calculation
# Find energy statement from all G09 log files in working directory
findEnergy ()
{
# Initiate variables necessary for parsing output
local readWholeLine pattern functional energy cycles
# Find match from the end of the file
# Ref: https://unix.stackexchange.com/q/112159/160000
readWholeLine=$(tac $1 | grep -m1 'SCF Done')
# Gaussian output has following format, trap important information
pattern="(E\(.+\)) = (.+) A\.U\.[^0-9]+([0-9]+) cycles"
if [[ $readWholeLine =~ $pattern ]]
then
functional="${BASH_REMATCH[1]}"
energy="${BASH_REMATCH[2]}"
cycles="${BASH_REMATCH[3]}"
fi
# Print the line, format
Solution
I don't see obvious signs why this script should be slow.
There are no unnecessary sub-processes,
no compute-heavy operations or nested loops,
and so I don't know how to help you speed this up.
The
I don't think a specialized custom implementation for your purpose would make a significant difference.
I have a few tips only in terms of technique.
In
As such, you can reuse that, and call
This code is a bit sloppy, because if
then the
It would be more appropriate to move the loop inside the
I'm guessing that the intention here is to call
But strictly speaking
it just means that the first argument is empty.
The correct way to check that there are no arguments:
And instead of a
Lastly, at many places you did not quote variables that are paths.
I'm guessing you did that because you are certain they will never contain spaces. Even so, it's a good habit to double-quote such variables.
There are no unnecessary sub-processes,
no compute-heavy operations or nested loops,
and so I don't know how to help you speed this up.
The
tac | grep -m1 combo are well utilized for their intended purpose,I don't think a specialized custom implementation for your purpose would make a significant difference.
I have a few tips only in terms of technique.
In
getAll, part of the code is identical to what you have in getOnly.As such, you can reuse that, and call
getOnly from getAll:getAll() {
# run over all commandfiles
# ToDo: specify file suffixes
local commandfile logfile
printf "%-25s %s\n" "Summary for " ${PWD#\/*\/*\/}
printf "%-25s %s\n\n" "Created " "$(date +"%Y/%m/%d %k:%M:%S")"
# Print a header
printf "%-25s %-15s %20s ( %6s )\n" "Command file" "Functional" "Energy / Hartree" "cycles"
for commandfile in *com; do
getOnly "$commandfile"
done
}This code is a bit sloppy, because if
$1 is empty,then the
while doesn't need to be executed:if [[ -z $1 ]]; then getAll; fi
while [[ ! -z $1 ]]; do
getOnly $1
shift
doneIt would be more appropriate to move the loop inside the
else branch:if [[ -z $1 ]]; then
getAll
else
while [[ ! -z $1 ]]; do
getOnly $1
shift
done
fiI'm guessing that the intention here is to call
getAll if there are no arguments.But strictly speaking
[[ -z $1 ]] doesn't mean there are no arguments,it just means that the first argument is empty.
The correct way to check that there are no arguments:
if [[ $# == 0 ]]; thenAnd instead of a
while loop, it would be more natural to use a for loop here:for commandfile in "$@"; do
getOnly "$commandfile"
doneLastly, at many places you did not quote variables that are paths.
I'm guessing you did that because you are certain they will never contain spaces. Even so, it's a good habit to double-quote such variables.
Code Snippets
getAll() {
# run over all commandfiles
# ToDo: specify file suffixes
local commandfile logfile
printf "%-25s %s\n" "Summary for " ${PWD#\/*\/*\/}
printf "%-25s %s\n\n" "Created " "$(date +"%Y/%m/%d %k:%M:%S")"
# Print a header
printf "%-25s %-15s %20s ( %6s )\n" "Command file" "Functional" "Energy / Hartree" "cycles"
for commandfile in *com; do
getOnly "$commandfile"
done
}if [[ -z $1 ]]; then getAll; fi
while [[ ! -z $1 ]]; do
getOnly $1
shift
doneif [[ -z $1 ]]; then
getAll
else
while [[ ! -z $1 ]]; do
getOnly $1
shift
done
fiif [[ $# == 0 ]]; thenfor commandfile in "$@"; do
getOnly "$commandfile"
doneContext
StackExchange Code Review Q#129854, answer score: 5
Revisions (0)
No revisions yet.