patternpythonModerate
Replace one-liner sed/awk with python
Viewed 0 times
linerawkwithsedreplaceonepython
Problem
I have some files that I want to process, and I know how to do it in sed/awk (for each one):
or
It's one line, and rather beautiful and clear.
Now, my main program is in python (2.7). Calling sed/awk from python is a bit tedious—I get some error—and I'd rather use a nice pythonic way to do it.
So far I have:
Meh.
It works, but it's not beautiful. Is there a pythonic way, preferentially a clear one liner (not 10 imbricated list comprehension, to replace awk and sed ?
Thanks
awk '{if (index($0,"#")!=1) {line++; if (line%3==1) {print $2,$3}}}' q.post > qor
grep -v "#" q.post | awk '{if (NR%3==1) {print $2,$3}}'It's one line, and rather beautiful and clear.
Now, my main program is in python (2.7). Calling sed/awk from python is a bit tedious—I get some error—and I'd rather use a nice pythonic way to do it.
So far I have:
pp_files = glob.glob("*gauss.post")
for pp in pp_files:
ppf = open(pp)
with open(pp[:pp.rfind(".post")] + "_clean.post", "w") as outfile:
counter = 0
temp = []
for line in ppf.readlines():
if not line.startswith("#"):
temp.append(line)
for line in temp:
if counter % 3 == 0:
outfile.write(" ".join(line.split()[1:3]) + '\n')
counter += 1
ppf.close()Meh.
It works, but it's not beautiful. Is there a pythonic way, preferentially a clear one liner (not 10 imbricated list comprehension, to replace awk and sed ?
Thanks
Solution
First you should add
Always use
This is as it will always close the file, even if there is an error.
But onto your code. You seem to dislike comprehensions. I don't really get why.
Take your code:
This can instead be:
I know which I find easier to read. But if you don't like it fair dues.
After this I'd then slice the list, you want every third line.
To do this we can use the slice operator, say you have the string
You'd do
This removes the need for
But if your file is large it'll read all of it into a list, then take a third of it put it in another list.
That's bad, instead if you use a generator comprehension and
But the program will use less memory.
open(pp) to your with.Always use
with with open.This is as it will always close the file, even if there is an error.
But onto your code. You seem to dislike comprehensions. I don't really get why.
Take your code:
for line in ppf.readlines():
if not line.startswith("#"):
temp.append(line)This can instead be:
[line for line in ppf if not line.startswith("#")]I know which I find easier to read. But if you don't like it fair dues.
After this I'd then slice the list, you want every third line.
To do this we can use the slice operator, say you have the string
abcdefghijk, but you only want every third character.You'd do
'abcdefghijk'[::3]. This gets adgj.This removes the need for
counter, and so can simplify your code to:for pp in pp_files:
with open(pp) as ppf, open(pp[:pp.rfind(".post")] + "_clean.post", "w") as outfile:
for line in [line for line in ppf if not line.startswith("#")][::3]:
outfile.write(" ".join(line.split()[1:3]) + '\n')But if your file is large it'll read all of it into a list, then take a third of it put it in another list.
That's bad, instead if you use a generator comprehension and
itertools.islice then you can achieve the same as above.But the program will use less memory.
for pp in pp_files:
with open(pp) as ppf, open(pp[:pp.rfind(".post")] + "_clean.post", "w") as outfile:
for line in islice((line for line in ppf if not line.startswith("#")), 0, None, 3):
outfile.write(" ".join(line.split()[1:3]) + '\n')Code Snippets
for line in ppf.readlines():
if not line.startswith("#"):
temp.append(line)[line for line in ppf if not line.startswith("#")]for pp in pp_files:
with open(pp) as ppf, open(pp[:pp.rfind(".post")] + "_clean.post", "w") as outfile:
for line in [line for line in ppf if not line.startswith("#")][::3]:
outfile.write(" ".join(line.split()[1:3]) + '\n')for pp in pp_files:
with open(pp) as ppf, open(pp[:pp.rfind(".post")] + "_clean.post", "w") as outfile:
for line in islice((line for line in ppf if not line.startswith("#")), 0, None, 3):
outfile.write(" ".join(line.split()[1:3]) + '\n')Context
StackExchange Code Review Q#148547, answer score: 10
Revisions (0)
No revisions yet.