patternpythonMinor
Justify text from a file using a LaTeX method
Viewed 0 times
filemethodtextlatexusingjustifyfrom
Problem
I wrote a bit of code that takes a file, justifies the text and writes to another file.
It uses a DP approach to minimise a badness metric, which is the amount of undershoot of the line length, cubed (see my answer below, the badness function isn't quite right if there is an overshoot, the badness should be
I encapsulated it in a
Is this a standard way to implement
It uses a DP approach to minimise a badness metric, which is the amount of undershoot of the line length, cubed (see my answer below, the badness function isn't quite right if there is an overshoot, the badness should be
inf - I believe this is how LaTeX justifies its text).I encapsulated it in a
class (which is the bit I'm least confident about). Be careful running it on big files, because I think it's got a pretty nasty complexity (\$O(n^2)\$ maybe).class justifyText:
def __init__(self, inFileName, outFileName, pageWidth):
with open(inFileName, "r") as fp:
self.words = [word for line in fp for word in line.split()]
self.outFileName = outFileName
self.pageWidth = pageWidth
self.memo = {}
self.breakPointTrack = {}
self.n = len(self.words)
def badness(self, i, j):
totalWidth = 0;
for word in self.words[i:j]:
totalWidth += len(word)
return abs((self.pageWidth - totalWidth)**3)
def minBadness(self, i):
if i in self.memo:
return self.memo[i]
if i == self.n:
f = 0
j_min = self.n
else:
f = None
for j in range(i + 1, self.n + 1):
temp = self.minBadness(j) + self.badness(i, j)
if (f is None) or (temp < f):
f = temp
j_min = j
self.memo[i] = f
self.breakPointTrack[i] = j_min
return f
def justify(self):
self.minBadness(0)
fOut = open(self.outFileName, "w")
brk = 0
while brk < self.n:
start = brk
brk = self.breakPointTrack[brk]
line = " ".join(self.words[start:brk]) + "\n"
fOut.write(line)
fOut.close()
test = justifyText("test.txt", "out.txt", 80)
test.justify()Is this a standard way to implement
Solution
You should have a look at Python's official style-guide, PEP8. One of its recommendations is to use
Since you read all words into memory anyway, you can just write:
This is because
Caching (or memoization) of a functions return value is best done with a decorator. This way you avoid interleaving the actual implementation of the function with the caching. One example could be:
Your
While it might be your most used use case, writing the output to a file prevents anyone from using this module in any other way. It would be better if
I'm not exactly sure why you have a lone
PascalCase for class names and lower_case for function and variable names.Since you read all words into memory anyway, you can just write:
with open(in_file_name) as fp:
self.words = fp.read().split()This is because
split splits by default on all whitespace (so both space and newlines). Also note that the default behaviour of open is to open a file in read mode, so "r" is also not needed here.Caching (or memoization) of a functions return value is best done with a decorator. This way you avoid interleaving the actual implementation of the function with the caching. One example could be:
import functools
def memoize_method(func):
cache = func.cache = {}
@functools.wraps(func)
def wrapper(self, n):
if n not in cache:
cache[n] = func(self, n)
return cache[n]
return wrapper
class JustifyText:
...
@memoize_method
def min_badness(self, i):
if i == self.n:
f, j_min = 0, self.n
else:
f, j_min = min((self.min_badness(j) + self.badness(i, j), j)
for j in range(i + 1, self.n + 1))
self.break_point_track[i] = j_min
return fYour
badness function can be simplified by using sum, similar to how I used min above:def badness(self, start, end):
total_width = sum(len(word) for word in self.words[start:end])
return abs((self.page_width - total_width)**3)While it might be your most used use case, writing the output to a file prevents anyone from using this module in any other way. It would be better if
justify just returned (even better and simpler: yielded) the justified text and writing to a file is left to the user or a dedicated method using justify internally.def justify(self):
self.min_badness(0) # no-op or to populate the cache?
brk = 0
while brk < self.n:
start, brk = brk, self.break_point_track[brk]
yield " ".join(self.words[start:brk]) + "\n"
def write(self):
with open(self.out_file_name, "w") as f_out:
f_out.writelines(self.justify())I'm not exactly sure why you have a lone
self.min_badness(0) in justify. is it just to populate the cache with the value for zero? If so, this is worth to put a comment here, because otherwise you might accidentally delete it.Code Snippets
with open(in_file_name) as fp:
self.words = fp.read().split()import functools
def memoize_method(func):
cache = func.cache = {}
@functools.wraps(func)
def wrapper(self, n):
if n not in cache:
cache[n] = func(self, n)
return cache[n]
return wrapper
class JustifyText:
...
@memoize_method
def min_badness(self, i):
if i == self.n:
f, j_min = 0, self.n
else:
f, j_min = min((self.min_badness(j) + self.badness(i, j), j)
for j in range(i + 1, self.n + 1))
self.break_point_track[i] = j_min
return fdef badness(self, start, end):
total_width = sum(len(word) for word in self.words[start:end])
return abs((self.page_width - total_width)**3)def justify(self):
self.min_badness(0) # no-op or to populate the cache?
brk = 0
while brk < self.n:
start, brk = brk, self.break_point_track[brk]
yield " ".join(self.words[start:brk]) + "\n"
def write(self):
with open(self.out_file_name, "w") as f_out:
f_out.writelines(self.justify())Context
StackExchange Code Review Q#147823, answer score: 9
Revisions (0)
No revisions yet.