patternpythonMinor
Reading the bytes of a PDF
Viewed 0 times
readingthebytespdf
Problem
I'm quite a newbie in Python and I want to speed up this method since it takes very long time especially when the size of the input file in Mbs. Also, I couldn't figure out how to use Cython in the for loop. I'm using this function with other functions to compare files byte by byte. Any recommendations?
# this function returns a file bytes in a list
filename1 = 'doc1.pdf'
def byte_target(filename1):
f = open(filename1, "rb")
try:
b = f.read(1)
tlist = []
while True:
# get file bytes
t = ' '.join(format(ord(x), 'b') for x in b)
b = f.read(1)
if not b:
break
#add this byte to the list
tlist.append(t)
#print b
finally:
f.close()
return tlistSolution
It's not surprising that this is too slow:
you're reading data byte-by-byte.
For faster performance you would need to read larger buffers at a time.
If you want to compare files by content, use the
There are also some glaring problems with this code.
For example, instead of opening a file, doing something in a
Finally, the function name and all variable names are very poor,
and don't help the readers understand their purpose and what you're trying to do.
you're reading data byte-by-byte.
For faster performance you would need to read larger buffers at a time.
If you want to compare files by content, use the
filecmp package.There are also some glaring problems with this code.
For example, instead of opening a file, doing something in a
try block and closing the file handle manually, you should use the recommended with-resources technique:with open(filename1, "rb") as f:
b = f.read(1)
# ...Finally, the function name and all variable names are very poor,
and don't help the readers understand their purpose and what you're trying to do.
Code Snippets
with open(filename1, "rb") as f:
b = f.read(1)
# ...Context
StackExchange Code Review Q#92676, answer score: 5
Revisions (0)
No revisions yet.