patternpythonMinor
Simple directory and file serialization protocol for Python 2 and 3
Viewed 0 times
directoryfilesimpleprotocolserializationforpythonand
Problem
The first edition of this code was written back when Python 2 had not yet been deprecated in favor of Python 3. The difference between strings and bytes were not very clear back then. Also, PEP8 violations abound in the original version.
```
'''Module for Directory and File Serialization.
This module provides two classes that implement the
DFS (Directory and File Serialization) file format.'''
__version__ = '1.0'
import os as _os
import sys as _sys
################################################################################
FORMAT_DOCUMENTATION = '''\
Directory
Header
0,aaa,b,c,dd
0 = Directory
a = Pointer Size (bytes)
b = Name Size (bytes)
c = Content Flag
d = Type Code
00 = End
01 = Error
10 = Error
11 = Real
Pointer
Name Size
Name
File
Header
1,aaa,b,ccc
1 = File
a = Pointer Size (bytes)
b = Name Size (bytes)
c = Data Size (bytes)
Pointer
Name Size
Name
Data Size
Data'''
################################################################################
class Acquire:
'Acquire(destination) -> Acquire'
BUFF_SIZE = 2 ** 20
def __init__(self, destination):
'Initialize the Acquire object.'
self.__destination = destination
self.__destination_path = _os.path.abspath(destination.name) if hasattr(destination, 'name') else None
self.__archive = False
def acquire(self, source):
'Save source to destination.'
source = _os.path.abspath(source)
self.__pointer = 0
if self.__archive:
self.__destination.write('\0')
else:
self.__archive = True
if _os.path.isdir(source):
self.__dir(source, '\0')
elif _os.path.isfile(source):
if source == self.__destination_path:
raise ValueError,
```
'''Module for Directory and File Serialization.
This module provides two classes that implement the
DFS (Directory and File Serialization) file format.'''
__version__ = '1.0'
import os as _os
import sys as _sys
################################################################################
FORMAT_DOCUMENTATION = '''\
Directory
Header
0,aaa,b,c,dd
0 = Directory
a = Pointer Size (bytes)
b = Name Size (bytes)
c = Content Flag
d = Type Code
00 = End
01 = Error
10 = Error
11 = Real
Pointer
Name Size
Name
File
Header
1,aaa,b,ccc
1 = File
a = Pointer Size (bytes)
b = Name Size (bytes)
c = Data Size (bytes)
Pointer
Name Size
Name
Data Size
Data'''
################################################################################
class Acquire:
'Acquire(destination) -> Acquire'
BUFF_SIZE = 2 ** 20
def __init__(self, destination):
'Initialize the Acquire object.'
self.__destination = destination
self.__destination_path = _os.path.abspath(destination.name) if hasattr(destination, 'name') else None
self.__archive = False
def acquire(self, source):
'Save source to destination.'
source = _os.path.abspath(source)
self.__pointer = 0
if self.__archive:
self.__destination.write('\0')
else:
self.__archive = True
if _os.path.isdir(source):
self.__dir(source, '\0')
elif _os.path.isfile(source):
if source == self.__destination_path:
raise ValueError,
Solution
DFS (Directory and File Serialization) file format
Not sure what that is. Nor is google.
Recommend you offer a more specific citation, hopefully one that includes an URL.
Hmmm, that's odd. Do you really need to rename them?
The backwhack is very odd -- don't do it.
That's what triple quotes are for.
You apparently meant "0 = type is Directory". Similarly for "1 = File".
What?!?
You're apparently indicating that
and it denotes this many vs that many bytes?!?
How many bytes?
I understand that you think that's a docstring.
I don't understand what it's trying to tell me.
Same remark for
Putting such a docstring on a ctor is redundant;
consider deleting it.
Same remark for
Are you sure you want name mangling?
Consider using just a single leading
Same remark for
You should probably init this to zero in the ctor, as well.
It gives the reader a hint about which balls will be up in the air
during the lifetime of this object.
The meaning of
Oh, goodness!
Stop with the name mangling already!
Is there some inheritance use case coded up where mangling helps?
Just use a single
That's not a docstring, it's a
It doesn't help the caller figure out how to correctly call this private method.
It is clearly not a
Similarly, delete the 'Private module function.' docstrings; they're not helpful.
There is a pre-condition here:
That is not at all obvious.
Make it so, by initializing pointer in the ctor.
A single digit,
Rather than pasting the line in twice, parameterize it, highlighting the 3 / 7 contrast.
I was rather hoping to see a call to
but perhaps the bit packing is not a good fit for that.
That source line is Just Wrong.
Use the constant
Similarly for
Sure, I'll grant you there's a race here, the
But I think you wanted to assign
(Or maybe you wanted to re-open
That's the 2nd time you've defined that.
DRY.
Declare it in a common place that both classes can pull the value from.
This is obscure.
Yes, I understand the high bit distinguishes between file / dir,
and the two low order bits have been defined elsewhere.
That doesn't make
You probably want to define a helper function that knows such encoding details.
Ok, if you're going to hard code Magic Numbers,
at least pick one and stick with it.
Consider parsing out the high bit into its own variable, then use that.
I imagine the same helper would also return
PEP-8 asks that you name this
Similarly for
I imagine this is some very nice C code,
but in python land we encourage breaking out the occasional helper function.
Again, this is obscure, despite the very nice
you helpfully offered.
Break out a helper function with a descriptive name.
You apparently wrote some additional code in a separate code block,
but this review has gone on overlong already.
Summary: code is obscure, and needs unit tests added so we
can confidently re-factor.
Not sure what that is. Nor is google.
Recommend you offer a more specific citation, hopefully one that includes an URL.
import os as _os
import sys as _sysHmmm, that's odd. Do you really need to rename them?
FORMAT_DOCUMENTATION = '''\The backwhack is very odd -- don't do it.
That's what triple quotes are for.
0 = DirectoryYou apparently meant "0 = type is Directory". Similarly for "1 = File".
b = Name Size (bytes)What?!?
You're apparently indicating that
b is a single bit,and it denotes this many vs that many bytes?!?
How many bytes?
'Acquire(destination) -> Acquire'I understand that you think that's a docstring.
I don't understand what it's trying to tell me.
Same remark for
Release.'Initialize the Acquire object.'Putting such a docstring on a ctor is redundant;
consider deleting it.
Same remark for
Release.self.__destination = destinationAre you sure you want name mangling?
Consider using just a single leading
_ underscore for these three.Same remark for
Release.self.__pointer = 0You should probably init this to zero in the ctor, as well.
It gives the reader a hint about which balls will be up in the air
during the lifetime of this object.
The meaning of
self.__archive is not transparently obvious to me.def __dir(self, source, pointer):Oh, goodness!
Stop with the name mangling already!
Is there some inheritance use case coded up where mangling helps?
Just use a single
_ underscore.'Private class method.'That's not a docstring, it's a
# comment.It doesn't help the caller figure out how to correctly call this private method.
It is clearly not a
@classmethod, rather, each instantiated object owns it.Similarly, delete the 'Private module function.' docstrings; they're not helpful.
self.__pointer += 1There is a pre-condition here:
acquire() must execute before __dir() does.That is not at all obvious.
Make it so, by initializing pointer in the ctor.
self.__destination.write(chr((len(pointer) - 1 << 4) + (len(name_size) - 1 << 3) + 3) + pointer + name_size + name)A single digit,
3, is different from the preceding line, where it was 7.Rather than pasting the line in twice, parameterize it, highlighting the 3 / 7 contrast.
I was rather hoping to see a call to
pack(),but perhaps the bit packing is not a good fit for that.
source.seek(0, 2)That source line is Just Wrong.
Use the constant
os.SEEK_END rather than an obscure 2.Similarly for
0 Magic Number.position = source.tell()
source.seek(0, 2)
if position != source.tell():Sure, I'll grant you there's a race here, the
if might possibly trigger.But I think you wanted to assign
position before the while loop.(Or maybe you wanted to re-open
source?)BUFF_SIZE = 2 ** 20That's the 2nd time you've defined that.
DRY.
Declare it in a common place that both classes can pull the value from.
while header != -1 and (header > 127 or header & 3):This is obscure.
Yes, I understand the high bit distinguishes between file / dir,
and the two low order bits have been defined elsewhere.
That doesn't make
127 any less of a Magic Number, nor 3.You probably want to define a helper function that knows such encoding details.
if header < 128:Ok, if you're going to hard code Magic Numbers,
at least pick one and stick with it.
Consider parsing out the high bit into its own variable, then use that.
I imagine the same helper would also return
type_code.def EOF(self):PEP-8 asks that you name this
eof(), as it is not a class.Similarly for
self._eof.path = _os.path.join(self.__parents[_int(self.__read((header >> 4 & 7) + 1))], self.__read(_int(self.__read((header >> 3 & 1) + 1))))I imagine this is some very nice C code,
but in python land we encourage breaking out the occasional helper function.
if header >> 2 & 1:Again, this is obscure, despite the very nice
FORMAT_DOCUMENTATIONyou helpfully offered.
Break out a helper function with a descriptive name.
You apparently wrote some additional code in a separate code block,
but this review has gone on overlong already.
Summary: code is obscure, and needs unit tests added so we
can confidently re-factor.
Code Snippets
import os as _os
import sys as _sysFORMAT_DOCUMENTATION = '''\0 = Directoryb = Name Size (bytes)'Acquire(destination) -> Acquire'Context
StackExchange Code Review Q#155615, answer score: 3
Revisions (0)
No revisions yet.