HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Simple directory and file serialization protocol for Python 2 and 3

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
directoryfilesimpleprotocolserializationforpythonand

Problem

The first edition of this code was written back when Python 2 had not yet been deprecated in favor of Python 3. The difference between strings and bytes were not very clear back then. Also, PEP8 violations abound in the original version.

```
'''Module for Directory and File Serialization.

This module provides two classes that implement the
DFS (Directory and File Serialization) file format.'''

__version__ = '1.0'

import os as _os
import sys as _sys

################################################################################

FORMAT_DOCUMENTATION = '''\
Directory
Header
0,aaa,b,c,dd
0 = Directory
a = Pointer Size (bytes)
b = Name Size (bytes)
c = Content Flag
d = Type Code
00 = End
01 = Error
10 = Error
11 = Real
Pointer
Name Size
Name

File
Header
1,aaa,b,ccc
1 = File
a = Pointer Size (bytes)
b = Name Size (bytes)
c = Data Size (bytes)
Pointer
Name Size
Name
Data Size
Data'''

################################################################################

class Acquire:

'Acquire(destination) -> Acquire'

BUFF_SIZE = 2 ** 20

def __init__(self, destination):
'Initialize the Acquire object.'
self.__destination = destination
self.__destination_path = _os.path.abspath(destination.name) if hasattr(destination, 'name') else None
self.__archive = False

def acquire(self, source):
'Save source to destination.'
source = _os.path.abspath(source)
self.__pointer = 0
if self.__archive:
self.__destination.write('\0')
else:
self.__archive = True
if _os.path.isdir(source):
self.__dir(source, '\0')
elif _os.path.isfile(source):
if source == self.__destination_path:
raise ValueError,

Solution

DFS (Directory and File Serialization) file format

Not sure what that is. Nor is google.
Recommend you offer a more specific citation, hopefully one that includes an URL.

import os as _os
import sys as _sys


Hmmm, that's odd. Do you really need to rename them?

FORMAT_DOCUMENTATION = '''\


The backwhack is very odd -- don't do it.
That's what triple quotes are for.

0 = Directory


You apparently meant "0 = type is Directory". Similarly for "1 = File".

b = Name Size (bytes)


What?!?
You're apparently indicating that b is a single bit,
and it denotes this many vs that many bytes?!?
How many bytes?

'Acquire(destination) -> Acquire'


I understand that you think that's a docstring.
I don't understand what it's trying to tell me.
Same remark for Release.

'Initialize the Acquire object.'


Putting such a docstring on a ctor is redundant;
consider deleting it.
Same remark for Release.

self.__destination = destination


Are you sure you want name mangling?
Consider using just a single leading _ underscore for these three.
Same remark for Release.

self.__pointer = 0


You should probably init this to zero in the ctor, as well.
It gives the reader a hint about which balls will be up in the air
during the lifetime of this object.

The meaning of self.__archive is not transparently obvious to me.

def __dir(self, source, pointer):


Oh, goodness!
Stop with the name mangling already!
Is there some inheritance use case coded up where mangling helps?
Just use a single _ underscore.

'Private class method.'


That's not a docstring, it's a # comment.
It doesn't help the caller figure out how to correctly call this private method.
It is clearly not a @classmethod, rather, each instantiated object owns it.
Similarly, delete the 'Private module function.' docstrings; they're not helpful.

self.__pointer += 1


There is a pre-condition here: acquire() must execute before __dir() does.
That is not at all obvious.
Make it so, by initializing pointer in the ctor.

self.__destination.write(chr((len(pointer) - 1 << 4) + (len(name_size) - 1 << 3) + 3) + pointer + name_size + name)


A single digit, 3, is different from the preceding line, where it was 7.
Rather than pasting the line in twice, parameterize it, highlighting the 3 / 7 contrast.

I was rather hoping to see a call to pack(),
but perhaps the bit packing is not a good fit for that.

source.seek(0, 2)


That source line is Just Wrong.
Use the constant os.SEEK_END rather than an obscure 2.
Similarly for 0 Magic Number.

position = source.tell()
            source.seek(0, 2)
            if position != source.tell():


Sure, I'll grant you there's a race here, the if might possibly trigger.
But I think you wanted to assign position before the while loop.
(Or maybe you wanted to re-open source?)

BUFF_SIZE = 2 ** 20


That's the 2nd time you've defined that.
DRY.
Declare it in a common place that both classes can pull the value from.

while header != -1 and (header > 127 or header & 3):


This is obscure.
Yes, I understand the high bit distinguishes between file / dir,
and the two low order bits have been defined elsewhere.
That doesn't make 127 any less of a Magic Number, nor 3.
You probably want to define a helper function that knows such encoding details.

if header < 128:


Ok, if you're going to hard code Magic Numbers,
at least pick one and stick with it.
Consider parsing out the high bit into its own variable, then use that.
I imagine the same helper would also return type_code.

def EOF(self):


PEP-8 asks that you name this eof(), as it is not a class.
Similarly for self._eof.

path = _os.path.join(self.__parents[_int(self.__read((header >> 4 & 7) + 1))], self.__read(_int(self.__read((header >> 3 & 1) + 1))))


I imagine this is some very nice C code,
but in python land we encourage breaking out the occasional helper function.

if header >> 2 & 1:


Again, this is obscure, despite the very nice FORMAT_DOCUMENTATION
you helpfully offered.
Break out a helper function with a descriptive name.

You apparently wrote some additional code in a separate code block,
but this review has gone on overlong already.

Summary: code is obscure, and needs unit tests added so we
can confidently re-factor.

Code Snippets

import os as _os
import sys as _sys
FORMAT_DOCUMENTATION = '''\
0 = Directory
b = Name Size (bytes)
'Acquire(destination) -> Acquire'

Context

StackExchange Code Review Q#155615, answer score: 3

Revisions (0)

No revisions yet.