patternpythonMinor
Serialize and deserialize a tree
Viewed 0 times
serializeanddeserializetree
Problem
Suppose each node of a tree has maximum
I'm wondering if there are any better implementation method. My current concern for my code is if the tree has too many
I'm looking forward to any smart ideas on existing code and smart ideas to represent serialized tree in a better (more compact and more efficient -- for both serialize and de-serialize).
My print tree method is just for the purpose of verification.
```
number_of_children = 3
index = 0
class Node:
def __init__(self, value):
self.value = value
self.children = [] # a list of Node
def serialize(self, result): # serialzie a tree into a string
result.append(str(self.value))
for child in self.children:
if child:
child.serialize(result)
else:
result.append('#')
def printTree(self): # print tree layer by layer
current_level_buffer=[]
current_level_buffer.append(self)
current_level_count = 1
next_level_count=0
next_level_buffer=[]
print_buffer=[]
next_level_valid = 1
while True:
node = current_level_buffer.pop(0)
if not node:
print_buffer.append('#')
current_level_count -= 1
else:
print_buffer.append(str(node.value))
current_level_count -= 1
for child in node.children:
next_level_buffer.append(child)
n children. I want to serialize and de-serialize for a tree. My current solution is doing pre-order traverse, and using # to represent None child.I'm wondering if there are any better implementation method. My current concern for my code is if the tree has too many
None node (kinds of sparse tree), there will be a lot # in the serialized string output. But I have not figured out a way to represent serialized tree better, since I think for None node, we still need some kinds of information to represent there is an empty node there.I'm looking forward to any smart ideas on existing code and smart ideas to represent serialized tree in a better (more compact and more efficient -- for both serialize and de-serialize).
My print tree method is just for the purpose of verification.
deserialize is correctly re-constructing the tree, so we can skip code review for this part.```
number_of_children = 3
index = 0
class Node:
def __init__(self, value):
self.value = value
self.children = [] # a list of Node
def serialize(self, result): # serialzie a tree into a string
result.append(str(self.value))
for child in self.children:
if child:
child.serialize(result)
else:
result.append('#')
def printTree(self): # print tree layer by layer
current_level_buffer=[]
current_level_buffer.append(self)
current_level_count = 1
next_level_count=0
next_level_buffer=[]
print_buffer=[]
next_level_valid = 1
while True:
node = current_level_buffer.pop(0)
if not node:
print_buffer.append('#')
current_level_count -= 1
else:
print_buffer.append(str(node.value))
current_level_count -= 1
for child in node.children:
next_level_buffer.append(child)
Solution
Your
Also, I really don't like the output of your
Therefore I rewrote your code in the following way:
This class has the nice feature, that
The output of
Therefore I defined a
Note that I used a different tree here than you. Here would be your structure:
Or, even better to visualize:
Example:
This could be cut down even further, but at this point it becomes a balance between the processing time needed to make this string smaller and the gain in memory from that.
With this function it could become:
But this becomes hard to parse back into the tree and would break the setup of using
printTree function is unnecessarily long.Also, I really don't like the output of your
serialize. I was not able to parse (as a human) this back into the structure of the tree. For me it would be much better if the structure was apparent from the serialization.Therefore I rewrote your code in the following way:
class Node:
def __init__(self, value='#', children=None):
self.value = value
self.children = children if children else []
def __nonzero__(self):
return self.value != '#'
def __eq__(self, other):
return self.value == other.value and self.children == other.children
def __str__(self):
buf, out = [self], []
while buf:
# current level
out.append(", ".join(str(node.value) for node in buf))
if any(node for node in buf):
# add children
buf = [subnode if subnode else Node()
for node in buf
for subnode in node.children]
else:
break
return "\n".join(out)
def __repr__(self):
if not self:
return "Node()"
children_repr = ",".join(repr(child) for child in self.children)
return "Node({.value},[{}])".format(self, children_repr)
def serialize(self):
return repr(self).replace("Node", "").replace("()", "#")
@staticmethod
def deserialize(source):
return eval(source.replace("#", "()").replace("(", "Node("))
if __name__ == "__main__":
root = Node(1, [Node(2, [Node(4)]), Node(3, [Node(5), Node(6)])])
print repr(root)
print root
source = root.serialize()
root2 = Node.deserialize(source)
print repr(root2)
print root2
assert root == root2This class has the nice feature, that
eval(repr(root)) is a copy of root, so repr(root) is a serialization of root.The output of
repr(root) is a bit verbose, though:Node(1,[Node(2,[Node(4,[])]),Node(3,[Node(5,[]),Node(6,[])])])Therefore I defined a
serialize function, which just gets rid of all the Nodes. The deserialize function just puts them back in and does an eval.Note that I used a different tree here than you. Here would be your structure:
root = Node(0, [Node(1, [Node(3, [Node(), Node(), Node()]),
Node(4, [Node(), Node(), Node()]),
Node(5, [Node(), Node(), Node()])]),
Node(2, [Node(6, [Node(), Node(), Node()]),
Node(7, [Node(), Node(), Node()]),
Node(8, [Node(), Node(), Node()])]),
Node()])Or, even better to visualize:
root = Node(0, [Node(1, [Node(3, [Node(),
Node(),
Node()]),
Node(4, [Node(),
Node(),
Node()]),
Node(5, [Node(),
Node(),
Node()])]),
Node(2, [Node(6, [Node(),
Node(),
Node()]),
Node(7, [Node(),
Node(),
Node()]),
Node(8, [Node(),
Node(),
Node()])]),
Node()])Example:
>>> root.serialize()
'(0,[(1,[(3,[#,#,#]),(4,[#,#,#]),(5,[#,#,#])]),(2,[(6,[#,#,#]),(7,[#,#,#]),(8,[#,#,#])]),#])'This could be cut down even further, but at this point it becomes a balance between the processing time needed to make this string smaller and the gain in memory from that.
With this function it could become:
def serialize2(self):
replacements = [("Node", ""), ("()", "#"),
("(", ""), (")", ""), (",[", "["), (",", "")]
out = repr(self)
for replacement in replacements:
out = out.replace(*replacement)
return out
>>> root.serialize2()
'0[1[3[###]4[###]5[###]]2[6[###]7[###]8[###]]#]'But this becomes hard to parse back into the tree and would break the setup of using
__repr__ and eval. It would be possible using some fancy regular expressions, though.Code Snippets
class Node:
def __init__(self, value='#', children=None):
self.value = value
self.children = children if children else []
def __nonzero__(self):
return self.value != '#'
def __eq__(self, other):
return self.value == other.value and self.children == other.children
def __str__(self):
buf, out = [self], []
while buf:
# current level
out.append(", ".join(str(node.value) for node in buf))
if any(node for node in buf):
# add children
buf = [subnode if subnode else Node()
for node in buf
for subnode in node.children]
else:
break
return "\n".join(out)
def __repr__(self):
if not self:
return "Node()"
children_repr = ",".join(repr(child) for child in self.children)
return "Node({.value},[{}])".format(self, children_repr)
def serialize(self):
return repr(self).replace("Node", "").replace("()", "#")
@staticmethod
def deserialize(source):
return eval(source.replace("#", "()").replace("(", "Node("))
if __name__ == "__main__":
root = Node(1, [Node(2, [Node(4)]), Node(3, [Node(5), Node(6)])])
print repr(root)
print root
source = root.serialize()
root2 = Node.deserialize(source)
print repr(root2)
print root2
assert root == root2Node(1,[Node(2,[Node(4,[])]),Node(3,[Node(5,[]),Node(6,[])])])root = Node(0, [Node(1, [Node(3, [Node(), Node(), Node()]),
Node(4, [Node(), Node(), Node()]),
Node(5, [Node(), Node(), Node()])]),
Node(2, [Node(6, [Node(), Node(), Node()]),
Node(7, [Node(), Node(), Node()]),
Node(8, [Node(), Node(), Node()])]),
Node()])root = Node(0, [Node(1, [Node(3, [Node(),
Node(),
Node()]),
Node(4, [Node(),
Node(),
Node()]),
Node(5, [Node(),
Node(),
Node()])]),
Node(2, [Node(6, [Node(),
Node(),
Node()]),
Node(7, [Node(),
Node(),
Node()]),
Node(8, [Node(),
Node(),
Node()])]),
Node()])>>> root.serialize()
'(0,[(1,[(3,[#,#,#]),(4,[#,#,#]),(5,[#,#,#])]),(2,[(6,[#,#,#]),(7,[#,#,#]),(8,[#,#,#])]),#])'Context
StackExchange Code Review Q#146368, answer score: 5
Revisions (0)
No revisions yet.