HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Serialize and deserialize a tree

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
serializeanddeserializetree

Problem

Suppose each node of a tree has maximum n children. I want to serialize and de-serialize for a tree. My current solution is doing pre-order traverse, and using # to represent None child.

I'm wondering if there are any better implementation method. My current concern for my code is if the tree has too many None node (kinds of sparse tree), there will be a lot # in the serialized string output. But I have not figured out a way to represent serialized tree better, since I think for None node, we still need some kinds of information to represent there is an empty node there.

I'm looking forward to any smart ideas on existing code and smart ideas to represent serialized tree in a better (more compact and more efficient -- for both serialize and de-serialize).

My print tree method is just for the purpose of verification. deserialize is correctly re-constructing the tree, so we can skip code review for this part.

```
number_of_children = 3
index = 0
class Node:
def __init__(self, value):
self.value = value
self.children = [] # a list of Node

def serialize(self, result): # serialzie a tree into a string
result.append(str(self.value))
for child in self.children:
if child:
child.serialize(result)
else:
result.append('#')

def printTree(self): # print tree layer by layer
current_level_buffer=[]
current_level_buffer.append(self)
current_level_count = 1
next_level_count=0
next_level_buffer=[]
print_buffer=[]
next_level_valid = 1
while True:
node = current_level_buffer.pop(0)
if not node:
print_buffer.append('#')
current_level_count -= 1
else:
print_buffer.append(str(node.value))
current_level_count -= 1
for child in node.children:
next_level_buffer.append(child)

Solution

Your printTree function is unnecessarily long.

Also, I really don't like the output of your serialize. I was not able to parse (as a human) this back into the structure of the tree. For me it would be much better if the structure was apparent from the serialization.

Therefore I rewrote your code in the following way:

class Node:

    def __init__(self, value='#', children=None):
        self.value = value
        self.children = children if children else []

    def __nonzero__(self):
        return self.value != '#'

    def __eq__(self, other):
        return self.value == other.value and self.children == other.children

    def __str__(self):
        buf, out = [self], []
        while buf:
            # current level
            out.append(", ".join(str(node.value) for node in buf))
            if any(node for node in buf):
                # add children
                buf = [subnode if subnode else Node()
                       for node in buf
                       for subnode in node.children]
            else:
                break
        return "\n".join(out)

    def __repr__(self):
        if not self:
            return "Node()"
        children_repr = ",".join(repr(child) for child in self.children)
        return "Node({.value},[{}])".format(self, children_repr)

    def serialize(self):
        return repr(self).replace("Node", "").replace("()", "#")

    @staticmethod
    def deserialize(source):
        return eval(source.replace("#", "()").replace("(", "Node("))

if __name__ == "__main__":
    root = Node(1, [Node(2, [Node(4)]), Node(3, [Node(5), Node(6)])])
    print repr(root)
    print root
    source = root.serialize()
    root2 = Node.deserialize(source)
    print repr(root2)
    print root2
    assert root == root2


This class has the nice feature, that eval(repr(root)) is a copy of root, so repr(root) is a serialization of root.

The output of repr(root) is a bit verbose, though:

Node(1,[Node(2,[Node(4,[])]),Node(3,[Node(5,[]),Node(6,[])])])


Therefore I defined a serialize function, which just gets rid of all the Nodes. The deserialize function just puts them back in and does an eval.

Note that I used a different tree here than you. Here would be your structure:

root = Node(0, [Node(1, [Node(3, [Node(), Node(), Node()]),
                         Node(4, [Node(), Node(), Node()]),
                         Node(5, [Node(), Node(), Node()])]),
                Node(2, [Node(6, [Node(), Node(), Node()]),
                         Node(7, [Node(), Node(), Node()]),
                         Node(8, [Node(), Node(), Node()])]),
                Node()])


Or, even better to visualize:

root = Node(0, [Node(1, [Node(3, [Node(),
                                  Node(),
                                  Node()]),
                         Node(4, [Node(),
                                  Node(),
                                  Node()]),
                         Node(5, [Node(),
                                  Node(),
                                  Node()])]),
                Node(2, [Node(6, [Node(),
                                  Node(),
                                  Node()]),
                         Node(7, [Node(),
                                  Node(),
                                  Node()]),
                         Node(8, [Node(),
                                  Node(),
                                  Node()])]),
                Node()])


Example:

>>> root.serialize()
'(0,[(1,[(3,[#,#,#]),(4,[#,#,#]),(5,[#,#,#])]),(2,[(6,[#,#,#]),(7,[#,#,#]),(8,[#,#,#])]),#])'


This could be cut down even further, but at this point it becomes a balance between the processing time needed to make this string smaller and the gain in memory from that.

With this function it could become:

def serialize2(self):
    replacements = [("Node", ""), ("()", "#"),
                    ("(", ""), (")", ""), (",[", "["), (",", "")]
    out = repr(self)
    for replacement in replacements:
        out = out.replace(*replacement)
    return out

>>> root.serialize2()
'0[1[3[###]4[###]5[###]]2[6[###]7[###]8[###]]#]'


But this becomes hard to parse back into the tree and would break the setup of using __repr__ and eval. It would be possible using some fancy regular expressions, though.

Code Snippets

class Node:

    def __init__(self, value='#', children=None):
        self.value = value
        self.children = children if children else []

    def __nonzero__(self):
        return self.value != '#'

    def __eq__(self, other):
        return self.value == other.value and self.children == other.children

    def __str__(self):
        buf, out = [self], []
        while buf:
            # current level
            out.append(", ".join(str(node.value) for node in buf))
            if any(node for node in buf):
                # add children
                buf = [subnode if subnode else Node()
                       for node in buf
                       for subnode in node.children]
            else:
                break
        return "\n".join(out)

    def __repr__(self):
        if not self:
            return "Node()"
        children_repr = ",".join(repr(child) for child in self.children)
        return "Node({.value},[{}])".format(self, children_repr)

    def serialize(self):
        return repr(self).replace("Node", "").replace("()", "#")

    @staticmethod
    def deserialize(source):
        return eval(source.replace("#", "()").replace("(", "Node("))


if __name__ == "__main__":
    root = Node(1, [Node(2, [Node(4)]), Node(3, [Node(5), Node(6)])])
    print repr(root)
    print root
    source = root.serialize()
    root2 = Node.deserialize(source)
    print repr(root2)
    print root2
    assert root == root2
Node(1,[Node(2,[Node(4,[])]),Node(3,[Node(5,[]),Node(6,[])])])
root = Node(0, [Node(1, [Node(3, [Node(), Node(), Node()]),
                         Node(4, [Node(), Node(), Node()]),
                         Node(5, [Node(), Node(), Node()])]),
                Node(2, [Node(6, [Node(), Node(), Node()]),
                         Node(7, [Node(), Node(), Node()]),
                         Node(8, [Node(), Node(), Node()])]),
                Node()])
root = Node(0, [Node(1, [Node(3, [Node(),
                                  Node(),
                                  Node()]),
                         Node(4, [Node(),
                                  Node(),
                                  Node()]),
                         Node(5, [Node(),
                                  Node(),
                                  Node()])]),
                Node(2, [Node(6, [Node(),
                                  Node(),
                                  Node()]),
                         Node(7, [Node(),
                                  Node(),
                                  Node()]),
                         Node(8, [Node(),
                                  Node(),
                                  Node()])]),
                Node()])
>>> root.serialize()
'(0,[(1,[(3,[#,#,#]),(4,[#,#,#]),(5,[#,#,#])]),(2,[(6,[#,#,#]),(7,[#,#,#]),(8,[#,#,#])]),#])'

Context

StackExchange Code Review Q#146368, answer score: 5

Revisions (0)

No revisions yet.