patternpythonMinor
Python algorithm mimicking mark/sweep garbage collection
Viewed 0 times
sweepcollectionalgorithmmimickingpythongarbagemark
Problem
I don't have much experience with Python and would love some feedback on my implementation of this program.
I am given a file with an integer representing a number of memory-cells, followed by a list of ordered pairs representing links between either a pointer and a memory-cell, or memory-cell and a memory-cell. You could use the list to draw diagram akin to those used in text-book "which cells are marked and which are reclaimed" problems regarding mark/sweep GC.
I was tasked with creating a Python script that takes in such a list, and runs a simulated mark/sweep on it and my solution (below) is fully functional.
Here's an example input file:
And here is the script:
`import sys
import re
def RepresentsInt(s):
try:
int(s)
return True
except ValueError:
return False
#recursive function that traces through a mem-node structure
#setting each node's check_flag to 1 if traversed
def node_trace(pointer, node_pointer_list, nodes_dict):
if RepresentsInt(pointer[0]):
if nodes_dict[pointer[0]][1] == 1:
#returns if the current node has been traversed already
return
else:
#checks the current node as marked and traversed
nodes_dict[pointer[0]][0] = 1
nodes_dict[pointer[0]][1] = 1
for x in node_pointer_list:
#recurs for each node the current node points to
if x[0] == pointer[1] and nodes_dict[x[0]][1] == 0:
node_trace(x, node_pointer_list, nodes_dict)
#catches the node at the end of a chain
if RepresentsInt(x[0]):
if nodes_dict[x[0]][1] == 1:
nodes_dict[x[1]][0] = 1
nodes_dict[x[1]][1] = 1
else:
#catches a single-link chain from a variable to a mem-block
nodes_dict[x[1]][0] = 1
nodes_dict[x[1]][1] = 1
sysArgs = list(sys.argv)
inputFile = sysArgs[1]
inFile = open(inputFile)
node_pointer_list = []
nod
I am given a file with an integer representing a number of memory-cells, followed by a list of ordered pairs representing links between either a pointer and a memory-cell, or memory-cell and a memory-cell. You could use the list to draw diagram akin to those used in text-book "which cells are marked and which are reclaimed" problems regarding mark/sweep GC.
I was tasked with creating a Python script that takes in such a list, and runs a simulated mark/sweep on it and my solution (below) is fully functional.
Here's an example input file:
6
ptr1,0
0,1
1,2
ptr2,3
3,2
4,5
And here is the script:
`import sys
import re
def RepresentsInt(s):
try:
int(s)
return True
except ValueError:
return False
#recursive function that traces through a mem-node structure
#setting each node's check_flag to 1 if traversed
def node_trace(pointer, node_pointer_list, nodes_dict):
if RepresentsInt(pointer[0]):
if nodes_dict[pointer[0]][1] == 1:
#returns if the current node has been traversed already
return
else:
#checks the current node as marked and traversed
nodes_dict[pointer[0]][0] = 1
nodes_dict[pointer[0]][1] = 1
for x in node_pointer_list:
#recurs for each node the current node points to
if x[0] == pointer[1] and nodes_dict[x[0]][1] == 0:
node_trace(x, node_pointer_list, nodes_dict)
#catches the node at the end of a chain
if RepresentsInt(x[0]):
if nodes_dict[x[0]][1] == 1:
nodes_dict[x[1]][0] = 1
nodes_dict[x[1]][1] = 1
else:
#catches a single-link chain from a variable to a mem-block
nodes_dict[x[1]][0] = 1
nodes_dict[x[1]][1] = 1
sysArgs = list(sys.argv)
inputFile = sysArgs[1]
inFile = open(inputFile)
node_pointer_list = []
nod
Solution
There are several suboptimal areas of this solution that are interrelated. Please feel free to ask questions if anything I've written here is unclear.
The solution is not "modeling" memory and then performing mark-and-sweep as it probably should be. Instead it is preserving the data-input file in
I'll try to break down those issues one at a time, though keep in mind they are all interrelated. I'm intentionally explaining rather than providing code, so you can do the learning work to make a better solution. I'll start by talking about code readability.
The way you use
Why do you have both "marked" and "traversed" flags? They contain the same information.
A code line like this below, is very hard for anyone to read the meaning of. Someone reading the code is forced to dig around to find out the meaning of the zeroes and ones to understand it.
To make it more readable, you need to get the word "TRAVERSED" in there somewhere, since what you are doing is setting a traversed flag. If you stick with your two element arrays, you can make a
Better style would be to define a class, and give the class a .traversed field. If you also make some of the other changes I highlight below, this access becomes:
I don't know if you are comfortable with classes or not, so I'll provide one explanation based on classes, and one based on arrays...
With Classes
You can put the
First you would instantiate an array of these for the number of memory-spaces according to your data-file, with an additional one as the
The first step of mark-and-sweep is to walk all memory locations and set
Without Classes
Instead of node_pointer_list, you want an array of lists of references. As you walk the datafile, you would insert each reference into the appropriate list for that pointer number. Further, you want an additional "root_references" list to hold references which appear in "ptr" entries.
To start the recursion, you would do
Other Factors
The running time you are shooting for is O(N), where N is the number of nodes in your memory. You are currently O(N^2). After you remove the extra loop in
Using recursion is good. An alternative is to use a work-list. To use a work list, you put the root(s) in a list, and then while the work-list is non-empty, you remove an item from the work-list and call
Both will have the same O(N) running time, but can potentially have different memory consumption. The recursive version's maximum memory consumption is proportional to the depth of object references. The work-list version's maximum memory consumption is hard to calculate, as it can be
The solution is not "modeling" memory and then performing mark-and-sweep as it probably should be. Instead it is preserving the data-input file in
node_pointer_list and repeatedly walking it to find what is needed, because of your for x in node_pointer_list: in node_trace(). This is causing the solution to unnecessarily iterate over the node_pointer_list inside each call to node_trace(), which is making its running time O(N^2) when it should be O(N). It is also making the code 2-3x longer than it should be. Further, the solution is quite hard to read.I'll try to break down those issues one at a time, though keep in mind they are all interrelated. I'm intentionally explaining rather than providing code, so you can do the learning work to make a better solution. I'll start by talking about code readability.
The way you use
if not RepresentsInt(item[0]): to determine if a line is a "pointer" line is very non-intuitive. It would be easier to read if the literal "ptr" appeared somewhere near where you are testing if it is a line containing "ptr", such as:if item[0].find("ptr") == 0:Why do you have both "marked" and "traversed" flags? They contain the same information.
A code line like this below, is very hard for anyone to read the meaning of. Someone reading the code is forced to dig around to find out the meaning of the zeroes and ones to understand it.
nodes_dict[x[1]][0] = 1To make it more readable, you need to get the word "TRAVERSED" in there somewhere, since what you are doing is setting a traversed flag. If you stick with your two element arrays, you can make a
TRAVERSED_FLAG symbol to make it more readable, as in:nodes_dict[x[1]][TRAVERSED_FLAG] = 1Better style would be to define a class, and give the class a .traversed field. If you also make some of the other changes I highlight below, this access becomes:
current_object.traversed = 1I don't know if you are comfortable with classes or not, so I'll provide one explanation based on classes, and one based on arrays...
With Classes
You can put the
marked field, and the list of an object's references in a single class that represents a memory position in your heap. The class definition would look something like:class MemoryObject:
def __init__(self):
self.marked = 0
self.references = [] # list of either or First you would instantiate an array of these for the number of memory-spaces according to your data-file, with an additional one as the
RootMemoryObject. Then you would walk the file's references, adding to each MemoryObject's references list. You can either store an int reference in the list, and look it up later, or you can look up the MemoryObject now and store that directly in references. When you see a "ptr" entry, you put that in the RootMemoryObject.The first step of mark-and-sweep is to walk all memory locations and set
.marked=0. Then you would call node_trace(RootMemoryObject), which would mark the object visited, and recurse through the objects referenced by that object's .references. Because you have the list of that object's references, you don't need the inefficient loop for x in node_pointer_list:.Without Classes
Instead of node_pointer_list, you want an array of lists of references. As you walk the datafile, you would insert each reference into the appropriate list for that pointer number. Further, you want an additional "root_references" list to hold references which appear in "ptr" entries.
To start the recursion, you would do
for reference in root_references: node_trace(reference), which would mark the reference visited (returning if it was already visited), then look up each child reference and recursively call node_trace(child_reference) on it. Because you have the list of that object's references, you don't need the inefficient loop for x in node_pointer_list:.Other Factors
The running time you are shooting for is O(N), where N is the number of nodes in your memory. You are currently O(N^2). After you remove the extra loop in
node_trace(), you will be properly O(N) because you already have a check to be sure you're not traversing a set of children more than one.Using recursion is good. An alternative is to use a work-list. To use a work list, you put the root(s) in a list, and then while the work-list is non-empty, you remove an item from the work-list and call
node_trace() on it. node_trace puts any child nodes that need to be visited into the work list, and exits. Both will have the same O(N) running time, but can potentially have different memory consumption. The recursive version's maximum memory consumption is proportional to the depth of object references. The work-list version's maximum memory consumption is hard to calculate, as it can be
Code Snippets
if item[0].find("ptr") == 0:nodes_dict[x[1]][0] = 1nodes_dict[x[1]][TRAVERSED_FLAG] = 1current_object.traversed = 1class MemoryObject:
def __init__(self):
self.marked = 0
self.references = [] # list of either <int> or <MemoryObject>Context
StackExchange Code Review Q#161908, answer score: 4
Revisions (0)
No revisions yet.