patternpythonMinor
Assembler for CPU
Viewed 0 times
cpuassemblerfor
Problem
I recently put together an assembler for a CPU I designed. I'm looking for feedback on my program structure, formatting, or anything else. I'm self taught on all of this so I don't have opportunities for it to be reviewed. I'm also new to python programming so if anything else doesn't look right please set me on the right track.
Assembler.py
```
import sys
from tables import *
symbols = {}
# Memory map
symbols["IN0"] = 0xfff8
symbols["IN1"] = 0xfff9
symbols["OUT0"] = 0xfffa
symbols["OUT1"] = 0xfffb
def to_bin(n, bits):
n = bin(n & 2**bits-1)[2:]
return "{:0>{}}".format(n, bits)
def reg(s):
n = int(s[1:])
return to_bin(n, 3)
def value(s):
n = 0
if s[0].isdigit():
if s[:2] == "0b":
n = int(s[2:], 2)
elif s[:2] == "0x":
n = int(s[2:], 16)
else:
n = int(s)
else:
if s in symbols:
n = symbols[s] # get address
else:
print("Error: undefined symbol \"{}\"".format(s))
quit()
return n
with open("Programs/"+sys.argv[1], "r") as fileIn, \
open("Programs/"+sys.argv[1]+".asm", "w+") as fileOut:
print("### First Pass - Mapping Symbols to Addresses ###")
address = 0
for lineNum, line in enumerate(fileIn, start = 1):
tokens = line.split("#")[0].split()
if not tokens:
continue # skip empty lines
if tokens[0][-1] == ":": # found symbol
if tokens[0][:-1] in symbols:
print("Error: duplicate symbol \"{}\" on line {}".format(tokens[0], lineNum))
quit()
else:
symbols[tokens[0][:-1]] = address # add symbol to dictionary
del tokens[0]
if tokens:
address += 2 if tokens[0] == "movi" else 1
fileIn.seek(0)
print("...Done\n")
print("### Second Pass - Translating into machine code ###")
address = 0
for lineNum, line in enumerate(fileIn, start = 1):
tokens = li
Assembler.py
```
import sys
from tables import *
symbols = {}
# Memory map
symbols["IN0"] = 0xfff8
symbols["IN1"] = 0xfff9
symbols["OUT0"] = 0xfffa
symbols["OUT1"] = 0xfffb
def to_bin(n, bits):
n = bin(n & 2**bits-1)[2:]
return "{:0>{}}".format(n, bits)
def reg(s):
n = int(s[1:])
return to_bin(n, 3)
def value(s):
n = 0
if s[0].isdigit():
if s[:2] == "0b":
n = int(s[2:], 2)
elif s[:2] == "0x":
n = int(s[2:], 16)
else:
n = int(s)
else:
if s in symbols:
n = symbols[s] # get address
else:
print("Error: undefined symbol \"{}\"".format(s))
quit()
return n
with open("Programs/"+sys.argv[1], "r") as fileIn, \
open("Programs/"+sys.argv[1]+".asm", "w+") as fileOut:
print("### First Pass - Mapping Symbols to Addresses ###")
address = 0
for lineNum, line in enumerate(fileIn, start = 1):
tokens = line.split("#")[0].split()
if not tokens:
continue # skip empty lines
if tokens[0][-1] == ":": # found symbol
if tokens[0][:-1] in symbols:
print("Error: duplicate symbol \"{}\" on line {}".format(tokens[0], lineNum))
quit()
else:
symbols[tokens[0][:-1]] = address # add symbol to dictionary
del tokens[0]
if tokens:
address += 2 if tokens[0] == "movi" else 1
fileIn.seek(0)
print("...Done\n")
print("### Second Pass - Translating into machine code ###")
address = 0
for lineNum, line in enumerate(fileIn, start = 1):
tokens = li
Solution
Some simplifications
-
You can initialize the
-
The format template string, when applied to a number, can take a base specifyier. So
As regard to applying the bitmask to limit the length of the output, you also have the possibility to cut the string afterwards:
I find it somewhat clearer of what is going on, but it might be slower. You’ll need to time it if it ever turns out to be an issue.
-
When dealing with formating stuff using a template like
Second, you should document your code a bit more, especially when sharing it like that, as it may be sometimes obscure why you are doing things like you do. It makes sense eventually but it would be easier to understand with a few comments and some docstrings.
And, lastly, follow PEP8, the official coding style, if you want your code to look like Python code.
One pass algorithm
There might not be a real need to perform 2 passes over the input file. Whenever a symbol cannot be resolved, store it in a dictionnary as a key and
-
You can initialize the
symbols dictionnary in one instruction:# Memory map
symbols = {
"IN0": 0xfff8,
"IN1": 0xfff9,
"OUT0": 0xfffa,
"OUT1": 0xfffb,
}-
The format template string, when applied to a number, can take a base specifyier. So
'{:b}'.format(x) will pretty much return the same thing than bin(x) except without the '0b' prefix. You can thus turn to_bin into:def to_bin(n, bits):
return "{:0>{}b}".format(n & 2**bits-1, bits)As regard to applying the bitmask to limit the length of the output, you also have the possibility to cut the string afterwards:
def to_bin(n, bits):
return "{:0>{}b}".format(n, bits)[-bits:]I find it somewhat clearer of what is going on, but it might be slower. You’ll need to time it if it ever turns out to be an issue.
-
When dealing with formating stuff using a template like
'{:}', if ` does not contain any other parameter, it might be clearer to use the format function directly. Combine that with the fact that the print function can be used to write in files, you can turn:
fileOut.write("{:04x}".format(int(asm, 2)) + "\n")
into
print(format(int(asm, 2), '04x'), file=fileOut)
-
You can use the "magic" base 0 of the int function to let python automatically "guess" the base of your number:
>>> int('0b101', 0)
5
>>> int('0x1f', 0)
31
>>> int('42', 0)
42
Note however, that python can't disambiguate between octal and decimal if the string contains only digits but starts with a '0':
>>> int('0644', 0)
Traceback (most recent call last):
File "", line 1, in
ValueError: invalid literal for int() with base 0: '0644'
>>> int('0o644', 0)
420
>>> int('644', 0)
644
It may not apply to you, so you could simplify value to:
def value(s):
try:
return int(s, 0)
except ValueError:
try:
return symbols[s] # get address
except KeyError:
sys.exit("Error: undefined symbol \"{}\"".format(s))
A few other improvements here: use of EAFP to make the intent more direct (let's convert this value into an integer; it doesn't work? let's pick its address; still doesn't work? then give up). And use of sys.exit instead of quit that should only be used within an interactive interpreter. exit has the advantage, if passed a string as parameter, to print it to stderr and to exit with a non-zero status code. Same improvement can be made to the "invalid instruction" near the end.
-
You appear to have duplicated code to extract out comments and empty lines from your input file. Why not extract this behaviour into a function instead? This will allow you to avoid the call to seek too. And to avoid filling up the memory with the whole file at once, let's write a generator instead:
def filter_out_comments(filename):
with open(filename) as f:
for line_num, line in enumerate(fileIn, start=1):
tokens = line.split("#")[0].split()
if tokens:
yield tokens, line_num
And use it like:
with open("Programs/"+sys.argv[1]+".asm", "w+") as fileOut:
print("### First Pass - Mapping Symbols to Addresses ###")
address = 0
for tokens, line_num in filter_out_comments("Programs/" + sys.argv[1]):
if tokens[0][-1] == ":": # found symbol
...
print("### Second Pass - Translating into machine code ###")
address = 0
for tokens, line_num in filter_out_comments("Programs/" + sys.argv[1]):
asm = ""
if tokens[0] in RRR and len(tokens) == 4:
...
Some improvements
Instead of leaving some code at the top-level of the file, you should wrap it into a function. It let you test and re-use it more easily. You should also make use of the if __name__ == '__main__':` idiom:def compile_asm(filename)
with open(filename + ".asm", "w+") as fileOut:
print("### First Pass - Mapping Symbols to Addresses ###")
address = 0
for tokens, line_num in filter_out_comments(filename):
if tokens[0][-1] == ":": # found symbol
...
print("### Second Pass - Translating into machine code ###")
address = 0
for tokens, line_num in filter_out_comments(filename):
asm = ""
if tokens[0] in RRR and len(tokens) == 4:
...
if __name__ == '__main__':
compile_asm("Programs/" + sys.argv[1])Second, you should document your code a bit more, especially when sharing it like that, as it may be sometimes obscure why you are doing things like you do. It makes sense eventually but it would be easier to understand with a few comments and some docstrings.
And, lastly, follow PEP8, the official coding style, if you want your code to look like Python code.
One pass algorithm
There might not be a real need to perform 2 passes over the input file. Whenever a symbol cannot be resolved, store it in a dictionnary as a key and
Code Snippets
# Memory map
symbols = {
"IN0": 0xfff8,
"IN1": 0xfff9,
"OUT0": 0xfffa,
"OUT1": 0xfffb,
}def to_bin(n, bits):
return "{:0>{}b}".format(n & 2**bits-1, bits)def to_bin(n, bits):
return "{:0>{}b}".format(n, bits)[-bits:]fileOut.write("{:04x}".format(int(asm, 2)) + "\n")print(format(int(asm, 2), '04x'), file=fileOut)Context
StackExchange Code Review Q#146688, answer score: 7
Revisions (0)
No revisions yet.