patternpythonMinor

Assembler for CPU

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

cpuassemblerfor

Problem

I recently put together an assembler for a CPU I designed. I'm looking for feedback on my program structure, formatting, or anything else. I'm self taught on all of this so I don't have opportunities for it to be reviewed. I'm also new to python programming so if anything else doesn't look right please set me on the right track.

Assembler.py

```
import sys
from tables import *

symbols = {}
# Memory map
symbols["IN0"] = 0xfff8
symbols["IN1"] = 0xfff9
symbols["OUT0"] = 0xfffa
symbols["OUT1"] = 0xfffb

def to_bin(n, bits):
n = bin(n & 2**bits-1)[2:]
return "{:0>{}}".format(n, bits)

def reg(s):
n = int(s[1:])
return to_bin(n, 3)

def value(s):
n = 0
if s[0].isdigit():
if s[:2] == "0b":
n = int(s[2:], 2)
elif s[:2] == "0x":
n = int(s[2:], 16)
else:
n = int(s)
else:
if s in symbols:
n = symbols[s] # get address
else:
print("Error: undefined symbol \"{}\"".format(s))
quit()
return n

with open("Programs/"+sys.argv[1], "r") as fileIn, \
open("Programs/"+sys.argv[1]+".asm", "w+") as fileOut:

print("### First Pass - Mapping Symbols to Addresses ###")
address = 0
for lineNum, line in enumerate(fileIn, start = 1):
tokens = line.split("#")[0].split()
if not tokens:
continue # skip empty lines

if tokens[0][-1] == ":": # found symbol
if tokens[0][:-1] in symbols:
print("Error: duplicate symbol \"{}\" on line {}".format(tokens[0], lineNum))
quit()
else:
symbols[tokens[0][:-1]] = address # add symbol to dictionary
del tokens[0]

if tokens:
address += 2 if tokens[0] == "movi" else 1

fileIn.seek(0)
print("...Done\n")

print("### Second Pass - Translating into machine code ###")
address = 0
for lineNum, line in enumerate(fileIn, start = 1):
tokens = li

Solution

Some simplifications

-
You can initialize the symbols dictionnary in one instruction:

# Memory map
symbols = {
    "IN0": 0xfff8,
    "IN1": 0xfff9,
    "OUT0": 0xfffa,
    "OUT1": 0xfffb,
}

-
The format template string, when applied to a number, can take a base specifyier. So '{:b}'.format(x) will pretty much return the same thing than bin(x) except without the '0b' prefix. You can thus turn to_bin into:

def to_bin(n, bits):
    return "{:0>{}b}".format(n & 2**bits-1, bits)

As regard to applying the bitmask to limit the length of the output, you also have the possibility to cut the string afterwards:

def to_bin(n, bits):
    return "{:0>{}b}".format(n, bits)[-bits:]

I find it somewhat clearer of what is going on, but it might be slower. You’ll need to time it if it ever turns out to be an issue.

-
When dealing with formating stuff using a template like '{:}', if ` does not contain any other parameter, it might be clearer to use the format function directly. Combine that with the fact that the print

 function can be used to write in files, you can turn:

fileOut.write("{:04x}".format(int(asm, 2)) + "\n")


into

print(format(int(asm, 2), '04x'), file=fileOut)


- 
You can use the "magic" base

0 of the int

 function to let python automatically "guess" the base of your number:

>>> int('0b101', 0)
5
>>> int('0x1f', 0)
31
>>> int('42', 0)
42


Note however, that python can't disambiguate between octal and decimal if the string contains only digits but starts with a

'0'

:

>>> int('0644', 0)
Traceback (most recent call last):
  File "", line 1, in 
ValueError: invalid literal for int() with base 0: '0644'
>>> int('0o644', 0)
420
>>> int('644', 0)
644


It may not apply to you, so you could simplify

value

 to:

def value(s):
    try:
        return int(s, 0)
    except ValueError:
        try:
            return symbols[s] # get address
        except KeyError:
            sys.exit("Error: undefined symbol \"{}\"".format(s))


A few other improvements here: use of EAFP to make the intent more direct (let's convert this value into an integer; it doesn't work? let's pick its address; still doesn't work? then give up). And use of

sys.exit instead of quit that should only be used within an interactive interpreter. exit has the advantage, if passed a string as parameter, to print it to stderr

 and to exit with a non-zero status code. Same improvement can be made to the "invalid instruction" near the end.

- 
You appear to have duplicated code to extract out comments and empty lines from your input file. Why not extract this behaviour into a function instead? This will allow you to avoid the call to

seek

 too. And to avoid filling up the memory with the whole file at once, let's write a generator instead:

def filter_out_comments(filename):
    with open(filename) as f:
        for line_num, line in enumerate(fileIn, start=1):
            tokens = line.split("#")[0].split()
            if tokens:
                yield tokens, line_num


And use it like:

with open("Programs/"+sys.argv[1]+".asm", "w+") as fileOut:
    print("### First Pass - Mapping Symbols to Addresses ###")
    address = 0
    for tokens, line_num in filter_out_comments("Programs/" + sys.argv[1]):
        if tokens[0][-1] == ":": # found symbol
    ...
    print("### Second Pass - Translating into machine code ###")
    address = 0
    for tokens, line_num in filter_out_comments("Programs/" + sys.argv[1]):
        asm = ""
        if tokens[0] in RRR and len(tokens) == 4:
            ...


Some improvements

Instead of leaving some code at the top-level of the file, you should wrap it into a function. It let you test and re-use it more easily. You should also make use of the

if __name__ == '__main__':` idiom:

def compile_asm(filename)
    with open(filename + ".asm", "w+") as fileOut:
        print("### First Pass - Mapping Symbols to Addresses ###")
        address = 0
        for tokens, line_num in filter_out_comments(filename):
            if tokens[0][-1] == ":": # found symbol
        ...
        print("### Second Pass - Translating into machine code ###")
        address = 0
        for tokens, line_num in filter_out_comments(filename):
            asm = ""
            if tokens[0] in RRR and len(tokens) == 4:
                ...

if __name__ == '__main__':
    compile_asm("Programs/" + sys.argv[1])

Second, you should document your code a bit more, especially when sharing it like that, as it may be sometimes obscure why you are doing things like you do. It makes sense eventually but it would be easier to understand with a few comments and some docstrings.

And, lastly, follow PEP8, the official coding style, if you want your code to look like Python code.

One pass algorithm

There might not be a real need to perform 2 passes over the input file. Whenever a symbol cannot be resolved, store it in a dictionnary as a key and

Code Snippets

# Memory map
symbols = {
    "IN0": 0xfff8,
    "IN1": 0xfff9,
    "OUT0": 0xfffa,
    "OUT1": 0xfffb,
}

def to_bin(n, bits):
    return "{:0>{}b}".format(n & 2**bits-1, bits)

def to_bin(n, bits):
    return "{:0>{}b}".format(n, bits)[-bits:]

fileOut.write("{:04x}".format(int(asm, 2)) + "\n")

print(format(int(asm, 2), '04x'), file=fileOut)

Context

StackExchange Code Review Q#146688, answer score: 7

Revisions (0)

No revisions yet.