HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Brainfix to NASM converter written in C (revision 2)

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
writtenrevisionbrainfixconverternasm

Problem

See this post for the previous question. I have made several enhancements to this compiler and language that make it a lot easier to work with, as well as slightly decreasing executable size (if new features are used properly). I've decided to call the new language brainfix, as I don't think anyone has used this name before.

Here's the compiler:

```
#include
#include
#include

void usage(const char *);
void compile_and_write(int);
void print_header(void);
void print_footer(void);

static FILE *in = NULL;
static FILE *out = NULL;

int main(int argc, char *argv[])
{
int ch;
int opt;

while ((opt = getopt(argc, argv, "i:o:h")) != -1) {
switch (opt) {
case 'i':
in = fopen(optarg, "r");
break;
case 'o':
out = fopen(optarg, "w");
break;
case 'h':
usage(argv[0]);
return EXIT_SUCCESS;
default:
usage(argv[0]);
return EXIT_FAILURE;
}
}

in = in ? in : stdin;
out = out ? out : stdout;

print_header();

while ((ch = fgetc(in)) != EOF) {
compile_and_write(ch);
}

compile_and_write('\0');
print_footer();

fclose(in);
fclose(out);
return EXIT_SUCCESS;
}

void print_header(void)
{
fputs("[bits 64]\n"
"[section .bss]\n"
"mem: resb 32768\n"
"hld: resb 1\n"
"[section .text]\n"
"[global _start]\n"
"putc:\n"
"xor rax, rax\n"
"xor rdi, rdi\n"
"xor rdx, rdx\n"
"inc rax\n"
"inc rdi\n"
"inc rdx\n"
"syscall\n"
"ret\n"

"getc:\n"
"xor rax, rax\n"
"xor rdi, rdi\n"
"xor rdx, rdx\n"
"inc rdx\n"
"syscall\n"
"ret\n"

"exit:\n"
"mov rax, 60\n"
"movzx rdi, byte [rsi]\n"
"syscall\n"
"ret\n"

"_start:\n"
"mov rsi,

Solution

Assembly

I don't have much to say about your assembly code; it looks very nice. I just have a few possible optimizations.

You are accessing the memory address of rsi a lot in your assembly code:

[rsi]


As you may know, memory access is inefficient and it would be best to access registers instead.

A possible optimization would be to store the value of [rsi] in a different register and to use that register instead of accessing the memory address. Then, when you need to change the memory address, simply reset the value of the register.

Here is what that would be in psuedo-code:

mov reg, [rsi]
...
... using reg instead of [rsi] ...
...
... the > instruction was entered:
inc rsi
mov reg, [rsi]


This will reduce your interaction with memory and will speed up compiled assembly code by a lot.

This recommendation would be a little difficult, and can be easily dismissed.

Right now you are compiling to assembly. If the user wishes to run this code, they must then use an external assembler to assemble the code. This can be tedious for the user (or not, if they are using a bash file).

However, you can reduce this middle step. Instead, why not compile straight to binary?

You can read up on the op-codes for each instruction that you are using and write methods for encoding them with the correct value (this can be merged in with a below tip about structs). Then, when the user compiles their code, they will have an executable all ready in the end.

Of course, this may not be the best choice. You will have to deal with platform if you are compiling to an executable and not just a binary, and the compiled binary may not be compatible with the different architectures that run your code.
C

if (ch == '#') {
    comment = comment ? 0 : 1;
}

if (comment) {
    return;
}


Not exactly sure what this is supposed to do, but it doesn't seem right to me. In Brainfuck, any non-valid Brainfuck character is a comment; you don't need to worry about what is an what isn't a comment based on a character.

Most Brainfuck code has a lot of comments (duh - it's Brainfuck). And, for every character in the code, the compile_and_write function is being called. Those characters include the comment characters.

This can slow your code down a lot. Every time this function is called, it has to do some memory prep, go through some conditionals, and go through the entire switch statement only to return.

To speed this up, the very first thing you should do in this function is see if the character is a valid Brainfuck character. If it's not, return immediately; there is no point in going on.

Right now, your "commands" are simple characters that are checked for in a big switch statement along with some code in the case statement. This is fine, but it could be more structured (get it?).

A better way to organize this would be to store the possible instructions in structs. As a note, I recommend moving this struct declaration to a different file (header file?) to keep things organized.

A sample struct might look like this:

struct Instruction {
    char symbol;
    void (*func)(void);
}


(you may need to modify this)

After you have this, create an array of the different instructions (you can just stick right on to the end of the struct definition). Then, in place of the switch statement, you can do something like this:

for(int i = 0; i symbol == last) {
        instructions[i]->func();
    }
}


note: I didn't try compiling this

Now your code is much more structured. And, you might be able to use the instruction data in other functions, too.

goto cleanup_exit;


Scream

Goto is not good practice in C. Avoid it at all costs. Here's why.

Why does that have to be a goto? Why do you have to jump to the end of the function? All you are technically doing is returning.

To fix this, move the goto label into the conditional itself and just return at the end:

if(...) {
    last = ch;
    cnt = 1;
    return;
}


if ((last == '>' && ch == '') ||
    (last == '+' && ch == '-') ||
    (last == '-' && ch == '+') ||
    (last == '(' && ch == ')') ||
    (last == ')' && ch == '(') ||
    (last == '^' && ch == '^')) {


This logic can be extracted into a separate function. It's not specifically about compiling the code, and is rather awkward there in the code.

You did a good job merging your fputs and fprintf calls in the beginning of your code, but you didn't do it for the rest of your code. Ex:

fprintf(out, "mov rcx, %u\n", cnt);
fprintf(out, "C%u:\n", call);
fputs("call putc\n", out);
fprintf(out, "loop C%u\n", call++);

Code Snippets

mov reg, [rsi]
...
... using reg instead of [rsi] ...
...
... the > instruction was entered:
inc rsi
mov reg, [rsi]
if (ch == '#') {
    comment = comment ? 0 : 1;
}

if (comment) {
    return;
}
struct Instruction {
    char symbol;
    void (*func)(void);
}
for(int i = 0; i < length_of_instruction_array; i++) {
    if(instructions[i]->symbol == last) {
        instructions[i]->func();
    }
}
goto cleanup_exit;

Context

StackExchange Code Review Q#113753, answer score: 3

Revisions (0)

No revisions yet.