HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Brainf*ck to NASM converter written in C

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
converterwrittenbrainfnasm

Problem

I have made a very simple Brainfuck to NASM converter, that is usable for practically all programs. It has one trivial optimisation (to subsitute ADD for INC with large numbers). How can I make the generated code smaller and/or faster? I would prefer smaller over faster, if I had to choose.

Here is the code:

```
#include
#include
#include
#include
#include
#include

static void usage(const char *);
static void compile_and_write(int);
static void print_header(void);
static void print_footer(void);

static FILE *in = NULL;
static FILE *out = NULL;

int
main(
int argc,
char *argv[])
{
int ch;
int opt;

while ((opt = getopt(argc, argv, "i:o:h")) != -1) {
switch (opt) {
case 'i':
in = fopen(optarg, "r");
break;
case 'o':
out = fopen(optarg, "w");
break;
case 'h':
usage(argv[0]);
return EXIT_SUCCESS;
default:
usage(argv[0]);
return EXIT_FAILURE;
}
}

if (in == NULL) {
in = stdin;
}

if (out == NULL) {
out = stdout;
}

print_header();

while ((ch = fgetc(in)) != EOF) {
compile_and_write(ch);
}

print_footer();

fclose(in);
fclose(out);
return EXIT_SUCCESS;
}

static void
print_header(
void)
{
fputs("[bits 64]\n", out);
fputs("[section .bss]\n", out);
fputs("mem:resb 32768\n", out);

fputs("[section .text]\n", out);
fputs("[global _start]\n", out);

fputs("putc:\n", out);
fputs("xor rax, rax\n", out);
fputs("inc rax\n", out);
fputs("xor rdi, rdi\n", out);
fputs("inc rdi\n", out);
fputs("xor rdx, rdx\n", out);
fputs("inc rdx\n", out);
fputs("syscall\n", out);
fputs("ret\n", out);

fputs("getc:\n", out);
fputs("xor rax, rax\n", out);
fputs("xor rdi, rdi\n", out);
fputs("xor rdx, rdx\n", out);
fputs("inc rdx\n", out);
fputs("syscall\n", out);
fput

Solution

I'll skip any consideration about C code and comment only generated assembly code (as per your question).

1) In your generated code:

XOR AX, AX
INC RAX
XOR RDI, RDI
INC RDI
XOR RDX, RDX
INC RDX


You're stalling your pipeline, shuffle your instructions and you may also gain because of using available ALUs. More about this on instruction scheduling.

XOR RAX, RAX
XOR RDI, RDI
XOR RDX, RDX
INC RAX
INC RDX
INC RDI


2) On same code I see you have often this pattern:

XOR reg, reg
INC reg


If you replace it with:

MOV reg, 1


You will have same size but it may result on a small speed improvement (but it varies according to CPU).

3) You're extensively using INC and DEC, you may have a small performance gain using ADD and SUB (at least if you do not target an ancient CPU):

INC rsi


Also simplifying (and slightly speed-up) your C code it should be rewritten as:

ADD rsi, 1


4) Now you have:

CMP BYTE [rsi], 0
JZ ...


You're testing for zero using CMP. With a small change you will not gain in size but (again) in speed:

TEST BYTE [rsi], 0
JZ ...


TEST is faster than CMP but unfortunately they both can't be macro-fused with JZ if second operand is an immediate value. Best would be to increase size in favor of speed (if you want so):

MOVZB AL, BYTE [rsi]
TEST AL, AL
JZ ...


4) If you do not really care about speed you can gain some extra bytes using LOOP instruction. For . and , commands you generate multiple CALL instructions. When cnt >= 3 you may replace this:

CALL getc
CALL getc
CALL getc


With:

C0: MOV CX, 3
    CALL getc
    LOOPNZ C0


Note that LOOP instruction is (usually) pretty bad from performance point of view then you should do this only if you really want to optimize for size. Same optimization also may apply for [ and ] refactoring code little bit more.

5) In generated code you often have patterns, if you're optimizing for size then you may detect them to compress code replacing blocks with jumps (see this somehow as the opposite of inlining). For example

  • Optimize this code using high-level knowledge (locally, across functions and whole program). You may add loops, pre-calculate invariants and so on. See also Is inline assembly language slower than native C++ code?.



  • Generate assembly code from intermediate code.



  • Optimize generated assembly code (if assembler you're using isn't doing this for you). Sometimes this kind of optimizations are called peephole optimizations and they're highly local (for example one jump followed by a MOV may simply perform a conditional MOV and you may use faster and shorter CMOV` instead).

Code Snippets

XOR AX, AX
INC RAX
XOR RDI, RDI
INC RDI
XOR RDX, RDX
INC RDX
XOR RAX, RAX
XOR RDI, RDI
XOR RDX, RDX
INC RAX
INC RDX
INC RDI
XOR reg, reg
INC reg
CMP BYTE [rsi], 0
JZ ...
TEST BYTE [rsi], 0
JZ ...

Context

StackExchange Code Review Q#113366, answer score: 4

Revisions (0)

No revisions yet.