patterncMinor
Brainf*ck to NASM converter written in C
Viewed 0 times
converterwrittenbrainfnasm
Problem
I have made a very simple Brainfuck to NASM converter, that is usable for practically all programs. It has one trivial optimisation (to subsitute
Here is the code:
```
#include
#include
#include
#include
#include
#include
static void usage(const char *);
static void compile_and_write(int);
static void print_header(void);
static void print_footer(void);
static FILE *in = NULL;
static FILE *out = NULL;
int
main(
int argc,
char *argv[])
{
int ch;
int opt;
while ((opt = getopt(argc, argv, "i:o:h")) != -1) {
switch (opt) {
case 'i':
in = fopen(optarg, "r");
break;
case 'o':
out = fopen(optarg, "w");
break;
case 'h':
usage(argv[0]);
return EXIT_SUCCESS;
default:
usage(argv[0]);
return EXIT_FAILURE;
}
}
if (in == NULL) {
in = stdin;
}
if (out == NULL) {
out = stdout;
}
print_header();
while ((ch = fgetc(in)) != EOF) {
compile_and_write(ch);
}
print_footer();
fclose(in);
fclose(out);
return EXIT_SUCCESS;
}
static void
print_header(
void)
{
fputs("[bits 64]\n", out);
fputs("[section .bss]\n", out);
fputs("mem:resb 32768\n", out);
fputs("[section .text]\n", out);
fputs("[global _start]\n", out);
fputs("putc:\n", out);
fputs("xor rax, rax\n", out);
fputs("inc rax\n", out);
fputs("xor rdi, rdi\n", out);
fputs("inc rdi\n", out);
fputs("xor rdx, rdx\n", out);
fputs("inc rdx\n", out);
fputs("syscall\n", out);
fputs("ret\n", out);
fputs("getc:\n", out);
fputs("xor rax, rax\n", out);
fputs("xor rdi, rdi\n", out);
fputs("xor rdx, rdx\n", out);
fputs("inc rdx\n", out);
fputs("syscall\n", out);
fput
ADD for INC with large numbers). How can I make the generated code smaller and/or faster? I would prefer smaller over faster, if I had to choose.Here is the code:
```
#include
#include
#include
#include
#include
#include
static void usage(const char *);
static void compile_and_write(int);
static void print_header(void);
static void print_footer(void);
static FILE *in = NULL;
static FILE *out = NULL;
int
main(
int argc,
char *argv[])
{
int ch;
int opt;
while ((opt = getopt(argc, argv, "i:o:h")) != -1) {
switch (opt) {
case 'i':
in = fopen(optarg, "r");
break;
case 'o':
out = fopen(optarg, "w");
break;
case 'h':
usage(argv[0]);
return EXIT_SUCCESS;
default:
usage(argv[0]);
return EXIT_FAILURE;
}
}
if (in == NULL) {
in = stdin;
}
if (out == NULL) {
out = stdout;
}
print_header();
while ((ch = fgetc(in)) != EOF) {
compile_and_write(ch);
}
print_footer();
fclose(in);
fclose(out);
return EXIT_SUCCESS;
}
static void
print_header(
void)
{
fputs("[bits 64]\n", out);
fputs("[section .bss]\n", out);
fputs("mem:resb 32768\n", out);
fputs("[section .text]\n", out);
fputs("[global _start]\n", out);
fputs("putc:\n", out);
fputs("xor rax, rax\n", out);
fputs("inc rax\n", out);
fputs("xor rdi, rdi\n", out);
fputs("inc rdi\n", out);
fputs("xor rdx, rdx\n", out);
fputs("inc rdx\n", out);
fputs("syscall\n", out);
fputs("ret\n", out);
fputs("getc:\n", out);
fputs("xor rax, rax\n", out);
fputs("xor rdi, rdi\n", out);
fputs("xor rdx, rdx\n", out);
fputs("inc rdx\n", out);
fputs("syscall\n", out);
fput
Solution
I'll skip any consideration about C code and comment only generated assembly code (as per your question).
1) In your generated code:
You're stalling your pipeline, shuffle your instructions and you may also gain because of using available ALUs. More about this on instruction scheduling.
2) On same code I see you have often this pattern:
If you replace it with:
You will have same size but it may result on a small speed improvement (but it varies according to CPU).
3) You're extensively using
Also simplifying (and slightly speed-up) your C code it should be rewritten as:
4) Now you have:
You're testing for zero using
4) If you do not really care about speed you can gain some extra bytes using
With:
Note that
5) In generated code you often have patterns, if you're optimizing for size then you may detect them to compress code replacing blocks with jumps (see this somehow as the opposite of inlining). For example
1) In your generated code:
XOR AX, AX
INC RAX
XOR RDI, RDI
INC RDI
XOR RDX, RDX
INC RDXYou're stalling your pipeline, shuffle your instructions and you may also gain because of using available ALUs. More about this on instruction scheduling.
XOR RAX, RAX
XOR RDI, RDI
XOR RDX, RDX
INC RAX
INC RDX
INC RDI2) On same code I see you have often this pattern:
XOR reg, reg
INC regIf you replace it with:
MOV reg, 1You will have same size but it may result on a small speed improvement (but it varies according to CPU).
3) You're extensively using
INC and DEC, you may have a small performance gain using ADD and SUB (at least if you do not target an ancient CPU):INC rsiAlso simplifying (and slightly speed-up) your C code it should be rewritten as:
ADD rsi, 14) Now you have:
CMP BYTE [rsi], 0
JZ ...You're testing for zero using
CMP. With a small change you will not gain in size but (again) in speed:TEST BYTE [rsi], 0
JZ ...TEST is faster than CMP but unfortunately they both can't be macro-fused with JZ if second operand is an immediate value. Best would be to increase size in favor of speed (if you want so):MOVZB AL, BYTE [rsi]
TEST AL, AL
JZ ...4) If you do not really care about speed you can gain some extra bytes using
LOOP instruction. For . and , commands you generate multiple CALL instructions. When cnt >= 3 you may replace this:CALL getc
CALL getc
CALL getcWith:
C0: MOV CX, 3
CALL getc
LOOPNZ C0Note that
LOOP instruction is (usually) pretty bad from performance point of view then you should do this only if you really want to optimize for size. Same optimization also may apply for [ and ] refactoring code little bit more.5) In generated code you often have patterns, if you're optimizing for size then you may detect them to compress code replacing blocks with jumps (see this somehow as the opposite of inlining). For example
- Optimize this code using high-level knowledge (locally, across functions and whole program). You may add loops, pre-calculate invariants and so on. See also Is inline assembly language slower than native C++ code?.
- Generate assembly code from intermediate code.
- Optimize generated assembly code (if assembler you're using isn't doing this for you). Sometimes this kind of optimizations are called peephole optimizations and they're highly local (for example one jump followed by a
MOV may simply perform a conditional MOV and you may use faster and shorter CMOV` instead).Code Snippets
XOR AX, AX
INC RAX
XOR RDI, RDI
INC RDI
XOR RDX, RDX
INC RDXXOR RAX, RAX
XOR RDI, RDI
XOR RDX, RDX
INC RAX
INC RDX
INC RDIXOR reg, reg
INC regCMP BYTE [rsi], 0
JZ ...TEST BYTE [rsi], 0
JZ ...Context
StackExchange Code Review Q#113366, answer score: 4
Revisions (0)
No revisions yet.