patternjavaMinor
Brainfuck to x86 Assembly Compiler
Viewed 0 times
brainfuckassemblyx86compiler
Problem
After my Brainfuck Interpreter written in x86 Assembly I decided that it was time to get to the next step, writing a Brainfuck Compiler in Java that generates x86 Assembly and compiles that to an executable.
Currently it only supports Windows and it still uses NASM and GCC as dependencies to transform the x86 Assembly code to actual executables. This is not ideal, but this is also the easiest way.
The compiler is currently still completely unoptimized and will be very happy to repeatedly increment/decrement pointers and/or memory locations.
I don't have a video this time, but my interpreter executed a Mandelbrot BF file in 60 seconds, and the executable obtained from this compiler executes it in 4 seconds.
My code intends to follow a generic compiler design, consisting of the following phases:
There are a few notable things to mention:
Now onto the code, let's start with the AST and the Expression interface:
Every code fragment in my program is a subclass of Expression`, m
Currently it only supports Windows and it still uses NASM and GCC as dependencies to transform the x86 Assembly code to actual executables. This is not ideal, but this is also the easiest way.
The compiler is currently still completely unoptimized and will be very happy to repeatedly increment/decrement pointers and/or memory locations.
I don't have a video this time, but my interpreter executed a Mandelbrot BF file in 60 seconds, and the executable obtained from this compiler executes it in 4 seconds.
My code intends to follow a generic compiler design, consisting of the following phases:
- Reading the source code file
- The Lexical Analyzer which translates the source file to a stream of tokens
- The Syntax Analyzer which builds an Abstract Syntax Tree (AST) from the stream of tokens and checks simple things like if loops match
- The Intermediate Code Generator which takes the Source AST and transforms it into an Intermediate AST
- The Target Code Generator which takes the Intermediate AST and transforms it into a Target AST
- The Target Code Writer which takes the Target AST and writes actual x86 Assembly code as output.
There are a few notable things to mention:
- There is no Semantic Analyzer. In a more complete language such phase would check whether assignment happens between compatible types, etc. Since Brainfuck is not advanced enough there is no need for this phase.
- The Intermediate AST is not directly used right now, in a later phase all optimizations will be done on this Intermediate AST. Think about replacing
+++by the pseudocodeaddmemorycell(3),
Now onto the code, let's start with the AST and the Expression interface:
public interface Expression {
}Every code fragment in my program is a subclass of Expression`, m
Solution
Your TargetCodeGenerator's generateAST has multiple abstraction levels embedded into one method.
And it's a long method.
It has all of these details that are pretty complex - when I see functionality like
go by, I start to skim instead of reading. What's up with all of these super detailed things?
What helps, of course, is this commment a bit up:
It's a header.
Before I go back and talk about the complexity of generateAST, let's take a quick look at that comment - it says "do not modify".
Who's the target audience for that comment? Probably just you. But you're also the person who (sort of) knows why that warning is there.
Have you considered turning your comments into functions?
I turned the bodies of the add child bits into ellipses.
I think the
The rest probably goes into a
What you don't want to see (which is what you have right now) is 70 lines (if not more, I just estimated) of code which look like the initialization code generated by those GUI tools that IDE's have. The sort of code you skim over without a second thought because it's auto generated.
And the main reason this is so important is because the compilation is hidden between all these steps.
Between all these fixed values/node creations is this line
And that's not a line that should be hidden because it's the actual code.
...
And maybe, when you split the code up into many small functions, you'll find a way to get rid of the duplication between adding Add and Mov operations. They both seem to take two arguments.
Something to convert that into
Because then you get rid of the cruft and show the relevant code.
In short,
Write something that will create Expressions from Expressions, without you having to create ASTNodes by yourself. If you need the more complex structure you can create the ASTNodes but the ASTNodes seem like noise when defining a program.
Using the simplified manner of creatin
And it's a long method.
It has all of these details that are pretty complex - when I see functionality like
dataSectionNode.addChild(ASTNode.newWithChildren(new DefineByteExpression(), Arrays.asList(
ASTNode.newWithChild(new IdentifierExpression(), new ASTNode(new StringExpression("write_mode"))),
ASTNode.newWithChild(new ValueExpression(), ASTNode.newWithChildren(new ValueListExpression(), Arrays.asList(
new ASTNode(new StringExpression("w")),
new ASTNode(new IntegerExpression(0))
)))
)));go by, I start to skim instead of reading. What's up with all of these super detailed things?
What helps, of course, is this commment a bit up:
// Header of the BF.exe program -do not modify-It's a header.
Before I go back and talk about the complexity of generateAST, let's take a quick look at that comment - it says "do not modify".
Who's the target audience for that comment? Probably just you. But you're also the person who (sort of) knows why that warning is there.
Have you considered turning your comments into functions?
ASTNode externNode = new ASTNode(new ExternExpression());
...
rootNode.addChild(externNode);
ASTNode dataSectionNode = new ASTNode(new DataSectionExpression());
...
rootNode.addChild(dataSectionNode);
ASTNode bssSectionNode = new ASTNode(new BssSectionExpression());
...
rootNode.addChild(bssSectionNode);
ASTNode textSectionNode = new ASTNode(new TextSectionExpression());
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...;
textSectionNode.addChild(...);
// Code Generated from the Intermediate AST
convertIntermediateToTargetNodes(intermediateAST.getRoot(), "").forEach(textSectionNode::addChild);
// Bottom of the BF.exe program -do not modify-
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(new ASTNode(new RetExpression()));
rootNode.addChild(textSectionNode);I turned the bodies of the add child bits into ellipses.
I think the
externNode and the dataSectionNode and the bssSectionNode could all be part of a addHeader method. Perhaps there being separate functions for creating such an externNode and dataSectionNode etc.The rest probably goes into a
createBody method - there is no footer, to be honest, because everything after the header goes into the textSectionNode. It seems to be something like defining calloc and a main method or something. I'd expect createBody to consider of lines like textSectionNode.addChild(createNodeForCalloc()). Of course, there has to be a body inside that too so maybe the createBody needs the intermediate AST.What you don't want to see (which is what you have right now) is 70 lines (if not more, I just estimated) of code which look like the initialization code generated by those GUI tools that IDE's have. The sort of code you skim over without a second thought because it's auto generated.
And the main reason this is so important is because the compilation is hidden between all these steps.
Between all these fixed values/node creations is this line
// Code Generated from the Intermediate AST
convertIntermediateToTargetNodes(intermediateAST.getRoot(), "").forEach(textSectionNode::addChild);And that's not a line that should be hidden because it's the actual code.
...
And maybe, when you split the code up into many small functions, you'll find a way to get rid of the duplication between adding Add and Mov operations. They both seem to take two arguments.
textSectionNode.addChild(ASTNode.newWithChildren(new AddExpression(), Arrays.asList(
ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.ESP))),
ASTNode.newWithChild(new OperandExpression(), new ASTNode(new IntegerExpression(8)))
)));Something to convert that into
textSectionNode.addChild(createNewAddNode(new RegisterExpression(Register.ESP), new IntegerExpression(8)));Because then you get rid of the cruft and show the relevant code.
In short,
Write something that will create Expressions from Expressions, without you having to create ASTNodes by yourself. If you need the more complex structure you can create the ASTNodes but the ASTNodes seem like noise when defining a program.
Using the simplified manner of creatin
Code Snippets
dataSectionNode.addChild(ASTNode.newWithChildren(new DefineByteExpression(), Arrays.asList(
ASTNode.newWithChild(new IdentifierExpression(), new ASTNode(new StringExpression("write_mode"))),
ASTNode.newWithChild(new ValueExpression(), ASTNode.newWithChildren(new ValueListExpression(), Arrays.asList(
new ASTNode(new StringExpression("w")),
new ASTNode(new IntegerExpression(0))
)))
)));// Header of the BF.exe program -do not modify-ASTNode externNode = new ASTNode(new ExternExpression());
...
rootNode.addChild(externNode);
ASTNode dataSectionNode = new ASTNode(new DataSectionExpression());
...
rootNode.addChild(dataSectionNode);
ASTNode bssSectionNode = new ASTNode(new BssSectionExpression());
...
rootNode.addChild(bssSectionNode);
ASTNode textSectionNode = new ASTNode(new TextSectionExpression());
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...;
textSectionNode.addChild(...);
// Code Generated from the Intermediate AST
convertIntermediateToTargetNodes(intermediateAST.getRoot(), "").forEach(textSectionNode::addChild);
// Bottom of the BF.exe program -do not modify-
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(...);
textSectionNode.addChild(new ASTNode(new RetExpression()));
rootNode.addChild(textSectionNode);// Code Generated from the Intermediate AST
convertIntermediateToTargetNodes(intermediateAST.getRoot(), "").forEach(textSectionNode::addChild);textSectionNode.addChild(ASTNode.newWithChildren(new AddExpression(), Arrays.asList(
ASTNode.newWithChild(new OperandExpression(), new ASTNode(new RegisterExpression(Register.ESP))),
ASTNode.newWithChild(new OperandExpression(), new ASTNode(new IntegerExpression(8)))
)));Context
StackExchange Code Review Q#147806, answer score: 7
Revisions (0)
No revisions yet.