patternphpMinor
A simple compiler for a language called Jack
Viewed 0 times
simplecalledlanguageforcompilerjack
Problem
Below is the code for a compiler I created for a language called Jack. This compiler is one of the projects for the book "The Elements of Computing Systems" (http://www1.idc.ac.il/tecs/plan.html) where you build an entire computing system from the ground up. Anyway, what this compiler does is translate code from a high level language (called Jack, it looks similar to Java) and then translates it to intermediate code while also creating a parse tree from it. It takes in a list of tokens (the tokenizing is handled by another component whose code I haven't included) as input.
Can someone take a look at this code and help me improve it? I'm done with the project but I want to make a similar compiler for another language I'm gonna make for this Hack platform.
(Note: I took out some code because it went over the character limit)
```
class CompilationEngine
{
private $tokens; // Token list to operate on
private $symbol_table; // Symbol table that stores all identifiers
private $registry; // Hash table to store any additional information
private $xml; // Contains the XML code in a SimpleXMLElement object
private $vm; // Contains the VM code in a VMWriter object
private $write_xml; // Boolean flag that determines whether to output XML code
private $write_vm; // Boolean flag that determines whether to output VM code
private $current_xml; // Points to the current XML child object
private $parent_xml; // Array of parent XML objects
// Initializes the CompilationEngine object with a list of tokens
public function __construct($tokens)
{
$this->tokens = $tokens;
$this->symbol_table = new SymbolTable();
$this->registry = array();
$this->xml = new SimpleXMLElement('');
$this->vm = new VMWriter();
$this->current_xml = $this->xml[0];
$this->parent_xml = array();
$this->write_xml = false;
$this->write_vm = false;
// This is gonna be really hackish, but it's the only way I
Can someone take a look at this code and help me improve it? I'm done with the project but I want to make a similar compiler for another language I'm gonna make for this Hack platform.
(Note: I took out some code because it went over the character limit)
```
class CompilationEngine
{
private $tokens; // Token list to operate on
private $symbol_table; // Symbol table that stores all identifiers
private $registry; // Hash table to store any additional information
private $xml; // Contains the XML code in a SimpleXMLElement object
private $vm; // Contains the VM code in a VMWriter object
private $write_xml; // Boolean flag that determines whether to output XML code
private $write_vm; // Boolean flag that determines whether to output VM code
private $current_xml; // Points to the current XML child object
private $parent_xml; // Array of parent XML objects
// Initializes the CompilationEngine object with a list of tokens
public function __construct($tokens)
{
$this->tokens = $tokens;
$this->symbol_table = new SymbolTable();
$this->registry = array();
$this->xml = new SimpleXMLElement('');
$this->vm = new VMWriter();
$this->current_xml = $this->xml[0];
$this->parent_xml = array();
$this->write_xml = false;
$this->write_vm = false;
// This is gonna be really hackish, but it's the only way I
Solution
Alright, I'm not going to pretend I understand what's going on here. First, there's just too much. Second, I don't know Jack about Jack, all puns intended. I'm only going to do a once over on this. Usually I'm a bit more thorough, but you have entirely too much here. And by that I mean you really need to break this up into tinier, more efficient, classes.
First thing, since this is something that is going to be shared (you mentioned a book), then you should use proper PHPDoc as if this were a traditional API. Will make importing this into an IDE much easier and make documentation easier too. I'll demonstrate with your properties and constructor.
You can assign default property values when initially declaring the property, no need to clutter the constructor. For instance,
This for loop looks like the
If
I would try to avoid braceless statements. They can cause confusion and can lead to mistakes. This isn't Python so the syntax is inherently necessary. Maybe, as people are starting to point out to me, this is a stylistic choice, but I believe its one PHP should not have introduced, at least not halfheartedly as they have.
You should remain consistent in your style. CamelCase or Under_Score, at least for the same data type. You can switch between them for methods and variables if you want, but methods should be identical in style to other methods, as should variables. For instance, you have combined camelcase and underscore with your XML prefixed methods, all other methods are camelcase only. Also, because of the wide usage of this prefix, and other prefixes, I think this is pretty indicative of a need for a new class. You are essentially namespacing your methods.
Why are you using
I can't decide if these all caps words are constants, or just WANT to be constants. They aren't defined here, so I can't be sure. If you have them defined as constants somewhere you can skip over this section. If these aren't constants, PHP is throwing silent warnings every time this comes up saying that these are "Undefined, assuming 'TOKENTYPE_KEYWORD'". Meaning its using these as strings. If you want to use them as strings, make them strings, if you want to use them as constants, make them constants. Unless these are PHP constants, then continue.
"Don't Repeat Yourself" (DRY) Principle. Your
Another DRY violation. You have at least two methods that have
And another. You use a similar routine each time you iterate over the tokens. Make this a method.
I stopped about here, I'm starting to get lost in your code. I would suggest, if you take nothing else from this review, that you split this class up into more logical subclasses. Your "namespacing" should help you get started. Another suggestion, put this on github or something. You are liable to get a lot of help there. And let me know if you do, I might just stop by, been trying to convince myself to get into that side of things anyways.
First thing, since this is something that is going to be shared (you mentioned a book), then you should use proper PHPDoc as if this were a traditional API. Will make importing this into an IDE much easier and make documentation easier too. I'll demonstrate with your properties and constructor.
private
/** Token list to operate on */
$tokens,
/** Symbol table that stores all identifiers */
$symbol_table,
//etc...
/** Array of parent XML objects */
$parent_xml
;
/** Initializes the CompilationEngine object with a list of tokens
*
* @param array Token list to operate on
*/
public function __construct($tokens)You can assign default property values when initially declaring the property, no need to clutter the constructor. For instance,
$write_xml and $write_vm are both statically set to FALSE, so they could easily be moved into the initial declaration. There are more, these are just two.This for loop looks like the
$token array should have been a multidimensional associative array. Some things I'm seeing later on seem to confirm this suspicion. This would make needing to check for every two elements unnecessary, I'm assuming these are parameters or something? Ideally, you would want to look at changing this, if possible. I'm not sure of the source, so maybe not.If
setWriteXML() is just a toggle, why accept a flag? Same for vm. Just toggle it.public function setWriteXML() {
$this->write_xml = ! $this->write_xml;
}I would try to avoid braceless statements. They can cause confusion and can lead to mistakes. This isn't Python so the syntax is inherently necessary. Maybe, as people are starting to point out to me, this is a stylistic choice, but I believe its one PHP should not have introduced, at least not halfheartedly as they have.
You should remain consistent in your style. CamelCase or Under_Score, at least for the same data type. You can switch between them for methods and variables if you want, but methods should be identical in style to other methods, as should variables. For instance, you have combined camelcase and underscore with your XML prefixed methods, all other methods are camelcase only. Also, because of the wide usage of this prefix, and other prefixes, I think this is pretty indicative of a need for a new class. You are essentially namespacing your methods.
Why are you using
sprintf? Yes, its fine and dandy for templating larger strings, but for simpler strings you can just concatenate. Much quicker, and cleaner, and less overhead.trigger_error( "CompilationEngine: $message", E_USER_WARNING );I can't decide if these all caps words are constants, or just WANT to be constants. They aren't defined here, so I can't be sure. If you have them defined as constants somewhere you can skip over this section. If these aren't constants, PHP is throwing silent warnings every time this comes up saying that these are "Undefined, assuming 'TOKENTYPE_KEYWORD'". Meaning its using these as strings. If you want to use them as strings, make them strings, if you want to use them as constants, make them constants. Unless these are PHP constants, then continue.
"Don't Repeat Yourself" (DRY) Principle. Your
compile() method is repeating itself, moving the xml and vm out of this class and into their own will help, but you will still need to avoid calling the functions more than once.$xml = $this->write_xml ? $this->xml->asXML() : '';
$vm = $this->write_vm ? implode( "\n", $this->vm->getCode() ) : '';
if( $xml && $vm ) {
return compact( 'xml', 'vm' );
} else if( $xml || $vm ) {
return $xml . $vm;//one's empty, so concatenating won't hurt
}Another DRY violation. You have at least two methods that have
$isCommaNeeded routines that appear to be almost identical. Try combining these into a new method where you can.And another. You use a similar routine each time you iterate over the tokens. Make this a method.
$this->check(JackTokenizer::isSymbol($tokens[$position], '}'), 'Missing } symbol for end of statements in while statement');
$this->writeTokenData(array($tokens[$position]));
$position += 1;I stopped about here, I'm starting to get lost in your code. I would suggest, if you take nothing else from this review, that you split this class up into more logical subclasses. Your "namespacing" should help you get started. Another suggestion, put this on github or something. You are liable to get a lot of help there. And let me know if you do, I might just stop by, been trying to convince myself to get into that side of things anyways.
Code Snippets
private
/** Token list to operate on */
$tokens,
/** Symbol table that stores all identifiers */
$symbol_table,
//etc...
/** Array of parent XML objects */
$parent_xml
;
/** Initializes the CompilationEngine object with a list of tokens
*
* @param array Token list to operate on
*/
public function __construct($tokens)public function setWriteXML() {
$this->write_xml = ! $this->write_xml;
}trigger_error( "CompilationEngine: $message", E_USER_WARNING );$xml = $this->write_xml ? $this->xml->asXML() : '';
$vm = $this->write_vm ? implode( "\n", $this->vm->getCode() ) : '';
if( $xml && $vm ) {
return compact( 'xml', 'vm' );
} else if( $xml || $vm ) {
return $xml . $vm;//one's empty, so concatenating won't hurt
}$this->check(JackTokenizer::isSymbol($tokens[$position], '}'), 'Missing } symbol for end of statements in while statement');
$this->writeTokenData(array($tokens[$position]));
$position += 1;Context
StackExchange Code Review Q#14957, answer score: 3
Revisions (0)
No revisions yet.