HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Reverse Polish notation based compiler

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
polishreversenotationbasedcompiler

Problem

Description

  • Very small subset of Forth



  • This is a proof of concept level compiler, no optimizations or over/underflow checking



  • See the embedded POD for more information



  • NASM is used as assembler



  • gcc is used to link with glibc



  • 32bit ELF Binary is generated



bhathiforth.pl

`#!/usr/bin/perl

use strict;
use warnings;
use feature qw(say);

sub tokenize {
my $fullcode = shift;
if ( not defined $fullcode ) {
die "Invalid Arguments";
}
my @tokens;
while ( $fullcode =~ /([0-9]+|\+|\-|\*|\/|\.)/g ) {
push @tokens, $1;
}
return @tokens;
}

sub generate_assembly {

my @tokens = @{ $_[0] };
if ( not @tokens ) {
die "Invalid Arguments";
}

my $assembly = "section .text\nglobal main\nextern printf\nmain:\n";
say "Tokens";
say "==================";
foreach (@tokens) {

say "";

if ( $_ =~ /[0-9]+/ ) {
$assembly .= "push $_\n";
}
elsif ( $_ eq "+" ) {
$assembly .= "pop ebx\npop eax\nadd eax,ebx\npush eax\n";
}
elsif ( $_ eq "-" ) {
$assembly .= "pop ebx\npop eax\nsub eax,ebx\npush eax\n";
}
elsif ( $_ eq "/" ) {
$assembly .= "mov edx,0\npop ecx\npop eax\ndiv ecx\npush eax\n";
}
elsif ( $_ eq "*" ) {
$assembly .= "mov edx,0\npop ecx\npop eax\nmul ecx\npush eax\n";
}
elsif ( $_ eq "." ) {
$assembly .= "push message\ncall printf\nadd esp, 8\n";
}
}
$assembly .= "ret\nmessage db \"%d\", 10, 0;";
say "==================";
return $assembly;
}

my $version = "0.1";

say "Welcome to BhathiFoth compiler v$version";
say "========================================";

my $source = shift @ARGV;
my $output = shift @ARGV;

if ( not defined $source or not defined $output ) {
say
"Invalid Commandline arguments.\n\nUSAGE:\n% ./bhathiforth.pl ";
exit;
}

open my $CODE, " ) {
$fullcode .= $line;
}

close $

Solution

tokenize

The tokenize subroutine could be simplified:

sub tokenize {
    my ($code) = @_;
    die "Invalid Arguments" unless defined $code;
    return $code =~ m!\d+|[-+*/.]!g;
}


Changes include:

  • Shorter parameter name



  • One-line validation



  • Use global match in list context to produce a list of all matches



  • Simpler regex that avoids leaning toothpick syndrome



Note that any unrecognized token is treated as a comment, which is quite lenient.

generate_assembly

For readability, I would just pass the tokens as a list rather than as a reference to a list.

I don't recommend printing output as as side-effect: it hinders code reuse.

The assembly code for the operators could be produced by a hash lookup.

main

A convention for declaring version numbers is

our $VERSION = 0.1;


An double_underline() subroutine could be useful.

sub double_underline {
    my ($text) = @_;
    return $text . "\n" . ('=' x len($text));
}

say double_underline("Welcome to BhathiForth compiler v$VERSION");  # Fixed typo "Foth"


To read a file fully, you don't need a loop. Use "slurp mode":

local $/ = undef;
my $code = ;

Code Snippets

sub tokenize {
    my ($code) = @_;
    die "Invalid Arguments" unless defined $code;
    return $code =~ m!\d+|[-+*/.]!g;
}
our $VERSION = 0.1;
sub double_underline {
    my ($text) = @_;
    return $text . "\n" . ('=' x len($text));
}

say double_underline("Welcome to BhathiForth compiler v$VERSION");  # Fixed typo "Foth"
local $/ = undef;
my $code = <$CODE>;

Context

StackExchange Code Review Q#67480, answer score: 11

Revisions (0)

No revisions yet.