HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Compiler for a minimal LISP dialect to run on the Java Virtual Machine

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thelispjavadialectforcompilermachineminimalvirtualrun

Problem

As the title states, this is a compiler written in C with a ruby build script that translates a minimal LISP dialect and spits out an executable jar file. I designed this LISP dialect and named it mjLisp. It has some missing features (that requires some extra parsing effort, thus more code) as you can see in the list below.

  • lambda (anonymous functions)



  • let (scoped variables)



  • begin (imperative sequencing; currently there is an extra main function to do this)



  • macros



  • error handling/exceptions



  • variadic functions



One obvious problem that I can pick out is that the parsing and translating code is somewhat boilerplate, but there may happily be a lot more problems. If you can help me improving the code please don't hesitate.

test.mjl

(def nil (list))

(def (~= a b)
    (~ (= a b))
)

(def (>= a b)
    (~ ( a b))
)

(def (reverse* a b)
    (if (= a nil)
        b
        (reverse* (cdr a) (cons (car a) b))
    )
)

(def (reverse l)
    (reverse* l nil)
)

(def (fac n)
    (if (< n 2)
        1
        (* n (fac (- n 1)))
    )
)

(def (P a b)
    (/ (fac a) (fac (- a b)))
)

(def (C a b)
    (/ (P a b) (P b b))
)

(def (pascal's-triangle-row* n n')
    (if (< n' 0)
        nil
        (cons (C n n') (pascal's-triangle-row* n (- n' 1)))
    )
)

(def (pascal's-triangle-row n)
    (pascal's-triangle-row* n n)
)

(def (pascal's-triangle* n)
    (if (< n 0)
        nil
        (cons (pascal's-triangle-row n) (pascal's-triangle* (- n 1)))
    )
)

(def (pascal's-triangle n)
    (reverse (pascal's-triangle* n))
)

(main
    (print-line (pascal's-triangle 10))
)


Output

((1) (1 1) (1 2 1) (1 3 3 1) (1 4 6 4 1) (1 5 10 10 5 1) (1 6 15 20 15 6 1) (1 7 21 35 35 21 7 1) (1 8 28 56 70 56 28 8 1) (1 9 36 84
126 126 84 36 9 1) (1 10 45 120 210 252 210 120 45 10 1))


mjlc.c

```
#include
#include
#include
#include
#include
#include
#include

#define STRLEN 1000
#define E(WHERE, FUNC) perror("error: "WHERE"."FUNC)

bool is_start_of_identi

Solution

return isalpha(c) || c == '\'' || c == '?' || c == '+' || c == '-' || c == '*' || c == '/' || c == '^' || c == '=' || c == '>' || c == '<' || c == '~';


This could be rewritten much more readably as

return isalpha(c) || (c != '\0' && strchr("'?+-*/^=><~", c) != NULL);


A good compiler will produce the same code for both. (Neither GCC nor Clang count as good compilers. ;) However, they both produce a call to __ctype_b_loc for isalpha, which is pretty bad in itself.)

When possible, avoid non-local control flow such as continue and goto. For example, I'd rewrite your skip_whitespace as simply:

bool skip_whitespace(FILE *src)
{
    int c;
    while ((c = getc(src)) != EOF) {
        if (!isspace(c)) {
            ungetc(c, src);
            return true;
        }
    }
    return false;
}


The next_identifier function also suffers from convoluted control flow. Particularly, that strcmp(dst, "if") is a code smell. Why are you checking only for the name if? What happens if the MJLisp programmer makes a function named while — does your code still work, or does it try to create a Java variable named while and blow up?

I would strongly recommend using a properly round-trippable name-mangling/encoding scheme. For example, you could use a slightly modified form of URL-encoding: each alphabetic character encodes to itself, and each punctuation character (including _) encodes to sprintf(dst, "_%02X", c). To make sure we don't collide with any Java keywords, I'd prefix each identifier with mjl. So for example the MJLisp name if would encode to mjlif, and i++ would encode to mjli_2B_2B.

Your current code also has the problem (bug?) that names containing _ are not encodable at all.

Function next_integer_literal has the problem that it treats c == ')' as a special case. I'm pretty sure you should be using that codepath for any non-digit character. And then rewrite to avoid non-local control flow:

bool next_integer_literal(char *dst, FILE *src)
{
    int i = sprintf(dst, "new BigInteger(\"");
    int c;
    while ((c = getc(src)) != EOF) {
        if (isdigit(c)) {
            dst[i++] = c;
        } else {
            ungetc(c, src);
            sprintf(dst + i, "\")");
            return true;
        }
    }
    return false;
}


Incidentally, notice how I replaced your strcpy with sprintf, there? That's so that it'll be really easy to go back later and replace all your sprintfs with snprintf. Right now, this entire program is a buffer overflow waiting to happen. It is absolutely imperative that you rewrite this whole thing with buffer lengths in mind. For example:

char *next_integer_literal(char *dst, const char *end, FILE *src)
{
    dst += snprintf(dst, end - dst, "new BigInteger(\"");
    int c;
    while ((c = getc(src)) != EOF) {
        if (isdigit(c)) {
            dst += snprintf(dst, end - dst, "%c", c);
        } else {
            ungetc(c, src);
            dst += snprintf(dst, end - dst, "\")");
            return dst;
        }
    }
    return NULL;
}


Notice that if we get all the way back up to the top level and find that dst == end, we know that we've hit the end of our buffer and therefore we can't trust the result.

char *s_ = malloc(strlen(s) + 1);
    strcpy(s_, s);
    g_ptr_array_add(dst, s_);


You fail to check for NULL returned from malloc, which is par for the course in C, so I wouldn't worry about it unless you're planning to use this code in mission-critical settings. However, you could rewrite this as a one-liner by either using a glibc extension or implementing it yourself if you're on a non-glibc platform:

g_ptr_array_add(dst, strdup(s));


You use the name dst for a char buffer, a FILE , and a GPtrArray at different points in the program. This is a bit jarring when skimming the code in a small browser window — "Wait, you can't fputs into a char pointer! ...oh, I see, it's a FILE here" —; it might be totally fine in your editor of choice.

Your main is a huge mess. First of all, you seem to have forgotten about the existence of sprintf; secondly, you should encapsulate each stage of the "compilation pipeline" into its own named function. For example,

```
int main(int argc, char **argv)
{
if (argc == 1) {
fprintf(stderr, "Usage: ...\n");
exit(EXIT_FAILURE);
}
char mjl[1000], mjl_copy[1000], java[1000], jar[1000], mtxt[1000], jar_copy[1000];
snprintf(mjl, sizeof mjl, "%s.mjl", argv[1]);
snprintf(mjl_copy, sizeof mjl_copy, "/tmp/%s.mjl", argv[1]);
snprintf(java, sizeof java, "/tmp/_%s.java", argv[1]);
snprintf(jar, sizeof jar, "/tmp/%s.jar", argv[1]);
snprintf(mtxt, sizeof mtxt, "/tmp/m.txt");
snprintf(jar_copy, sizeof jar_copy, "../%s.jar", argv[1]);
copy_file(mjl, mjl_copy);
compile_mjl_to_java(mjl_copy, java);
compile_java_to_jar(java, mtxt, jar);
copy_file(jar, jar_copy); // why?
remo

Code Snippets

return isalpha(c) || c == '\'' || c == '?' || c == '+' || c == '-' || c == '*' || c == '/' || c == '^' || c == '=' || c == '>' || c == '<' || c == '~';
return isalpha(c) || (c != '\0' && strchr("'?+-*/^=><~", c) != NULL);
bool skip_whitespace(FILE *src)
{
    int c;
    while ((c = getc(src)) != EOF) {
        if (!isspace(c)) {
            ungetc(c, src);
            return true;
        }
    }
    return false;
}
bool next_integer_literal(char *dst, FILE *src)
{
    int i = sprintf(dst, "new BigInteger(\"");
    int c;
    while ((c = getc(src)) != EOF) {
        if (isdigit(c)) {
            dst[i++] = c;
        } else {
            ungetc(c, src);
            sprintf(dst + i, "\")");
            return true;
        }
    }
    return false;
}
char *next_integer_literal(char *dst, const char *end, FILE *src)
{
    dst += snprintf(dst, end - dst, "new BigInteger(\"");
    int c;
    while ((c = getc(src)) != EOF) {
        if (isdigit(c)) {
            dst += snprintf(dst, end - dst, "%c", c);
        } else {
            ungetc(c, src);
            dst += snprintf(dst, end - dst, "\")");
            return dst;
        }
    }
    return NULL;
}

Context

StackExchange Code Review Q#119350, answer score: 4

Revisions (0)

No revisions yet.