patternjavaMinor
Compiler for a minimal LISP dialect to run on the Java Virtual Machine
Viewed 0 times
thelispjavadialectforcompilermachineminimalvirtualrun
Problem
As the title states, this is a compiler written in C with a ruby build script that translates a minimal LISP dialect and spits out an executable jar file. I designed this LISP dialect and named it mjLisp. It has some missing features (that requires some extra parsing effort, thus more code) as you can see in the list below.
One obvious problem that I can pick out is that the parsing and translating code is somewhat boilerplate, but there may happily be a lot more problems. If you can help me improving the code please don't hesitate.
test.mjl
Output
mjlc.c
```
#include
#include
#include
#include
#include
#include
#include
#define STRLEN 1000
#define E(WHERE, FUNC) perror("error: "WHERE"."FUNC)
bool is_start_of_identi
lambda(anonymous functions)
let(scoped variables)
begin(imperative sequencing; currently there is an extramainfunction to do this)
- macros
- error handling/exceptions
- variadic functions
One obvious problem that I can pick out is that the parsing and translating code is somewhat boilerplate, but there may happily be a lot more problems. If you can help me improving the code please don't hesitate.
test.mjl
(def nil (list))
(def (~= a b)
(~ (= a b))
)
(def (>= a b)
(~ ( a b))
)
(def (reverse* a b)
(if (= a nil)
b
(reverse* (cdr a) (cons (car a) b))
)
)
(def (reverse l)
(reverse* l nil)
)
(def (fac n)
(if (< n 2)
1
(* n (fac (- n 1)))
)
)
(def (P a b)
(/ (fac a) (fac (- a b)))
)
(def (C a b)
(/ (P a b) (P b b))
)
(def (pascal's-triangle-row* n n')
(if (< n' 0)
nil
(cons (C n n') (pascal's-triangle-row* n (- n' 1)))
)
)
(def (pascal's-triangle-row n)
(pascal's-triangle-row* n n)
)
(def (pascal's-triangle* n)
(if (< n 0)
nil
(cons (pascal's-triangle-row n) (pascal's-triangle* (- n 1)))
)
)
(def (pascal's-triangle n)
(reverse (pascal's-triangle* n))
)
(main
(print-line (pascal's-triangle 10))
)Output
((1) (1 1) (1 2 1) (1 3 3 1) (1 4 6 4 1) (1 5 10 10 5 1) (1 6 15 20 15 6 1) (1 7 21 35 35 21 7 1) (1 8 28 56 70 56 28 8 1) (1 9 36 84
126 126 84 36 9 1) (1 10 45 120 210 252 210 120 45 10 1))mjlc.c
```
#include
#include
#include
#include
#include
#include
#include
#define STRLEN 1000
#define E(WHERE, FUNC) perror("error: "WHERE"."FUNC)
bool is_start_of_identi
Solution
return isalpha(c) || c == '\'' || c == '?' || c == '+' || c == '-' || c == '*' || c == '/' || c == '^' || c == '=' || c == '>' || c == '<' || c == '~';This could be rewritten much more readably as
return isalpha(c) || (c != '\0' && strchr("'?+-*/^=><~", c) != NULL);A good compiler will produce the same code for both. (Neither GCC nor Clang count as good compilers. ;) However, they both produce a call to
__ctype_b_loc for isalpha, which is pretty bad in itself.)When possible, avoid non-local control flow such as
continue and goto. For example, I'd rewrite your skip_whitespace as simply:bool skip_whitespace(FILE *src)
{
int c;
while ((c = getc(src)) != EOF) {
if (!isspace(c)) {
ungetc(c, src);
return true;
}
}
return false;
}The
next_identifier function also suffers from convoluted control flow. Particularly, that strcmp(dst, "if") is a code smell. Why are you checking only for the name if? What happens if the MJLisp programmer makes a function named while — does your code still work, or does it try to create a Java variable named while and blow up?I would strongly recommend using a properly round-trippable name-mangling/encoding scheme. For example, you could use a slightly modified form of URL-encoding: each alphabetic character encodes to itself, and each punctuation character (including
_) encodes to sprintf(dst, "_%02X", c). To make sure we don't collide with any Java keywords, I'd prefix each identifier with mjl. So for example the MJLisp name if would encode to mjlif, and i++ would encode to mjli_2B_2B.Your current code also has the problem (bug?) that names containing
_ are not encodable at all.Function
next_integer_literal has the problem that it treats c == ')' as a special case. I'm pretty sure you should be using that codepath for any non-digit character. And then rewrite to avoid non-local control flow:bool next_integer_literal(char *dst, FILE *src)
{
int i = sprintf(dst, "new BigInteger(\"");
int c;
while ((c = getc(src)) != EOF) {
if (isdigit(c)) {
dst[i++] = c;
} else {
ungetc(c, src);
sprintf(dst + i, "\")");
return true;
}
}
return false;
}Incidentally, notice how I replaced your
strcpy with sprintf, there? That's so that it'll be really easy to go back later and replace all your sprintfs with snprintf. Right now, this entire program is a buffer overflow waiting to happen. It is absolutely imperative that you rewrite this whole thing with buffer lengths in mind. For example:char *next_integer_literal(char *dst, const char *end, FILE *src)
{
dst += snprintf(dst, end - dst, "new BigInteger(\"");
int c;
while ((c = getc(src)) != EOF) {
if (isdigit(c)) {
dst += snprintf(dst, end - dst, "%c", c);
} else {
ungetc(c, src);
dst += snprintf(dst, end - dst, "\")");
return dst;
}
}
return NULL;
}Notice that if we get all the way back up to the top level and find that
dst == end, we know that we've hit the end of our buffer and therefore we can't trust the result.char *s_ = malloc(strlen(s) + 1);
strcpy(s_, s);
g_ptr_array_add(dst, s_);You fail to check for
NULL returned from malloc, which is par for the course in C, so I wouldn't worry about it unless you're planning to use this code in mission-critical settings. However, you could rewrite this as a one-liner by either using a glibc extension or implementing it yourself if you're on a non-glibc platform:g_ptr_array_add(dst, strdup(s));You use the name
dst for a char buffer, a FILE , and a GPtrArray at different points in the program. This is a bit jarring when skimming the code in a small browser window — "Wait, you can't fputs into a char pointer! ...oh, I see, it's a FILE here" —; it might be totally fine in your editor of choice.Your
main is a huge mess. First of all, you seem to have forgotten about the existence of sprintf; secondly, you should encapsulate each stage of the "compilation pipeline" into its own named function. For example,```
int main(int argc, char **argv)
{
if (argc == 1) {
fprintf(stderr, "Usage: ...\n");
exit(EXIT_FAILURE);
}
char mjl[1000], mjl_copy[1000], java[1000], jar[1000], mtxt[1000], jar_copy[1000];
snprintf(mjl, sizeof mjl, "%s.mjl", argv[1]);
snprintf(mjl_copy, sizeof mjl_copy, "/tmp/%s.mjl", argv[1]);
snprintf(java, sizeof java, "/tmp/_%s.java", argv[1]);
snprintf(jar, sizeof jar, "/tmp/%s.jar", argv[1]);
snprintf(mtxt, sizeof mtxt, "/tmp/m.txt");
snprintf(jar_copy, sizeof jar_copy, "../%s.jar", argv[1]);
copy_file(mjl, mjl_copy);
compile_mjl_to_java(mjl_copy, java);
compile_java_to_jar(java, mtxt, jar);
copy_file(jar, jar_copy); // why?
remo
Code Snippets
return isalpha(c) || c == '\'' || c == '?' || c == '+' || c == '-' || c == '*' || c == '/' || c == '^' || c == '=' || c == '>' || c == '<' || c == '~';return isalpha(c) || (c != '\0' && strchr("'?+-*/^=><~", c) != NULL);bool skip_whitespace(FILE *src)
{
int c;
while ((c = getc(src)) != EOF) {
if (!isspace(c)) {
ungetc(c, src);
return true;
}
}
return false;
}bool next_integer_literal(char *dst, FILE *src)
{
int i = sprintf(dst, "new BigInteger(\"");
int c;
while ((c = getc(src)) != EOF) {
if (isdigit(c)) {
dst[i++] = c;
} else {
ungetc(c, src);
sprintf(dst + i, "\")");
return true;
}
}
return false;
}char *next_integer_literal(char *dst, const char *end, FILE *src)
{
dst += snprintf(dst, end - dst, "new BigInteger(\"");
int c;
while ((c = getc(src)) != EOF) {
if (isdigit(c)) {
dst += snprintf(dst, end - dst, "%c", c);
} else {
ungetc(c, src);
dst += snprintf(dst, end - dst, "\")");
return dst;
}
}
return NULL;
}Context
StackExchange Code Review Q#119350, answer score: 4
Revisions (0)
No revisions yet.