patternMinor
Implementing a Compiler with Macros
Viewed 0 times
withmacrosimplementingcompiler
Problem
I've been thinking about embedded languages (domain-specific or otherwise), and in particular the approach where we define a datatype to represent terms of the embedded language, e.g. (in Haskell):
We then provide a library of functions which allow us to write programs, like
My question is: can we implement a general-purpose, high-level programming language, like Haskell, such that the "compiler" is actually a set of library functions, "compiling" is just interpreting our program, and the resulting value is machine code (or Javascript, or whatever our compilation target is).
Another way to think about this: in Lisp-like languages, we can create new language constructs using macros, which get "expanded" into other macro calls and eventually down to "special forms" implemented with an interpreter/compiler. From this perspective, my question would be: is it possible to make a Lisp where the only special forms are (representations of) machine code, and the whole "user-facing" language (function definition, calling, branching, etc.) is just a set of macros? In such a system, macro-expanding a program would result in a compiled binary. I'm willing to ignore the "dynamic"/reflection/etc. features found in "real" Lisps.
Another way to think about this: assembly languages get macro-expanded into machine code; are there any any assemblers which do/could provide "high-level" features, like higher-order functions, pattern-matching, garbage collection or linear types, etc.?
Is there some insurmountable difficulty with this (e.g. linking?)? Is it something that's been done? Things which seem related, but slightly different:
data Expr = Numeral Int
| Apply Expr Expr
| Lambda String Expr
| Var String
| ...We then provide a library of functions which allow us to write programs, like
plus 10 5, but when "run" they don't actually produce the 'answer' (15), they instead produce a representation of that program in the embedded language, like Apply (Apply (Var "plus") (Numeral 10)) (Numeral 5).My question is: can we implement a general-purpose, high-level programming language, like Haskell, such that the "compiler" is actually a set of library functions, "compiling" is just interpreting our program, and the resulting value is machine code (or Javascript, or whatever our compilation target is).
Another way to think about this: in Lisp-like languages, we can create new language constructs using macros, which get "expanded" into other macro calls and eventually down to "special forms" implemented with an interpreter/compiler. From this perspective, my question would be: is it possible to make a Lisp where the only special forms are (representations of) machine code, and the whole "user-facing" language (function definition, calling, branching, etc.) is just a set of macros? In such a system, macro-expanding a program would result in a compiled binary. I'm willing to ignore the "dynamic"/reflection/etc. features found in "real" Lisps.
Another way to think about this: assembly languages get macro-expanded into machine code; are there any any assemblers which do/could provide "high-level" features, like higher-order functions, pattern-matching, garbage collection or linear types, etc.?
Is there some insurmountable difficulty with this (e.g. linking?)? Is it something that's been done? Things which seem related, but slightly different:
- Forth
Solution
Yes and no.
Yes, you could structure a compiler this way, but most of the benefits you are hoping for would not materialize. There may be some benefits, such as a powerful compile-time meta-language. There are also costs like a complete loss of the ability to make meaningful guarantees about code that isn't completely self-contained (e.g. takes in objects or higher-order functions as parameters).
Felleisen defines a notion of "macro expressiveness" in his 1991 paper On the expressive power of programming languages. Basically, a programming language feature that is macro-expressible is one that can be expressed by a localized and delimited syntactic transformation. With this notion you can partially compare the "expressive power" of different language features: a language feature is at least as expressive as another feature if the second feature is macro-expressible in a language with the first feature (relative to a common base language).
If a feature is not macro expressible relative some language, that means to implement it requires non-local (often global) transformations. This may be done in a brute-force way as a transformation of the whole program, but it also could be done by having a bunch of smaller macros that coordinate to maintain invariants. Either way, the point is if I want to extend your language with a macro of my own creation that operates at the "base" level (machine code in your idea), then my macros need to maintain all the invariants your macros rely upon or else I'll break the "high-level language". It's possible that these invariants are too constraining for what I want to do. If I proceed anyway, rewriting parts of the language so its compatible with my new macros, the end result is that I haven't extended your language at all, I've made a new, incompatible language. It's quite likely that I will either not be able to use your language's existing standard library or that I will need to reimplement it so that it takes advantage of the new features. Even if I can make something that's compatible with the old language, if what I'm doing is not macro-expressible in terms of the "high-level language" (which presumably it isn't or else I wouldn't be accessing the base language) then my macros are going to have their own invariants that need to be maintained across usages and/or will be making assumptions about the low-level details. Now imagine I'm not making an application but a library. Now imagine you want to use my library and another library that has its own set of macros and invariants but don't know about each other.
I've discussed this quite extensively at LtU. See this comment thread and the comment threads it references. I even briefly touch on the thought experiment of assembly + macros in this comment (from 2006). This comment (a bit into one of the threads referenced by the first comment) may be a relatively coherent explanation with examples.
In a nutshell, separately adding two features which are not macro-expressible to a common language will typically produce two incompatible languages. The less expressive the common language is, the more often this will occur.
Yes, you could structure a compiler this way, but most of the benefits you are hoping for would not materialize. There may be some benefits, such as a powerful compile-time meta-language. There are also costs like a complete loss of the ability to make meaningful guarantees about code that isn't completely self-contained (e.g. takes in objects or higher-order functions as parameters).
Felleisen defines a notion of "macro expressiveness" in his 1991 paper On the expressive power of programming languages. Basically, a programming language feature that is macro-expressible is one that can be expressed by a localized and delimited syntactic transformation. With this notion you can partially compare the "expressive power" of different language features: a language feature is at least as expressive as another feature if the second feature is macro-expressible in a language with the first feature (relative to a common base language).
If a feature is not macro expressible relative some language, that means to implement it requires non-local (often global) transformations. This may be done in a brute-force way as a transformation of the whole program, but it also could be done by having a bunch of smaller macros that coordinate to maintain invariants. Either way, the point is if I want to extend your language with a macro of my own creation that operates at the "base" level (machine code in your idea), then my macros need to maintain all the invariants your macros rely upon or else I'll break the "high-level language". It's possible that these invariants are too constraining for what I want to do. If I proceed anyway, rewriting parts of the language so its compatible with my new macros, the end result is that I haven't extended your language at all, I've made a new, incompatible language. It's quite likely that I will either not be able to use your language's existing standard library or that I will need to reimplement it so that it takes advantage of the new features. Even if I can make something that's compatible with the old language, if what I'm doing is not macro-expressible in terms of the "high-level language" (which presumably it isn't or else I wouldn't be accessing the base language) then my macros are going to have their own invariants that need to be maintained across usages and/or will be making assumptions about the low-level details. Now imagine I'm not making an application but a library. Now imagine you want to use my library and another library that has its own set of macros and invariants but don't know about each other.
I've discussed this quite extensively at LtU. See this comment thread and the comment threads it references. I even briefly touch on the thought experiment of assembly + macros in this comment (from 2006). This comment (a bit into one of the threads referenced by the first comment) may be a relatively coherent explanation with examples.
In a nutshell, separately adding two features which are not macro-expressible to a common language will typically produce two incompatible languages. The less expressive the common language is, the more often this will occur.
Context
StackExchange Computer Science Q#82416, answer score: 4
Revisions (0)
No revisions yet.