HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Quality LISP/Scheme compilers to compete with C/C++

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
qualityschemecompetecompilerslispwith

Problem

Theoretically speaking, is it possible to have a Lisp/Scheme compiler that can produce code that can compete with compiled C, let's say within 15-25% margin?

In my testing, I've found that the current crop of compilers (Bigloo, SBCL, Gambit, Chicken, etc) are 20-50 times slower than equivalent C code.

The only outlier is the Stalin compiler. For simple programs, it produces binaries that are equivalent to C. However, what I find suspicious is that none of the other projects (Bigloo, Chicken, Clozure, etc) have attempted to implement whatever tricks Stalin uses ("whole program optimization", etc).

I'm a huge fan of LISP since the mid 90s and would love to bring it on board so my team can crank out projects in half the time in normally takes using C/C++/.NET/etc, but...the performance issues are a huge roadblock.

I wonder if the lack of quality LISP compilers are due to the fact that no serious time and money has been invested into the subject OR if this simply isn't a feasible task given the current state of compiler technology??

Solution

As noted in the comments, it isn't very exact to state "the current crop of compilers (Bigloo, SBCL, Gambit, Chicken, etc) are 20-50 times slower than equivalent C code", without qualifying how you tested and what you tested.

For my use, I find that for many things Gambit and Chicken Scheme are quite close to the equivalent 'C' code in speed, with the current version of Gambit (4.7.3) generally faster than Chicken (4.9.0.1) but over pre-optimizing the output 'C' code (making assumptions about the number of available registers - assumes x686 - and forcing the use of stack memory for any extra memory requirements, which decisions should be left to the 'C' compiler as Chicken does, which often eliminates the requirement for extra registers and combines processing steps) so as to prevent the 'C' compiler from doing its own optimizations resulting in very tight small loops being up to about twice as slow as those same tight small loops in 'C' (or Chicken); Chicken just defines as many variables local to a given function as it sees fit (mostly used immutably) and then depends on the compiler to optimize most of those away. Chicken doesn't seem to optimize Continuation Passing Style (CPS) algorithms with multiple forked levels of continuations as well as Gambit and can be up to about eight times slower than Gambit for these, but then one thinks that CPS isn't really formalized for 'C' code at all so...

EDIT_ADD: I have done some more research into the Chicken and Gambit-C Scheme versions and have found the following:

-
Chicken is faster than Gambit for small tight loops for the above reason (depends more on the 'C' compiler for optimizations without doing as much itself and thus takes better advantage of x86-64 extra registers), but also because it doesn't include a "POLL" stack maintenance check inside the loops, whereas Gambit includes the "POLL" check inside the loop. Even when this isn't triggered (the usual case), it will take several CPU clock cycles to determine that nothing is required (about 6 cycles). A future smarter compiler might see that there is no need to do a stack check when inside a tight loop and not doing stack-building operations, doing it just before or after the loop and save this time.

-
Gambit's 'C' macro's do too much work, as said, in precisely defining how the operations should be carried out, especially including fixed stack size operations, and these are likely harder for the 'C' compiler to optimize; using registers more effectively could reduce tight loop time by perhaps 4 cycles, which in combination with the above would put it into the Chicken ballpark.

-
Neither output "read/modify/write" optimizations for say vector operations that are modify in place and don't output code so that the compiler does it. A smarter backend such as LLVM when used with Haskell does this kind of thing. This would reduce register use by one and execution time in using only a single instruction rather than a separate read, computation of the modification, and a write to the same location; this would become just a calculation of the modification (say a bit or), and then a read modify (|=) write single instruction. This might make the speed faster by a cycle or so

-
Both are dynamically typed and process data "tag" bits as part of their data. Neither are smart enough for tight loops to reduce the tags, perform the loop "tag-less", then add the tags back for any results from the loop, nor do they produce code where the 'C' compiler can see to do this although it does combine these operations for some cases. Optimization here could reduce loop times by a couple of CPU cycles or so, depending on the loop.

-
Very tight 'C' loops can take about 3.5 CPU clock cycles per loop on a fast CPU that isn't memory cache access speed throttled (such as AMD Bulldozer, which is about twice as slow); the same loop in Chicken currently takes about 6 cycles and Gambit takes about 16.9 cycles. With all of the optimizations as per the above, there is no reason that these implementations of Scheme couldn't do that, however some work is required:

-
In the case of Gambit, the harder work might be improving the flow analysis to recognize when no "POLL" tests need to be inserted (ie. could this be interrupt driven, although the compiler does allow interrupts to be turned off for some uses?); the easier improvement would be to just implement better register usage (ie. recognize x86-64 registers better rather than the default x686 architecture). For both, better flow analysis to recognize that they can "untag" the data, especially "fixnum", "flonum", and vector, data so these operations aren't necessary inside tight loops and combining read/modify/write instructions. Both of these ends could be accomplished by using a better backend such as LLVM (not a trivial amount of work, but both are already partly there).

Conclusion: Chicken is currently about 50% slower than 'C' on the fastest CPU's (not my Bulldozer

Context

StackExchange Computer Science Q#9483, answer score: 7

Revisions (0)

No revisions yet.