Recent Entries 10
- pattern moderate 112d agoWhy is backpatching needed during intermediate code generation? For what purposes?I understand how backpatching works during intermediate code generation, but I do not understand why it is needed. A common argument for using backpatching during intermediate code generation is as follows: Suppose the intermediate code generated is: ``` goto L1 ... L1: ``` When `goto L1` is generated, the actual address of the target of `goto` is unknown. So we use a label `L1` as a placeholder. When the code `L1:` is generated, we can backpatch the address where `L1:` is to `goto L1`. However, why should we use the actual address for `L1` during intermediate code generation? For example, LLVM IR uses symbolic labels in `branch` instructions instead of their actual addresses. Even the RISC-V assembly language, which is often used as a targe language during targe code generation, uses symbolic labels in `jump` instructions. Isn't it the job of an assembler to replace labels with actual addresses? The related post does not address my concerns above.
- pattern moderate 112d agoHow does the linker know where to look for a function implementation in C++?I am currently learning about how the compilation and linking works in C++. I think I kinda get how the compiler works, and that for a file to fully compile you don't need to have function implementations, but only declarations. It is the linker's job to link the function declaration to its implementation. But now I have this weird question, for example if I have a `.cpp` file in which I am using 1000 different functions, and each of those functions has its own separate `.cpp` and `.h` file, how does the linker know which of the cpp files to scan to find that specific function? I mean does the linker know where the function is located or does the linker every time for every function scan the whole project to find that specific function?
- pattern minor 112d agoTail call optimization via translating to CPSI am struggling to wrap my head around this compiler technique, so let's say here's my `factorial` function `def factorial(value: int) -> int: if value == 0: return 1 else: return factorial(value-1) * value ` It is recursive, but not TCO friendly yet, so, as the theory goes, the first thing to try here is translate it to CPS: `def factorial_cont(value: int, cont: typing.Callable[[int], T]) -> T: if value == 0: return cont(1) else: return factorial_cont(value-1, lambda result: cont(value * result)) ` Now, as the function is tail call recursive, I can do the usual trick with the while loop: `def factorial_while(value: int, cont: typing.Callable[[int], T]) -> T: current_cont = cont current_value = value while True: if current_value == 0: return current_cont(1) else: current_cont = lambda result: current_cont(current_value * result) # note: in actual python that would look like # current_cont = lambda result, c=current_cont, v=current_value: c(v * result) current_value = current_value - 1 ` This `current_cont` thing effectively becomes a huge composition chain, in `haskell` terms for the `value == 3` that would be `let resulting_cont = ((initial_cont . (3*)) . (2*)) . (1*)`, where `initial_cont` is safe to default to `id`, and surely enough `resulting_cont value == value!`. But I also know the trick with "`accumulator`" value: `def factorial_acc(value: int, acc: int = 1) -> int: current_acc = acc current_value = value while True: if current_value == 1: return current_acc else: current_acc = current_acc * current_value current_value = current_value - 1 ` which looks pretty much identical to the CPS version after the introduction of while loop. The question is, how exactly do I massage the continuation `let resulting_cont = ((initial_cont . (3*)) . (2*)) . (1*)` into t
- pattern major 112d agoWould it be wrong to say that the processor (and hardware) is the implementation of an interpreter for machine language?The question is basically in the title. I know that a computers hardware is of course some physical object, where as the interpreter is some abstract thing that does something abstract with the code it is given. Yet, can we say that the way the processor is build is the implementation of an interpreter? Machine language is a sequence of physical states (0s and 1s ) on some physical memory, and if this physical states order follows certain rules (the syntax of the machine language) then the processor is build in a way that naturally leads to performing calculation steps (and changing some of the memory), e.g. "run" the programm. As this question's answer points out, compilers translate from one language to another, and interpreters for each language "run" the programm. It would be just consistent if this stretches down to machine language as well, and that's why I'm asking. If one can make the analogy, when would it break down? What features of language semantics (given by the interpreter), for example expressions and values, are there that we can't find at the physical level anymore? In what way does the processor behave differently from what is expected of an interpreter?
- snippet minor 112d agoHow compiler optimizations create irreducible control flow graph?I've been looking through research papers and the internet and found many claims that "compiler optimizations can cause irreducible control flow". However, I was not able to find a single example of how that can happen. In particular, in [1], there is written that tail recursion elimination in combination with inlining can yield an irreducible control flow graph. I can imagine some transformations that could create irreducible control flow, but I cannot come up with an example of how tail recursion elimination with inlining can do that? Does anybody have a pointer here? Thanks [1] J. Stainer, D. Watson. A study of irreducibility in C programs.
- pattern minor 112d agoHow is CLR(1) grammar more powerful than LALR(1) grammarI am unable to understand how Canonical LR(1) grammar is more powerful than LookAhead LR(1). Both have lookahead symbols in their items and works almost similarly, so how can CLR(1) derive a larger set of languages than LALR(1).
- pattern minor 112d agoRepresenting abstract syntax tree as a graphDoes it make sense to represent an AST as a graph? How can one achieve a mapping between ASTs and graphs that preserves both semantic and syntactic properties of source code? The goal and application of such a transformation would be to use graph neural networks and other deep "graph" learning techniques to extracts features, clustering source code, find code similarities and suggest code completion tasks. Any suggestion of current algorithms and research in this area?
- debug minor 112d agoHow does a compiler give an immediate compilation error?I am confused. I know that compilers translate your code one time as a whole into intermediary code, but how come when I type in something improper, the IDE shoots out a compilation error immediately? What am I not understanding? I thought the whole difference between an interpreter and a compiler is that the former translates code line by line and the latter translates your source program as a whole in one go.
- gotcha minor 112d agoWhy does the Java grammar have a StatementExpression that resolves to just Expression? Why have this and other redundant rules in the grammar?I'm looking at the following grammar rules for the Java language described on the Oracle docs: ``` Statement: ... if ParExpression Statement [else Statement] StatementExpression ; ... StatementExpression: Expression ParExpression: ( Expression ) ``` I don't understand why the StatementExpression rule is present at all. I also don't understand why ParExpression had to be written as a separate rule. Why couldn't the grammar just look like the following? ``` Statement: ... if ( Expression ) Statement [else Statement] Expression ; ... ```
- debug moderate 112d agoBreak keyword outside a loop is syntax error or semantic error?I am designing a simple compiler for my university project. In my programming language, the break keyword is allowed. I want to know whether break keyword occurs outside a loop should be a syntax error or semantic error. I want to know what is the best approach. If it is a syntax error I can do it in the grammar file. But if it is a semantic one then I can do it in a pass. But I don't know which one is the better approach.