patternMinor

What formal representation is commonly used to describe compiler optimizations?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

formal-languages semantics compilers cs stackoverflow operational-semantics program-optimization

whatformaluseddescribeoptimizationscompilerrepresentationcommonly

Problem

I have devised a compiler optimization that works on any structured language that has arrays assignments array[index] = value and counted loops (for i = n; i < N; i++) {doTHIS} (*).

Now I want to represent this optimization using some formal semantics. Since I want to publish my results in a Programming Languages venue, I would like to know:

What formal(s) representation(s) is (are) most commonly used to describe compiler optimizations?

I have already described my optimization using operational semantics, but I'm currently reviewing this choice. Hence, I came to the community for advice.

(*) Please forgive this C-like representation in a question asking for formal semantics.

Solution

There are many compilers, which compile widely different kinds of languages which serve widely different purposes. For example, a database language will have very different optimizations than an array-based language like APL.

Compilers themselves use several intermediate languages, from the input language, to a de-sugared version of the input language, all the way down to "fancy" assembly-like versions. You might check here to get an idea of the intermediate languages involved.

Optimizations happen at each of these stages. In general it only makes sense to state optimizations in terms of a formal execution semantics of the language in question: an optimization must preserve the observational semantics, aka the result of the computation. Ideally you would use the semantics to say something about the execution time as well: typically the number of abstract reduction steps would get lower (in certain cases) after application of the optimization.

Pragmatically, many optimizations of imperative-style languages happen on an intermediate language that looks a lot like LLVM IR. Characteristics include SSA form (no variable repeated in the left-hand-side of an assignment) and three address code (at most one assignment, and one operation per statement).

The function calls, thunks, memory layout and semantics of things like pointers are going to tend to be specific to your language.

Context

StackExchange Computer Science Q#76834, answer score: 9

Revisions (0)

No revisions yet.