April 2021
The goal of the optimizer is to improve the quality of the code generated by the backend. In optimizing a program, the compiler must ensure that the meaning of the program is preserved and also that the transformed program is an improvement over the original program.
The compiler optimizes a program mainly by reducing the overhead of various source-level abstractions and by taking advantage of special cases. The optimizer also uses information about the target machine, such as its ISA and number of functional units, to improve the runtime performance of the program on that machine.
Optimizations operate at different granularities, or scopes:
Local value numbering finds redundant expressions in a basic block and replaces the redundant evaluations with reuse of a previously computed value. In general, local value numbering is profitable. However, replacing code can extend or shorten the lifetime of variables. This changes register demands, which may or may not cause register spillage.
Tree-height balancing reorganizes heavily left- or right-leaned expression trees into one that is more balanced. This exposes additional parallelism in expression evaluation --- the nodes of the resulting tree has less dependencies between them.
Superlocal value numbering is an extension of local value numbering to larger regions called extended basic blocks.
Loop unrolling reduces the amount of loop overheads by increasing the size of the loop body to reduce the total number of loop iterations required to compute the same result. For instance, a loop computation that does 1 thing n times can be unrolled into one that does 2 things n/2 times.
Global code placement exploits asymmetric branch costs to reduce branching overheads. Consider a piece of code with a branching point that splits into two branches. The optimizer can perform a profile run to determine which one of the two branches executes more frequently. Then, it can reorganize the linear sequence of code such that the branch that is executed more frequently becomes the fall-through branch at the branch point.
Inline substitution eliminates the overheads of invoking a procedure by replacing its call site with a copy of the procedure's body. Inlining can improve runtime performance by reducing procedure call overheads, but it can also degrade performance. Replacing a call site with the body of the called procedure can increase code size and the name space size, which may increase register demands.
Procedures within a single executable image can also be rearranged to increase the performance of the program. Using a call graph, the optimizer can rearrange the procedures to reduce virtual-memory working-set sizes and to limit the potential for call-induced conflicts in the instruction cache. If a procedure calls another procedure, the ideal case would be to have both procedures occupy adjacent locations in memory.