April 2021
The execution time of a sequence of operations depends heavily on the order of the set of operations. Instruction scheduling is the process of reodering the operations to improve running time. Instruction scheduling is hard because:
The instruction scheduling problem is defined over the dependence graph D of a basic block. The edges in D represent flow of values. The nodes in D, which correspond to operations, have two attributes: optype, which specifies the functional unit, and delay, which specifies the number of cycles required for completion.
Given a dependence graph, a schedule S maps each node in D to a nonnegative integer that denotes the cycle in which it should be issued, assuming that the first operation issues in cycle 1. The i-th instruction is just the set of operations that issue in cycle i.
A schedule must meet three constraints:
An operation x is antidependent on y if x precedes y and y defines a value used in x. Antidependences are resolved either by respecting them in the final schedule (through further reordering) or renaming the values (which may increase register demands and cause spillage).
Instruction scheduling is NP-complete. In practice, approximate solutions to scheduling problems are generated using greedy heuristics. Most scheduling algorithms are based on one family of heuristic techniques called list scheduling.
List scheduling is a greedy, heuristic approach to scheduling the operations in a basic block (can be extended into multi-block sequences). The basic outline of the algorithm is:
The algorithm uses a cycle counter and two lists to track operations.
The Ready
list holds all the operations that can execute in the current cycle. The
Active list contains operations that were issued earlier but
have not yet finished.
At each step, the algorithm accounts for any
operations completed in the previous cycle by examining Active
and adds some of their parents into Ready,
then it schedules one operation in Ready for the current cycle
and increments the cycle counter (assuming a processor with a single functional unit).
The quality of the schedule produced depends primarily on the mechanism
used to pick an operation from Ready. Various priority ranks may
be used. None dominates the others in terms of the overall schedule quality.