Processor Architecture

February 2021

A processor does work simply in the form of accessing and modifying the program registers, the condition codes, the program counter, and memory. When we write a program, we are essentially writing a sequence of these activities that the processor has to follow.

Instruction Set Architecture

The set of instructions of a processor constitutes its instruction set architecture (ISA). In addition to specfiying what each instruction does, the ISA also specifies how the instructions are encoded in terms bits.

Each instruction has an opcode that indicates the instruction class. Some instructions have a function code that specifies a particular instruction type within a class. Most ISAs support the following classes of instructions: data moves, ALU operations (eg. add, multiply, bitwise AND etc), jumps, and stack operations (push and pop).

Some instructions have bits that specify which registers to use as the operands. Each register in the processor has a unique identifier. The CPU maintains a mapping of the registers to their identifiers in a register file.

Hardware

Logic gates are the basic buiding blocks of a computer system. They implement Boolean functions such as AND and OR. Through combinations of these logic gates, we can construct highly complex digital circuits that generate nontrivial behaviors, such as adding two numbers together.

Combinational circuits that are built solely out of logic gates cannot store information. To build a computer, we need stateful hardware capable of storing state information. Two components that fall under this category are clocked registers and memory devices.

A clocked register works as follows: the output of the register always remains held at the current register state, regardless of what the input is. The register state changes to the input only when the clock input rises, ie. goes from low to high. Memory devices operate in a similar fashion.

Instruction Processing

Before diving into pipelined processors, it helps to look at processors that are implemented sequentially. An instruction is processed by the processor in several distinct stages. In a sequential processors, all these different stages are performed within one clock cycle, resulting in a long cycle time and low clock rate.

The first stage is the fetch stage. In this stage the instruction specified by the program counter is fetched from memory. Additionally, the opcode of the instruction is extracted, and depending on what the opcode is, the processor may extract additional bytes such as the function code, the register specifier, or an immediate constant word. The processor also computes the address of the next instruction.

Next is the decode stage, where the processor reads up to two operands from the register file. These registers may be specified by the operands themselves, or, in some cases, they may extract the value referred to by the stack pointer. Then comes the execute stage. The ALU plays a key role in this stage. The branch that is executed in a conditional move or jump is also determined in this stage.

In the fourth stage --- the memory stage, the processor may read data from memory or write data into memory, depending on what the opcode is. Then comes the write-back stage --- the processor updates the contents of up to two registers in the register file. Finally, the PC update stage sets the program counter register to the address of the next instruction.

Pipelining

The biggest difference between a pipelined processor and a non-pipelined processor is that, in a pipelined processor, multiple instructions are processed at the same time within the same clock cycle.

Pipelining increases the throughput of the processor by decreasing the clock rate. This comes at the expense of more hardware (need to add pipeline registers) and a slight increase in latency (total time required to perform one instruction).

Pipelining also introduces another critical issue --- pipeline hazards. A pipeline hazard occurs when there is a data dependency or control dependency between successive instructions. Pipeline hazards should be avoided because they change the behavior of a program.

One technique for addressing data hazards is to forward computation results from a stage further down the pipeline directly back to an earlier stage of the pipeline. Load/use hazards is a type of data hazards that cannot be handled purely by forwarding alone. This type of data hazards is avoided using a combination of forwarding and stalling, which decreases the throughput slightly.

In addition to data hazards, another class of pipeline hazards, known as control hazards, occur when there is a control dependency between successive instructions. Control hazards can occur when there is a return instruction or a condition instruction in the pipeline.

In a return instruction, the return address is not known until the memory stage, which occurs late in the pipeline. Whenever a return instruction enters the pipeline, the immediately following instructions are stalled until the return instruction reaches the write-back stage (after the memory stage). This results in a slighlty decreased throughput.

The instructions immediately following a conditional jump instruction are predicted when the jump instruction enters the fetch stage. The result of the conditional is not known until the execute stage. When this occurs, either the branch was predicted correctly or it was predicted incorrectly. In the case of a misprediction, the instructions immediately preceding the jump instruction that are now in the pipeline are flushed, and the correct instructions are fetched from memory.