Virtual Address Space

February 2021

Virtualizing memory is one of the key jobs of an operating system. Memory virtualization is motivated by the increase in popularity of time sharing in early computer systems.

One important concept in memory virtualization is the address space. Each running program has its own private space in memory that contains information relevant to that program (code, stack, heap). The OS and hardware enforces protection and isolation between the processes by making sure that each process has access only to its own address space.

Another consequence of the address space abstraction is that there are now two kinds of memory addresses: virtual addresses and physical addresses. Virtual addresses exist only within the context of a program's address space. When a program makes a reference to memory, it does so via a virtual address. It is up to the OS and the hardware to translate this into an actual physical address in memory.

Base and Bounds

The most basic approach to implementing address translation is to use a pair of registers called the base register and the bounds register --- both reside in the memory management unit of the processor.

The base register is used to compute the physical address of a given virtual address --- by adding the base to the virtual address.

When a process is created, the OS decides where in memory it should store the address space of this process. The OS then proceeds to store this information in the base register, which becomes a part of the register context of the program.

Note that this scheme allows the OS to change the location of the address space of a running program (which has to be descheduled first) by changing the value stored in the base register. Writing to the base (and the bounds) registers are privileged operations --- they can only be performed by the OS.

The bounds register is there to prevent out-of-bounds references. An out-of-bounds reference is when a program tries to access memory outside its address space. When this happens, an exception is raised, which passes control back to the OS.

Base and bounds is simple, but it is non-optimal. In most cases, the stack and heap of a program do not grow too large. This results in a lot of wasted space between the two. We call this type of waste an internal fragmentation.

Segmentation

The solution to internal fragmentation is segmentation. Segmentation is a generalization of base and bounds. Multiple pairs of base and bounds registers are used, one for each logical segment (code, stack and heap). Each segment can now be put independently in memory.

Now that there are multiple base and bounds register pairs, the hardware needs some scheme of determining which type of segment an address is referring to. The conventional approach is partition the bits of the address into two groups, one for specifying the segment, the other for computing the physical address space. Additionally, one bit is reserved for specifying which way the segment grows (stack and heap grows in opposite direction). These conventions ensure that the hardware is capable of computing the physical addresses correctly.

We can increase efficiency by allowing memory segments to be shared between different address spaces (code sharing is common today). This is implemented by incorporating a set of protection bits (read, write, execute) into the hardware system.

Segmentation solves the problem of internal fragmentation, but it introduces another problem --- external fragmentation. An external fragmentation occurs when you have a large number of small pockets of free space between segments, making it difficult to allocate memory for new segments.

One solution is to compact memory periodically by rearranging the existing segments into one contiguous chunk. This method is less than ideal because copying segments is a very expensive operation. An alternative solution is to use space management algorithms to allocate and free memory in a manner that minimizes external fragmentation. See Paging.