Back

Paging

February 2021

The core idea of paging is to divide physical memory into equally-sized chunks called page frames. Similarly, the virtual address space of a process is divided into equally-sized virtual pages. Each virtual page maps to a physical page frame.

Page Tables

This mapping information is stored in a per-process data structure called a page table. The simplest page table is simply an array (also a linear page table). This array is indexed by the virtual page number of a process. The entries of this array are called the page table entries (PTEs).

Each PTE contains a number of useful information. Most importantly, it contains the physical frame number for this virtual page. Additionally, the PTE also contains a valid bit that indicates whether a translation is valid, protection bits for specifying read, write and execute permissions, a present bit for page swapping, a reference bit for caching, a dirty bit, and so on.

Linear paging on its own is very slow. For every memory reference, linear paging requires us to perform one extra memory reference into the linear page table. To speed things up, a hardware cache called a translation-lookaside buffer (TLB) is installed in the memory management unit of the CPU. This TLB operates like most typical caches, it exploits temporal locality and spatial locality by storing the most recently performed address translations.

Another issue with linear paging is that the page tables themselves can use up significant amounts of memory. A very straightforward solution is to use larger page sizes (less entries per table of the same size). This solution is less than ideal because large page sizes tend to be wasteful due to internal fragmentation.

An alternative solution is to use a hybrid of paging and segmentations. Instead of using one page table per process, we use multiple page tables, one per logical segment of the process. This solution is also not ideal because segmentation leads to external fragmentation, which makes it hard for the OS to allocate memory for additional segments.

Multi-Level Page Tables

A alternative to linear paging is to use multi-level page tables. In multi-level paging, each page table is itself divided into equally-sized chunks. Each page table chunk becomes an entry in a new data structure called a page directory. If all the PTEs in a page table chunk are invalid, then we can mark the corresponding page directory entry as invalid and not allocate any physical memory for that chunk of the page table.

The advantages of multi-level paging are straightforward. The size of each page table is now proportional to the amount of page table entries that are valid in that table. The resulting page tables are more efficient and less wasteful. Also, if the page table chunks themselves are of the same size as the physical page frames, then it becomes very easy for the OS to allocate memory for additional page table chunks.

However, one downside of multi-level paging is that TLB misses are now more expensive. Assuming a 2-level page table, each TLB misses now requires two memory references, one to index the page directory, another to index the page table chunk pointed to by the page directory.

Inverted Page Tables

Instead of maintaining multiple page tables, one for each running process in the system, the OS can use one single table that has an entry for each physical page frame of the system. This table is called an inverted page table.

Each entry in an inverted page table indicates the process that is using this physical frame, along with the virtual page number of that process that maps to this particular frame. In this approach, address translation is performed simply by searching through this table.

Page Swapping

It is difficult and inefficient to fit the address space of all running processes in physical memory at once. To support a large virtual address space, the OS will have to reserve some additional space lower down in the memory hierarchy. This space is called the swap space, because pages are swapped in to and out of this space. A present bit in the PTE of a page indicates whether the page is currently residing in physical memory or in the swap space.

A page fault occurs when a page not currently residing in physical memory is accessed. Page faults are addressed by the OS via page fault handlers --- the OS moves the requested page(s) into physical memory. The location of a page in the swap space is stored in the PTE of that page.

Whenever the amount of free space in physical memory is lower than some threshold (called the lower watermark), the OS runs a background thread (called the swap daemon or page daemon) that evicts pages continually until the amount of free space becomes at least some value (called the high watermark). Typically, pages are clustered and wrote out at one go to improve efficiency (eg. reduce seek and rotational overheads of disks).

Some common page replacement policies are FIFO, the random policy, and LRU. In the general case, LRU is the best approach (random policy is better than LRU in certain edge cases, eg. in a looping-sequential workload). LRU is inefficient because of high overheads.

Instead, policies that approximate LRU are used. One approach is to use the linear time clock algorithm --- arrange pages in a circular list, and, using a persistent pointer, search linearly down the list for a page with its reference bit set to 0 (a reference bit of 1 indicates a page that was recently used). This page is evicted out of physical memory. Clean pages are preferred over dirty pages because a clean page can be evicted freely, whereas a dirty page, when evicted, has to be rewritten back into disk.

It is still possible for the memory demand of the running processes to exceed those provided by physical memory. This results in a system that is constantly paging, a condition known as thrashing. The OS needs to be able to detect when thrashing occurs, and be able to cope with them. One method is called admission control, allowing only a subset of all the processes to run at a given time. Another method is to run an out-of-memory killer daemon that finds and terminates memory intensive processes.