xv6-riscv-ch2
- This chapter2 explains how the OS is structured internally to manage hardware resources, run processes, and enforce protection.
ch2: Operating system organization
- A key requirement for an operating system is to support several activities at once.
- an operating system must fulfill three requirements: multiplexing, isolation, and
interaction. - Xv6 runs on a multi-core1 RISC-V microprocessor, and much of its low-level functionality
(for example, its process implementation) is specific to RISC-V. RISC-V is a 64-bit CPU, and xv6
is written in “LP64” C, which means long (L) and pointers (P) in the C programming language
are 64 bits, but an int is 32 bits. - RISCV Technical Specifications
2.1 Abstracting physical resources
- The Unix interface is not the only way to abstract
resources, but it has proved to be a good one.
2.2 User mode, supervisor mode, and system calls
- CPUs provide hardware support for strong isolation. For example, RISC-V has three modes in
which the CPU can execute instructions: machine mode, supervisor mode, and user mode. Instruc-
tions executing in machine mode have full privilege; a CPU starts in machine mode. Machine mode
is mostly intended for setting up the computer during boot. Xv6 executes a few lines in machine
mode and then changes to supervisor mode. - In supervisor mode the CPU is allowed to execute privileged instructions: for example, en-
abling and disabling interrupts, reading and writing the register that holds the address of a page
table, etc. - An application can execute only user-mode instructions (e.g., adding
numbers, etc.) and is said to be running in user space, while the software in supervisor mode can
also execute privileged instructions and is said to be running in kernel space. The software running
in kernel space (or in supervisor mode) is called the kernel. - CPUs provide a
special instruction that switches the CPU from user mode to supervisor mode and enters the kernel
at an entry point specified by the kernel. (RISC-V provides the ecall instruction for this purpose.)
Once the CPU has switched to supervisor mode, the kernel can then validate the arguments of the
system call (e.g., check if the address passed to the system call is part of the application’s memory),
decide whether the application is allowed to perform the requested operation (e.g., check if the
application is allowed to write the specified file), and then deny it or execute it. It is important that
the kernel control the entry point for transitions to supervisor mode; if the application could decide
the kernel entry point, a malicious application could, for example, enter the kernel at a point where
the validation of arguments is skipped.
2.3 Kernel organization
- A key design question is what part of the operating system should run in supervisor mode. One
possibility is that the entire operating system resides in the kernel, so that the implementations of
all system calls run in supervisor mode. This organization is called a monolithic kernel. - A downside of the monolithic organization is that the interactions among different parts of
the operating system are often complex (as we will see in the rest of this text), and therefore it is easy for an operating system developer to make a mistake. In a monolithic kernel, a mistake is
fatal, because an error in supervisor mode will often cause the kernel to fail. If the kernel fails,
the computer stops working, and thus all applications fail too. The computer must reboot to start
again. - To reduce the risk of mistakes in the kernel, OS designers can minimize the amount of operating
system code that runs in supervisor mode, and execute the bulk of the operating system in user
mode. This kernel organization is called a microkernel. - Figure 2.1 illustrates this microkernel design. In the figure, the file system runs as a user-level
process. OS services running as processes are called servers. To allow applications to interact with
the file server, the kernel provides an inter-process communication mechanism to send messages
from one user-mode process to another. - Xv6 is implemented as a monolithic kernel, like most Unix operating systems. Thus, the xv6
kernel interface corresponds to the operating system interface, and the kernel implements the com-
plete operating system. Since xv6 doesn’t provide many services, its kernel is smaller than some
microkernels, but conceptually xv6 is monolithic.
2.4 Code: xv6 organization
- The xv6 kernel source is in the kernel/ sub-directory. The source is divided into files, following
a rough notion of modularity; Figure 2.2 lists the files.The inter-module interfaces are defined in defs.h (kernel/defs.h).
2.5 Process overview
- The unit of isolation in xv6 (as in other Unix operating systems) is a process. The process ab-
straction prevents one process from wrecking or spying on another process’s memory, CPU, file
descriptors, etc. It also prevents a process from wrecking the kernel itself, so that a process can’t
subvert the kernel’s isolation mechanisms. - To help enforce isolation, the process abstraction provides the illusion to a program that it has
its own private machine. A process provides a program with what appears to be a private memory
system, or address space, which other processes cannot read or write. A process also provides the
program with what appears to be its own CPU to execute the program’s instructions. - Xv6 uses page tables (which are implemented by hardware) to give each process its own ad-
dress space. The RISC-V page table translates (or “maps”) a virtual address (the address that an
RISC-V instruction manipulates) to a physical address (an address that the CPU sends to main
memory).
1 |
|
- Xv6 maintains a separate page table for each process that defines that process’s address space.
As illustrated in Figure 2.3, an address space includes the process’s user memory starting at virtual
address zero.- Instructions come first, followed by global variables, then the stack, and finally a “heap” area (for malloc) that the process can expand as needed.
- There are a number of factors that limit the maximum size of a process’s address space: pointers on the RISC-V are 64 bits wide; the hardware uses only the low 39 bits when looking up virtual addresses in page tables; and xv6 uses only 38 of those 39 bits. Thus, the maximum address is 238 − 1 = 0x3fffffffff, which is MAXVA (kernel/riscv.h:378).
- At the top of the address space xv6 places a trampoline page (4096 bytes) and a trapframe page. Xv6 uses these two pages to transition into the kernel and back; the trampoline page contains the code to transition in and out of the kernel, and the trapframe is where the kernel saves the process’s user registers, as Chapter 4 explains.
- The xv6 kernel maintains many pieces of state for each process, which it gathers into a struct proc (kernel/proc.h:85). A process’s most important pieces of kernel state are its page table, its kernel
stack, and its run state. We’ll use the notation p->xxx to refer to elements of the proc structure; for example, p->pagetable is a pointer to the process’s page table. - Each process has a thread of control (or thread for short) that holds the state needed to ex-
ecute the process.might be executing on a CPU, or suspended (not
executing, but capable of resuming executing in the future). - Each process has two stacks:
- user stack: When the process is executing user instructions,only its user stack is in use, and its kernel stack is empty.
- kernel stack: When the process enters the kernel (for a system call or interrupt), the kernel code executes on the process’s kernel stack; while a process is in the kernel, its user stack still contains saved data, but isn’t actively used.
- A process’s thread alternates between actively using its user stack and its kernel stack. The kernel stack is separate (and protected from user code) so that the kernel can execute even if a process has wrecked its user stack.
- A process can make a system call by executing the RISC-V
ecall
instruction. This instruction raises the hardware privilege level and changes the program counter to a kernel-defined entry point. The code at the entry point switches to the process’s kernel stack and executes the kernel instructions that implement the system call. When the system call completes, the kernel switches back to the user stack and returns to user space by calling thesret
instruction, which lowers the hardware privilege level and resumes executing user instructions just after the system call instruction. A process’s thread can “block” in the kernel to wait for I/O, and resume where it left off when the I/O has finished. p->state
indicates whether the process is allocated, ready to run, currently running on a CPU, waiting for I/O, or exiting.p->pagetable
holds the process’s page table, in the format that the RISC-V hardware ex- pects. Xv6 causes the paging hardware to use a process’s p->pagetable when executing that process in user space. A process’s page table also serve- In summary, a process bundles two design ideas: an address space to give a process the illusion of its own memory, and a thread to give the process the illusion of its own CPU. In xv6, a process consists of one address space and one thread. In real operating systems a process may have more than one thread to take advantage of multiple CPUs.
2.6 Code: starting xv6, the first process and system call
To make xv6 more concrete, we’ll outline how the kernel starts and runs the first process.The subsequent chapters will describe the mechanisms that show up in this overview in more detail.
When the RISC-V computer powers on, it initializes itself and runs a boot loader which is stored in read-only memory.
The boot loader loads the xv6 kernel into memory.Then, in machine mode, the CPU executes xv6 starting at _entry
(kernel/entry.S:7
).The RISC-V starts with paging hardware disabled: virtual addresses map directly to physical addresses.
The loader loads the xv6 kernel into memory at physical address 0x80000000
.
The reason it places the kernel at 0x80000000
rather than 0x0
is because the address range 0x0:0x80000000
contains I/O devices.
The instructions at _entry
set up a stack so that xv6 can run C code.
Xv6 declares space for an initial stack, stack0
, in the file start.c
(kernel/start.c:11
).The code at _entry
loads the stack pointer register sp
with the address stack0 + 4096
, the top of the stack, because the stack on RISC-V grows down.Now that the kernel has a stack, _entry
calls into C code at start
(kernel/start.c:15
).
The function start
performs some configuration that is only allowed in machine mode, and then switches to supervisor mode.
To enter supervisor mode, RISC-V provides the instruction mret
.This instruction is most often used to return from a previous call from supervisor mode to machine mode.start
isn’t returning from such a call, but sets things up as if it were:
- it sets the previous privilege mode to supervisor in the register
mstatus
, - it sets the return address to
main
by writingmain
’s address into the registermepc
, - disables virtual address translation in supervisor mode by writing 0 into the page-table register
satp
, - and delegates all interrupts and exceptions to supervisor mode.
Before jumping into supervisor mode, start
performs one more task:
it programs the clock chip to generate timer interrupts.With this housekeeping out of the way, start
“returns” to supervisor mode by calling mret
.
This causes the program counter to change to main
(kernel/main.c:11
), the address previously stored in mepc
.
After main
(kernel/main.c:11
) initializes several devices and subsystems, it creates the first process by calling userinit
(kernel/proc.c:233
).The first process executes a small program written in RISC-V assembly, which makes the first system call in xv6.initcode.S
(user/initcode.S:3
) loads the number for the exec
system call, SYS_EXEC
(kernel/syscall.h:8
), into register a7
,and then calls ecall
to re-enter the kernel.
The kernel uses the number in register a7
in syscall
(kernel/syscall.c:132
) to call the desired system call.
The system call table (kernel/syscall.c:107
) maps SYS_EXEC
to the function sys_exec
, which the kernel invokes.As we saw in Chapter 1, exec
replaces the memory and registers of the current process with a new program (in this case, /init
).
Once the kernel has completed exec
, it returns to user space in the /init
process.init
(user/init.c:15
) creates a new console device file if needed and then opens it as file descriptors 0, 1, and 2. Then it starts a shell on the console. The system is up.
2.7 Security Model
The operating system must assume that a process’s user-level code will do its best to wreck the kernel or other processes.User code may try to dereference pointers outside its allowed address space; it may attempt to execute any RISC-V instructions, even those not intended for user code; it may try to read and write any RISC-V control register; it may try to directly access device hardware; and it may pass clever values to system calls in an attempt to trick the kernel into crashing or doing something stupid. The kernel’s goal is to restrict each user process so that all it can do is:
- read/write/execute its own user memory,
- use the 32 general-purpose RISC-V registers,
- and affect the kernel and other processes only in the ways that system calls are intended to allow.
The expectations for the kernel’s own code are quite different.Kernel code is assumed to be written by well-meaning and careful programmers.Kernel code is expected to be bug-free, and certainly to contain nothing malicious.This assumption affects how we analyze kernel code.For example, there are many internal kernel functions (e.g., the spin locks) that would cause serious problems if kernel code used them incorrectly.When examining any specific piece of kernel code, we’ll want to convince ourselves that it behaves correctly.We assume, however, that kernel code in general is correctly written, and follows all the rules about use of the kernel’s own functions and data structures. At the hardware level, the RISC-V CPU, RAM, disk, etc. are assumed to operate as advertised in the documentation, with no hardware bugs.
2.8 Real world
Most operating systems have adopted the process concept, and most processes look similar to xv6’s. Modern operating systems, however, support several threads within a process, to allow a single process to exploit multiple CPUs. Supporting multiple threads in a process involves quite a bit of machinery that xv6 doesn’t have, often including interface changes (e.g., Linux’s clone, a variant of fork), to control which aspects of a process threads share.