xv6-riscv-ch1
- This chapter-1 introduces the basic Unix process, file, and I/O abstractions that applications use to interact with the OS.
ch1: Operating system interfaces
As Figure 1.1 shows, xv6 takes the traditional form of a kernel, a special program that provides
services to running programs. Each running program, called a process, has memory containing
instructions, data, and a stack. The instructions implement the program’s computation. The data
are the variables on which the computation acts. The stack organizes the program’s procedure calls.
A given computer typically has many processes but only a single kernel.When a user program invokes a sys-
tem call, the hardware raises the privilege level and starts executing a pre-arranged function in the
kernel.
The collection of system calls that a kernel provides is the interface that user programs see. The
xv6 kernel provides a subset of the services and system calls that Unix kernels traditionally offer.
Figure 1.2 lists all of xv6’s system calls.The shell is an ordinary program that reads commands from the user and executes them. The
fact that the shell is a user program, and not part of the kernel, illustrates the power of the system
call interface: there is nothing special about the shell. It also means that the shell is easy to replace;
as a result, modern Unix systems have a variety of shells to choose from, each with its own user
interface and scripting features. The xv6 shell is a simple implementation of the essence of the
Unix Bourne shell. Its implementation can be found at (user/sh.c:1).
The xv6 shell uses the exec calls of blew to run programs on behalf of users. The main structure of
the shell is simple; see main (user/sh.c:146). The main loop reads a line of input from the user with
getcmd. Then it calls fork, which creates a copy of the shell process. The parent calls wait,
while the child runs the command. For example, if the user had typed “echo hello” to the shell,
runcmd would have been called with “echo hello” as the argument. runcmd (user/sh.c:55) runs
the actual command. For “echo hello”, it would call exec (user/sh.c:79). If exec succeeds then
the child will execute instructions from echo instead of runcmd. At some point echo will call
exit, which will cause the parent to return from wait in main (user/sh.c:146).
1.1 Processes and memory
- An xv6 process consists of user-space memory (instructions, data, and stack) and per-process state
private to the kernel. Xv6 time-shares processes: it transparently switches the available CPUs
among the set of processes waiting to execute. When a process is not executing, xv6 saves the
process’s CPU registers, restoring them when it next runs the process. The kernel associates a
process identifier, or PID, with each process. - the following program fragment written in the C programming lan-
guage
1 |
|
In the example, the output lines
parent: child=1234
child: exiting
might come out in either order (or even intermixed), depending on whether the parent or child gets
to its printf call first. After the child exits, the parent’s wait returns, causing the parent to print
parent: child 1234 is done
Although the child has the same memory contents as the parent initially, the parent and child are
executing with separate memory and separate registers: changing a variable in one does not affect
the other. For example, when the return value of wait is stored into pid in the parent process, it
doesn’t change the variable pid in the child. The value of pid in the child will still be zero.
- The exec system call replaces the calling process’s memory with a new memory image loaded
from a file stored in the file system. The file must have a particular format, which specifies which
part of the file holds instructions, which part is data, at which instruction to start, etc. Xv6 uses the
ELF format, which Chapter 3 discusses in more detail. Usually the file is the result of compiling
a program’s source code. When exec succeeds, it does not return to the calling program; instead,
the instructions loaded from the file start executing at the entry point declared in the ELF header.
exec takes two arguments: the name of the file containing the executable and an array of string
arguments. For example
1 |
|
This fragment replaces the calling program with an instance of the program /bin/echo running
with the argument list echo hello. Most programs ignore the first element of the argument array,
which is conventionally the name of the program.
- why fork and exec are not combined in a single call
we will see later that
the shell exploits the separation in its implementation of I/O redirection.
Xv6 allocates most user-space memory implicitly: fork allocates the memory required for the
child’s copy of the parent’s memory, and exec allocates enough memory to hold the executable
file. A process that needs more memory at run-time (perhaps for malloc) can call sbrk(n) to
grow its data memory by n zero bytes; sbrk returns the location of the new memory.
1.2 I/O and File descriptors
- A file descriptor is a small integer representing a kernel-managed object that a process may read
from or write to. A process may obtain a file descriptor by opening a file, directory, or device,
or by creating a pipe, or by duplicating an existing descriptor. For simplicity we’ll often refer
to the object a file descriptor refers to as a “file”; the file descriptor interface abstracts away the
differences between files, pipes, and devices, making them all look like streams of bytes. We’ll
refer to input and output as I/O. - Internally, the xv6 kernel uses the file descriptor as an index into a per-process table, so that
every process has a private space of file descriptors starting at zero. By convention, a process reads
from file descriptor 0 (standard input), writes output to file descriptor 1 (standard output), and
writes error messages to file descriptor 2 (standard error). As we will see, the shell exploits the
convention to implement I/O redirection and pipelines. The shell ensures that it always has three
file descriptors open (user/sh.c:152), which are by default file descriptors for the console. - The following program fragment (which forms the essence of the program cat) copies data
from its standard input to its standard output. If an error occurs, it writes a message to the standard
error.
1 |
|
The important thing to note in the code fragment is that cat doesn’t know whether it is reading
from a file, console, or a pipe. Similarly cat doesn’t know whether it is printing to a console, a
file, or whatever. The use of file descriptors and the convention that file descriptor 0 is input and
file descriptor 1 is output allows a simple implementation of cat.
- The close system call releases a file descriptor, making it free for reuse by a future open,
pipe, or dup system call (see below). A newly allocated file descriptor is always the lowest-
numbered unused descriptor of the current process. - File descriptors and fork interact to make I/O redirection easy to implement.
1 |
|
After the child closes file descriptor 0, open is guaranteed to use that file descriptor for the newly
opened input.txt: 0 will be the smallest available file descriptor. cat then executes with file
descriptor 0 (standard input) referring to input.txt. The parent process’s file descriptors are not
changed by this sequence
- Two file descriptors share an offset if they were derived from the same original file descriptor
by a sequence of fork and dup calls. Otherwise file descriptors do not share offsets, even if they
resulted from open calls for the same file. - dup allows shells to implement commands like this:
ls existing-file non-existing-file > tmp1 2>&1
. The 2>&1 tells the shell to give the command a file descriptor 2 that is a duplicate of descriptor 1. Both the name of the existing file and the error message for the non-existing file will show up in the file tmp1. The xv6 shell doesn’t support I/O redirection for the error file descriptor, but now you know how to implement it.
1.3 Pipes
- A pipe is a small kernel buffer exposed to processes as a pair of file descriptors, one for reading
and one for writing. Writing data to one end of the pipe makes that data available for reading from
the other end of the pipe. Pipes provide a way for processes to communicate. - The following example code runs the program wc with standard input connected to the read
end of a pipe.
1 |
|
1 |
|
The fact that read blocks until it is impossible for new data to arrive
is one reason that it’s important for the child to close the write end of the pipe before executing
wc above: if one of wc ’s file descriptors referred to the write end of the pipe and not close, wc would never see
end-of-file.
- The xv6 shell implements pipelines such as grep fork sh.c | wc -l in a manner similar
to the above code (user/sh.c:101). The child process creates a pipe to connect the left end of the
pipeline with the right end. Then it calls fork and runcmd for the left end of the pipeline and
fork and runcmd for the right end, and waits for both to finish. The right end of the pipeline
may be a command that itself includes a pipe (e.g., a | b | c), which itself forks two new child
processes (one for b and one for c). Thus, the shell may create a tree of processes. The leaves
16of this tree are commands and the interior nodes are processes that wait until the left and right
children complete.
1 |
|
echo hello world >/tmp/xyz; wc </tmp/xyz
Pipes have at least three advantages over temporary files in this situation.- First, pipes automatically clean themselves up; with the file redirection, a shell would have to be careful to remove /tmp/xyz when done.
- Second, pipes can pass arbitrarily long streams of data, while file redirection requires enough free space on disk to store all the data.
- Third, pipes allow for parallel execution of pipeline stages, while the file approach requires the first program to finish before the second starts.
1.4 File system
- The xv6 file system provides data files, which contain uninterpreted byte arrays, and directories,
which contain named references to data files and other directories. - There are system calls to create new files and directories: mkdir creates a new directory, open
with the O_CREATE flag creates a new data file, and mknod creates a new device file. This example
illustrates all three:
1 |
|
mknod creates a special file that refers to a device. Associated with a device file are the major and
minor device numbers (the two arguments to mknod), which uniquely identify a kernel device.
When a process later opens a device file, the kernel diverts read and write system calls to the
kernel device implementation instead of passing them to the file system.
- A file’s name is distinct from the file itself; the same underlying file, called an inode, can have
multiple names, called links. Each link consists of an entry in a directory; the entry contains a file
name and a reference to an inode. An inode holds metadata about a file, including its type (file or
directory or device), its length, the location of the file’s content on disk, and the number of links to
a file. - The fstat system call retrieves information from the inode that a file descriptor refers to. It
fills in a struct stat, defined in stat.h (kernel/stat.h) as:
1 |
|
- The link system call creates another file system name referring to the same inode as an exist-
ing file. This fragment creates a new file named both a and b.
1 |
|
Reading from or writing to a is the same as reading from or writing to b. Each inode is identified
by a unique inode number. After the code sequence above, it is possible to determine that a and b
refer to the same underlying contents by inspecting the result of fstat: both will return the same
inode number (ino), and the nlink count will be set to 2.
The unlink system call removes a name from the file
- The unlink system call removes a name from the file system. The file’s inode and the disk
space holding its content are only freed when the file’s link count is zero and no file descriptors
refer to it. Thus adding
1 |
|
to the last code sequence leaves the inode and file content accessible as b. Furthermore,
1 |
|
is an idiomatic way to create a temporary inode with no name that will be cleaned up when the
process closes fd or exits.
- Unix provides file utilities callable from the shell as user-level programs, for example mkdir,
ln, and rm. This design allows anyone to extend the command-line interface by adding new user-
level programs. In hindsight this plan seems obvious, but other systems designed at the time of
Unix often built such commands into the shell (and built the shell into the kernel).
One exception is cd, which is built into the shell (user/sh.c:161). cd must change the current
working directory of the shell itself. If cd were run as a regular command, then the shell would
18fork a child process, the child process would run cd, and cd would change the child ’s working
directory. The parent’s (i.e., the shell’s) working directory would not change.
1.5 Real world
- the shell was the first so-called “scripting language.” The Unix system call interface persists today in
systems like BSD, Linux, and macOS. - The Unix system call interface has been standardized through the Portable Operating System
Interface (POSIX) standard. Xv6 is not POSIX compliant: it is missing many system calls (in-
cluding basic ones such as lseek), and many of the system calls it does provide differ from the
standard. Our main goals for xv6 are simplicity and clarity while providing a simple UNIX-like
system-call interface. Several people have extended xv6 with a few more system calls and a sim-
ple C library in order to run basic Unix programs. Modern kernels, however, provide many more
system calls, and many more kinds of kernel services, than xv6. For example, they support net-
working, windowing systems, user-level threads, drivers for many devices, and so on. Modern
kernels evolve continuously and rapidly, and offer many features beyond POSIX. - Xv6 does not provide a notion of users or of protecting one user from another; in Unix terms,
all xv6 processes run as root.
comments:
- Linux tries to adhere to POSIX (glibc provides most of the POSIX interfaces), but has its own extensions (e.g., epoll).
- Programmers who write POSIX interfaces can compile and run them on macOS, BSD, and Linux (as long as they don’t use platform-specific extensions).
- Think of the xv6 system call interface as a “subset implementation of POSIX.”