The Linux Kernel/System
< The Linux KernelSystem means general functions used to support and manage other kernel functionality. Synonym: infrastructure.
Booting and Halting
Booting a Linux installation is a complex process involving multiple stages and software components. This book is going to focus on the kernel part of the booting process, leaving the rest to other books or documentation. However, we are going to outline the essential stages of the startup process to show where the kernel fits:
- Hardware (including firmware) initialization
- Execution of a boot loader (if any)
- Loading the kernel image and startup of the kernel
- User space initialization
Early stages of the Linux startup process heavily depend very much on the computer architecture. IBM PC compatible hardware is one architecture Linux is commonly used on; on these systems, the BIOS plays an important role, which might not have exact analogs on other systems. In the following example, IBM PC compatible hardware is assumed:
- The BIOS performs startup tasks specific to the actual hardware platform.
- Once the hardware is enumerated and the hardware which is necessary for boot is initialized correctly, the BIOS loads and executes the boot code from the configured boot device. This boot code contains phase 1 of a Linux boot loader; phase 1 loads phase 2, which is the bulk of the boot loader code. Some loaders may use an intermediate phase (known as phase 1.5) to achieve this, since they are designed to be able to interpret filesystem layouts to locate the phase 2 loader.
- The boot loader loads the kernel image, prepare the boot parameters to be read later by the kernel, and transfer control to the operating system by jumping to a position in memory.
When the kernel finishes its startup process, it must be able to provide a pre-emptive multi-tasking environment to user space. The kernel must also initialize the device needed to mount the root filesystem (as read only), specified by the boot loader using the parameter root=
. The root file system contains the image of the first process to be launched by the kernel, called the init process, which PID is going to be 1. The kernel looks for it in /sbin/init
by default, or in the path indicated by the init=
boot parameter.
The init process manages user space initialization. It usually involves the execution of various startup scripts and daemons that set up all non-operating system services and structures in order to allow a user environment to be created.
Kernel booting
This is loaded in two stages - in the first stage the kernel (as a compressed image file) is loaded into memory and decompressed, and a few fundamental functions such as essential hardware and basic memory management (memory paging) are set up. Control is then switched one final time to the main kernel start process calling start_kernel()
, which then performs the majority of system setup (interrupts, the rest of memory management, device and driver initialization, etc.) before spawning separately, the idle process and scheduler, and the init process (which is executed in user space).
- Kernel loading stage
- The kernel as loaded is typically an image file, compressed into either zImage or bzImage formats with zlib. A routine at the head of it does a minimal amount of hardware setup, decompresses the image fully into high memory, and takes note of any RAM disk if configured. It then executes kernel startup via
./arch/i386/boot/head
and thestartup_32 ()
(for x86 based processors) process.
- The kernel as loaded is typically an image file, compressed into either zImage or bzImage formats with zlib. A routine at the head of it does a minimal amount of hardware setup, decompresses the image fully into high memory, and takes note of any RAM disk if configured. It then executes kernel startup via
- Kernel startup stage
- The startup function for the kernel (also called the swapper or process 0) establishes memory management (paging tables and memory paging), detects the type of CPU and any additional functionality such as floating point capabilities, and then switches to non-architecture specific Linux kernel functionality via a call to
start_kernel()
.
- The startup function for the kernel (also called the swapper or process 0) establishes memory management (paging tables and memory paging), detects the type of CPU and any additional functionality such as floating point capabilities, and then switches to non-architecture specific Linux kernel functionality via a call to
- start_kernel executes a wide range of initialization functions. It sets up interrupt handling (IRQs), further configures memory, starts the init process (the first user-space process), and then starts the idle task via
cpu_idle()
. Notably, the kernel startup process also mounts the initial RAM disk ("initrd") that was loaded previously as the temporary root file system during the boot phase. The initrd allows driver modules to be loaded directly from memory, without reliance upon other devices (e.g. a hard disk) and the drivers that are needed to access them (e.g. a SATA driver). This split of some drivers statically compiled into the kernel and other drivers loaded from initrd allows for a smaller kernel. The root file system is later switched via a call topivot_root()
which unmounts the temporary root file system and replaces it with the use of the real one, once the latter is accessible. The memory used by the temporary root file system is then reclaimed.
- start_kernel executes a wide range of initialization functions. It sets up interrupt handling (IRQs), further configures memory, starts the init process (the first user-space process), and then starts the idle task via
References
Booting
- Article about booting of the kernel
- wikipedia:Booting
- http://tldp.org/HOWTO/Linux-i386-Boot-Code-HOWTO/
- http://www.tldp.org/LDP/lki/lki-1.html
- http://www.tldp.org/HOWTO/KernelAnalysis-HOWTO-4.html
- IBM description of Linux BIOS boot process
- Linux (U)EFI boot process
- Linux kernel boot parameters
- Kernel booting process (for Linux 3.18) Part 1, Part 2, Part 3, Part 4
cat /proc/cmdline arch/i386/boot/bootsect.S arch/i386/kernel/head.S: calls start_kernel In init/main.c: start_kernel, rest_init, init, run_init_process
Halting and rebooting
- Softdog Driver
- wikipedia:Shutdown
- "Linux for PowerPC Embedded Systems HOWTO: Boot Sequence" by Boas Betzler
- "Embedded Linux Howto" by Sebastien Huet
- "Migrating from x86 to PowerPC, Part 2: Anatomy of the Linux boot process" by Lewin Edwards
- "Inside the Linux boot process" by M. Tim Jones
- "Reducing OS Boot Times for In-Car Computer Applications" by Damien Stolarz
sys_reboot calls
machine_restart or
machine_halt or
machine_power_off
Userspace communication
syscall, /proc, /dev, /sys
linux/proc_fs.h create_proc_entry etc
- Anatomy of a system call, part 1 and part 2
- ULK3 Chapter 10. System Calls
- ULK3 Chapter 11. Signals
procfs
The proc filesystem (procfs) is a special filesystem that presents information about processes and other system information in a hierarchical file-like structure, providing a more convenient and standardized method for dynamically accessing process data held in the kernel than traditional tracing methods or direct access to kernel memory. Typically, it is mapped to a mount point named /proc at boot time. The proc file system acts as an interface to internal data structures in the kernel. It can be used to obtain information about the system and to change certain kernel parameters at runtime.
/proc includes a directory for each running process —including kernel threads— in directories named /proc/PID, where PID is the process number. Each directory contains information about one process, including the command that originally started the process (/proc/PID/cmdline), the names and values of its environment variables (/proc/PID/environ), a symlink to its working directory (/proc/PID/cwd), another symlink to the original executable file —if it still exists— (/proc/PID/exe), a couple of directories with symlinks to each open file descriptor (/proc/PID/fd) and the status —position, flags, ...— of each of them (/proc/PID/fdinfo), information about mapped files and blocks like heap and stack (/proc/PID/maps), a binary image representing the process's virtual memory (/proc/PID/mem), a symlink to the root path as seen by the process (/proc/PID/root), a directory containing hard links to any child process or thread (/proc/PID/task), basic information about a process including its run state and memory usage (/proc/PID/status) and much more.
sysfs
sysfs is a pseudo-file system that exports information about various kernel subsystems, hardware devices, and associated device drivers from the kernel's device model to user space through virtual files. In addition to providing information about various devices and kernel subsystems, exported virtual files are also used for their configuring. Sysfs is designed to export the information present in the device tree, which would then no longer clutter up procfs.
Sysfs is mounted under the /sys mount point.
devfs
devfs is a specific implementation of a device file system used for presenting device files. Maintaining the /dev special files on a physically implemented file system (i.e. harddrive) is inconvenient, and as it needs kernel assistance anyway, the idea arose of a special-purpose logical file system that is not physically stored. Also defining when devices are ready to appear is not entirely trivial. The devfs approach is for the device driver to request creation and deletion of devfs entries related to the devices it enables and disables.
The current implementation —called devtmpfs— is a hybrid kernel/userspace approach of a device filesystem to provide nodes before udev runs for the first time.
Devices
ls /dev cat /proc/devices
Char devices
Chapter 13. I/O Architecture and Device Drivers
DMA
- Documentation/DMA-mapping.txt
- DMA-able memory: pci_alloc_consistent __get_free_page kmalloc kmem_cache_alloc
- pci_pool
- LDD3:Memory Mapping and DMA
- http://www.xml.com/ldd/chapter/book/ch13.html mmap and DMA
SAC Single Address Cycle
Modules
lsmod cat /proc/modules
- kernel/kmod.c
- LDD3: Building and Running Modules
- http://www.xml.com/ldd/chapter/book/ch02.html
- http://www.tldp.org/LDP/tlk/modules/modules.html
- http://www.tldp.org/LDP/lkmpg/2.6/html/ The Linux Kernel Module Programming Guide
I/O ports and registers
Modern functions for port I/O:
#include <linux/ioport.h>
Functions for memory mapped registers:
The {in,out}[bwl] macros are for emulating x86-style PCI/ISA IO space:
Hardware Device Drivers
Hardware Device Drivers are different from char and block devices.
or just Device Drivers
Keywords: kobjects, sysfs, buses, devices, drivers, classes, firmware, hotplug
- http://www.tldp.org/LDP/tlk/dd/drivers.html
- http://www.xml.com/ldd/chapter/book/
- http://examples.oreilly.com/linuxdrive2/
Busses: input, PCI, USB
Input bus: keyboard and mouse
cat /proc/bus/input/devices
PCI bus
pci_register_driver
lspci cat /proc/pci cat /proc/bus/pci/devices
- #include <linux/pci.h>
- Documentation/pci.txt
USB bus
lsusb cat /proc/bus/usb/devices
Building and Updating
Debugging
- printk
- dump_stack
- show_registers
- dmesg --console-level <level>
- gdb /usr/src/linux/vmlinux /proc/kcore
- Magic SysRq key
- git bisect ...
oops
- http://lxr.linux.no/source/Documentation/oops-tracing.txt
- http://www.urbanmyth.org/linux/oops/ Good presentation
- http://www.mulix.org/lectures/kernel_oopsing/kernel_oopsing.pdf
printk
linux/arch/i386/kernel/traps.c
KDB (Built-in Kernel Debugger) Local debugging.
- http://www-106.ibm.com/developerworks/linux/library/l-kdbug/?ca=dgr-wikiaKDB
- http://oss.sgi.com/projects/kdb/
- ftp://oss.sgi.com/www/projects/kdb/download/
Other
- http://user-mode-linux.sourceforge.net/
- CONFIG_MAGIC_SYSRQ, handle_sysrq
- http://lkcd.sourceforge.net/ Linux Kernel Crash Dump
KGDB Remote GDB debugging.
# gdb -q vmlinux (gdb) target remote /dev/ttyS0