next up previous contents
Next: Procfs Labeling Design Up: Procfs Previous: Procfs   Contents

Procfs Analysis

The Linux procfs file system provides an interface to kernel data structures as an alternative to the traditional /dev/kmem interface. This file system is typically mounted at /proc. The /proc file system hierarchy is described in the proc(5) manual page and in the Documentation/proc.txt file.

The Linux sysctl system call provides an interface for reading and writing system parameters. This system call is described in the sysctl(2) manual page. The system parameters are arranged in a tree structure, and they are typically also accessible through a parallel directory tree under the /proc/sys subdirectory. In addition to the previously mentioned documents, the /proc/sys hierarchy is described in the files in the Documentation/sysctl directory. Based on the documentation for sysctl, it appears that applications should always use the /proc/sys interface instead of the system call interface for portability across kernel versions.

There is a subdirectory for each running process under /proc, named by its process identifier. A process may always use the /proc/self symbolic link to refer to its own subdirectory. The effective uid and effective gid of the process is used for the user and group ownership of the files and subdirectories within each process-specific subdirectory. Several files in these subdirectories permit any user to read them: the cmdline, maps, stat, statm, and status files. The remaining files may only be read by the owner. The mem file, which provides access to the memory of the process, may only be read and written by the owner. The procfs implementation only permits a process to access its own mem file or the mem file of a child process that is stopped and being traced (fs/proc/mem.c:get_task). The Linux 2.2.12 kernel implementation does not support write access to the mem files due to a risk of overwriting kernel memory if a process dies in the middle of a write, but future versions of the kernel are likely to support such access.

Most of the files in /proc outside of the process-specific subdirectories are readable by all users. The most notable exceptions are /proc/kmsg and /proc/kcore, which are only readable by the superuser. The /proc/kmsg file is used by klogd as a source of kernel log information as an alternative to the syslog system call interface. The /proc/kcore file provides access to the physical memory of the system in core file format, and can be used by gdb to examine the current state of any kernel data structures. The kernel implementation also requires that a process possess the CAP_SYS_RAWIO capability to open the /proc/kcore file (fs/proc/array.c:open_kcore).

Only the superuser may write to files in /proc outside of the process-specific subdirectories. Most files that can be written correspond to system parameters and are located in /proc/sys. A few files outside of /proc/sys also permit writing for configuration. For example, /proc/mtrr may be written to manipulate the memory type range register, as described in Documentation/mtrr.txt. Some of the files under /proc/ide, /proc/scsi, /proc/bus, and /proc/parport may be written for device configuration.

The types and functions provided by the procfs file system to the kernel are defined in include/linux/proc_fs.h. Entries in the /proc file system are defined by struct proc_dir_entry objects. The proc_register function may be used to add an entry under a given parent entry, and the create_proc_entry function may be used to create and register a dynamically allocated entry given a name, mode, and parent entry. Functions are also provided for registering entries under certain well-defined subdirectories, such as the net or scsi subdirectories.

When an entry in /proc is looked up, an inode is obtained for the entry. The fs/proc/inode.c:proc_get_inode function copies the owner and mode attributes from the entry into the inode when it is requested by a lookup. This function also calls the entry's fill_inode operation. For process-specific files, this operation is implemented by the base.c:proc_pid_fill_inode function, which copies the effective uid and effective gid of the associated process into the inode. The inode.c:proc_read_inode function also copies the effective identity attributes into the inode when the inode for a process-specific file is initialized.

The types and functions provided by the sysctl call to the kernel are defined in include/linux/sysctl.h. A sysctl table is defined by an array of struct ctl_table objects. Each object may contain a pointer to an array of child objects. The statically declared kernel/sysctl.c:root_table contains the base set of sysctl entries. The sysctl_init function calls the register_proc_table function to create the corresponding entries under /proc/sys.

Additional sysctl tables may be added dynamically by using the register_sysctl_table function. Unlike proc_register, this function does not link the new table into an existing table in the hierarchy. Instead, the new table is added to a linked list of top-level tables. Consequently, dynamically-registered tables must contain dummy entries to provide the path from the root of the hierarchy to the newly registered parameters. The register_sysctl_table function also calls the register_proc_table function on the newly registered table.

When the sysctl system call is called, the kernel/sysctl.c:parse_table function looks up the appropriate struct ctl_table object, calling the ctl_perm function to check that the process has search access to each table in the prefix. When a matching entry is found, the do_sysctl_strategy function calls ctl_perm to check that the process has the appropriate read and/or write access to the table. The ctl_perm function is also called by the do_rw_proc function when a sysctl parameter is accessed through /proc/sys.

next up previous contents
Next: Procfs Labeling Design Up: Procfs Previous: Procfs   Contents