Files and stuff

Unix has one kind of abstract object that is used in many places: a file. All files have a common set of operations that are supported even when the actual implementation of these operations differ from file to file.

VFS overview

#include <linux/fs.h>

The virtual file system layer (VFS) is responsible to offer a unified view on files to user space programs. A file can be some storage object of a file system, a representation of a device driver, or a network socket. The actual file system, device driver or network protocol is responsible to implement all operations that can be called from userspace.

struct inode

represents a file

struct file

represents an open file, similar to a FILE* in userspace. The current position in the file is stored here.

struct file_operations

table of function pointers with implementation of the files behaviour

Such operations structures are very common to describe an abstract interface of some object. It is very much like the virtual method table of C++ objects.

Common operations on files are:

open(inode, file)

Called when a file is opened.

release(inode, file)

Called when the last filehandle to a file is closed.

read(file, buf, count, pos)

Called when data is to be read. count bytes should be transferred (via put_user or copy_to_user) to the userspace memory at address buf. pos holds a pointer to the current position inside the file. It should be updated to reflect the new position.

The function has to return the number of bytes actually copied (it is ok to return less then count bytes). If no data is available the function should wait for new data to arrive and then return that data. A return value of 0 indicates 'End of File'.

write(file, buf, count, pos)

Called when data is to be written. count bytes should be transferred (via get_user or copy_from_user) from the userspace memory at address buf. Same as the read function otherwise.

poll(file, polltable)

Called when a userspace process wants to wait for available data. Will be described later.

ioctl(inode, file, cmd, arg)

Called when ioctl(2) is called on this file. Ioctl provides a multiplexer for device-specific commands that do not map well on the standard read/write API. Example useages may be to set the transfer speed or to retrieve status information about the device.

file->private_data

Most operations only have the file pointer to identify the actual file that is needed. Device drivers usually maintain their own structure to describe each device. If a device driver supports more than one device then it has to map from the file argument its own device structure. For that reason does struct file contain the field private_data. This is an opaque void* pointer that can be freely used by the device driver. It is usually initialized by the open operation and then used by all other file operations.

Character devices

Device drivers can offer a file interface to talk to a device. Those files are usually located in /dev. Each file that is implemented by a device driver is identified by a device number (named dev_t inside the kernel) which is split into major and minor number for historical reasons. Usually the major number identifies the driver and the minor number identifies a device managed by this driver. /dev contains special inodes that include a major and minor number. The driver specific file operations are used whenever such an inode is opened.

There are two different types of device driver from the VFS point of view: character devices and block devices. Character devices operate on streams of bytes and usually do not support seeking while block devices only transfer fixed sized blocks. We only look at character devices here.

To create a new device node, see mknod(1).

The kernel maintains a list of currently available device drivers. A device driver can add itself to the list by calling register_chrdev. When doing so it must provide the major number which should be associated with the new driver and the file operations which should be called when a user space process access this device.

 static struct file_operations example_fops = {
 	.read = example_read, /* ... */
 };
 register_chrdrv(example_major, "example", &example_fops);

After the above call, example_read will be called whenever some process wants to read from a device node with the major number example_major.

The major number will be dynamically assigned when register_chrdev will be called with a 0 major_number. register_chrdev will return the actual major number in this case. Otherwise it will return 0 to indicate success. As usual, negative values indicate failure.

To remove the driver from the system:

 unregister_chrdev(example_major, "name");

You can get a list of currently available drivers in /proc/devices.