3.3. Some Important Data Structures
As you can imagine, device number registration
is just the first of many tasks
that driver code must carry out. We will soon look at other important
driver components, but one other digression is needed first. Most of
the fundamental driver operations involve three important kernel data
structures, called file_operations,
file, and inode. A basic
familiarity with these structures is required to be able to do much
of anything interesting, so we will now take a quick look at each of
them before getting into the details of how to implement the
fundamental driver operations.
3.3.1. File Operations
So far, we have reserved som
e
device numbers for our use, but we have not yet connected any of our
driver's operations to those numbers. The
file_operations structure is how a char driver
sets up this connection. The structure, defined in
<linux/fs.h>, is a collection of function
pointers. Each open file (represented internally by a
file structure, which we will examine shortly) is
associated with its own set of functions (by including a field called
f_op that points to a
file_operations structure). The operations are
mostly in charge of implementing the system calls and are therefore,
named open, read, and so
on. We can consider the file to be an
"object" and the functions
operating on it to be its
"methods," using object-oriented
programming terminology to denote actions declared by an object to
act on itself. This is the first sign of object-oriented programming
we see in the Linux kernel, and we'll see more in
later chapters.
Conventionally, a
file_operations structure or a pointer to one is
called fops (or some variation thereof ). Each
field in the structure must point to the function in the driver that
implements a specific operation, or be left NULL
for unsupported operations. The exact behavior of the kernel when a
NULL pointer is specified is different for each
function, as the list later in this section shows.
The following list introduces all the operations that an application
can invoke on a device. We've tried to keep the list
brief so it can be used as a reference, merely summarizing each
operation and the default kernel behavior when a
NULL pointer is used.
As you read through the list of file_operations
methods, you will note that a number of parameters include the string
_ _user. This annotation is a form of
documentation, noting that a pointer is a user-space address that
cannot be directly dereferenced. For normal compilation, _
_user has no effect, but it can be used by external
checking software to find misuse of user-space addresses.
The rest of the chapter, after describing some other important data
structures, explains the role of the most important operations and
offers hints, caveats, and real code examples. We defer discussion of
the more complex operations to later chapters, because we
aren't ready to dig into topics such as memory
management, blocking operations, and asynchronous notification quite
yet.
- struct module *owner
-
The first file_operations field is not an
operation at all; it is a pointer to the module that
"owns" the structure. This field is
used to prevent the module from being unloaded while its operations
are in use. Almost all the time, it is simply initialized to
THIS_MODULE, a macro defined in
<linux/module.h>.
- loff_t (*llseek) (struct file *, loff_t, int);
-
The
llseek method is used to change the current
read/write position in a file, and the new position is returned as a
(positive) return value. The loff_t parameter is a
"long offset" and is at least 64
bits wide even on 32-bit platforms. Errors are signaled by a negative
return value. If this function pointer is NULL,
seek calls will modify the position counter in the
file structure (described in Section 3.3.2)
in potentially unpredictable ways.
- ssize_t (*read) (struct file *, char _ _user *, size_t, loff_t *);
-
Used
to retrieve data from the device. A null pointer in this position
causes the read system call to fail with
-EINVAL ("Invalid
argument"). A nonnegative return value represents
the number of bytes successfully read (the return value is a
"signed size" type, usually the
native integer type for the target platform).
- ssize_t (*aio_read)(struct kiocb *, char _ _user *, size_t, loff_t);
-
Initiates an asynchronous read—a read operation that might not
complete before the function returns. If this method is
NULL, all operations will be processed
(synchronously) by read instead.
- ssize_t (*write) (struct file *, const char _ _user *, size_t, loff_t *);
-
Sends
data to the device. If NULL,
-EINVAL is returned to the program calling the
write system call. The return value, if
nonnegative, represents the number of bytes successfully written.
- ssize_t (*aio_write)(struct kiocb *, const char _ _user *, size_t, loff_t *);
-
Initiates an asynchronous write operation on the device.
- int (*readdir) (struct file *, void *, filldir_t);
-
This
field should be NULL for device files; it is used
for reading directories and is useful only for filesystems.
- unsigned int (*poll) (struct file *, struct poll_table_struct *);
-
The
poll method is the back end of three system
calls: poll, epoll, and
select, all of which are used to query whether a
read or write to one or more file descriptors would block. The
poll method should return a bit mask indicating
whether non-blocking reads or writes are possible, and, possibly,
provide the kernel with information that can be used to put the
calling process to sleep until I/O becomes possible. If a driver
leaves its poll method NULL,
the device is assumed to be both readable and writable without
blocking.
- int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
-
The
ioctl system call offers a way to issue
device-specific commands (such as formatting a track of a floppy
disk, which is neither reading nor writing). Additionally, a few
ioctl commands are recognized by the kernel
without referring to the fops table. If the device
doesn't provide an ioctl
method, the system call returns an error for any request that
isn't predefined (-ENOTTY,
"No such ioctl for device").
- int (*mmap) (struct file *, struct vm_area_struct *);
-
mmap
is used to request a mapping of device memory to a
process's address space. If this method is
NULL, the mmap system call
returns -ENODEV.
- int (*open) (struct inode *, struct file *);
-
Though
this is always the first operation performed on the device file, the
driver is not required to declare a corresponding method. If this
entry is NULL, opening the device always succeeds,
but your driver isn't notified.
- int (*flush) (struct file *);
-
The
flush
operation is invoked when a process closes its copy of a file
descriptor for a device; it should execute (and wait for) any
outstanding operations on the device. This must not be confused with
the fsync operation requested by user programs.
Currently, flush is used in very few drivers;
the SCSI tape driver uses it, for example, to ensure that all data
written makes it to the tape before the device is closed. If
flush is NULL, the kernel
simply ignores the user application request.
- int (*release) (struct inode *, struct file *);
-
This
operation is invoked when the file structure is
being released. Like open,
release can be
NULL.
- int (*fsync) (struct file *, struct dentry *, int);
-
This
method is the back end of the fsync system call,
which a user calls to flush any pending data. If this pointer is
NULL, the system call returns
-EINVAL.
- int (*aio_fsync)(struct kiocb *, int);
-
This is the asynchronous version of the fsync
method.
- int (*fasync) (int, struct file *, int);
-
This
operation is used to notify the device of a change in its
FASYNC flag. Asynchronous notification is an
advanced topic and is described in Chapter 6. The field can be
NULL if the driver doesn't
support asynchronous notification.
- int (*lock) (struct file *, int, struct file_lock *);
-
The lock method is used
to implement file locking; locking is an indispensable feature for
regular files but is almost never implemented by device drivers.
- ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *);
- ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *);
-
These
methods implement scatter/gather read and write operations.
Applications occasionally need to do a single read or write operation
involving multiple memory areas; these system calls allow them to do
so without forcing extra copy operations on the data. If these
function pointers are left NULL, the
read and write methods are
called (perhaps more than once) instead.
- ssize_t (*sendfile)(struct file *, loff_t *, size_t, read_actor_t, void *);
-
This method implements the read side of the
sendfile system call, which moves the data from
one file descriptor to another with a minimum of copying. It is used,
for example, by a web server that needs to send the contents of a
file out a network connection. Device drivers usually leave
sendfile NULL.
- ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *,
- int);
-
sendpage is the other half of
sendfile; it is called by the kernel to send
data, one page at a time, to the corresponding file. Device drivers
do not usually implement sendpage.
- unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned
- long, unsigned long, unsigned long);
-
The purpose of this
method is to find a suitable location in
the process's address space to map in a memory
segment on the underlying device. This task is normally performed by
the memory management code; this method exists to allow drivers to
enforce any alignment requirements a particular device may have. Most
drivers can leave this method NULL.
- int (*check_flags)(int)
-
This method allows a module to
check
the flags passed to an fcntl(F_SETFL...) call.
- int (*dir_notify)(struct file *, unsigned long);
-
This method is invoked when an
application
uses fcntl to request directory change
notifications. It is useful only to filesystems; drivers need not
implement dir_notify.
The scull
device driver implements only the most important device methods. Its
file_operations structure is initialized as
follows:
struct file_operations scull_fops = {
.owner = THIS_MODULE,
.llseek = scull_llseek,
.read = scull_read,
.write = scull_write,
.ioctl = scull_ioctl,
.open = scull_open,
.release = scull_release,
};
This declaration uses the standard C tagged structure initialization
syntax. This syntax is preferred because it makes drivers more
portable across changes in the definitions of the structures and,
arguably, makes the code more compact and readable. Tagged
initialization allows the reordering of structure members; in some
cases, substantial performance improvements have been realized by
placing pointers to frequently accessed members in the same
hardware
cache line.
3.3.2. The file Structure
struct file, defined in
<linux/fs.h>, is
the
second most
important data structure used in device drivers. Note that a
file has nothing to do with the
FILE pointers of user-space programs. A
FILE is defined in the C library and never appears
in kernel code. A struct file,
on the other hand, is a kernel structure that never appears in user
programs.
The file structure
represents an open file
.
(It is not specific to device drivers; every open file in the system
has an associated struct file
in kernel space.) It is created by the kernel on
open and is passed to any function that operates
on the file, until the last close. After all
instances of the file are closed, the kernel releases the data
structure.
In the kernel
sources, a pointer to struct
file is usually called either
file or filp
("file pointer").
We'll consistently call the pointer
filp to prevent ambiguities with the structure
itself. Thus, file refers to the structure and
filp to a pointer to the structure.
The most important fields of struct
file are shown here. As in the previous section,
the list can be skipped on a first reading. However, later in this
chapter, when we face some real C code, we'll
discuss the fields in more detail.
- mode_t f_mode;
-
The
file mode identifies the file as either readable or writable (or
both), by means of the bits FMODE_READ and
FMODE_WRITE. You might want to check this field
for read/write permission in your open or
ioctl function, but you don't
need to check permissions for read and
write, because the kernel checks before invoking
your method. An attempt to read or write when the file has not been
opened for that type of access is rejected without the driver even
knowing about it.
- loff_t f_pos;
-
The current reading or writing position.
loff_t is a 64-bit value on all platforms
(long long in gcc
terminology). The driver can read this value if it needs to know the
current position in the file but should not normally change it;
read and write should
update a position using the pointer they receive as the last argument
instead of acting on filp->f_pos directly. The
one exception to this rule is in the llseek
method, the purpose of which is to change the file position.
- unsigned int f_flags;
-
These are the file flags, such
as O_RDONLY, O_NONBLOCK, and
O_SYNC. A driver should check the
O_NONBLOCK flag to see if nonblocking operation
has been requested (we discuss nonblocking I/O in Section 6.2.3); the other
flags are seldom used. In particular, read/write permission should be
checked using f_mode rather than
f_flags. All the flags are defined in the header
<linux/fcntl.h>.
- struct file_operations *f_op;
-
The
operations associated with the file. The kernel assigns the pointer
as part of its implementation of open and then
reads it when it needs to dispatch any operations. The value in
filp->f_op is never saved by the kernel for
later reference; this means that you can change the file operations
associated with your file, and the new methods will be effective
after you return to the caller. For example, the code for
open associated with major number 1
(/dev/null, /dev/zero, and
so on) substitutes the operations in filp->f_op
depending on the minor number being opened. This practice allows the
implementation of several behaviors under the same major number
without introducing overhead at each system call. The ability to
replace the file operations is the kernel equivalent of
"method overriding" in
object-oriented programming.
- void *private_data;
-
The open
system call sets this pointer to NULL before
calling the open method for the driver. You are
free to make its own use of the field or to ignore it; you can use
the field to point to allocated data, but then you must remember to
free that memory in the release method before
the file structure is destroyed by the kernel.
private_data is a useful resource for preserving
state information across system calls and is used by most of our
sample modules.
- struct dentry *f_dentry;
-
The directory entry
(dentry) structure associated with the file.
Device driver writers normally need not concern themselves with
dentry structures, other than to access the inode
structure as filp->f_dentry->d_inode.
The real structure has a few more fields, but they
aren't useful to device drivers. We can safely
ignore those fields, because drivers never create
file structures; they only access structures
created elsewhere.
3.3.3. The inode Structure
The inode structure is
used by the
kernel internally to represent
files. Therefore, it is different from the file
structure that represents an open file descriptor. There can be
numerous file structures representing multiple
open descriptors on a single file, but they all point to a single
inode structure.
The inode structure contains a great deal of
information about the file. As a general rule, only two fields of
this structure are of interest for writing driver code:
- dev_t i_rdev;
-
For inodes that represent device files, this field contains the
actual device number.
- struct cdev *i_cdev;
-
struct cdev is the kernel's
internal structure that represents char devices; this field contains
a pointer to that structure when the inode refers to a char device
file.
The type of i_rdev changed over the course of the
2.5 development series, breaking a lot of drivers. As a way of
encouraging more portable programming, the kernel developers have
added two macros that can be used to obtain the major and minor
number from an inode:
unsigned int iminor(struct inode *inode);
unsigned int imajor(struct inode *inode);
In the interest of not being caught by the next change, these macros
should be used instead of manipulating i_rdev
directly.
|