⇦ prev | ⇱ home | next ⇨ |
12.1. The PCI InterfaceAlthough many computer users think of PCI as a way of laying out electrical wires, it is actually a complete set of specifications defining how different parts of a computer should interact. The PCI specification covers most issues related to computer interfaces. We are not going to cover it all here; in this section, we are mainly concerned with how a PCI driver can find its hardware and gain access to it. The probing techniques discussed in Chapter 12 and Chapter 10 can be used with PCI devices, but the specification offers an alternative that is preferable to probing. The PCI architecture was designed as a replacement for the ISA standard, with three main goals: to get better performance when transferring data between the computer and its peripherals, to be as platform independent as possible, and to simplify adding and removing peripherals to the system. The PCI bus achieves better performance by using a higher clock rate than ISA; its clock runs at 25 or 33 MHz (its actual rate being a factor of the system clock), and 66-MHz and even 133-MHz implementations have recently been deployed as well. Moreover, it is equipped with a 32-bit data bus, and a 64-bit extension has been included in the specification. Platform independence is often a goal in the design of a computer bus, and it's an especially important feature of PCI, because the PC world has always been dominated by processor-specific interface standards. PCI is currently used extensively on IA-32, Alpha, PowerPC, SPARC64, and IA-64 systems, and some other platforms as well. What is most relevant to the driver writer, however, is PCI's support for autodetection of interface boards. PCI devices are jumperless (unlike most older peripherals) and are automatically configured at boot time. Then, the device driver must be able to access configuration information in the device in order to complete initialization. This happens without the need to perform any probing. 12.1.1. PCI AddressingEach PCI peripheral is identified by a bus number, a device number, and a function number. The PCI specification permits a single system to host up to 256 buses, but because 256 buses are not sufficient for many large systems, Linux now supports PCI domains. Each PCI domain can host up to 256 buses. Each bus hosts up to 32 devices, and each device can be a multifunction board (such as an audio device with an accompanying CD-ROM drive) with a maximum of eight functions. Therefore, each function can be identified at hardware level by a 16-bit address, or key. Device drivers written for Linux, though, don't need to deal with those binary addresses, because they use a specific data structure, called pci_dev, to act on the devices. Most recent workstations feature at least two PCI buses. Plugging more than one bus in a single system is accomplished by means of bridges, special-purpose PCI peripherals whose task is joining two buses. The overall layout of a PCI system is a tree where each bus is connected to an upper-layer bus, up to bus 0 at the root of the tree. The CardBus PC-card system is also connected to the PCI system via bridges. A typical PCI system is represented in Figure 12-1, where the various bridges are highlighted. Figure 12-1. Layout of a typical PCI systemThe 16-bit hardware addresses associated with PCI peripherals, although mostly hidden in the struct pci_dev object, are still visible occasionally, especially when lists of devices are being used. One such situation is the output of lspci (part of the pciutils package, available with most distributions) and the layout of information in /proc/pci and /proc/bus/pci. The sysfs representation of PCI devices also shows this addressing scheme, with the addition of the PCI domain information.[1] When the hardware address is displayed, it can be shown as two values (an 8-bit bus number and an 8-bit device and function number), as three values (bus, device, and function), or as four values (domain, bus, device, and function); all the values are usually displayed in hexadecimal.
For example, /proc/bus/pci/devices uses a single 16-bit field (to ease parsing and sorting), while /proc/bus/busnumber splits the address into three fields. The following shows how those addresses appear, showing only the beginning of the output lines: $ lspci | cut -d: -f1-3 0000:00:00.0 Host bridge 0000:00:00.1 RAM memory 0000:00:00.2 RAM memory 0000:00:02.0 USB Controller 0000:00:04.0 Multimedia audio controller 0000:00:06.0 Bridge 0000:00:07.0 ISA bridge 0000:00:09.0 USB Controller 0000:00:09.1 USB Controller 0000:00:09.2 USB Controller 0000:00:0c.0 CardBus bridge 0000:00:0f.0 IDE interface 0000:00:10.0 Ethernet controller 0000:00:12.0 Network controller 0000:00:13.0 FireWire (IEEE 1394) 0000:00:14.0 VGA compatible controller $ cat /proc/bus/pci/devices | cut -f1 0000 0001 0002 0010 0020 0030 0038 0048 0049 004a 0060 0078 0080 0090 0098 00a0 $ tree /sys/bus/pci/devices/ /sys/bus/pci/devices/ |-- 0000:00:00.0 -> ../../../devices/pci0000:00/0000:00:00.0 |-- 0000:00:00.1 -> ../../../devices/pci0000:00/0000:00:00.1 |-- 0000:00:00.2 -> ../../../devices/pci0000:00/0000:00:00.2 |-- 0000:00:02.0 -> ../../../devices/pci0000:00/0000:00:02.0 |-- 0000:00:04.0 -> ../../../devices/pci0000:00/0000:00:04.0 |-- 0000:00:06.0 -> ../../../devices/pci0000:00/0000:00:06.0 |-- 0000:00:07.0 -> ../../../devices/pci0000:00/0000:00:07.0 |-- 0000:00:09.0 -> ../../../devices/pci0000:00/0000:00:09.0 |-- 0000:00:09.1 -> ../../../devices/pci0000:00/0000:00:09.1 |-- 0000:00:09.2 -> ../../../devices/pci0000:00/0000:00:09.2 |-- 0000:00:0c.0 -> ../../../devices/pci0000:00/0000:00:0c.0 |-- 0000:00:0f.0 -> ../../../devices/pci0000:00/0000:00:0f.0 |-- 0000:00:10.0 -> ../../../devices/pci0000:00/0000:00:10.0 |-- 0000:00:12.0 -> ../../../devices/pci0000:00/0000:00:12.0 |-- 0000:00:13.0 -> ../../../devices/pci0000:00/0000:00:13.0 `-- 0000:00:14.0 -> ../../../devices/pci0000:00/0000:00:14.0 All three lists of devices are sorted in the same order, since lspci uses the /proc files as its source of information. Taking the VGA video controller as an example, 0x00a0 means 0000:00:14.0 when split into domain (16 bits), bus (8 bits), device (5 bits) and function (3 bits). The hardware circuitry of each peripheral board answers queries pertaining to three address spaces: memory locations, I/O ports, and configuration registers. The first two address spaces are shared by all the devices on the same PCI bus (i.e., when you access a memory location, all the devices on that PCI bus see the bus cycle at the same time). The configuration space, on the other hand, exploits geographical addressing. Configuration queries address only one slot at a time, so they never collide. As far as the driver is concerned, memory and I/O regions are accessed in the usual ways via inb, readb, and so forth. Configuration transactions, on the other hand, are performed by calling specific kernel functions to access configuration registers. With regard to interrupts, every PCI slot has four interrupt pins, and each device function can use one of them without being concerned about how those pins are routed to the CPU. Such routing is the responsibility of the computer platform and is implemented outside of the PCI bus. Since the PCI specification requires interrupt lines to be shareable, even a processor with a limited number of IRQ lines, such as the x86, can host many PCI interface boards (each with four interrupt pins). The I/O space in a PCI bus uses a 32-bit address bus (leading to 4 GB of I/O ports), while the memory space can be accessed with either 32-bit or 64-bit addresses. 64-bit addresses are available on more recent platforms. Addresses are supposed to be unique to one device, but software may erroneously configure two devices to the same address, making it impossible to access either one. But this problem never occurs unless a driver is willingly playing with registers it shouldn't touch. The good news is that every memory and I/O address region offered by the interface board can be remapped by means of configuration transactions. That is, the firmware initializes PCI hardware at system boot, mapping each region to a different address to avoid collisions.[2] The addresses to which these regions are currently mapped can be read from the configuration space, so the Linux driver can access its devices without probing. After reading the configuration registers, the driver can safely access its hardware.
The PCI configuration space consists of 256 bytes for each device function (except for PCI Express devices, which have 4 KB of configuration space for each function), and the layout of the configuration registers is standardized. Four bytes of the configuration space hold a unique function ID, so the driver can identify its device by looking for the specific ID for that peripheral.[3] In summary, each device board is geographically addressed to retrieve its configuration registers; the information in those registers can then be used to perform normal I/O access, without the need for further geographic addressing.
It should be clear from this description that the main innovation of the PCI interface standard over ISA is the configuration address space. Therefore, in addition to the usual driver code, a PCI driver needs the ability to access the configuration space, in order to save itself from risky probing tasks. For the remainder of this chapter, we use the word device to refer to a device function, because each function in a multifunction board acts as an independent entity. When we refer to a device, we mean the tuple "domain number, bus number, device number, and function number." 12.1.2. Boot TimeTo see how PCI works, we start from system boot, since that's when the devices are configured. When power is applied to a PCI device, the hardware remains inactive. In other words, the device responds only to configuration transactions. At power on, the device has no memory and no I/O ports mapped in the computer's address space; every other device-specific feature, such as interrupt reporting, is disabled as well. Fortunately, every PCI motherboard is equipped with PCI-aware firmware, called the BIOS, NVRAM, or PROM, depending on the platform. The firmware offers access to the device configuration address space by reading and writing registers in the PCI controller. At system boot, the firmware (or the Linux kernel, if so configured) performs configuration transactions with every PCI peripheral in order to allocate a safe place for each address region it offers. By the time a device driver accesses the device, its memory and I/O regions have already been mapped into the processor's address space. The driver can change this default assignment, but it never needs to do that. As suggested, the user can look at the PCI device list and the devices' configuration registers by reading /proc/bus/pci/devices and /proc/bus/pci/*/*. The former is a text file with (hexadecimal) device information, and the latter are binary files that report a snapshot of the configuration registers of each device, one file per device. The individual PCI device directories in the sysfs tree can be found in /sys/bus/pci/devices. A PCI device directory contains a number of different files: $ tree /sys/bus/pci/devices/0000:00:10.0 /sys/bus/pci/devices/0000:00:10.0 |-- class |-- config |-- detach_state |-- device |-- irq |-- power | `-- state |-- resource |-- subsystem_device |-- subsystem_vendor `-- vendor The file config is a binary file that allows the raw PCI config information to be read from the device (just like the /proc/bus/pci/*/* provides.) The files vendor, device, subsystem_device, subsystem_vendor, and class all refer to the specific values of this PCI device (all PCI devices provide this information.) The file irq shows the current IRQ assigned to this PCI device, and the file resource shows the current memory resources allocated by this device. 12.1.3. Configuration Registers and InitializationIn this section, we look at the configuration registers that PCI devices contain. All PCI devices feature at least a 256-byte address space. The first 64 bytes are standardized, while the rest are device dependent. Figure 12-2 shows the layout of the device-independent configuration space. Figure 12-2. The standardized PCI configuration registersAs the figure shows, some of the PCI configuration registers are required and some are optional. Every PCI device must contain meaningful values in the required registers, whereas the contents of the optional registers depend on the actual capabilities of the peripheral. The optional fields are not used unless the contents of the required fields indicate that they are valid. Thus, the required fields assert the board's capabilities, including whether the other fields are usable. It's interesting to note that the PCI registers are always little-endian. Although the standard is designed to be architecture independent, the PCI designers sometimes show a slight bias toward the PC environment. The driver writer should be careful about byte ordering when accessing multibyte configuration registers; code that works on the PC might not work on other platforms. The Linux developers have taken care of the byte-ordering problem (see the next section, Section 12.1.8), but the issue must be kept in mind. If you ever need to convert data from host order to PCI order or vice versa, you can resort to the functions defined in <asm/byteorder.h>, introduced in Chapter 11, knowing that PCI byte order is little-endian. Describing all the configuration items is beyond the scope of this book. Usually, the technical documentation released with each device describes the supported registers. What we're interested in is how a driver can look for its device and how it can access the device's configuration space. Three or five PCI registers identify a device: vendorID, deviceID, and class are the three that are always used. Every PCI manufacturer assigns proper values to these read-only registers, and the driver can use them to look for the device. Additionally, the fields subsystem vendorID and subsystem deviceID are sometimes set by the vendor to further differentiate similar devices. Let's look at these registers in more detail:
Using these different identifiers, a PCI driver can tell the kernel what kind of devices it supports. The struct pci_device_id structure is used to define a list of the different types of PCI devices that a driver supports. This structure contains the following fields:
There are two helper macros that should be used to initialize a struct pci_device_id structure:
An example of using these macros to define the type of devices a driver supports can be found in the following kernel files: drivers/usb/host/ehci-hcd.c: static const struct pci_device_id pci_ids[ ] = { { /* handle any USB 2.0 EHCI controller */ PCI_DEVICE_CLASS(((PCI_CLASS_SERIAL_USB << 8) | 0x20), ~0), .driver_data = (unsigned long) &ehci_driver, }, { /* end: all zeroes */ } }; drivers/i2c/busses/i2c-i810.c: static struct pci_device_id i810_ids[ ] = { { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82810_IG1) }, { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82810_IG3) }, { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82810E_IG) }, { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82815_CGC) }, { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82845G_IG) }, { 0, }, }; These examples create a list of struct pci_device_id structures, with an empty structure set to all zeros as the last value in the list. This array of IDs is used in the struct pci_driver (described below), and it is also used to tell user space which devices this specific driver supports. 12.1.4. MODULE_DEVICE_TABLEThis pci_device_id structure needs to be exported to user space to allow the hotplug and module loading systems know what module works with what hardware devices. The macro MODULE_DEVICE_TABLE accomplishes this. An example is: MODULE_DEVICE_TABLE(pci, i810_ids); This statement creates a local variable called _ _mod_pci_device_table that points to the list of struct pci_device_id. Later in the kernel build process, the depmod program searches all modules for the symbol _ _mod_pci_device_table. If that symbol is found, it pulls the data out of the module and adds it to the file /lib/modules/KERNEL_VERSION/modules.pcimap. After depmod completes, all PCI devices that are supported by modules in the kernel are listed, along with their module names, in that file. When the kernel tells the hotplug system that a new PCI device has been found, the hotplug system uses the modules.pcimap file to find the proper driver to load. 12.1.5. Registering a PCI DriverThe main structure that all PCI drivers must create in order to be registered with the kernel properly is the struct pci_driver structure. This structure consists of a number of function callbacks and variables that describe the PCI driver to the PCI core. Here are the fields in this structure that a PCI driver needs to be aware of:
In summary, to create a proper struct pci_driver structure, only four fields need to be initialized: static struct pci_driver pci_driver = { .name = "pci_skel", .id_table = ids, .probe = probe, .remove = remove, }; To register the struct pci_driver with the PCI core, a call to pci_register_driver is made with a pointer to the struct pci_driver. This is traditionally done in the module initialization code for the PCI driver: static int _ _init pci_skel_init(void) { return pci_register_driver(&pci_driver); } Note that the pci_register_driver function either returns a negative error number or 0 if everything was registered successfully. It does not return the number of devices that were bound to the driver or an error number if no devices were bound to the driver. This is a change from kernels prior to the 2.6 release and was done because of the following situations:
When the PCI driver is to be unloaded, the struct pci_driver needs to be unregistered from the kernel. This is done with a call to pci_unregister_driver. When this call happens, any PCI devices that were currently bound to this driver are removed, and the remove function for this PCI driver is called before the pci_unregister_driver function returns. static void _ _exit pci_skel_exit(void) { pci_unregister_driver(&pci_driver); } 12.1.6. Old-Style PCI ProbingIn older kernel versions, the function, pci_register_driver, was not always used by PCI drivers. Instead, they would either walk the list of PCI devices in the system by hand, or they would call a function that could search for a specific PCI device. The ability to walk the list of PCI devices in the system within a driver has been removed from the 2.6 kernel in order to prevent drivers from crashing the kernel if they happened to modify the PCI device lists while a device was being removed at the same time. If the ability to find a specific PCI device is really needed, the following functions are available:
The from argument is used to get hold of multiple devices with the same signature; the argument should point to the last device that has been found, so that the search can continue instead of restarting from the head of the list. To find the first device, from is specified as NULL. If no (further) device is found, NULL is returned. An example of how to use this function properly is: struct pci_dev *dev; dev = pci_get_device(PCI_VENDOR_FOO, PCI_DEVICE_FOO, NULL); if (dev) { /* Use the PCI device */ ... pci_dev_put(dev); } This function can not be called from interrupt context. If it is, a warning is printed out to the system log.
This function can not be called from interrupt context. If it is, a warning is printed out to the system log.
All of these functions can not be called from interrupt context. If they are, a warning is printed out to the system log. 12.1.7. Enabling the PCI DeviceIn the probe function for the PCI driver, before the driver can access any device resource (I/O region or interrupt) of the PCI device, the driver must call the pci_enable_device function:
12.1.8. Accessing the Configuration SpaceAfter the driver has detected the device, it usually needs to read from or write to the three address spaces: memory, port, and configuration. In particular, accessing the configuration space is vital to the driver, because it is the only way it can find out where the device is mapped in memory and in the I/O space. Because the microprocessor has no way to access the configuration space directly, the computer vendor has to provide a way to do it. To access configuration space, the CPU must write and read registers in the PCI controller, but the exact implementation is vendor dependent and not relevant to this discussion, because Linux offers a standard interface to access the configuration space. As far as the driver is concerned, the configuration space can be accessed through 8-bit, 16-bit, or 32-bit data transfers. The relevant functions are prototyped in <linux/pci.h>:
All of the previous functions are implemented as inline functions that really call the following functions. Feel free to use these functions instead of the above in case the driver does not have access to a struct pci_dev at any paticular moment in time:
The best way to address the configuration variables using the pci_read_ functions is by means of the symbolic names defined in <linux/pci.h>. For example, the following small function retrieves the revision ID of a device by passing the symbolic name for where to pci_read_config_byte: static unsigned char skel_get_revision(struct pci_dev *dev) { u8 revision; pci_read_config_byte(dev, PCI_REVISION_ID, &revision); return revision; } 12.1.9. Accessing the I/O and Memory SpacesA PCI device implements up to six I/O address regions. Each region consists of either memory or I/O locations. Most devices implement their I/O registers in memory regions, because it's generally a saner approach. However, unlike normal memory, I/O registers should not be cached by the CPU because each access can have side effects. The PCI device that implements I/O registers as a memory region marks the difference by setting a "memory-is-prefetchable" bit in its configuration register.[4] If the memory region is marked as prefetchable, the CPU can cache its contents and do all sorts of optimization with it; nonprefetchable memory access, on the other hand, can't be optimized because each access can have side effects, just as with I/O ports. Peripherals that map their control registers to a memory address range declare that range as nonprefetchable, whereas something like video memory on PCI boards is prefetchable. In this section, we use the word region to refer to a generic I/O address space that is memory-mapped or port-mapped.
An interface board reports the size and current location of its regions using configuration registers—the six 32-bit registers shown in Figure 12-2, whose symbolic names are PCI_BASE_ADDRESS_0 tHRough PCI_BASE_ADDRESS_5. Since the I/O space defined by PCI is a 32-bit address space, it makes sense to use the same configuration interface for memory and I/O. If the device uses a 64-bit address bus, it can declare regions in the 64-bit memory space by using two consecutive PCI_BASE_ADDRESS registers for each region, low bits first. It is possible for one device to offer both 32-bit regions and 64-bit regions. In the kernel, the I/O regions of PCI devices have been integrated into the generic resource management. For this reason, you don't need to access the configuration variables in order to know where your device is mapped in memory or I/O space. The preferred interface for getting region information consists of the following functions:
Resource flags are used to define some features of the individual resource. For PCI resources associated with PCI I/O regions, the information is extracted from the base address registers, but can come from elsewhere for resources not associated with PCI devices. All resource flags are defined in <linux/ioport.h>; the most important are:
By making use of the pci_resource_ functions, a device driver can completely ignore the underlying PCI registers, since the system already used them to structure resource information. 12.1.10. PCI InterruptsAs far as interrupts are concerned, PCI is easy to handle. By the time Linux boots, the computer's firmware has already assigned a unique interrupt number to the device, and the driver just needs to use it. The interrupt number is stored in configuration register 60 (PCI_INTERRUPT_LINE), which is one byte wide. This allows for as many as 256 interrupt lines, but the actual limit depends on the CPU being used. The driver doesn't need to bother checking the interrupt number, because the value found in PCI_INTERRUPT_LINE is guaranteed to be the right one. If the device doesn't support interrupts, register 61 (PCI_INTERRUPT_PIN) is 0; otherwise, it's nonzero. However, since the driver knows if its device is interrupt driven or not, it doesn't usually need to read PCI_INTERRUPT_PIN. Thus, PCI-specific code for dealing with interrupts just needs to read the configuration byte to obtain the interrupt number that is saved in a local variable, as shown in the following code. Beyond that, the information in Chapter 10 applies. result = pci_read_config_byte(dev, PCI_INTERRUPT_LINE, &myirq); if (result) { /* deal with error */ } The rest of this section provides additional information for the curious reader but isn't needed for writing drivers. A PCI connector has four interrupt pins, and peripheral boards can use any or all of them. Each pin is individually routed to the motherboard's interrupt controller, so interrupts can be shared without any electrical problems. The interrupt controller is then responsible for mapping the interrupt wires (pins) to the processor's hardware; this platform-dependent operation is left to the controller in order to achieve platform independence in the bus itself. The read-only configuration register located at PCI_INTERRUPT_PIN is used to tell the computer which single pin is actually used. It's worth remembering that each device board can host up to eight devices; each device uses a single interrupt pin and reports it in its own configuration register. Different devices on the same device board can use different interrupt pins or share the same one. The PCI_INTERRUPT_LINE register, on the other hand, is read/write. When the computer is booted, the firmware scans its PCI devices and sets the register for each device according to how the interrupt pin is routed for its PCI slot. The value is assigned by the firmware, because only the firmware knows how the motherboard routes the different interrupt pins to the processor. For the device driver, however, the PCI_INTERRUPT_LINE register is read-only. Interestingly, recent versions of the Linux kernel under some circumstances can assign interrupt lines without resorting to the BIOS. 12.1.11. Hardware AbstractionsWe complete the discussion of PCI by taking a quick look at how the system handles the plethora of PCI controllers available on the marketplace. This is just an informational section, meant to show the curious reader how the object-oriented layout of the kernel extends down to the lowest levels. The mechanism used to implement hardware abstraction is the usual structure containing methods. It's a powerful technique that adds just the minimal overhead of dereferencing a pointer to the normal overhead of a function call. In the case of PCI management, the only hardware-dependent operations are the ones that read and write configuration registers, because everything else in the PCI world is accomplished by directly reading and writing the I/O and memory address spaces, and those are under direct control of the CPU. Thus, the relevant structure for configuration register access includes only two fields: struct pci_ops { int (*read)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val); int (*write)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val); }; The structure is defined in <linux/pci.h> and used by drivers/pci/pci.c, where the actual public functions are defined. The two functions that act on the PCI configuration space have more overhead than dereferencing a pointer; they use cascading pointers due to the high object-orientedness of the code, but the overhead is not an issue in operations that are performed quite rarely and never in speed-critical paths. The actual implementation of pci_read_config_byte(dev, where, val), for instance, expands to: dev->bus->ops->read(bus, devfn, where, 8, val); The various PCI buses in the system are detected at system boot, and that's when the struct pci_bus items are created and associated with their features, including the ops field. Implementing hardware abstraction via "hardware operations" data structures is typical in the Linux kernel. One important example is the struct alpha_machine_vector data structure. It is defined in <asm-alpha/machvec.h> and takes care of everything that may change across different Alpha-based computers. |
⇦ prev | ⇱ home | next ⇨ |