17.5. Packet Transmission

The most important tasks performed by network interfaces are data transmission and reception. We start with transmission because it is slightly easier to understand.

Transmission refers to the act of sending a packet over a network link. Whenever the kernel needs to transmit a data packet, it calls the driver's hard_start_transmit method to put the data on an outgoing queue. Each packet handled by the kernel is contained in a socket buffer structure (struct sk_buff), whose definition is found in <linux/skbuff.h>. The structure gets its name from the Unix abstraction used to represent a network connection, the socket. Even if the interface has nothing to do with sockets, each network packet belongs to a socket in the higher network layers, and the input/output buffers of any socket are lists of struct sk_buff structures. The same sk_buff structure is used to host network data throughout all the Linux network subsystems, but a socket buffer is just a packet as far as the interface is concerned.

A pointer to sk_buff is usually called skb, and we follow this practice both in the sample code and in the text.

The socket buffer is a complex structure, and the kernel offers a number of functions to act on it. The functions are described later in Section 17.10; for now, a few basic facts about sk_buff are enough for us to write a working driver.

The socket buffer passed to hard_start_xmit contains the physical packet as it should appear on the media, complete with the transmission-level headers. The interface doesn't need to modify the data being transmitted. skb->data points to the packet being transmitted, and skb->len is its length in octets. This situation gets a little more complicated if your driver can handle scatter/gather I/O; we get to that in Section 17.5.3.

The snull packet transmission code follows; the physical transmission machinery has been isolated in another function, because every interface driver must implement it according to the specific hardware being driven:

int snull_tx(struct sk_buff *skb, struct net_device *dev)
{
    int len;
    char *data, shortpkt[ETH_ZLEN];
    struct snull_priv *priv = netdev_priv(dev);
    
    data = skb->data;
    len = skb->len;
    if (len < ETH_ZLEN) {
        memset(shortpkt, 0, ETH_ZLEN);
        memcpy(shortpkt, skb->data, skb->len);
        len = ETH_ZLEN;
        data = shortpkt;
    }
    dev->trans_start = jiffies; /* save the timestamp */

    /* Remember the skb, so we can free it at interrupt time */
    priv->skb = skb;

    /* actual deliver of data is device-specific, and not shown here */
    snull_hw_tx(data, len, dev);

    return 0; /* Our simple device can not fail */
}

The transmission function, thus, just performs some sanity checks on the packet and transmits the data through the hardware-related function. Do note, however, the care that is taken when the packet to be transmitted is shorter than the minimum length supported by the underlying media (which, for snull, is our virtual "Ethernet"). Many Linux network drivers (and those for other operating systems as well) have been found to leak data in such situations. Rather than create that sort of security vulnerability, we copy short packets into a separate array that we can explicitly zero-pad out to the full length required by the media. (We can safely put that data on the stack, since the minimum length—60 bytes—is quite small).

The return value from hard_start_xmit should be 0 on success; at that point, your driver has taken responsibility for the packet, should make its best effort to ensure that transmission succeeds, and must free the skb at the end. A nonzero return value indicates that the packet could not be transmitted at this time; the kernel will retry later. In this situation, your driver should stop the queue until whatever situation caused the failure has been resolved.

The "hardware-related" transmission function (snull_hw_tx) is omitted here since it is entirely occupied with implementing the trickery of the snull device, including manipulating the source and destination addresses, and has little of interest to authors of real network drivers. It is present, of course, in the sample source for those who want to go in and see how it works.

17.5.1. Controlling Transmission Concurrency

The hard_start_xmit function is protected from concurrent calls by a spinlock (xmit_lock) in the net_device structure. As soon as the function returns, however, it may be called again. The function returns when the software is done instructing the hardware about packet transmission, but hardware transmission will likely not have been completed. This is not an issue with snull, which does all of its work using the CPU, so packet transmission is complete before the transmission function returns.

Real hardware interfaces, on the other hand, transmit packets asynchronously and have a limited amount of memory available to store outgoing packets. When that memory is exhausted (which, for some hardware, happens with a single outstanding packet to transmit), the driver needs to tell the networking system not to start any more transmissions until the hardware is ready to accept new data.

This notification is accomplished by calling netif_stop_queue, the function introduced earlier to stop the queue. Once your driver has stopped its queue, it must arrange to restart the queue at some point in the future, when it is again able to accept packets for transmission. To do so, it should call:

void netif_wake_queue(struct net_device *dev);

This function is just like netif_start_queue, except that it also pokes the networking system to make it start transmitting packets again.

Most modern network hardware maintains an internal queue with multiple packets to transmit; in this way it can get the best performance from the network. Network drivers for these devices must support having multiple transmisions outstanding at any given time, but device memory can fill up whether or not the hardware supports multiple outstanding transmissions. Whenever device memory fills to the point that there is no room for the largest possible packet, the driver should stop the queue until space becomes available again.

If you must disable packet transmission from anywhere other than your hard_start_xmit function (in response to a reconfiguration request, perhaps), the function you want to use is:

void netif_tx_disable(struct net_device *dev);

This function behaves much like netif_stop_queue, but it also ensures that, when it returns, your hard_start_xmit method is not running on another CPU. The queue can be restarted with netif_wake_queue, as usual.

17.5.2. Transmission Timeouts

Most drivers that deal with real hardware have to be prepared for that hardware to fail to respond occasionally. Interfaces can forget what they are doing, or the system can lose an interrupt. This sort of problem is common with some devices designed to run on personal computers.

Many drivers handle this problem by setting timers; if the operation has not completed by the time the timer expires, something is wrong. The network system, as it happens, is essentially a complicated assembly of state machines controlled by a mass of timers. As such, the networking code is in a good position to detect transmission timeouts as part of its regular operation.

Thus, network drivers need not worry about detecting such problems themselves. Instead, they need only set a timeout period, which goes in the watchdog_timeo field of the net_device structure. This period, which is in jiffies, should be long enough to account for normal transmission delays (such as collisions caused by congestion on the network media).

If the current system time exceeds the device's TRans_start time by at least the timeout period, the networking layer eventually calls the driver's tx_timeout method. That method's job is to do whatever is needed to clear up the problem and to ensure the proper completion of any transmissions that were already in progress. It is important, in particular, that the driver not lose track of any socket buffers that have been entrusted to it by the networking code.

snull has the ability to simulate transmitter lockups, which is controlled by two load-time parameters:

static int lockup = 0;
module_param(lockup, int, 0);

static int timeout = SNULL_TIMEOUT;
module_param(timeout, int, 0);

If the driver is loaded with the parameter lockup=n, a lockup is simulated once every n packets transmitted, and the watchdog_timeo field is set to the given timeout value. When simulating lockups, snull also calls netif_stop_queue to prevent other transmission attempts from occurring.

The snull transmission timeout handler looks like this:

void snull_tx_timeout (struct net_device *dev)
{
    struct snull_priv *priv = netdev_priv(dev);

    PDEBUG("Transmit timeout at %ld, latency %ld\n", jiffies,
            jiffies - dev->trans_start);
        /* Simulate a transmission interrupt to get things moving */
    priv->status = SNULL_TX_INTR;
    snull_interrupt(0, dev, NULL);
    priv->stats.tx_errors++;
    netif_wake_queue(dev);
    return;
}

When a transmission timeout happens, the driver must mark the error in the interface statistics and arrange for the device to be reset to a sane state so that new packets can be transmitted. When a timeout happens in snull, the driver calls snull_interrupt to fill in the "missing" interrupt and restarts the transmit queue with netif_wake_queue.

17.5.3. Scatter/Gather I/O

The process of creating a packet for transmission on the network involves assembling multiple pieces. Packet data must often be copied in from user space, and the headers used by various levels of the network stack must be added as well. This assembly can require a fair amount of data copying. If, however, the network interface that is destined to transmit the packet can perform scatter/gather I/O, the packet need not be assembled into a single chunk, and much of that copying can be avoided. Scatter/gather I/O also enables "zero-copy" transmission of network data directly from user-space buffers.

The kernel does not pass scattered packets to your hard_start_xmit method unless the NETIF_F_SG bit has been set in the features field of your device structure. If you have set that flag, you need to look at a special "shared info" field within the skb to see whether the packet is made up of a single fragment or many and to find the scattered fragments if need be. A special macro exists to access this information; it is called skb_shinfo. The first step when transmitting potentially fragmented packets usually looks something like this:

if (skb_shinfo(skb)->nr_frags =  = 0) {
    /* Just use skb->data and skb->len as usual */
}

The nr_frags field tells how many fragments have been used to build the packet. If it is 0, the packet exists in a single piece and can be accessed via the data field as usual. If, however, it is nonzero, your driver must pass through and arrange to transfer each individual fragment. The data field of the skb structure points conveniently to the first fragment (as compared to the full packet, as in the unfragmented case). The length of the fragment must be calculated by subtracting skb->data_len from skb->len (which still contains the length of the full packet). The remaining fragments are to be found in an array called frags in the shared information structure; each entry in frags is an skb_frag_struct structure:

struct skb_frag_struct {
    struct page *page;
    _ _u16 page_offset;
    _ _u16 size;
};

As you can see, we are once again dealing with page structures, rather than kernel virtual addresses. Your driver should loop through the fragments, mapping each for a DMA transfer and not forgetting the first fragment, which is pointed to by the skb directly. Your hardware, of course, must assemble the fragments and transmit them as a single packet. Note that, if you have set the NETIF_F_HIGHDMA feature flag, some or all of the fragments may be located in high memory.

⇦ prev

⇱ home

next ⇨