Poster of Linux kernelThe best gift for a Linux geek
 Linux kernel map 
⇦ prev ⇱ home next ⇨

7.6. Workqueues

Workqueues are, superficially, similar to tasklets; they allow kernel code to request that a function be called at some future time. There are, however, some significant differences between the two, including:

  • Tasklets run in software interrupt context with the result that all tasklet code must be atomic. Instead, workqueue functions run in the context of a special kernel process; as a result, they have more flexibility. In particular, workqueue functions can sleep.

  • Tasklets always run on the processor from which they were originally submitted. Workqueues work in the same way, by default.

  • Kernel code can request that the execution of workqueue functions be delayed for an explicit interval.

The key difference between the two is that tasklets execute quickly, for a short period of time, and in atomic mode, while workqueue functions may have higher latency but need not be atomic. Each mechanism has situations where it is appropriate.

Workqueues have a type of struct workqueue_struct, which is defined in <linux/workqueue.h>. A workqueue must be explicitly created before use, using one of the following two functions:

struct workqueue_struct *create_workqueue(const char *name);
struct workqueue_struct *create_singlethread_workqueue(const char *name);

Each workqueue has one or more dedicated processes ("kernel threads"), which run functions submitted to the queue. If you use create_workqueue, you get a workqueue that has a dedicated thread for each processor on the system. In many cases, all those threads are simply overkill; if a single worker thread will suffice, create the workqueue with create_singlethread_workqueue instead.

To submit a task to a workqueue, you need to fill in a work_struct structure. This can be done at compile time as follows:

DECLARE_WORK(name, void (*function)(void *), void *data);

Where name is the name of the structure to be declared, function is the function that is to be called from the workqueue, and data is a value to pass to that function. If you need to set up the work_struct structure at runtime, use the following two macros:

INIT_WORK(struct work_struct *work, void (*function)(void *), void *data);
PREPARE_WORK(struct work_struct *work, void (*function)(void *), void *data);

INIT_WORK does a more thorough job of initializing the structure; you should use it the first time that structure is set up. PREPARE_WORK does almost the same job, but it does not initialize the pointers used to link the work_struct structure into the workqueue. If there is any possibility that the structure may currently be submitted to a workqueue, and you need to change that structure, use PREPARE_WORK rather than INIT_WORK.

There are two functions for submitting work to a workqueue:

int queue_work(struct workqueue_struct *queue, struct work_struct *work);
int queue_delayed_work(struct workqueue_struct *queue, 
                       struct work_struct *work, unsigned long delay);

Either one adds work to the given queue. If queue_delayed_work is used, however, the actual work is not performed until at least delay jiffies have passed. The return value from these functions is 0 if the work was successfully added to the queue; a nonzero result means that this work_struct structure was already waiting in the queue, and was not added a second time.

At some time in the future, the work function will be called with the given data value. The function will be running in the context of the worker thread, so it can sleep if need be—although you should be aware of how that sleep might affect any other tasks submitted to the same workqueue. What the function cannot do, however, is access user space. Since it is running inside a kernel thread, there simply is no user space to access.

Should you need to cancel a pending workqueue entry, you may call:

int cancel_delayed_work(struct work_struct *work);

The return value is nonzero if the entry was canceled before it began execution. The kernel guarantees that execution of the given entry will not be initiated after a call to cancel_delayed_work. If cancel_delayed_work returns 0, however, the entry may have already been running on a different processor, and might still be running after a call to cancel_delayed_work. To be absolutely sure that the work function is not running anywhere in the system after cancel_delayed_work returns 0, you must follow that call with a call to:

void flush_workqueue(struct workqueue_struct *queue);

After flush_workqueue returns, no work function submitted prior to the call is running anywhere in the system.

When you are done with a workqueue, you can get rid of it with:

void destroy_workqueue(struct workqueue_struct *queue);

7.6.1. The Shared Queue

A device driver, in many cases, does not need its own workqueue. If you only submit tasks to the queue occasionally, it may be more efficient to simply use the shared, default workqueue that is provided by the kernel. If you use this queue, however, you must be aware that you will be sharing it with others. Among other things, that means that you should not monopolize the queue for long periods of time (no long sleeps), and it may take longer for your tasks to get their turn in the processor.

The jiq ("just in queue") module exports two files that demonstrate the use of the shared workqueue. They use a single work_struct structure, which is set up this way:

static struct work_struct jiq_work;

    /* this line is in jiq_init(  ) */
    INIT_WORK(&jiq_work, jiq_print_wq, &jiq_data);

When a process reads /proc/jiqwq, the module initiates a series of trips through the shared workqueue with no delay. The function it uses is:

int schedule_work(struct work_struct *work);

Note that a different function is used when working with the shared queue; it requires only the work_struct structure for an argument. The actual code in jiq looks like this:

prepare_to_wait(&jiq_wait, &wait, TASK_INTERRUPTIBLE);
schedule_work(&jiq_work);
schedule(  );
finish_wait(&jiq_wait, &wait);

The actual work function prints out a line just like the jit module does, then, if need be, resubmits the work_struct structure into the workqueue. Here is jiq_print_wq in its entirety:

static void jiq_print_wq(void *ptr)
{
    struct clientdata *data = (struct clientdata *) ptr;
    
    if (! jiq_print (ptr))
        return;
    
    if (data->delay)
        schedule_delayed_work(&jiq_work, data->delay);
    else
        schedule_work(&jiq_work);
}

If the user is reading the delayed device (/proc/jiqwqdelay), the work function resubmits itself in the delayed mode with schedule_delayed_work:

int schedule_delayed_work(struct work_struct *work, unsigned long delay);

If you look at the output from these two devices, it looks something like:

% cat /proc/jiqwq
    time  delta preempt   pid cpu command
  1113043     0       0     7   1 events/1
  1113043     0       0     7   1 events/1
  1113043     0       0     7   1 events/1
  1113043     0       0     7   1 events/1
  1113043     0       0     7   1 events/1
% cat /proc/jiqwqdelay
    time  delta preempt   pid cpu command
  1122066     1       0     6   0 events/0
  1122067     1       0     6   0 events/0
  1122068     1       0     6   0 events/0
  1122069     1       0     6   0 events/0
  1122070     1       0     6   0 events/0

When /proc/jiqwq is read, there is no obvious delay between the printing of each line. When, instead, /proc/jiqwqdelay is read, there is a delay of exactly one jiffy between each line. In either case, we see the same process name printed; it is the name of the kernel thread that implements the shared workqueue. The CPU number is printed after the slash; we never know which CPU will be running when the /proc file is read, but the work function will always run on the same processor thereafter.

If you need to cancel a work entry submitted to the shared queue, you may use cancel_delayed_work, as described above. Flushing the shared workqueue requires a separate function, however:

void flush_scheduled_work(void);

Since you do not know who else might be using this queue, you never really know how long it might take for flush_scheduled_work to return.

    ⇦ prev ⇱ home next ⇨
    Poster of Linux kernelThe best gift for a Linux geek