Poster of Linux kernelThe best gift for a Linux geek
 Linux kernel map 
⇦ prev ⇱ home next ⇨

8.2. Lookaside Caches

A device driver often ends up allocating many objects of the same size, over and over. Given that the kernel already maintains a set of memory pools of objects that are all the same size, why not add some special pools for these high-volume objects? In fact, the kernel does implement a facility to create this sort of pool, which is often called a lookaside cache. Device drivers normally do not exhibit the sort of memory behavior that justifies using a lookaside cache, but there can be exceptions; the USB and SCSI drivers in Linux 2.6 use caches.

The cache manager in the Linux kernel is sometimes called the "slab allocator." For that reason, its functions and types are declared in <linux/slab.h>. The slab allocator implements caches that have a type of kmem_cache_t; they are created with a call to kmem_cache_create:

kmem_cache_t *kmem_cache_create(const char *name, size_t size,
                                size_t offset, 
                                unsigned long flags,
                                void (*constructor)(void *, kmem_cache_t *,
                                                    unsigned long flags),
                                void (*destructor)(void *, kmem_cache_t *,
                                                   unsigned long flags));

The function creates a new cache object that can host any number of memory areas all of the same size, specified by the size argument. The name argument is associated with this cache and functions as housekeeping information usable in tracking problems; usually, it is set to the name of the type of structure that is cached. The cache keeps a pointer to the name, rather than copying it, so the driver should pass in a pointer to a name in static storage (usually the name is just a literal string). The name cannot contain blanks.

The offset is the offset of the first object in the page; it can be used to ensure a particular alignment for the allocated objects, but you most likely will use 0 to request the default value. flags controls how allocation is done and is a bit mask of the following flags:

SLAB_NO_REAP

Setting this flag protects the cache from being reduced when the system is looking for memory. Setting this flag is normally a bad idea; it is important to avoid restricting the memory allocator's freedom of action unnecessarily.

SLAB_HWCACHE_ALIGN

This flag requires each data object to be aligned to a cache line; actual alignment depends on the cache layout of the host platform. This option can be a good choice if your cache contains items that are frequently accessed on SMP machines. The padding required to achieve cache line alignment can end up wasting significant amounts of memory, however.

SLAB_CACHE_DMA

This flag requires each data object to be allocated in the DMA memory zone.

There is also a set of flags that can be used during the debugging of cache allocations; see mm/slab.c for the details. Usually, however, these flags are set globally via a kernel configuration option on systems used for development.

The constructor and destructor arguments to the function are optional functions (but there can be no destructor without a constructor); the former can be used to initialize newly allocated objects, and the latter can be used to "clean up" objects prior to their memory being released back to the system as a whole.

Constructors and destructors can be useful, but there are a few constraints that you should keep in mind. A constructor is called when the memory for a set of objects is allocated; because that memory may hold several objects, the constructor may be called multiple times. You cannot assume that the constructor will be called as an immediate effect of allocating an object. Similarly, destructors can be called at some unknown future time, not immediately after an object has been freed. Constructors and destructors may or may not be allowed to sleep, according to whether they are passed the SLAB_CTOR_ATOMIC flag (where CTOR is short for constructor).

For convenience, a programmer can use the same function for both the constructor and destructor; the slab allocator always passes the SLAB_CTOR_CONSTRUCTOR flag when the callee is a constructor.

Once a cache of objects is created, you can allocate objects from it by calling kmem_cache_alloc:

void *kmem_cache_alloc(kmem_cache_t *cache, int flags);

Here, the cache argument is the cache you have created previously; the flags are the same as you would pass to kmalloc and are consulted if kmem_cache_alloc needs to go out and allocate more memory itself.

To free an object, use kmem_cache_free:

 void kmem_cache_free(kmem_cache_t *cache, const void *obj);

When driver code is finished with the cache, typically when the module is unloaded, it should free its cache as follows:

 int kmem_cache_destroy(kmem_cache_t *cache);

The destroy operation succeeds only if all objects allocated from the cache have been returned to it. Therefore, a module should check the return status from kmem_cache_destroy; a failure indicates some sort of memory leak within the module (since some of the objects have been dropped).

One side benefit to using lookaside caches is that the kernel maintains statistics on cache usage. These statistics may be obtained from /proc/slabinfo.

8.2.1. A scull Based on the Slab Caches: scullc

Time for an example. scullc is a cut-down version of the scull module that implements only the bare device—the persistent memory region. Unlike scull, which uses kmalloc, scullc uses memory caches. The size of the quantum can be modified at compile time and at load time, but not at runtime—that would require creating a new memory cache, and we didn't want to deal with these unneeded details.

scullc is a complete example that can be used to try out the slab allocator. It differs from scull only in a few lines of code. First, we must declare our own slab cache:

/* declare one cache pointer: use it for all devices */
kmem_cache_t *scullc_cache;

The creation of the slab cache is handled (at module load time) in this way:

/* scullc_init: create a cache for our quanta */
scullc_cache = kmem_cache_create("scullc", scullc_quantum,
        0, SLAB_HWCACHE_ALIGN, NULL, NULL); /* no ctor/dtor */
if (!scullc_cache) {
    scullc_cleanup(  );
    return -ENOMEM;
}

This is how it allocates memory quanta:

/* Allocate a quantum using the memory cache */
if (!dptr->data[s_pos]) {
    dptr->data[s_pos] = kmem_cache_alloc(scullc_cache, GFP_KERNEL);
    if (!dptr->data[s_pos])
        goto nomem;
    memset(dptr->data[s_pos], 0, scullc_quantum);
}

And these lines release memory:

for (i = 0; i < qset; i++)
if (dptr->data[i])
        kmem_cache_free(scullc_cache, dptr->data[i]);

Finally, at module unload time, we have to return the cache to the system:

/* scullc_cleanup: release the cache of our quanta */
if (scullc_cache)
    kmem_cache_destroy(scullc_cache);

The main differences in passing from scull to scullc are a slight speed improvement and better memory use. Since quanta are allocated from a pool of memory fragments of exactly the right size, their placement in memory is as dense as possible, as opposed to scull quanta, which bring in an unpredictable memory fragmentation.

8.2.2. Memory Pools

There are places in the kernel where memory allocations cannot be allowed to fail. As a way of guaranteeing allocations in those situations, the kernel developers created an abstraction known as a memory pool (or "mempool"). A memory pool is really just a form of a lookaside cache that tries to always keep a list of free memory around for use in emergencies.

A memory pool has a type of mempool_t (defined in <linux/mempool.h>); you can create one with mempool_create:

mempool_t *mempool_create(int min_nr, 
                          mempool_alloc_t *alloc_fn,
                          mempool_free_t *free_fn, 
                          void *pool_data);

The min_nr argument is the minimum number of allocated objects that the pool should always keep around. The actual allocation and freeing of objects is handled by alloc_fn and free_fn, which have these prototypes:

typedef void *(mempool_alloc_t)(int gfp_mask, void *pool_data);
typedef void (mempool_free_t)(void *element, void *pool_data);

The final parameter to mempool_create (pool_data) is passed to alloc_fn and free_fn.

If need be, you can write special-purpose functions to handle memory allocations for mempools. Usually, however, you just want to let the kernel slab allocator handle that task for you. There are two functions (mempool_alloc_slab and mempool_free_slab) that perform the impedance matching between the memory pool allocation prototypes and kmem_cache_alloc and kmem_cache_free. Thus, code that sets up memory pools often looks like the following:

cache = kmem_cache_create(. . .);
pool = mempool_create(MY_POOL_MINIMUM,
                      mempool_alloc_slab, mempool_free_slab,
                      cache);

Once the pool has been created, objects can be allocated and freed with:

void *mempool_alloc(mempool_t *pool, int gfp_mask);
void mempool_free(void *element, mempool_t *pool);

When the mempool is created, the allocation function will be called enough times to create a pool of preallocated objects. Thereafter, calls to mempool_alloc attempt to acquire additional objects from the allocation function; should that allocation fail, one of the preallocated objects (if any remain) is returned. When an object is freed with mempool_free, it is kept in the pool if the number of preallocated objects is currently below the minimum; otherwise, it is to be returned to the system.

A mempool can be resized with:

int mempool_resize(mempool_t *pool, int new_min_nr, int gfp_mask);

This call, if successful, resizes the pool to have at least new_min_nr objects.

If you no longer need a memory pool, return it to the system with:

void mempool_destroy(mempool_t *pool);

You must return all allocated objects before destroying the mempool, or a kernel oops results.

If you are considering using a mempool in your driver, please keep one thing in mind: mempools allocate a chunk of memory that sits in a list, idle and unavailable for any real use. It is easy to consume a great deal of memory with mempools. In almost every case, the preferred alternative is to do without the mempool and simply deal with the possibility of allocation failures instead. If there is any way for your driver to respond to an allocation failure in a way that does not endanger the integrity of the system, do things that way. Use of mempools in driver code should be rare.

    ⇦ prev ⇱ home next ⇨
    Poster of Linux kernelThe best gift for a Linux geek