8.2. Lookaside Caches
A device driver often ends up allocating
many objects of the same size, over and over. Given that the kernel
already maintains a set of memory pools of objects that are all the
same size, why not add some special pools for these high-volume
objects? In fact, the kernel does implement a facility to create this
sort of pool, which is often called a lookaside
cache. Device drivers normally do not exhibit the sort of
memory behavior that justifies using a lookaside cache, but there can
be exceptions; the USB and SCSI drivers in Linux 2.6 use caches.
The
cache manager in the Linux kernel is sometimes called the
"slab allocator." For that reason,
its functions and types are declared in
<linux/slab.h>. The slab allocator
implements caches that have a type of
kmem_cache_t; they are created with a call to
kmem_cache_create:
kmem_cache_t *kmem_cache_create(const char *name, size_t size,
size_t offset,
unsigned long flags,
void (*constructor)(void *, kmem_cache_t *,
unsigned long flags),
void (*destructor)(void *, kmem_cache_t *,
unsigned long flags));
The function creates a new cache object that can host any number of
memory areas all of the same size, specified by the
size argument. The name
argument is associated with this cache and functions as housekeeping
information usable in tracking problems; usually, it is set to the
name of the type of structure that is cached. The cache keeps a
pointer to the name, rather than copying it, so the driver should
pass in a pointer to a name in static storage (usually the name is
just a literal string). The name cannot contain blanks.
The offset is the offset of the first object in
the page; it can be used to ensure a particular alignment for the
allocated objects, but you most likely will use 0
to request the default value. flags controls how
allocation is done and is a bit mask of the following flags:
- SLAB_NO_REAP
-
Setting
this flag protects the cache from being reduced when the system is
looking for memory. Setting this flag is normally a bad idea; it is
important to avoid restricting the memory
allocator's freedom of action unnecessarily.
- SLAB_HWCACHE_ALIGN
-
This
flag requires each data object to
be
aligned to a cache line; actual alignment depends on the cache layout
of the host platform. This option can be a good choice if your cache
contains items that are frequently accessed on SMP machines. The
padding required to achieve cache line alignment can end up wasting
significant amounts of memory, however.
- SLAB_CACHE_DMA
-
This flag requires each data object to be
allocated in the DMA memory zone.
There is also a set of flags that can be used during the debugging of
cache allocations; see mm/slab.c for the
details. Usually, however, these flags are set globally via a kernel
configuration option on systems used for development.
The
constructor and destructor
arguments to the function are optional functions (but there can be no
destructor without a constructor); the former can be used to
initialize newly allocated objects, and the latter can be used to
"clean up" objects prior to their
memory being released back to the system as a whole.
Constructors and destructors can be useful,
but there are a few constraints that you should keep in mind. A
constructor is called when the memory for a set of objects is
allocated; because that memory may hold several objects, the
constructor may be called multiple times. You cannot assume that the
constructor will be called as an immediate effect of allocating an
object. Similarly, destructors can be called at some unknown future
time, not immediately after an object has been freed. Constructors
and destructors may or may not be allowed to sleep, according to
whether they are passed the SLAB_CTOR_ATOMIC flag
(where CTOR is short for
constructor).
For
convenience, a programmer can use the same function for both the
constructor and destructor; the slab allocator always passes the
SLAB_CTOR_CONSTRUCTOR flag when the callee is a
constructor.
Once
a cache of objects is created, you can allocate objects from it by
calling kmem_cache_alloc:
void *kmem_cache_alloc(kmem_cache_t *cache, int flags);
Here, the
cache
argument is the cache you have created previously; the flags are the
same as you would pass to kmalloc and are
consulted if kmem_cache_alloc needs to go out
and allocate more memory itself.
To free an object, use kmem_cache_free:
void kmem_cache_free(kmem_cache_t *cache, const void *obj);
When driver code is finished with the cache, typically when the
module is unloaded, it should free its cache as follows:
int kmem_cache_destroy(kmem_cache_t *cache);
The destroy operation succeeds only if all objects allocated from the
cache have been returned to it. Therefore, a module should check the
return status from kmem_cache_destroy; a failure
indicates some sort of memory leak within the module (since some of
the objects have been dropped).
One
side benefit to using lookaside caches is that the kernel maintains
statistics on cache usage. These statistics may be obtained from
/proc/slabinfo.
8.2.1. A scull Based on the Slab Caches: scullc
Time for an example.
scullc is a cut-down version of the
scull module that implements only the bare
device—the persistent memory region. Unlike
scull, which uses kmalloc,
scullc uses memory caches. The size of the
quantum can be modified at compile time and at load time, but not at
runtime—that would require creating a new memory cache, and we
didn't want to deal with these unneeded details.
scullc is a complete example that can be used to
try out the slab allocator. It differs from
scull only in a few lines of code. First, we
must declare our own slab cache:
/* declare one cache pointer: use it for all devices */
kmem_cache_t *scullc_cache;
The creation of the slab cache is handled (at module load time) in
this way:
/* scullc_init: create a cache for our quanta */
scullc_cache = kmem_cache_create("scullc", scullc_quantum,
0, SLAB_HWCACHE_ALIGN, NULL, NULL); /* no ctor/dtor */
if (!scullc_cache) {
scullc_cleanup( );
return -ENOMEM;
}
This is how it allocates memory quanta:
/* Allocate a quantum using the memory cache */
if (!dptr->data[s_pos]) {
dptr->data[s_pos] = kmem_cache_alloc(scullc_cache, GFP_KERNEL);
if (!dptr->data[s_pos])
goto nomem;
memset(dptr->data[s_pos], 0, scullc_quantum);
}
And these lines release memory:
for (i = 0; i < qset; i++)
if (dptr->data[i])
kmem_cache_free(scullc_cache, dptr->data[i]);
Finally, at module unload time, we have to return the cache to the
system:
/* scullc_cleanup: release the cache of our quanta */
if (scullc_cache)
kmem_cache_destroy(scullc_cache);
The main differences in passing from scull to
scullc are a slight speed improvement and better
memory use. Since quanta are allocated from a pool of memory
fragments of exactly the right size, their placement in memory is as
dense as possible, as opposed to scull quanta,
which bring in an unpredictable memory fragmentation.
8.2.2. Memory Pools
There are places in
the
kernel where memory allocations cannot be allowed to fail. As a way
of guaranteeing allocations in those situations, the kernel
developers created an abstraction known as a memory
pool (or "mempool"). A
memory pool is really just a form of a lookaside cache that tries to
always keep a list of free memory around for use in emergencies.
A memory pool has a type of mempool_t (defined in
<linux/mempool.h>); you can create one
with mempool_create:
mempool_t *mempool_create(int min_nr,
mempool_alloc_t *alloc_fn,
mempool_free_t *free_fn,
void *pool_data);
The min_nr argument is the minimum number of
allocated objects that the pool should always keep around. The actual
allocation and freeing of objects is handled by
alloc_fn and free_fn, which
have these prototypes:
typedef void *(mempool_alloc_t)(int gfp_mask, void *pool_data);
typedef void (mempool_free_t)(void *element, void *pool_data);
The final parameter to mempool_create
(pool_data) is passed to
alloc_fn and free_fn.
If need be, you can write special-purpose functions to handle memory
allocations for mempools. Usually, however, you just want to let the
kernel slab allocator handle that task for you. There are two
functions (mempool_alloc_slab and
mempool_free_slab) that perform the impedance
matching between the memory pool allocation prototypes and
kmem_cache_alloc and
kmem_cache_free. Thus, code that sets up memory
pools often looks like the following:
cache = kmem_cache_create(. . .);
pool = mempool_create(MY_POOL_MINIMUM,
mempool_alloc_slab, mempool_free_slab,
cache);
Once the pool has been created, objects can be allocated and freed
with:
void *mempool_alloc(mempool_t *pool, int gfp_mask);
void mempool_free(void *element, mempool_t *pool);
When the mempool is created, the allocation function will be called
enough times to create a pool of preallocated objects. Thereafter,
calls to mempool_alloc attempt to acquire
additional objects from the allocation function; should that
allocation fail, one of the preallocated objects (if any remain) is
returned. When an object is freed with
mempool_free, it is kept in the pool if the
number of preallocated objects is currently below the minimum;
otherwise, it is to be returned to the system.
A mempool can be resized with:
int mempool_resize(mempool_t *pool, int new_min_nr, int gfp_mask);
This call, if successful, resizes the pool to have at least
new_min_nr objects.
If you no longer need a memory pool, return it to the system with:
void mempool_destroy(mempool_t *pool);
You must return all allocated objects before destroying the mempool,
or a kernel oops results.
If you are considering using a mempool in your driver, please keep
one thing in mind: mempools allocate a chunk of memory that sits in a
list, idle and unavailable for any real use. It is easy to consume a
great deal of memory with mempools. In almost every case, the
preferred alternative is to do without the mempool and simply deal
with the possibility of allocation failures instead. If there is any
way for your driver to respond to an allocation failure in a way that
does not endanger the integrity of the system, do things that way.
Use of mempools in driver code should be rare.
|