Skip to content

Commit

Permalink
mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
Browse files Browse the repository at this point in the history
In most configurations, kmalloc() happens to return naturally aligned
(i.e.  aligned to the block size itself) blocks for power of two sizes.

That means some kmalloc() users might unknowingly rely on that
alignment, until stuff breaks when the kernel is built with e.g.
CONFIG_SLUB_DEBUG or CONFIG_SLOB, and blocks stop being aligned.  Then
developers have to devise workaround such as own kmem caches with
specified alignment [1], which is not always practical, as recently
evidenced in [2].

The topic has been discussed at LSF/MM 2019 [3].  Adding a
'kmalloc_aligned()' variant would not help with code unknowingly relying
on the implicit alignment.  For slab implementations it would either
require creating more kmalloc caches, or allocate a larger size and only
give back part of it.  That would be wasteful, especially with a generic
alignment parameter (in contrast with a fixed alignment to size).

Ideally we should provide to mm users what they need without difficult
workarounds or own reimplementations, so let's make the kmalloc()
alignment to size explicitly guaranteed for power-of-two sizes under all
configurations.  What this means for the three available allocators?

* SLAB object layout happens to be mostly unchanged by the patch.  The
  implicitly provided alignment could be compromised with
  CONFIG_DEBUG_SLAB due to redzoning, however SLAB disables redzoning for
  caches with alignment larger than unsigned long long.  Practically on at
  least x86 this includes kmalloc caches as they use cache line alignment,
  which is larger than that.  Still, this patch ensures alignment on all
  arches and cache sizes.

* SLUB layout is also unchanged unless redzoning is enabled through
  CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache.
  With this patch, explicit alignment is guaranteed with redzoning as
  well.  This will result in more memory being wasted, but that should be
  acceptable in a debugging scenario.

* SLOB has no implicit alignment so this patch adds it explicitly for
  kmalloc().  The potential downside is increased fragmentation.  While
  pathological allocation scenarios are certainly possible, in my testing,
  after booting a x86_64 kernel+userspace with virtme, around 16MB memory
  was consumed by slab pages both before and after the patch, with
  difference in the noise.

[1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
[2] https://lore.kernel.org/linux-fsdevel/[email protected]/
[3] https://lwn.net/Articles/787740/

[[email protected]: documentation fixlet, per Matthew]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: "Darrick J . Wong" <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
tehcaster authored and torvalds committed Oct 7, 2019
1 parent 6a486c0 commit 59bb479
Show file tree
Hide file tree
Showing 4 changed files with 49 additions and 12 deletions.
4 changes: 4 additions & 0 deletions Documentation/core-api/memory-allocation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,10 @@ limited. The actual limit depends on the hardware and the kernel
configuration, but it is a good practice to use `kmalloc` for objects
smaller than page size.

The address of a chunk allocated with `kmalloc` is aligned to at least
ARCH_KMALLOC_MINALIGN bytes. For sizes which are a power of two, the
alignment is also guaranteed to be at least the respective size.

For large allocations you can use :c:func:`vmalloc` and
:c:func:`vzalloc`, or directly request pages from the page
allocator. The memory allocated by `vmalloc` and related functions is
Expand Down
4 changes: 4 additions & 0 deletions include/linux/slab.h
Original file line number Diff line number Diff line change
Expand Up @@ -493,6 +493,10 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
* kmalloc is the normal method of allocating memory
* for objects smaller than page size in the kernel.
*
* The allocated object address is aligned to at least ARCH_KMALLOC_MINALIGN
* bytes. For @size of power of two bytes, the alignment is also guaranteed
* to be at least to the size.
*
* The @flags argument may be one of the GFP flags defined at
* include/linux/gfp.h and described at
* :ref:`Documentation/core-api/mm-api.rst <mm-api-gfp-flags>`
Expand Down
11 changes: 10 additions & 1 deletion mm/slab_common.c
Original file line number Diff line number Diff line change
Expand Up @@ -1030,10 +1030,19 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
unsigned int useroffset, unsigned int usersize)
{
int err;
unsigned int align = ARCH_KMALLOC_MINALIGN;

s->name = name;
s->size = s->object_size = size;
s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);

/*
* For power of two sizes, guarantee natural alignment for kmalloc
* caches, regardless of SL*B debugging options.
*/
if (is_power_of_2(size))
align = max(align, size);
s->align = calculate_alignment(flags, align, size);

s->useroffset = useroffset;
s->usersize = usersize;

Expand Down
42 changes: 31 additions & 11 deletions mm/slob.c
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,7 @@ static void slob_free_pages(void *b, int order)
* @sp: Page to look in.
* @size: Size of the allocation.
* @align: Allocation alignment.
* @align_offset: Offset in the allocated block that will be aligned.
* @page_removed_from_list: Return parameter.
*
* Tries to find a chunk of memory at least @size bytes big within @page.
Expand All @@ -234,7 +235,7 @@ static void slob_free_pages(void *b, int order)
* true (set to false otherwise).
*/
static void *slob_page_alloc(struct page *sp, size_t size, int align,
bool *page_removed_from_list)
int align_offset, bool *page_removed_from_list)
{
slob_t *prev, *cur, *aligned = NULL;
int delta = 0, units = SLOB_UNITS(size);
Expand All @@ -243,8 +244,17 @@ static void *slob_page_alloc(struct page *sp, size_t size, int align,
for (prev = NULL, cur = sp->freelist; ; prev = cur, cur = slob_next(cur)) {
slobidx_t avail = slob_units(cur);

/*
* 'aligned' will hold the address of the slob block so that the
* address 'aligned'+'align_offset' is aligned according to the
* 'align' parameter. This is for kmalloc() which prepends the
* allocated block with its size, so that the block itself is
* aligned when needed.
*/
if (align) {
aligned = (slob_t *)ALIGN((unsigned long)cur, align);
aligned = (slob_t *)
(ALIGN((unsigned long)cur + align_offset, align)
- align_offset);
delta = aligned - cur;
}
if (avail >= units + delta) { /* room enough? */
Expand Down Expand Up @@ -288,7 +298,8 @@ static void *slob_page_alloc(struct page *sp, size_t size, int align,
/*
* slob_alloc: entry point into the slob allocator.
*/
static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
int align_offset)
{
struct page *sp;
struct list_head *slob_list;
Expand Down Expand Up @@ -319,7 +330,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
if (sp->units < SLOB_UNITS(size))
continue;

b = slob_page_alloc(sp, size, align, &page_removed_from_list);
b = slob_page_alloc(sp, size, align, align_offset, &page_removed_from_list);
if (!b)
continue;

Expand Down Expand Up @@ -356,7 +367,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
INIT_LIST_HEAD(&sp->slab_list);
set_slob(b, SLOB_UNITS(PAGE_SIZE), b + SLOB_UNITS(PAGE_SIZE));
set_slob_page_free(sp, slob_list);
b = slob_page_alloc(sp, size, align, &_unused);
b = slob_page_alloc(sp, size, align, align_offset, &_unused);
BUG_ON(!b);
spin_unlock_irqrestore(&slob_lock, flags);
}
Expand Down Expand Up @@ -458,27 +469,36 @@ static __always_inline void *
__do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
{
unsigned int *m;
int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
int minalign = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
void *ret;

gfp &= gfp_allowed_mask;

fs_reclaim_acquire(gfp);
fs_reclaim_release(gfp);

if (size < PAGE_SIZE - align) {
if (size < PAGE_SIZE - minalign) {
int align = minalign;

/*
* For power of two sizes, guarantee natural alignment for
* kmalloc()'d objects.
*/
if (is_power_of_2(size))
align = max(minalign, (int) size);

if (!size)
return ZERO_SIZE_PTR;

m = slob_alloc(size + align, gfp, align, node);
m = slob_alloc(size + minalign, gfp, align, node, minalign);

if (!m)
return NULL;
*m = size;
ret = (void *)m + align;
ret = (void *)m + minalign;

trace_kmalloc_node(caller, ret,
size, size + align, gfp, node);
size, size + minalign, gfp, node);
} else {
unsigned int order = get_order(size);

Expand Down Expand Up @@ -579,7 +599,7 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
fs_reclaim_release(flags);

if (c->size < PAGE_SIZE) {
b = slob_alloc(c->size, flags, c->align, node);
b = slob_alloc(c->size, flags, c->align, node, 0);
trace_kmem_cache_alloc_node(_RET_IP_, b, c->object_size,
SLOB_UNITS(c->size) * SLOB_UNIT,
flags, node);
Expand Down

0 comments on commit 59bb479

Please sign in to comment.