Skip to content

Commit

Permalink
kasan, mm: change hooks signatures
Browse files Browse the repository at this point in the history
Patch series "kasan: add software tag-based mode for arm64", v13.

This patchset adds a new software tag-based mode to KASAN [1].  (Initially
this mode was called KHWASAN, but it got renamed, see the naming rationale
at the end of this section).

The plan is to implement HWASan [2] for the kernel with the incentive,
that it's going to have comparable to KASAN performance, but in the same
time consume much less memory, trading that off for somewhat imprecise bug
detection and being supported only for arm64.

The underlying ideas of the approach used by software tag-based KASAN are:

1. By using the Top Byte Ignore (TBI) arm64 CPU feature, we can store
   pointer tags in the top byte of each kernel pointer.

2. Using shadow memory, we can store memory tags for each chunk of kernel
   memory.

3. On each memory allocation, we can generate a random tag, embed it into
   the returned pointer and set the memory tags that correspond to this
   chunk of memory to the same value.

4. By using compiler instrumentation, before each memory access we can add
   a check that the pointer tag matches the tag of the memory that is being
   accessed.

5. On a tag mismatch we report an error.

With this patchset the existing KASAN mode gets renamed to generic KASAN,
with the word "generic" meaning that the implementation can be supported
by any architecture as it is purely software.

The new mode this patchset adds is called software tag-based KASAN.  The
word "tag-based" refers to the fact that this mode uses tags embedded into
the top byte of kernel pointers and the TBI arm64 CPU feature that allows
to dereference such pointers.  The word "software" here means that shadow
memory manipulation and tag checking on pointer dereference is done in
software.  As it is the only tag-based implementation right now, "software
tag-based" KASAN is sometimes referred to as simply "tag-based" in this
patchset.

A potential expansion of this mode is a hardware tag-based mode, which
would use hardware memory tagging support (announced by Arm [3]) instead
of compiler instrumentation and manual shadow memory manipulation.

Same as generic KASAN, software tag-based KASAN is strictly a debugging
feature.

[1] https://www.kernel.org/doc/html/latest/dev-tools/kasan.html

[2] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html

[3] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture-2018-developments-armv85a

====== Rationale

On mobile devices generic KASAN's memory usage is significant problem.
One of the main reasons to have tag-based KASAN is to be able to perform a
similar set of checks as the generic one does, but with lower memory
requirements.

Comment from Vishwath Mohan <[email protected]>:

I don't have data on-hand, but anecdotally both ASAN and KASAN have proven
problematic to enable for environments that don't tolerate the increased
memory pressure well.  This includes

(a) Low-memory form factors - Wear, TV, Things, lower-tier phones like Go,
(c) Connected components like Pixel's visual core [1].

These are both places I'd love to have a low(er) memory footprint option at
my disposal.

Comment from Evgenii Stepanov <[email protected]>:

Looking at a live Android device under load, slab (according to
/proc/meminfo) + kernel stack take 8-10% available RAM (~350MB).  KASAN's
overhead of 2x - 3x on top of it is not insignificant.

Not having this overhead enables near-production use - ex.  running
KASAN/KHWASAN kernel on a personal, daily-use device to catch bugs that do
not reproduce in test configuration.  These are the ones that often cost
the most engineering time to track down.

CPU overhead is bad, but generally tolerable.  RAM is critical, in our
experience.  Once it gets low enough, OOM-killer makes your life
miserable.

[1] https://www.blog.google/products/pixel/pixel-visual-core-image-processing-and-machine-learning-pixel-2/

====== Technical details

Software tag-based KASAN mode is implemented in a very similar way to the
generic one. This patchset essentially does the following:

1. TCR_TBI1 is set to enable Top Byte Ignore.

2. Shadow memory is used (with a different scale, 1:16, so each shadow
   byte corresponds to 16 bytes of kernel memory) to store memory tags.

3. All slab objects are aligned to shadow scale, which is 16 bytes.

4. All pointers returned from the slab allocator are tagged with a random
   tag and the corresponding shadow memory is poisoned with the same value.

5. Compiler instrumentation is used to insert tag checks. Either by
   calling callbacks or by inlining them (CONFIG_KASAN_OUTLINE and
   CONFIG_KASAN_INLINE flags are reused).

6. When a tag mismatch is detected in callback instrumentation mode
   KASAN simply prints a bug report. In case of inline instrumentation,
   clang inserts a brk instruction, and KASAN has it's own brk handler,
   which reports the bug.

7. The memory in between slab objects is marked with a reserved tag, and
   acts as a redzone.

8. When a slab object is freed it's marked with a reserved tag.

Bug detection is imprecise for two reasons:

1. We won't catch some small out-of-bounds accesses, that fall into the
   same shadow cell, as the last byte of a slab object.

2. We only have 1 byte to store tags, which means we have a 1/256
   probability of a tag match for an incorrect access (actually even
   slightly less due to reserved tag values).

Despite that there's a particular type of bugs that tag-based KASAN can
detect compared to generic KASAN: use-after-free after the object has been
allocated by someone else.

====== Testing

Some kernel developers voiced a concern that changing the top byte of
kernel pointers may lead to subtle bugs that are difficult to discover.
To address this concern deliberate testing has been performed.

It doesn't seem feasible to do some kind of static checking to find
potential issues with pointer tagging, so a dynamic approach was taken.
All pointer comparisons/subtractions have been instrumented in an LLVM
compiler pass and a kernel module that would print a bug report whenever
two pointers with different tags are being compared/subtracted (ignoring
comparisons with NULL pointers and with pointers obtained by casting an
error code to a pointer type) has been used.  Then the kernel has been
booted in QEMU and on an Odroid C2 board and syzkaller has been run.

This yielded the following results.

The two places that look interesting are:

is_vmalloc_addr in include/linux/mm.h
is_kernel_rodata in mm/util.c

Here we compare a pointer with some fixed untagged values to make sure
that the pointer lies in a particular part of the kernel address space.
Since tag-based KASAN doesn't add tags to pointers that belong to rodata
or vmalloc regions, this should work as is.  To make sure debug checks to
those two functions that check that the result doesn't change whether we
operate on pointers with or without untagging has been added.

A few other cases that don't look that interesting:

Comparing pointers to achieve unique sorting order of pointee objects
(e.g. sorting locks addresses before performing a double lock):

tty_ldisc_lock_pair_timeout in drivers/tty/tty_ldisc.c
pipe_double_lock in fs/pipe.c
unix_state_double_lock in net/unix/af_unix.c
lock_two_nondirectories in fs/inode.c
mutex_lock_double in kernel/events/core.c

ep_cmp_ffd in fs/eventpoll.c
fsnotify_compare_groups fs/notify/mark.c

Nothing needs to be done here, since the tags embedded into pointers
don't change, so the sorting order would still be unique.

Checks that a pointer belongs to some particular allocation:

is_sibling_entry in lib/radix-tree.c
object_is_on_stack in include/linux/sched/task_stack.h

Nothing needs to be done here either, since two pointers can only belong
to the same allocation if they have the same tag.

Overall, since the kernel boots and works, there are no critical bugs.
As for the rest, the traditional kernel testing way (use until fails) is
the only one that looks feasible.

Another point here is that tag-based KASAN is available under a separate
config option that needs to be deliberately enabled. Even though it might
be used in a "near-production" environment to find bugs that are not found
during fuzzing or running tests, it is still a debug tool.

====== Benchmarks

The following numbers were collected on Odroid C2 board. Both generic and
tag-based KASAN were used in inline instrumentation mode.

Boot time [1]:
* ~1.7 sec for clean kernel
* ~5.0 sec for generic KASAN
* ~5.0 sec for tag-based KASAN

Network performance [2]:
* 8.33 Gbits/sec for clean kernel
* 3.17 Gbits/sec for generic KASAN
* 2.85 Gbits/sec for tag-based KASAN

Slab memory usage after boot [3]:
* ~40 kb for clean kernel
* ~105 kb (~260% overhead) for generic KASAN
* ~47 kb (~20% overhead) for tag-based KASAN

KASAN memory overhead consists of three main parts:
1. Increased slab memory usage due to redzones.
2. Shadow memory (the whole reserved once during boot).
3. Quaratine (grows gradually until some preset limit; the more the limit,
   the more the chance to detect a use-after-free).

Comparing tag-based vs generic KASAN for each of these points:
1. 20% vs 260% overhead.
2. 1/16th vs 1/8th of physical memory.
3. Tag-based KASAN doesn't require quarantine.

[1] Time before the ext4 driver is initialized.
[2] Measured as `iperf -s & iperf -c 127.0.0.1 -t 30`.
[3] Measured as `cat /proc/meminfo | grep Slab`.

====== Some notes

A few notes:

1. The patchset can be found here:
   https://github.com/xairy/kasan-prototype/tree/khwasan

2. Building requires a recent Clang version (7.0.0 or later).

3. Stack instrumentation is not supported yet and will be added later.

This patch (of 25):

Tag-based KASAN changes the value of the top byte of pointers returned
from the kernel allocation functions (such as kmalloc).  This patch
updates KASAN hooks signatures and their usage in SLAB and SLUB code to
reflect that.

Link: http://lkml.kernel.org/r/aec2b5e3973781ff8a6bb6760f8543643202c451.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Andrey Ryabinin <[email protected]>
Reviewed-by: Dmitry Vyukov <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
xairy authored and torvalds committed Dec 28, 2018
1 parent 00c569b commit 0116523
Show file tree
Hide file tree
Showing 7 changed files with 65 additions and 45 deletions.
43 changes: 29 additions & 14 deletions include/linux/kasan.h
Original file line number Diff line number Diff line change
Expand Up @@ -51,16 +51,16 @@ void kasan_cache_shutdown(struct kmem_cache *cache);
void kasan_poison_slab(struct page *page);
void kasan_unpoison_object_data(struct kmem_cache *cache, void *object);
void kasan_poison_object_data(struct kmem_cache *cache, void *object);
void kasan_init_slab_obj(struct kmem_cache *cache, const void *object);
void *kasan_init_slab_obj(struct kmem_cache *cache, const void *object);

void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
void *kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
void kasan_kfree_large(void *ptr, unsigned long ip);
void kasan_poison_kfree(void *ptr, unsigned long ip);
void kasan_kmalloc(struct kmem_cache *s, const void *object, size_t size,
void *kasan_kmalloc(struct kmem_cache *s, const void *object, size_t size,
gfp_t flags);
void kasan_krealloc(const void *object, size_t new_size, gfp_t flags);
void *kasan_krealloc(const void *object, size_t new_size, gfp_t flags);

void kasan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags);
void *kasan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags);
bool kasan_slab_free(struct kmem_cache *s, void *object, unsigned long ip);

struct kasan_cache {
Expand Down Expand Up @@ -105,19 +105,34 @@ static inline void kasan_unpoison_object_data(struct kmem_cache *cache,
void *object) {}
static inline void kasan_poison_object_data(struct kmem_cache *cache,
void *object) {}
static inline void kasan_init_slab_obj(struct kmem_cache *cache,
const void *object) {}
static inline void *kasan_init_slab_obj(struct kmem_cache *cache,
const void *object)
{
return (void *)object;
}

static inline void kasan_kmalloc_large(void *ptr, size_t size, gfp_t flags) {}
static inline void *kasan_kmalloc_large(void *ptr, size_t size, gfp_t flags)
{
return ptr;
}
static inline void kasan_kfree_large(void *ptr, unsigned long ip) {}
static inline void kasan_poison_kfree(void *ptr, unsigned long ip) {}
static inline void kasan_kmalloc(struct kmem_cache *s, const void *object,
size_t size, gfp_t flags) {}
static inline void kasan_krealloc(const void *object, size_t new_size,
gfp_t flags) {}
static inline void *kasan_kmalloc(struct kmem_cache *s, const void *object,
size_t size, gfp_t flags)
{
return (void *)object;
}
static inline void *kasan_krealloc(const void *object, size_t new_size,
gfp_t flags)
{
return (void *)object;
}

static inline void kasan_slab_alloc(struct kmem_cache *s, void *object,
gfp_t flags) {}
static inline void *kasan_slab_alloc(struct kmem_cache *s, void *object,
gfp_t flags)
{
return object;
}
static inline bool kasan_slab_free(struct kmem_cache *s, void *object,
unsigned long ip)
{
Expand Down
4 changes: 2 additions & 2 deletions include/linux/slab.h
Original file line number Diff line number Diff line change
Expand Up @@ -444,7 +444,7 @@ static __always_inline void *kmem_cache_alloc_trace(struct kmem_cache *s,
{
void *ret = kmem_cache_alloc(s, flags);

kasan_kmalloc(s, ret, size, flags);
ret = kasan_kmalloc(s, ret, size, flags);
return ret;
}

Expand All @@ -455,7 +455,7 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s,
{
void *ret = kmem_cache_alloc_node(s, gfpflags, node);

kasan_kmalloc(s, ret, size, gfpflags);
ret = kasan_kmalloc(s, ret, size, gfpflags);
return ret;
}
#endif /* CONFIG_TRACING */
Expand Down
30 changes: 18 additions & 12 deletions mm/kasan/kasan.c
Original file line number Diff line number Diff line change
Expand Up @@ -474,20 +474,22 @@ struct kasan_free_meta *get_free_info(struct kmem_cache *cache,
return (void *)object + cache->kasan_info.free_meta_offset;
}

void kasan_init_slab_obj(struct kmem_cache *cache, const void *object)
void *kasan_init_slab_obj(struct kmem_cache *cache, const void *object)
{
struct kasan_alloc_meta *alloc_info;

if (!(cache->flags & SLAB_KASAN))
return;
return (void *)object;

alloc_info = get_alloc_info(cache, object);
__memset(alloc_info, 0, sizeof(*alloc_info));

return (void *)object;
}

void kasan_slab_alloc(struct kmem_cache *cache, void *object, gfp_t flags)
void *kasan_slab_alloc(struct kmem_cache *cache, void *object, gfp_t flags)
{
kasan_kmalloc(cache, object, cache->object_size, flags);
return kasan_kmalloc(cache, object, cache->object_size, flags);
}

static bool __kasan_slab_free(struct kmem_cache *cache, void *object,
Expand Down Expand Up @@ -528,7 +530,7 @@ bool kasan_slab_free(struct kmem_cache *cache, void *object, unsigned long ip)
return __kasan_slab_free(cache, object, ip, true);
}

void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size,
void *kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size,
gfp_t flags)
{
unsigned long redzone_start;
Expand All @@ -538,7 +540,7 @@ void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size,
quarantine_reduce();

if (unlikely(object == NULL))
return;
return NULL;

redzone_start = round_up((unsigned long)(object + size),
KASAN_SHADOW_SCALE_SIZE);
Expand All @@ -551,10 +553,12 @@ void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size,

if (cache->flags & SLAB_KASAN)
set_track(&get_alloc_info(cache, object)->alloc_track, flags);

return (void *)object;
}
EXPORT_SYMBOL(kasan_kmalloc);

void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags)
void *kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags)
{
struct page *page;
unsigned long redzone_start;
Expand All @@ -564,7 +568,7 @@ void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags)
quarantine_reduce();

if (unlikely(ptr == NULL))
return;
return NULL;

page = virt_to_page(ptr);
redzone_start = round_up((unsigned long)(ptr + size),
Expand All @@ -574,21 +578,23 @@ void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags)
kasan_unpoison_shadow(ptr, size);
kasan_poison_shadow((void *)redzone_start, redzone_end - redzone_start,
KASAN_PAGE_REDZONE);

return (void *)ptr;
}

void kasan_krealloc(const void *object, size_t size, gfp_t flags)
void *kasan_krealloc(const void *object, size_t size, gfp_t flags)
{
struct page *page;

if (unlikely(object == ZERO_SIZE_PTR))
return;
return ZERO_SIZE_PTR;

page = virt_to_head_page(object);

if (unlikely(!PageSlab(page)))
kasan_kmalloc_large(object, size, flags);
return kasan_kmalloc_large(object, size, flags);
else
kasan_kmalloc(page->slab_cache, object, size, flags);
return kasan_kmalloc(page->slab_cache, object, size, flags);
}

void kasan_poison_kfree(void *ptr, unsigned long ip)
Expand Down
12 changes: 6 additions & 6 deletions mm/slab.c
Original file line number Diff line number Diff line change
Expand Up @@ -3551,7 +3551,7 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
void *ret = slab_alloc(cachep, flags, _RET_IP_);

kasan_slab_alloc(cachep, ret, flags);
ret = kasan_slab_alloc(cachep, ret, flags);
trace_kmem_cache_alloc(_RET_IP_, ret,
cachep->object_size, cachep->size, flags);

Expand Down Expand Up @@ -3617,7 +3617,7 @@ kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)

ret = slab_alloc(cachep, flags, _RET_IP_);

kasan_kmalloc(cachep, ret, size, flags);
ret = kasan_kmalloc(cachep, ret, size, flags);
trace_kmalloc(_RET_IP_, ret,
size, cachep->size, flags);
return ret;
Expand All @@ -3641,7 +3641,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
{
void *ret = slab_alloc_node(cachep, flags, nodeid, _RET_IP_);

kasan_slab_alloc(cachep, ret, flags);
ret = kasan_slab_alloc(cachep, ret, flags);
trace_kmem_cache_alloc_node(_RET_IP_, ret,
cachep->object_size, cachep->size,
flags, nodeid);
Expand All @@ -3660,7 +3660,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,

ret = slab_alloc_node(cachep, flags, nodeid, _RET_IP_);

kasan_kmalloc(cachep, ret, size, flags);
ret = kasan_kmalloc(cachep, ret, size, flags);
trace_kmalloc_node(_RET_IP_, ret,
size, cachep->size,
flags, nodeid);
Expand All @@ -3681,7 +3681,7 @@ __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
if (unlikely(ZERO_OR_NULL_PTR(cachep)))
return cachep;
ret = kmem_cache_alloc_node_trace(cachep, flags, node, size);
kasan_kmalloc(cachep, ret, size, flags);
ret = kasan_kmalloc(cachep, ret, size, flags);

return ret;
}
Expand Down Expand Up @@ -3719,7 +3719,7 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
return cachep;
ret = slab_alloc(cachep, flags, caller);

kasan_kmalloc(cachep, ret, size, flags);
ret = kasan_kmalloc(cachep, ret, size, flags);
trace_kmalloc(caller, ret,
size, cachep->size, flags);

Expand Down
2 changes: 1 addition & 1 deletion mm/slab.h
Original file line number Diff line number Diff line change
Expand Up @@ -441,7 +441,7 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags,

kmemleak_alloc_recursive(object, s->object_size, 1,
s->flags, flags);
kasan_slab_alloc(s, object, flags);
p[i] = kasan_slab_alloc(s, object, flags);
}

if (memcg_kmem_enabled())
Expand Down
4 changes: 2 additions & 2 deletions mm/slab_common.c
Original file line number Diff line number Diff line change
Expand Up @@ -1204,7 +1204,7 @@ void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
page = alloc_pages(flags, order);
ret = page ? page_address(page) : NULL;
kmemleak_alloc(ret, size, 1, flags);
kasan_kmalloc_large(ret, size, flags);
ret = kasan_kmalloc_large(ret, size, flags);
return ret;
}
EXPORT_SYMBOL(kmalloc_order);
Expand Down Expand Up @@ -1482,7 +1482,7 @@ static __always_inline void *__do_krealloc(const void *p, size_t new_size,
ks = ksize(p);

if (ks >= new_size) {
kasan_krealloc((void *)p, new_size, flags);
p = kasan_krealloc((void *)p, new_size, flags);
return (void *)p;
}

Expand Down
15 changes: 7 additions & 8 deletions mm/slub.c
Original file line number Diff line number Diff line change
Expand Up @@ -1372,10 +1372,10 @@ static inline void dec_slabs_node(struct kmem_cache *s, int node,
* Hooks for other subsystems that check memory allocations. In a typical
* production configuration these hooks all should produce no code at all.
*/
static inline void kmalloc_large_node_hook(void *ptr, size_t size, gfp_t flags)
static inline void *kmalloc_large_node_hook(void *ptr, size_t size, gfp_t flags)
{
kmemleak_alloc(ptr, size, 1, flags);
kasan_kmalloc_large(ptr, size, flags);
return kasan_kmalloc_large(ptr, size, flags);
}

static __always_inline void kfree_hook(void *x)
Expand Down Expand Up @@ -2768,7 +2768,7 @@ void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
{
void *ret = slab_alloc(s, gfpflags, _RET_IP_);
trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags);
kasan_kmalloc(s, ret, size, gfpflags);
ret = kasan_kmalloc(s, ret, size, gfpflags);
return ret;
}
EXPORT_SYMBOL(kmem_cache_alloc_trace);
Expand Down Expand Up @@ -2796,7 +2796,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
trace_kmalloc_node(_RET_IP_, ret,
size, s->size, gfpflags, node);

kasan_kmalloc(s, ret, size, gfpflags);
ret = kasan_kmalloc(s, ret, size, gfpflags);
return ret;
}
EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
Expand Down Expand Up @@ -3784,7 +3784,7 @@ void *__kmalloc(size_t size, gfp_t flags)

trace_kmalloc(_RET_IP_, ret, size, s->size, flags);

kasan_kmalloc(s, ret, size, flags);
ret = kasan_kmalloc(s, ret, size, flags);

return ret;
}
Expand All @@ -3801,8 +3801,7 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
if (page)
ptr = page_address(page);

kmalloc_large_node_hook(ptr, size, flags);
return ptr;
return kmalloc_large_node_hook(ptr, size, flags);
}

void *__kmalloc_node(size_t size, gfp_t flags, int node)
Expand All @@ -3829,7 +3828,7 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)

trace_kmalloc_node(_RET_IP_, ret, size, s->size, flags, node);

kasan_kmalloc(s, ret, size, flags);
ret = kasan_kmalloc(s, ret, size, flags);

return ret;
}
Expand Down

0 comments on commit 0116523

Please sign in to comment.