linux/kernel/bpf
Daniel Borkmann cfa1a2329a bpf: Fix overrunning reservations in ringbuf
The BPF ring buffer internally is implemented as a power-of-2 sized circular
buffer, with two logical and ever-increasing counters: consumer_pos is the
consumer counter to show which logical position the consumer consumed the
data, and producer_pos which is the producer counter denoting the amount of
data reserved by all producers.

Each time a record is reserved, the producer that "owns" the record will
successfully advance producer counter. In user space each time a record is
read, the consumer of the data advanced the consumer counter once it finished
processing. Both counters are stored in separate pages so that from user
space, the producer counter is read-only and the consumer counter is read-write.

One aspect that simplifies and thus speeds up the implementation of both
producers and consumers is how the data area is mapped twice contiguously
back-to-back in the virtual memory, allowing to not take any special measures
for samples that have to wrap around at the end of the circular buffer data
area, because the next page after the last data page would be first data page
again, and thus the sample will still appear completely contiguous in virtual
memory.

Each record has a struct bpf_ringbuf_hdr { u32 len; u32 pg_off; } header for
book-keeping the length and offset, and is inaccessible to the BPF program.
Helpers like bpf_ringbuf_reserve() return `(void *)hdr + BPF_RINGBUF_HDR_SZ`
for the BPF program to use. Bing-Jhong and Muhammad reported that it is however
possible to make a second allocated memory chunk overlapping with the first
chunk and as a result, the BPF program is now able to edit first chunk's
header.

For example, consider the creation of a BPF_MAP_TYPE_RINGBUF map with size
of 0x4000. Next, the consumer_pos is modified to 0x3000 /before/ a call to
bpf_ringbuf_reserve() is made. This will allocate a chunk A, which is in
[0x0,0x3008], and the BPF program is able to edit [0x8,0x3008]. Now, lets
allocate a chunk B with size 0x3000. This will succeed because consumer_pos
was edited ahead of time to pass the `new_prod_pos - cons_pos > rb->mask`
check. Chunk B will be in range [0x3008,0x6010], and the BPF program is able
to edit [0x3010,0x6010]. Due to the ring buffer memory layout mentioned
earlier, the ranges [0x0,0x4000] and [0x4000,0x8000] point to the same data
pages. This means that chunk B at [0x4000,0x4008] is chunk A's header.
bpf_ringbuf_submit() / bpf_ringbuf_discard() use the header's pg_off to then
locate the bpf_ringbuf itself via bpf_ringbuf_restore_from_rec(). Once chunk
B modified chunk A's header, then bpf_ringbuf_commit() refers to the wrong
page and could cause a crash.

Fix it by calculating the oldest pending_pos and check whether the range
from the oldest outstanding record to the newest would span beyond the ring
buffer size. If that is the case, then reject the request. We've tested with
the ring buffer benchmark in BPF selftests (./benchs/run_bench_ringbufs.sh)
before/after the fix and while it seems a bit slower on some benchmarks, it
is still not significantly enough to matter.

Fixes: 457f44363a ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Reported-by: Muhammad Ramdhan <ramdhan@starlabs.sg>
Co-developed-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Co-developed-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240621140828.18238-1-daniel@iogearbox.net
2024-06-21 13:04:21 -07:00
..
preload bpf: make preloaded map iterators to display map elements count 2023-07-06 12:42:25 -07:00
arena.c bpf: Fix remap of arena. 2024-06-18 17:19:46 +02:00
arraymap.c bpf: Do not walk twice the map on free 2024-04-30 16:28:33 +02:00
bloom_filter.c bpf: Check bloom filter map value size 2024-03-27 09:56:17 -07:00
bpf_cgrp_storage.c bpf: Enable bpf_cgrp_storage for cgroup1 non-attach case 2023-12-08 17:08:18 -08:00
bpf_inode_storage.c
bpf_iter.c bpf: move sleepable flag from bpf_prog_aux to bpf_prog 2024-03-11 16:41:25 -07:00
bpf_local_storage.c bpf: Fix typos in comments 2024-04-22 17:48:08 +02:00
bpf_lru_list.c bpf: Address KCSAN report on bpf_lru_list 2023-05-12 12:01:03 -07:00
bpf_lru_list.h bpf: lru: Remove unused declaration bpf_lru_promote() 2023-08-08 17:21:42 -07:00
bpf_lsm.c bpf: Minor clean-up to sleepable_lsm_hooks BTF set 2024-02-01 18:37:45 +01:00
bpf_struct_ops.c bpf: Check return from set_memory_rox() 2024-03-18 14:18:47 -07:00
bpf_task_storage.c
btf.c bpf: Add support for kprobe session context 2024-04-30 09:45:53 -07:00
cgroup.c bpf: Allow helper bpf_get_[ns_]current_pid_tgid() for all prog types 2024-03-19 14:24:07 -07:00
cgroup_iter.c bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg 2023-11-07 15:24:25 -08:00
core.c Modules changes for v6.10-rc1 2024-05-15 14:05:08 -07:00
cpumap.c bpf: report RCU QS in cpumap kthread 2024-03-20 21:05:43 -07:00
cpumask.c bpf: Allow invoking kfuncs from BPF_PROG_TYPE_SYSCALL progs 2024-04-05 10:56:09 -07:00
crypto.c bpf: make common crypto API for TC/XDP programs 2024-04-24 16:01:10 -07:00
devmap.c bpf, devmap: Remove unnecessary if check in for loop 2024-06-03 17:09:23 +02:00
disasm.c bpf: add special internal-only MOV instruction to resolve per-CPU addrs 2024-04-03 10:29:55 -07:00
disasm.h
dispatcher.c bpf: Use arch_bpf_trampoline_size 2023-12-06 17:17:20 -08:00
hashtab.c bpf: Do not walk twice the hash map on free 2024-04-30 16:28:46 +02:00
helpers.c bpf: make common crypto API for TC/XDP programs 2024-04-24 16:01:10 -07:00
inode.c bpf: Support symbolic BPF FS delegation mount options 2024-01-24 16:21:02 -08:00
Kconfig bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of 2024-05-14 00:36:29 -07:00
link_iter.c
local_storage.c
log.c bpf: Replace deprecated strncpy with strscpy 2024-04-03 16:57:41 +02:00
lpm_trie.c bpf: Avoid kfree_rcu() under lock in bpf_lpm_trie. 2024-03-29 11:10:41 -07:00
Makefile bpf: crypto: fix build when CONFIG_CRYPTO=m 2024-05-01 13:32:26 -07:00
map_in_map.c bpf: save extended inner map info for percpu array maps as well 2024-05-15 09:34:54 -07:00
map_in_map.h bpf: Add map and need_defer parameters to .map_fd_put_ptr() 2023-12-04 17:50:26 -08:00
map_iter.c bpf: treewide: Annotate BPF kfuncs in BTF 2024-01-31 20:40:56 -08:00
memalloc.c mm: memcg: add NULL check to obj_cgroup_put() 2024-04-25 20:55:43 -07:00
mmap_unlock_work.h
mprog.c bpf: Handle bpf_mprog_query with NULL entry 2023-10-06 17:11:20 -07:00
net_namespace.c
offload.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-09-21 21:49:45 +02:00
percpu_freelist.c
percpu_freelist.h
prog_iter.c
queue_stack_maps.c bpf: Avoid deadlock when using queue and stack maps from NMI 2023-09-11 19:04:49 -07:00
reuseport_array.c bpf: Centralize permissions checks for all BPF map types 2023-06-19 14:04:04 +02:00
ringbuf.c bpf: Fix overrunning reservations in ringbuf 2024-06-21 13:04:21 -07:00
stackmap.c bpf: Fix stackmap overflow check on 32-bit arches 2024-03-07 20:06:25 -08:00
syscall.c bpf: Fix a potential use-after-free in bpf_link_free() 2024-06-03 18:16:19 +02:00
sysfs_btf.c btf: Avoid weak external references 2024-04-16 16:35:13 +02:00
task_iter.c bpf: Fix an issue due to uninitialized bpf_iter_task 2024-02-19 12:28:15 +01:00
tcx.c bpf, tcx: Get rid of tcx_link_const 2023-10-23 15:01:53 -07:00
tnum.c bpf: simplify tnum output if a fully known constant 2023-12-02 11:36:51 -08:00
token.c bpf,token: Use BIT_ULL() to convert the bit mask 2024-01-29 20:04:55 -08:00
trampoline.c Networking changes for 6.10. 2024-05-14 19:42:24 -07:00
verifier.c bpf: Fix the corner case with may_goto and jump to the 1st insn. 2024-06-21 20:18:40 +02:00