linux/kernel/bpf
Jesper Dangaard Brouer 1a4a0cb798 bpf/lpm_trie: Inline longest_prefix_match for fastpath
The BPF map type LPM (Longest Prefix Match) is used heavily
in production by multiple products that have BPF components.
Perf data shows trie_lookup_elem() and longest_prefix_match()
being part of kernels perf top.

For every level in the LPM tree trie_lookup_elem() calls out
to longest_prefix_match().  The compiler is free to inline this
call, but chooses not to inline, because other slowpath callers
(that can be invoked via syscall) exists like trie_update_elem(),
trie_delete_elem() or trie_get_next_key().

 bcc/tools/funccount -Ti 1 'trie_lookup_elem|longest_prefix_match.isra.0'
 FUNC                                    COUNT
 trie_lookup_elem                       664945
 longest_prefix_match.isra.0           8101507

Observation on a single random machine shows a factor 12 between
the two functions. Given an average of 12 levels in the trie being
searched.

This patch force inlining longest_prefix_match(), but only for
the lookup fastpath to balance object instruction size.

In production with AMD CPUs, measuring the function latency of
'trie_lookup_elem' (bcc/tools/funclatency) we are seeing an improvement
function latency reduction 7-8% with this patch applied (to production
kernels 6.6 and 6.1). Analyzing perf data, we can explain this rather
large improvement due to reducing the overhead for AMD side-channel
mitigation SRSO (Speculative Return Stack Overflow).

Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/171076828575.2141737.18370644069389889027.stgit@firesoul
2024-03-19 13:46:03 +01:00
..
preload bpf: make preloaded map iterators to display map elements count 2023-07-06 12:42:25 -07:00
arena.c bpf: Introduce bpf_arena. 2024-03-11 15:37:23 -07:00
arraymap.c bpf: Consistently use BPF token throughout BPF verifier logic 2024-01-24 16:21:01 -08:00
bloom_filter.c bpf: Centralize permissions checks for all BPF map types 2023-06-19 14:04:04 +02:00
bpf_cgrp_storage.c bpf: Enable bpf_cgrp_storage for cgroup1 non-attach case 2023-12-08 17:08:18 -08:00
bpf_inode_storage.c Networking changes for 6.4. 2023-04-26 16:07:23 -07:00
bpf_iter.c bpf: move sleepable flag from bpf_prog_aux to bpf_prog 2024-03-11 16:41:25 -07:00
bpf_local_storage.c bpf: Allow compiler to inline most of bpf_local_storage_lookup() 2024-02-11 14:06:24 -08:00
bpf_lru_list.c bpf: Address KCSAN report on bpf_lru_list 2023-05-12 12:01:03 -07:00
bpf_lru_list.h bpf: lru: Remove unused declaration bpf_lru_promote() 2023-08-08 17:21:42 -07:00
bpf_lsm.c bpf: Minor clean-up to sleepable_lsm_hooks BTF set 2024-02-01 18:37:45 +01:00
bpf_struct_ops.c bpf: Check return from set_memory_rox() 2024-03-18 14:18:47 -07:00
bpf_task_storage.c bpf: Teach verifier that certain helpers accept NULL pointer. 2023-04-04 16:57:16 -07:00
btf.c bpf: Recognize btf_decl_tag("arg: Arena") as PTR_TO_ARENA. 2024-03-11 15:37:24 -07:00
cgroup.c net: adopt skb_network_offset() and similar helpers 2024-03-04 08:47:06 +00:00
cgroup_iter.c bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg 2023-11-07 15:24:25 -08:00
core.c bpf: Check return from set_memory_rox() 2024-03-18 14:18:47 -07:00
cpumap.c net: move skbuff_cache(s) to net_hotdata 2024-03-07 21:12:42 -08:00
cpumask.c bpf: treewide: Annotate BPF kfuncs in BTF 2024-01-31 20:40:56 -08:00
devmap.c bpf: Fix DEVMAP_HASH overflow check on 32-bit arches 2024-03-07 20:02:38 -08:00
disasm.c bpf: Disasm support for addr_space_cast instruction. 2024-03-11 15:37:24 -07:00
disasm.h
dispatcher.c bpf: Use arch_bpf_trampoline_size 2023-12-06 17:17:20 -08:00
hashtab.c bpf: Fix hashtab overflow check on 32-bit arches 2024-03-07 20:05:56 -08:00
helpers.c bpf-next-for-netdev 2024-03-02 20:50:59 -08:00
inode.c bpf: Support symbolic BPF FS delegation mount options 2024-01-24 16:21:02 -08:00
Kconfig bpf: Merge two CONFIG_BPF entries 2024-02-07 16:38:20 -08:00
link_iter.c bpf: Add bpf_link iterator 2022-05-10 11:20:45 -07:00
local_storage.c cgroup changes for v6.4-rc1 2023-04-29 10:05:22 -07:00
log.c bpf: Recognize addr_space_cast instruction in the verifier. 2024-03-11 15:37:24 -07:00
lpm_trie.c bpf/lpm_trie: Inline longest_prefix_match for fastpath 2024-03-19 13:46:03 +01:00
Makefile bpf: Introduce bpf_arena. 2024-03-11 15:37:23 -07:00
map_in_map.c bpf: Optimize the free of inner map 2023-12-04 17:50:26 -08:00
map_in_map.h bpf: Add map and need_defer parameters to .map_fd_put_ptr() 2023-12-04 17:50:26 -08:00
map_iter.c bpf: treewide: Annotate BPF kfuncs in BTF 2024-01-31 20:40:56 -08:00
memalloc.c bpf: Remove unnecessary cpu == 0 check in memalloc 2024-01-04 10:18:14 -08:00
mmap_unlock_work.h
mprog.c bpf: Handle bpf_mprog_query with NULL entry 2023-10-06 17:11:20 -07:00
net_namespace.c net: Add includes masked by netdevice.h including uapi/bpf.h 2021-12-29 20:03:05 -08:00
offload.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-09-21 21:49:45 +02:00
percpu_freelist.c bpf: Initialize same number of free nodes for each pcpu_freelist 2022-11-11 12:05:14 -08:00
percpu_freelist.h
prog_iter.c
queue_stack_maps.c bpf: Avoid deadlock when using queue and stack maps from NMI 2023-09-11 19:04:49 -07:00
reuseport_array.c bpf: Centralize permissions checks for all BPF map types 2023-06-19 14:04:04 +02:00
ringbuf.c bpf: Fold smp_mb__before_atomic() into atomic_set_release() 2023-10-24 14:26:07 +02:00
stackmap.c bpf: Fix stackmap overflow check on 32-bit arches 2024-03-07 20:06:25 -08:00
syscall.c bpf: move sleepable flag from bpf_prog_aux to bpf_prog 2024-03-11 16:41:25 -07:00
sysfs_btf.c
task_iter.c bpf: Fix an issue due to uninitialized bpf_iter_task 2024-02-19 12:28:15 +01:00
tcx.c bpf, tcx: Get rid of tcx_link_const 2023-10-23 15:01:53 -07:00
tnum.c bpf: simplify tnum output if a fully known constant 2023-12-02 11:36:51 -08:00
token.c bpf,token: Use BIT_ULL() to convert the bit mask 2024-01-29 20:04:55 -08:00
trampoline.c bpf: Check return from set_memory_rox() 2024-03-18 14:18:47 -07:00
verifier.c bpf: Take return from set_memory_ro() into account with bpf_prog_lock_ro() 2024-03-14 19:28:52 -07:00