linux/tools/perf
Ian Rogers 659ad3492b perf maps: Switch from rbtree to lazily sorted array for addresses
Maps is a collection of maps primarily sorted by the starting address
of the map. Prior to this change the maps were held in an rbtree
requiring 4 pointers per node. Prior to reference count checking, the
rbnode was embedded in the map so 3 pointers per node were
necessary. This change switches the rbtree to an array lazily sorted
by address, much as the array sorting nodes by name. 1 pointer is
needed per node, but to avoid excessive resizing the backing array may
be twice the number of used elements. Meaning the memory overhead is
roughly half that of the rbtree. For a perf record with
"--no-bpf-event -g -a" of true, the memory overhead of perf inject is
reduce fom 3.3MB to 3MB, so 10% or 300KB is saved.

Map inserts always happen at the end of the array. The code tracks
whether the insertion violates the sorting property. O(log n) rb-tree
complexity is switched to O(1).

Remove slides the array, so O(log n) rb-tree complexity is degraded to
O(n).

A find may need to sort the array using qsort which is O(n*log n), but
in general the maps should be sorted and so average performance should
be O(log n) as with the rbtree.

An rbtree node consumes a cache line, but with the array 4 nodes fit
on a cache line. Iteration is simplified to scanning an array rather
than pointer chasing.

Overall it is expected the performance after the change should be
comparable to before, but with half of the memory consumed.

To avoid a list and repeated logic around splitting maps,
maps__merge_in is rewritten in terms of
maps__fixup_overlap_and_insert. maps_merge_in splits the given mapping
inserting remaining gaps. maps__fixup_overlap_and_insert splits the
existing mappings, then adds the incoming mapping. By adding the new
mapping first, then re-inserting the existing mappings the splitting
behavior matches.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Artem Savkov <asavkov@redhat.com>
Cc: bpf@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240210031746.4057262-2-irogers@google.com
2024-02-12 12:35:14 -08:00
..
arch perf kvm powerpc: Fix build 2024-02-07 08:55:11 -08:00
bench
dlfilters
Documentation Merge branch 'perf-tools' into perf-tools-next 2024-02-12 12:19:21 -08:00
include/perf
jvmti
pmu-events Merge branch 'perf-tools' into perf-tools-next 2024-02-12 12:19:21 -08:00
python
scripts
tests perf maps: Switch from rbtree to lazily sorted array for addresses 2024-02-12 12:35:14 -08:00
trace tools headers uapi: Sync linux/stat.h with the kernel sources to pick STATX_MNT_ID_UNIQUE 2024-01-26 10:51:48 -03:00
ui
util perf maps: Switch from rbtree to lazily sorted array for addresses 2024-02-12 12:35:14 -08:00
.gitignore
Build
builtin-annotate.c
builtin-bench.c
builtin-buildid-cache.c
builtin-buildid-list.c
builtin-c2c.c perf mem: Clean up perf_pmus__num_mem_pmus() 2024-01-24 14:05:22 -08:00
builtin-config.c
builtin-daemon.c
builtin-data.c
builtin-diff.c
builtin-evlist.c
builtin-ftrace.c
builtin-help.c
builtin-inject.c
builtin-kallsyms.c
builtin-kmem.c
builtin-kvm.c
builtin-kwork.c
builtin-list.c perf list: Add output file option 2024-01-26 10:51:49 -03:00
builtin-lock.c
builtin-mem.c perf mem: Clean up perf_pmus__num_mem_pmus() 2024-01-24 14:05:22 -08:00
builtin-probe.c
builtin-record.c Merge branch 'perf-tools' into perf-tools-next 2024-02-12 12:19:21 -08:00
builtin-report.c perf tools: Make it possible to see perf's kernel and module memory mappings 2024-02-08 15:49:39 -08:00
builtin-sched.c perf sched: Move curr_pid and cpu_last_switched initialization to perf_sched__{lat|map|replay}() 2024-02-09 14:08:41 -08:00
builtin-script.c perf tools: Make it possible to see perf's kernel and module memory mappings 2024-02-08 15:49:39 -08:00
builtin-stat.c perf stat: Support per-cluster aggregation 2024-02-09 14:59:53 -08:00
builtin-timechart.c
builtin-top.c Merge branch 'perf-tools' into perf-tools-next 2024-02-12 12:19:21 -08:00
builtin-trace.c
builtin-version.c perf version: Display availability of HAVE_DWARF_UNWIND_SUPPORT 2024-01-24 14:13:48 -08:00
builtin.h
check-headers.sh
command-list.txt
CREDITS
design.txt
Makefile
Makefile.config
Makefile.perf Merge branch 'perf-tools' into perf-tools-next 2024-02-12 12:19:21 -08:00
MANIFEST
perf-archive.sh
perf-completion.sh
perf-iostat.sh
perf-read-vdso.c
perf-sys.h
perf.c
perf.h