linux

mirror of https://github.com/torvalds/linux synced 2024-09-30 08:20:40 +00:00

Author	SHA1	Message	Date
James Clark	df12e21d4e	perf map: Remove kernel map before updating start and end addresses In a debug build there is validation that mmap lists are sorted when taking a lock. In machine__update_kernel_mmap() the start and end addresses are updated resulting in an unsorted list before the map is removed from the list. When the map is removed, the lock is taken which triggers the validation and the failure: $ perf test "object code reading" --- start --- perf: util/maps.c:88: check_invariants: Assertion `map__start(prev) <= map__start(map)' failed. Aborted Fix it by updating the addresses after removal, but before insertion. The bug depends on the ordering and type of debug info on the system and doesn't reproduce everywhere. Fixes: `659ad3492b` ("perf maps: Switch from rbtree to lazily sorted array for addresses") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Spoorthy S <spoorts2@in.ibm.com> Link: https://lore.kernel.org/r/20240410103458.813656-4-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-12 12:02:06 -03:00
James Clark	2dade41a53	perf tests: Apply attributes to all events in object code reading test PERF_PMU_CAP_EXTENDED_HW_TYPE results in multiple events being opened on heterogeneous systems. Currently this test only sets its required attributes on the first event. Not disabling enable_on_exec on the other events causes the test to fail because the forked objdump processes are sampled. No tracking event is opened so Perf only knows about its own mappings causing the objdump samples to give the following error: $ perf test -vvv "object code reading" Reading object code for memory address: 0xffff9aaa55ec thread__find_map failed ---- end(-1) ---- 24: Object code reading : FAILED! Fixes: `251aa04024` ("perf parse-events: Wildcard most "numeric" events") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Spoorthy S <spoorts2@in.ibm.com> Link: https://lore.kernel.org/r/20240410103458.813656-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-12 12:02:05 -03:00
James Clark	256ef072b3	perf tests: Make "test data symbol" more robust on Neoverse N1 To prevent anyone from seeing a test failure appear as a regression and thinking that it was caused by their code change, insert some noise into the loop which makes it immune to sampling bias issues (errata 1694299). The "test data symbol" test can fail with any unrelated change that shifts the loop into an unfortunate position in the Perf binary which is almost impossible to debug as the root cause of the test failure. Ultimately it's caused by the referenced errata. Fixes: `60abedb8aa` ("perf test: Introduce script for data symbol testing") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Spoorthy S <spoorts2@in.ibm.com> Link: https://lore.kernel.org/r/20240410103458.813656-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-12 12:02:05 -03:00
Ian Rogers	4b5ee6db2d	perf metrics: Remove the "No_group" metric group Rather than place metrics without a metric group in "No_group" place them in a a metric group that is their name. Still allow such metrics to be selected if "No_group" is passed, this change just impacts perf list. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240403164636.3429091-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-12 12:02:05 -03:00
Namhyung Kim	0235abd89f	perf annotate: Get rid of symbol__ensure_annotate() Now symbol__annotate() is reentrant and it doesn't need to remove non-instruction lines. Let's get rid of symbol__ensure_annotate() and call symbol__annotate() directly. Also we can use it to get the arch pointer instead of calling evsel__get_arch() directly. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240405211800.1412920-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-12 12:02:05 -03:00
Namhyung Kim	879ebf3c83	perf annotate-data: Do not delete non-asm lines For data type profiling, it removed non-instruction lines from the list of annotation lines. It was to simplify the implementation dealing with instructions like to calculate the PC-relative address and to search the shortest path to the target instruction or basic block. But it means that it removes all the comments and debug information in the annotate output like source file name and line numbers. To support both code annotation and data type annotation, it'd be better to keep the non-instruction lines as well. So this change is to skip those lines during the data type profiling and to display them in the normal perf annotate output. No function changes intended (other than having more lines). Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240405211800.1412920-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-12 12:02:05 -03:00
Namhyung Kim	657852135d	perf annotate-data: Fix global variable lookup The recent change in the global variable handling added a bug to miss setting the return value even if it found a data type. Also add the type name in the debug message. Fixes: `1ebb5e17ef` ("perf annotate-data: Add get_global_var_type()") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240405211800.1412920-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-12 12:02:05 -03:00
Namhyung Kim	f3408580ba	perf lock contention: Add a missing NULL check I got a report for a failure in BPF verifier on a recent kernel with perf lock contention command. It checks task->sighand->siglock without checking if sighand is NULL or not. Let's add one. ; if (&curr->sighand->siglock == (void )lock) 265: (79) r1 = (u64 *)(r0 +2624) ; frame1: R0_w=trusted_ptr_task_struct(off=0,imm=0) ; R1_w=rcu_ptr_or_null_sighand_struct(off=0,imm=0) 266: (b7) r2 = 0 ; frame1: R2_w=0 267: (0f) r1 += r2 R1 pointer arithmetic on rcu_ptr_or_null_ prohibited, null-check it first processed 164 insns (limit 1000000) max_states_per_insn 1 total_states 15 peak_states 15 mark_read 5 -- END PROG LOAD LOG -- libbpf: prog 'contention_end': failed to load: -13 libbpf: failed to load object 'lock_contention_bpf' libbpf: failed to load BPF skeleton 'lock_contention_bpf': -13 Failed to load lock-contention BPF skeleton lock contention BPF setup failed lock contention did not detect any lock contention Fixes: `1811e82767` ("perf lock contention: Track and show siglock with address") Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240409225542.1870999-1-namhyung@kernel.org	2024-04-11 10:30:06 -07:00
Namhyung Kim	2b8dbf69ec	perf annotate: Make sure to call symbol__annotate2() in TUI The symbol__annotate2() initializes some data structures needed by TUI. It has a logic to prevent calling it multiple times by checking if it has the annotated source. But data type profiling uses a different code (symbol__annotate) to allocate the annotated lines in advance. So TUI missed to call symbol__annotate2() when it shows the annotation browser. Make symbol__annotate() reentrant and handle that situation properly. This fixes a crash in the annotation browser started by perf report in TUI like below. $ perf report -s type,sym --tui # and press 'a' key and then move down Fixes: `81e57deec3` ("perf report: Support data type profiling") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240405211800.1412920-2-namhyung@kernel.org	2024-04-11 10:14:58 -07:00
Namhyung Kim	8c004c7a60	perf annotate: Move 'start' field struct to 'struct annotated_source' It's only used in 'perf annotate' output which means functions with actual samples. No need to consume memory for every symbol ('struct annotation'). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240404175716.1225482-10-namhyung@kernel.org Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: LKML <linux-kernel@vger.kernel.org> Cc: <linux-perf-users@vger.kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-08 17:43:20 -03:00
Namhyung Kim	6f94a72d45	perf annotate: Move nr_events struct to 'struct annotated_source' It's only used in 'perf annotate' output which means functions with actual samples. No need to consume memory for every symbol ('struct annotation'). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240404175716.1225482-9-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-08 17:43:20 -03:00
Namhyung Kim	f6b18ababa	perf annotate: Move 'max_jump_sources' struct to 'struct annotated_source' It's only used in 'perf annotate' output which means functions with actual samples. No need to consume memory for every symbol ('struct annotation'). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240404175716.1225482-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-08 17:43:20 -03:00
Namhyung Kim	a46acc4567	perf annotate: Move 'widths' struct to 'struct annotated_source' It's only used in 'perf annotate' output which means functions with actual samples. No need to consume memory for every symbol ('struct annotation'). Also move the 'max_line_len' field into it as it's related. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240404175716.1225482-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-08 17:43:20 -03:00
Namhyung Kim	cee9b86043	perf annotate: Get rid of offsets array The struct annotated_source.offsets[] is to save pointers to annotation_line at each offset. We can use annotated_source__get_line() helper instead so let's get rid of the array. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240404175716.1225482-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-08 17:43:20 -03:00
Namhyung Kim	0c053ed273	perf annotate: Check annotation lines more efficiently In some places, it checks annotated (disasm) lines for each byte. But as it already has a list of disasm lines, it'd be better to traverse the list entries instead of checking every offset with linear search (by annotated_source__get_line() helper). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240404175716.1225482-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-08 17:43:20 -03:00
Namhyung Kim	6f157d9af1	perf annotate: Introduce annotated_source__get_line() It's a helper function to get annotation_line at the given offset without using the offsets array. The goal is to get rid of the offsets array altogether. It just does the linear search but I think it's better to save memory as it won't be called in a hot path. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240404175716.1225482-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-08 17:43:20 -03:00
Namhyung Kim	bfd98ceb62	perf annotate: Staticize some local functions I found annotation__mark_jump_targets(), annotation__set_offsets() and annotation__init_column_widths() are only used in the same file. Let's make them static. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240404175716.1225482-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-08 17:43:20 -03:00
Namhyung Kim	aaf494cf48	perf annotate: Fix annotation_calc_lines() to pass correct address to get_srcline() It should pass a proper address (i.e. suitable for objdump or addr2line) to get_srcline() in order to work correctly. It used to pass an address with map__rip_2objdump() as the second argument but later it's changed to use notes->start. It's ok in normal cases but it can be changed when annotate_opts.full_addr is set. So let's convert the address directly instead of using the notes->start. Also the last argument is an IP to print symbol offset if requested. So it should pass symbol-relative address. Fixes: `7d18a824b5` ("perf annotate: Toggle full address <-> offset display") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240404175716.1225482-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-08 17:43:20 -03:00
Adrian Hunter	218c200f67	perf script: Consolidate capstone print functions Consolidate capstone print functions, to reduce duplication. Amend call sites to use a file pointer for output, which is consistent with most perf tools print functions. Add print_opts with an option to print also the hex value of a resolved symbol+offset. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20240401210925.209671-4-ak@linux.intel.com Signed-off-by: Andi Kleen <ak@linux.intel.com> [ Added missing inttypes.h include to use PRIx64 in util/print_insn.c ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-08 17:42:27 -03:00
Andi Kleen	d812044688	perf script: Add capstone support for '-F +brstackdisasm' Support capstone output for the '-F +brstackinsn' branch dump. The new output is enabled with the new field 'brstackdisasm'. This was possible before with --xed, but now also allow it for users that don't have xed using the builtin capstone support. Before: perf record -b emacs -Q --batch '()' perf script -F +brstackinsn ... emacs 55778 1814366.755945: 151564 cycles:P: 7f0ab2d17192 intel_check_word.constprop.0+0x162 (/usr/lib64/ld-linux-x86-64.s> intel_check_word.constprop.0+237: 00007f0ab2d1711d insn: 75 e6 # PRED 3 cycles [3] 00007f0ab2d17105 insn: 73 51 00007f0ab2d17107 insn: 48 89 c1 00007f0ab2d1710a insn: 48 39 ca 00007f0ab2d1710d insn: 73 96 00007f0ab2d1710f insn: 48 8d 04 11 00007f0ab2d17113 insn: 48 d1 e8 00007f0ab2d17116 insn: 49 8d 34 c1 00007f0ab2d1711a insn: 44 3a 06 00007f0ab2d1711d insn: 75 e6 # PRED 3 cycles [6] 3.00 IPC 00007f0ab2d17105 insn: 73 51 # PRED 1 cycles [7] 1.00 IPC 00007f0ab2d17158 insn: 48 8d 50 01 00007f0ab2d1715c insn: eb 92 # PRED 1 cycles [8] 2.00 IPC 00007f0ab2d170f0 insn: 48 39 ca 00007f0ab2d170f3 insn: 73 b0 # PRED 1 cycles [9] 2.00 IPC After (perf must be compiled with capstone): perf script -F +brstackdisasm ... emacs 55778 1814366.755945: 151564 cycles:P: 7f0ab2d17192 intel_check_word.constprop.0+0x162 (/usr/lib64/ld-linux-x86-64.s> intel_check_word.constprop.0+237: 00007f0ab2d1711d jne intel_check_word.constprop.0+0xd5 # PRED 3 cycles [3] 00007f0ab2d17105 jae intel_check_word.constprop.0+0x128 00007f0ab2d17107 movq %rax, %rcx 00007f0ab2d1710a cmpq %rcx, %rdx 00007f0ab2d1710d jae intel_check_word.constprop.0+0x75 00007f0ab2d1710f leaq (%rcx, %rdx), %rax 00007f0ab2d17113 shrq $1, %rax 00007f0ab2d17116 leaq (%r9, %rax, 8), %rsi 00007f0ab2d1711a cmpb (%rsi), %r8b 00007f0ab2d1711d jne intel_check_word.constprop.0+0xd5 # PRED 3 cycles [6] 3.00 IPC 00007f0ab2d17105 jae intel_check_word.constprop.0+0x128 # PRED 1 cycles [7] 1.00 IPC 00007f0ab2d17158 leaq 1(%rax), %rdx 00007f0ab2d1715c jmp intel_check_word.constprop.0+0xc0 # PRED 1 cycles [8] 2.00 IPC 00007f0ab2d170f0 cmpq %rcx, %rdx 00007f0ab2d170f3 jae intel_check_word.constprop.0+0x75 # PRED 1 cycles [9] 2.00 IPC Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/20240401210925.209671-3-ak@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-05 10:43:07 -03:00
Andi Kleen	38ab60132b	perf script: Support 32bit code under 64bit OS with capstone Use the DSO to resolve whether an IP is 32bit or 64bit and use that to configure capstone to the correct mode. This allows to correctly disassemble 32bit code under a 64bit OS. % cat > loop.c volatile int var; int main(void) { int i; for (i = 0; i < 100000; i++) var++; } % gcc -m32 -o loop loop.c % perf record -e cycles:u ./loop % perf script -F +disasm loop 82665 1833176.618023: 1 cycles:u: f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2) movl %esp, %eax loop 82665 1833176.618029: 1 cycles:u: f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2) movl %esp, %eax loop 82665 1833176.618031: 7 cycles:u: f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2) movl %esp, %eax loop 82665 1833176.618034: 91 cycles:u: f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2) movl %esp, %eax loop 82665 1833176.618036: 1242 cycles:u: f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2) movl %esp, %eax Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/20240401210925.209671-2-ak@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-05 09:42:36 -03:00
Thomas Richter	c2f3d7dfc7	perf stat: Do not fail on metrics on s390 z/VM systems On s390 z/VM virtual machines command 'perf list' also displays metrics: # perf list \| grep -A 20 'Metric Groups:' Metric Groups: No_group: cpi [Cycles per Instruction] est_cpi [Estimated Instruction Complexity CPI infinite Level 1] finite_cpi [Cycles per Instructions from Finite cache/memory] l1mp [Level One Miss per 100 Instructions] l2p [Percentage sourced from Level 2 cache] l3p [Percentage sourced from Level 3 on same chip cache] l4lp [Percentage sourced from Level 4 Local cache on same book] l4rp [Percentage sourced from Level 4 Remote cache on different book] memp [Percentage sourced from memory] .... # The command # perf stat -M cpi -- true event syntax error: '{CPU_CYCLES/metric-id=CPU_CYCLES/.....' \___ Bad event or PMU Unable to find PMU or event on a PMU of 'CPU_CYCLES' event syntax error: '{CPU_CYCLES/metric-id=CPU_CYCLES/...' \___ Cannot find PMU `CPU_CYCLES'. Missing kernel support? # fails. 'perf stat' should not fail on metrics when the referenced CPU Counter Measurement PMU is not available. Output after: # perf stat -M est_cpi -- sleep 1 Performance counter stats for 'sleep 1': 1,000,887,494 ns duration_time # 0.00 est_cpi 1.000887494 seconds time elapsed 0.000143000 seconds user 0.000662000 seconds sys # Fixes: `7f76b31130` ("perf list: Add IBM z16 event description for s390") Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20240404064806.1362876-2-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-04 18:10:11 -03:00
Thomas Richter	b74bc5a633	perf report: Fix PAI counter names for s390 virtual machines s390 introduced the Processor Activity Instrumentation (PAI) counter facility on LPAR and virtual machines z/VM for models 3931 and 3932. These counters are stored as raw data in the perf.data file and are displayed with: # perf report -i /tmp//perfout-635468 -D \| grep Counter Counter:007 <unknown> Value:0x00000000000186a0 Counter:032 <unknown> Value:0x0000000000000001 Counter:032 <unknown> Value:0x0000000000000001 Counter:032 <unknown> Value:0x0000000000000001 # However on z/VM virtual machines, the counter names are not retrieved from the PMU and are shown as '<unknown>'. This is caused by the CPU string saved in the mapfile.csv for this machine: ^IBM.393[12].*3\.7.[[:xdigit:]]+$,3,cf_z16,core This string contains the CPU Measurement facility first and second version number and authorization level (3\.7.[[:xdigit:]]+). These numbers do not apply to the PAI counter facility. In fact they can be omitted. Shorten the CPU identification string for this machine to manufacturer and model. This is sufficient for all PMU devices. Output after: # perf report -i /tmp//perfout-635468 -D \| grep Counter Counter:007 km_aes_128 Value:0x00000000000186a0 Counter:032 kma_gcm_aes_256 Value:0x0000000000000001 Counter:032 kma_gcm_aes_256 Value:0x0000000000000001 Counter:032 kma_gcm_aes_256 Value:0x0000000000000001 # Fixes: `b539deafba` ("perf report: Add s390 raw data interpretation for PAI counters") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20240404064806.1362876-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-04 18:08:21 -03:00
Arnaldo Carvalho de Melo	b6347cb5e0	perf annotate: Initialize 'arch' variable not to trip some -Werror=maybe-uninitialized In some older distros the build is failing due to -Werror=maybe-uninitialized, in this case we know that this isn't the case because 'arch' gets initialized by evsel__get_arch(), so make sure it is initialized to NULL before returning from evsel__get_arch(), as suggested by Ian Rogers. E.g.: 32 17.12 opensuse:15.5 : FAIL gcc version 7.5.0 (SUSE Linux) util/annotate.c: In function 'hist_entry__get_data_type': util/annotate.c:2269:15: error: 'arch' may be used uninitialized in this function [-Werror=maybe-uninitialized] struct arch *arch; ^~~~ cc1: all warnings being treated as errors 43 7.30 ubuntu:18.04-x-powerpc64el : FAIL gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) util/annotate.c: In function 'hist_entry__get_data_type': util/annotate.c:2351:36: error: 'arch' may be used uninitialized in this function [-Werror=maybe-uninitialized] if (map__dso(ms->map)->kernel && arch__is(arch, "x86") && ^~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fUqtjxAsmdGrnkjhUTLHs-JvV10TtxyocpYDJK_+LYTiQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-03 18:05:19 -03:00
Yang Jihong	baa2ca59ec	perf build: Add LIBTRACEEVENT_DIR build option Currently, when libtraceevent is not linked, perf does not support tracepoint: # ./perf record -e sched:sched_switch -a sleep 10 event syntax error: 'sched:sched_switch' \___ unsupported tracepoint libtraceevent is necessary for tracepoint support Run 'perf list' for a list of valid events Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -e, --event <event> event selector. use 'perf list' to list available events For cross-compilation scenario, library may not be installed in the default system path. Based on the above requirements, add LIBTRACEEVENT_DIR build option to support specifying path of libtraceevent. Example: 1. Cross compile libtraceevent # cd /opt/libtraceevent # CROSS_COMPILE=aarch64-linux-gnu- make 2. Cross compile perf # cd tool/perf # make VF=1 ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- NO_LIBELF=1 LDFLAGS=--static LIBTRACEEVENT_DIR=/opt/libtraceevent <SNIP> Auto-detecting system features: <SNIP> ... LIBTRACEEVENT_DIR: /opt/libtraceevent Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240314063000.2139877-1-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-03 18:05:19 -03:00
Yang Jihong	089ef2f4c8	perf beauty: Fix AT_EACCESS undeclared build error for system with kernel versions lower than v5.8 In the environment of ubuntu 20.04 (the version of kernel headers is 5.4), there is an error in building perf: CC trace/beauty/fs_at_flags.o trace/beauty/fs_at_flags.c: In function ‘faccessat2__scnprintf_flags’: trace/beauty/fs_at_flags.c:35:14: error: ‘AT_EACCESS’ undeclared (first use in this function); did you mean ‘DN_ACCESS’? 35 \| if (flags & AT_EACCESS) { \| ^~~~~~~~~~ \| DN_ACCESS trace/beauty/fs_at_flags.c:35:14: note: each undeclared identifier is reported only once for each function it appears in commit `8a1ad44135` ("tools headers: Remove now unused copies of uapi/{fcntl,openat2}.h and asm/fcntl.h") removes fcntl.h from tools headers directory, and fs_at_flags.c uses the 'AT_EACCESS' macro. This macro was introduced in the kernel version v5.8. For system with a kernel version older than this version, it will cause compilation to fail. Fixes: `8a1ad44135` ("tools headers: Remove now unused copies of uapi/{fcntl,openat2}.h and asm/fcntl.h") Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240403122558.1438841-1-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-03 11:48:57 -03:00
Namhyung Kim	92dfc59463	perf annotate: Add symbol name when using capstone This is to keep the existing behavior with objdump. It needs to show symbol information of global variables like below: Percent \| Source code & Disassembly of elf for cycles:P (1 samples, percent: local period) ------------------------------------------------------------------------------------------------ : 0 0xffffffff81338f70 <vm_normal_page>: 0.00 : ffffffff81338f70: endbr64 0.00 : ffffffff81338f74: callq 0xffffffff81083a40 0.00 : ffffffff81338f79: movq %rdi, %r8 0.00 : ffffffff81338f7c: movq %rdx, %rdi 0.00 : ffffffff81338f7f: callq *0x17021c3(%rip) # ffffffff82a3b148 <pv_ops+0x1e8> 0.00 : ffffffff81338f85: movq 0xffbf3c(%rip), %rdx # ffffffff82334ec8 <physical_mask> 0.00 : ffffffff81338f8c: testq %rax, %rax ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 0.00 : ffffffff81338f8f: je 0xffffffff81338fd0 here 0.00 : ffffffff81338f91: movq %rax, %rcx 0.00 : ffffffff81338f94: andl $1, %ecx Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240329215812.537846-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-03 11:48:57 -03:00
Namhyung Kim	6d17edc113	perf annotate: Use libcapstone to disassemble Now it can use the capstone library to disassemble the instructions. Let's use that (if available) for perf annotate to speed up. Currently it only supports x86 architecture. With this change I can see ~3x speed up in data type profiling. But note that capstone cannot give the source file and line number info. For now, users should use the external objdump for that by specifying the --objdump option explicitly. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240329215812.537846-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-03 11:48:57 -03:00
Namhyung Kim	98f69a573c	perf annotate: Split out util/disasm.c The util/annotate.c code has both disassembly and sample annotation related codes. Factor out the disasm part so that it can be handled more easily. No functional changes intended. Committer notes: Add missing include env.h, util.h, bpf-event.h and bpf-util.h to disasm.c, to fix things like: util/disasm.c: In function ‘symbol__disassemble_bpf’: util/disasm.c:1203:9: error: implicit declaration of function ‘perf_exe’ [-Werror=implicit-function-declaration] 1203 \| perf_exe(tpath, sizeof(tpath)); \| ^~~~~~~~ Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240329215812.537846-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-03 11:48:57 -03:00
Namhyung Kim	10adbf7776	perf annotate: Add and use ins__is_nop() Likewise, add ins__is_nop() to check if the current instruction is NOP. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240329215812.537846-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-03 11:48:57 -03:00
Namhyung Kim	ad399baa06	perf annotate: Use ins__is_xxx() if possible This is to prepare separation of disasm related code. Use the public ins API instead of checking the internal data structure. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240329215812.537846-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-03 11:48:56 -03:00
Yang Jihong	09d2056efe	perf evsel: Use evsel__name_is() helper Code cleanup, replace strcmp(evsel__name(evsel, {NAME})) with evsel__name_is() helper. No functional change. Committer notes: Fix this build error: trace.syscalls.events.bpf_output = evlist__last(trace.evlist); - assert(evsel__name_is(trace.syscalls.events.bpf_output), "__augmented_syscalls__"); + assert(evsel__name_is(trace.syscalls.events.bpf_output, "__augmented_syscalls__")); Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240401062724.1006010-3-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-03 11:48:56 -03:00
Yang Jihong	6e4b398770	perf sched timehist: Fix -g/--call-graph option failure When 'perf sched' enables the call-graph recording, sample_type of dummy event does not have PERF_SAMPLE_CALLCHAIN, timehist_check_attr() checks that the evsel does not have a callchain, and set show_callchain to 0. Currently 'perf sched timehist' only saves callchain when processing the 'sched:sched_switch event', timehist_check_attr() only needs to determine whether the event has PERF_SAMPLE_CALLCHAIN. Before: # perf sched record -g true [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 4.153 MB perf.data (7536 samples) ] # perf sched timehist Samples do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 147851.826019 [0000] perf[285035] 0.000 0.000 0.000 147851.826029 [0000] migration/0[15] 0.000 0.003 0.009 147851.826063 [0001] perf[285035] 0.000 0.000 0.000 147851.826069 [0001] migration/1[21] 0.000 0.003 0.006 <SNIP> After: # perf sched record -g true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.572 MB perf.data (822 samples) ] # perf sched timehist time cpu task name waittime sch delay runtime [tid/pid] (msec) (msec) (msec) ----------- --- --------------- -------- -------- ----- 4193.035164 [0] perf[277062] 0.000 0.000 0.000 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- preempt_schedule_common <- __cond_resched <- __wait_for_common <- wait_for_completion 4193.035174 [0] migration/0[15] 0.000 0.003 0.009 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- smpboot_thread_fn <- kthread <- ret_from_fork 4193.035207 [1] perf[277062] 0.000 0.000 0.000 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- preempt_schedule_common <- __cond_resched <- __wait_for_common <- wait_for_completion 4193.035214 [1] migration/1[21] 0.000 0.003 0.007 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- smpboot_thread_fn <- kthread <- ret_from_fork <SNIP> Fixes: `9c95e4ef06` ("perf evlist: Add evlist__findnew_tracking_event() helper") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240401062724.1006010-2-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-03 11:48:56 -03:00
Namhyung Kim	bdeaf6ffec	perf annotate: Honor output options with --data-type For data type profiling output, it should be in sync with normal output so make it display percentage for each field. Also use coloring scheme for users to identify fields with big overhead easily. Users can use --show-total-period or --show-nr-samples to change the output style like in the normal perf annotate output. Before: $ perf annotate --data-type Annotate type: 'struct task_struct' in [kernel.kallsyms] (34 samples): ============================================================================ samples offset size field 34 0 9792 struct task_struct { 2 0 24 struct thread_info thread_info { 0 0 8 long unsigned int flags; 1 8 8 long unsigned int syscall_work; 0 16 4 u32 status; 1 20 4 u32 cpu; }; After: $ perf annotate --data-type Annotate type: 'struct task_struct' in [kernel.kallsyms] (34 samples): ============================================================================ Percent offset size field 100.00 0 9792 struct task_struct { 3.55 0 24 struct thread_info thread_info { 0.00 0 8 long unsigned int flags; 1.63 8 8 long unsigned int syscall_work; 0.00 16 4 u32 status; 1.91 20 4 u32 cpu; }; Committer testing: First collect a suitable perf.data file for use with 'perf annotate --data-type': root@number:~# perf mem record -a sleep 1s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 11.047 MB perf.data (3466 samples) ] root@number:~# Then, before: root@number:~# perf annotate --data-type Annotate type: 'union ' in /usr/lib64/libc.so.6 (6 samples): ============================================================================ samples offset size field 6 0 40 union { 6 0 40 struct __pthread_mutex_s __data { 2 0 4 int __lock; 0 4 4 unsigned int __count; 0 8 4 int __owner; 1 12 4 unsigned int __nusers; 2 16 4 int __kind; 1 20 2 short int __spins; 0 22 2 short int __elision; 0 24 16 __pthread_list_t __list { 0 24 8 struct __pthread_internal_list* __prev; 0 32 8 struct __pthread_internal_list* __next; }; }; 0 0 0 char* __size; 2 0 8 long int __align; }; <SNIP> And after: Annotate type: 'union ' in /usr/lib64/libc.so.6 (6 samples): ============================================================================ Percent offset size field 100.00 0 40 union { 100.00 0 40 struct __pthread_mutex_s __data { 31.27 0 4 int __lock; 0.00 4 4 unsigned int __count; 0.00 8 4 int __owner; 7.67 12 4 unsigned int __nusers; 53.10 16 4 int __kind; 7.96 20 2 short int __spins; 0.00 22 2 short int __elision; 0.00 24 16 __pthread_list_t __list { 0.00 24 8 struct __pthread_internal_list* __prev; 0.00 32 8 struct __pthread_internal_list* __next; }; }; 0.00 0 0 char* __size; 31.27 0 8 long int __align; }; <SNIP> The lines with percentages >= 7.67 have its percentages red colored. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240322224313.423181-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-03 11:48:56 -03:00
Namhyung Kim	374af9f1f0	perf annotate: Get rid of duplicate --group option item The options array in cmd_annotate() has duplicate --group options. It only needs one and let's get rid of the other. $ perf annotate -h 2>&1 \| grep group --group Show event group information together --group Show event group information together Fixes: `7ebaf4890f` ("perf annotate: Support '--group' option") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240322224313.423181-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-04-03 11:48:56 -03:00
Alexander Lobakin	7d8296b250	bitops: make BYTES_TO_BITS() treewide-available Avoid open-coding that simple expression each time by moving BYTES_TO_BITS() from the probes code to <linux/bitops.h> to export it to the rest of the kernel. Simplify the macro while at it. `BITS_PER_LONG / sizeof(long)` always equals to %BITS_PER_BYTE, regardless of the target architecture. Do the same for the tools ecosystem as well (incl. its version of bitops.h). The previous implementation had its implicit type of long, while the new one is int, so adjust the format literal accordingly in the perf code. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-01 10:49:27 +01:00
Linus Torvalds	c150b809f7	RISC-V Patches for the 6.9 Merge Window * Support for various vector-accelerated crypto routines. * Hibernation is now enabled for portable kernel builds. * mmap_rnd_bits_max is larger on systems with larger VAs. * Support for fast GUP. * Support for membarrier-based instruction cache synchronization. * Support for the Andes hart-level interrupt controller and PMU. * Some cleanups around unaligned access speed probing and Kconfig settings. * Support for ACPI LPI and CPPC. * Various cleanus related to barriers. * A handful of fixes. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmX9icgTHHBhbG1lckBk YWJiZWx0LmNvbQAKCRAuExnzX7sYib+UD/4xyL6UMixx6A06BVBL9UT4vOrxRvNr JIihG5y5QNMjes9DHWL35mZTMqFtQ0tq94ViWFLmJWloV/8KRVM2C9R9KX7vplf3 M/OwvP106spxgvNHoeQbycgs42RU1t2mpqT7N1iK2hCjqieP3vLn6hsSLXWTAG0L 3gQbQw6XCLC3hPyLq+nbFY2i4faeCmpXWmixoy/IvQ5calZQrRU0LNlP6lcMBhVo uocjG0uGAhrahw2s81jxcMZcxa3AvUCiplapdD5H5v9rBM85SkYJj2Q9SqdSorkb xzuimRnKPI5s47yM3pTfZY0qnQUYHV7PXXuw4WujpCQVQdhaG+Ggq63UUZA61J9t IzZK2zdcfHqICrGTtXImUzRT3dcc3oq+IFq4tTY+rEJm29hrXkAtx+qBm5xtMvax fJz5feJ/iT0u7MDj4Oq24n+Kpl+Olm+MJaZX3m5Ovi/9V6a9iK9HXqxg9/Fs0fMO +J/0kTgd8Vu9CYH7KNWz3uztcO9eMAH3VyzuXuab4BGj1i1Y/9EjpALQi7rDN73S OsYQX6NnzMkBV4dvElJVLXiPlvNlMHZZwdak5CqPb48jaJu6iiIZAuvOrG6/naGP wnQSLVA2WWWoOkl3AJhxfpa11CLhbMl9E2gYm1VtNvASXoSFIxlAq1Yv3sG8yjty 4ZT0rYFJOstYiQ== =3dL5 -----END PGP SIGNATURE----- Merge tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for various vector-accelerated crypto routines - Hibernation is now enabled for portable kernel builds - mmap_rnd_bits_max is larger on systems with larger VAs - Support for fast GUP - Support for membarrier-based instruction cache synchronization - Support for the Andes hart-level interrupt controller and PMU - Some cleanups around unaligned access speed probing and Kconfig settings - Support for ACPI LPI and CPPC - Various cleanus related to barriers - A handful of fixes * tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (66 commits) riscv: Fix syscall wrapper for >word-size arguments crypto: riscv - add vector crypto accelerated AES-CBC-CTS crypto: riscv - parallelize AES-CBC decryption riscv: Only flush the mm icache when setting an exec pte riscv: Use kcalloc() instead of kzalloc() riscv/barrier: Add missing space after ',' riscv/barrier: Consolidate fence definitions riscv/barrier: Define RISCV_FULL_BARRIER riscv/barrier: Define __{mb,rmb,wmb} RISC-V: defconfig: Enable CONFIG_ACPI_CPPC_CPUFREQ cpufreq: Move CPPC configs to common Kconfig and add RISC-V ACPI: RISC-V: Add CPPC driver ACPI: Enable ACPI_PROCESSOR for RISC-V ACPI: RISC-V: Add LPI driver cpuidle: RISC-V: Move few functions to arch/riscv riscv: Introduce set_compat_task() in asm/compat.h riscv: Introduce is_compat_thread() into compat.h riscv: add compile-time test into is_compat_task() riscv: Replace direct thread flag check with is_compat_task() riscv: Improve arch_get_mmap_end() macro ...	2024-03-22 10:41:13 -07:00
Arnaldo Carvalho de Melo	4962e19496	perf beauty: Move uapi/linux/vhost.h copy out of the directory used to build perf It is only used to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/include/ hierarchy, that is used just for scraping. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. No other tools/ living code uses it, just <linux/vhost.h> coming from either 'make install_headers' or from the system /usr/include/ directory. Suggested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 20:44:35 -03:00
Ian Rogers	b3ad832d8d	perf dso: Reorder members to save space in 'struct dso' Save 40 bytes and move from 8 to 7 cache lines. Make member dwfl dependent on being a powerpc build. Squeeze bits of int/enum types when appropriate. Remove holes/padding by reordering variables. Before: struct dso { struct mutex lock; /* 0 40 / struct list_head node; / 40 16 / struct rb_node rb_node __attribute__((__aligned__(8))); / 56 24 / / --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- / struct rb_root root; /* 80 8 / struct rb_root_cached symbols; / 88 16 / struct symbol * symbol_names; /* 104 8 / size_t symbol_names_len; / 112 8 / struct rb_root_cached inlined_nodes; / 120 16 / / --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- / struct rb_root_cached srclines; / 136 16 / struct { u64 addr; / 152 8 / struct symbol symbol; /* 160 8 / } last_find_result; / 152 16 / void a2l; /* 168 8 / char symsrc_filename; /* 176 8 / unsigned int a2l_fails; / 184 4 / enum dso_space_type kernel; / 188 4 / / --- cacheline 3 boundary (192 bytes) --- / _Bool is_kmod; / 192 1 / / XXX 3 bytes hole, try to pack / enum dso_swap_type needs_swap; / 196 4 / enum dso_binary_type symtab_type; / 200 4 / enum dso_binary_type binary_type; / 204 4 / enum dso_load_errno load_errno; / 208 4 / u8 adjust_symbols:1; / 212: 0 1 / u8 has_build_id:1; / 212: 1 1 / u8 header_build_id:1; / 212: 2 1 / u8 has_srcline:1; / 212: 3 1 / u8 hit:1; / 212: 4 1 / u8 annotate_warned:1; / 212: 5 1 / u8 auxtrace_warned:1; / 212: 6 1 / u8 short_name_allocated:1; / 212: 7 1 / u8 long_name_allocated:1; / 213: 0 1 / u8 is_64_bit:1; / 213: 1 1 / / XXX 6 bits hole, try to pack / _Bool sorted_by_name; / 214 1 / _Bool loaded; / 215 1 / u8 rel; / 216 1 / / XXX 7 bytes hole, try to pack / struct build_id bid; / 224 32 / / --- cacheline 4 boundary (256 bytes) --- / u64 text_offset; / 256 8 / u64 text_end; / 264 8 / const char short_name; /* 272 8 / const char long_name; /* 280 8 / u16 long_name_len; / 288 2 / u16 short_name_len; / 290 2 / / XXX 4 bytes hole, try to pack / void dwfl; /* 296 8 / struct auxtrace_cache auxtrace_cache; /* 304 8 / int comp; / 312 4 / / XXX 4 bytes hole, try to pack / / --- cacheline 5 boundary (320 bytes) --- / struct { struct rb_root cache; / 320 8 / int fd; / 328 4 / int status; / 332 4 / u32 status_seen; / 336 4 / / XXX 4 bytes hole, try to pack / u64 file_size; / 344 8 / struct list_head open_entry; / 352 16 / u64 elf_base_addr; / 368 8 / u64 debug_frame_offset; / 376 8 / / --- cacheline 6 boundary (384 bytes) --- / u64 eh_frame_hdr_addr; / 384 8 / u64 eh_frame_hdr_offset; / 392 8 / } data; / 320 80 / struct { u32 id; / 400 4 / u32 sub_id; / 404 4 / struct perf_env env; /* 408 8 / } bpf_prog; / 400 16 / union { void priv; /* 416 8 / u64 db_id; / 416 8 / }; / 416 8 / struct nsinfo nsinfo; /* 424 8 / struct dso_id id; / 432 24 / / --- cacheline 7 boundary (448 bytes) was 8 bytes ago --- / refcount_t refcnt; / 456 4 / char name[]; / 460 0 / / size: 464, cachelines: 8, members: 49 / / sum members: 440, holes: 4, sum holes: 18 / / sum bitfield members: 10 bits, bit holes: 1, sum bit holes: 6 bits / / padding: 4 / / forced alignments: 1 / / last cacheline: 16 bytes / } __attribute__((__aligned__(8))); After: struct dso { struct mutex lock; / 0 40 / struct list_head node; / 40 16 / struct rb_node rb_node __attribute__((__aligned__(8))); / 56 24 / / --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- / struct rb_root root; /* 80 8 / struct rb_root_cached symbols; / 88 16 / struct symbol * symbol_names; /* 104 8 / size_t symbol_names_len; / 112 8 / struct rb_root_cached inlined_nodes; / 120 16 / / --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- / struct rb_root_cached srclines; / 136 16 / struct { u64 addr; / 152 8 / struct symbol symbol; /* 160 8 / } last_find_result; / 152 16 / struct build_id bid; / 168 32 / / --- cacheline 3 boundary (192 bytes) was 8 bytes ago --- / u64 text_offset; / 200 8 / u64 text_end; / 208 8 / const char short_name; /* 216 8 / const char long_name; /* 224 8 / void a2l; /* 232 8 / char symsrc_filename; /* 240 8 / struct nsinfo nsinfo; /* 248 8 / / --- cacheline 4 boundary (256 bytes) --- / struct auxtrace_cache auxtrace_cache; /* 256 8 / union { void priv; /* 264 8 / u64 db_id; / 264 8 / }; / 264 8 / struct { struct perf_env env; /* 272 8 / u32 id; / 280 4 / u32 sub_id; / 284 4 / } bpf_prog; / 272 16 / struct { struct rb_root cache; / 288 8 / struct list_head open_entry; / 296 16 / u64 file_size; / 312 8 / / --- cacheline 5 boundary (320 bytes) --- / u64 elf_base_addr; / 320 8 / u64 debug_frame_offset; / 328 8 / u64 eh_frame_hdr_addr; / 336 8 / u64 eh_frame_hdr_offset; / 344 8 / int fd; / 352 4 / int status; / 356 4 / u32 status_seen; / 360 4 / } data; / 288 80 / / XXX last struct has 4 bytes of padding / struct dso_id id; / 368 24 / / --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- / unsigned int a2l_fails; / 392 4 / int comp; / 396 4 / refcount_t refcnt; / 400 4 / enum dso_load_errno load_errno; / 404 4 / u16 long_name_len; / 408 2 / u16 short_name_len; / 410 2 / enum dso_binary_type symtab_type:8; / 412: 0 4 / enum dso_binary_type binary_type:8; / 412: 8 4 / enum dso_space_type kernel:2; / 412:16 4 / enum dso_swap_type needs_swap:2; / 412:18 4 / / Bitfield combined with next fields / _Bool is_kmod:1; / 414: 4 1 / u8 adjust_symbols:1; / 414: 5 1 / u8 has_build_id:1; / 414: 6 1 / u8 header_build_id:1; / 414: 7 1 / u8 has_srcline:1; / 415: 0 1 / u8 hit:1; / 415: 1 1 / u8 annotate_warned:1; / 415: 2 1 / u8 auxtrace_warned:1; / 415: 3 1 / u8 short_name_allocated:1; / 415: 4 1 / u8 long_name_allocated:1; / 415: 5 1 / u8 is_64_bit:1; / 415: 6 1 / / XXX 1 bit hole, try to pack / _Bool sorted_by_name; / 416 1 / _Bool loaded; / 417 1 / u8 rel; / 418 1 / char name[]; / 419 0 / / size: 424, cachelines: 7, members: 48 / / sum members: 415 / / sum bitfield members: 31 bits, bit holes: 1, sum bit holes: 1 bits / / padding: 5 / / paddings: 1, sum paddings: 4 / / forced alignments: 1 / / last cacheline: 40 bytes */ } __attribute__((__aligned__(8))); Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Chengen Du <chengen.du@canonical.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Markus Elfring <Markus.Elfring@web.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20240321160300.1635121-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 20:44:35 -03:00
Anne Macedo	2a5049b75d	perf lock contention: Trim backtrace by skipping traceiter functions The 'perf lock contention' program currently shows the caller of the locks as __traceiter_contention_begin+0x??. This caller can be ignored, as it is from the traceiter itself. Instead, it should show the real callers for the locks. When fiddling with the --stack-skip parameter, the actual callers for the locks start to show up. However, just ignore the __traceiter_contention_begin and the __traceiter_contention_end symbols so the actual callers will show up. Before this patch is applied: sudo perf lock con -a -b -- sleep 3 contended total wait max wait avg wait type caller 8 2.33 s 2.28 s 291.18 ms rwlock:W __traceiter_contention_begin+0x44 4 2.33 s 2.28 s 582.35 ms rwlock:W __traceiter_contention_begin+0x44 7 140.30 ms 46.77 ms 20.04 ms rwlock:W __traceiter_contention_begin+0x44 2 63.35 ms 33.76 ms 31.68 ms mutex trace_contention_begin+0x84 2 46.74 ms 46.73 ms 23.37 ms rwlock:W __traceiter_contention_begin+0x44 1 13.54 us 13.54 us 13.54 us mutex trace_contention_begin+0x84 1 3.67 us 3.67 us 3.67 us rwsem:R __traceiter_contention_begin+0x44 Before this patch is applied - using --stack-skip 5 sudo perf lock con --stack-skip 5 -a -b -- sleep 3 contended total wait max wait avg wait type caller 2 2.24 s 2.24 s 1.12 s rwlock:W do_epoll_wait+0x5a0 4 1.65 s 824.21 ms 412.08 ms rwlock:W do_exit+0x338 2 824.35 ms 824.29 ms 412.17 ms spinlock get_signal+0x108 2 824.14 ms 824.14 ms 412.07 ms rwlock:W release_task+0x68 1 25.22 ms 25.22 ms 25.22 ms mutex cgroup_kn_lock_live+0x58 1 24.71 us 24.71 us 24.71 us spinlock do_exit+0x44 1 22.04 us 22.04 us 22.04 us rwsem:R lock_mm_and_find_vma+0xb0 After this patch is applied: sudo ./perf lock con -a -b -- sleep 3 contended total wait max wait avg wait type caller 4 4.13 s 2.07 s 1.03 s rwlock:W release_task+0x68 2 2.07 s 2.07 s 1.03 s rwlock:R mm_update_next_owner+0x50 2 2.07 s 2.07 s 1.03 s rwlock:W do_exit+0x338 1 41.56 ms 41.56 ms 41.56 ms mutex cgroup_kn_lock_live+0x58 2 36.12 us 18.83 us 18.06 us rwlock:W do_exit+0x338 Signed-off-by: Anne Macedo <retpolanne@posteo.net> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240319143629.3422590-1-retpolanne@posteo.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 20:44:35 -03:00
Linus Torvalds	cba9ffdb99	Including fixes from CAN, netfilter, wireguard and IPsec. Current release - regressions: - rxrpc: fix use of page_frag_alloc_align(), it changed semantics and we added a new caller in a different subtree - xfrm: allow UDP encapsulation only in offload modes Current release - new code bugs: - tcp: fix refcnt handling in __inet_hash_connect() - Revert "net: Re-use and set mono_delivery_time bit for userspace tstamp packets", conflicted with some expectations in BPF uAPI Previous releases - regressions: - ipv4: raw: fix sending packets from raw sockets via IPsec tunnels - devlink: fix devlink's parallel command processing - veth: do not manipulate GRO when using XDP - esp: fix bad handling of pages from page_pool Previous releases - always broken: - report RCU QS for busy network kthreads (with Paul McK's blessing) - tcp/rds: fix use-after-free on netns with kernel TCP reqsk - virt: vmxnet3: fix missing reserved tailroom with XDP Misc: - couple of build fixes for Documentation Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmX8bXsACgkQMUZtbf5S IrsfBg/+KzrEx0tB/Af57ZZGZ5PMjPy+XFDox4iFfHm338UFuGXVvZrXd7G+6YkH ZwWeF5YDPKzwIEiZ5D3hewZPlkLH0Eg88q74chlE0gUv7t1jhuQHUdIVeFnPcLbN t/8AcCZCJ2fENbr1iNnzZON1RW0fVOl+SDxhiSYFeFqii6FywDfqWL/h0u86H/AF KRktgb0LzH0waH6IiefVV1NZyjnZwmQ6+UVQerTzUnQmWhV1xQKoO3MQpZuFRvr6 O+kPZMkrqnTCCy7RO1BexS5cefqc80i5Z25FLGcaHgpnYd2pDNDMMxqrhqO9Y0Pv 6u/tLgRxzVUDXWouzREIRe50Z9GJswkg78zilAhpqYiHRjd8jaBH6y+9mhGFc7F8 iVAx02WfJhlk0aynFf2qZmR7PQIb9XjtFJ7OAeJrno9UD7zAubtikGM/6m6IZfRV TD1mze95RVnNjbHZMeg6oNLFUMJXVTobtvtqk5pTQvsNsmSYGFvkvWC5/P6ycyYt pMx6E0PA/ZCnQAlThCOCzFa5BO+It3RJHcQJhgbOzHrlWKwmrjBKcKJcLLcxFSUt 4wwjdEcG1Bo2wdnsjwsQwJDHQW+M9TSLdLM3YVptM9jbqOMizoqr6/xSykg3H4wZ t/dSiYSsEr06z7lvwbAjUXJ/mfszZ+JsVAFXAN7ahcM4OZb5WTQ= =gpLl -----END PGP SIGNATURE----- Merge tag 'net-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from CAN, netfilter, wireguard and IPsec. I'd like to highlight [ lowlight? - Linus ] Florian W stepping down as a netfilter maintainer due to constant stream of bug reports. Not sure what we can do but IIUC this is not the first such case. Current release - regressions: - rxrpc: fix use of page_frag_alloc_align(), it changed semantics and we added a new caller in a different subtree - xfrm: allow UDP encapsulation only in offload modes Current release - new code bugs: - tcp: fix refcnt handling in __inet_hash_connect() - Revert "net: Re-use and set mono_delivery_time bit for userspace tstamp packets", conflicted with some expectations in BPF uAPI Previous releases - regressions: - ipv4: raw: fix sending packets from raw sockets via IPsec tunnels - devlink: fix devlink's parallel command processing - veth: do not manipulate GRO when using XDP - esp: fix bad handling of pages from page_pool Previous releases - always broken: - report RCU QS for busy network kthreads (with Paul McK's blessing) - tcp/rds: fix use-after-free on netns with kernel TCP reqsk - virt: vmxnet3: fix missing reserved tailroom with XDP Misc: - couple of build fixes for Documentation" * tag 'net-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits) selftests: forwarding: Fix ping failure due to short timeout MAINTAINERS: step down as netfilter maintainer netfilter: nf_tables: Fix a memory leak in nf_tables_updchain net: dsa: mt7530: fix handling of all link-local frames net: dsa: mt7530: fix link-local frames that ingress vlan filtering ports bpf: report RCU QS in cpumap kthread net: report RCU QS on threaded NAPI repolling rcu: add a helper to report consolidated flavor QS ionic: update documentation for XDP support lib/bitmap: Fix bitmap_scatter() and bitmap_gather() kernel doc netfilter: nf_tables: do not compare internal table flags on updates netfilter: nft_set_pipapo: release elements in clone only from destroy path octeontx2-af: Use separate handlers for interrupts octeontx2-pf: Send UP messages to VF only when VF is up. octeontx2-pf: Use default max_active works instead of one octeontx2-pf: Wait till detach_resources msg is complete octeontx2: Detect the mbox up or down message via register devlink: fix port new reply cmd type tcp: Clear req->syncookie in reqsk_alloc(). net/bnx2x: Prevent access to a freed page in page_pool ...	2024-03-21 14:50:39 -07:00
Ian Rogers	af34a16d30	perf vendor events intel: Remove info metrics erroneously in TopdownL1 Bug affected server metrics only. This doesn't impact default metrics but if the TopdownL1 metric group is specified. Passes on the fix in: `b09f0a3953` Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Ian Rogers	7bce27f8d3	perf vendor events intel: Update snowridgex to 1.22 Update events from 1.21 to 1.22 as released in: `ba4f96039f` Updates various descriptions and removes the event UNC_IIO_NUM_REQ_FROM_CPU.IRP. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-12-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Ian Rogers	70e7028c5b	perf vendor events intel: Update skylake to v58 Update events from: `f2e5136e06` This change didn't increase the version number from v58. Updates various descriptions. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-11-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Ian Rogers	d70cc755ca	perf vendor events intel: Update skylakex to 1.33 Update events from 1.32 to 1.33 as released in: `3fe7390dd1` Various description updates. Adds the event OFFCORE_RESPONSE.ALL_READS.L3_HIT.HIT_OTHER_CORE_FWD. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Ian Rogers	bf270b15c0	perf vendor events intel: Update sierraforest to 1.02 Update events from 1.01 to 1.02 as released in: `451dd41ae6` Various description updates. Adds topdown events TOPDOWN_BAD_SPECULATION.ALL_P, TOPDOWN_BE_BOUND.ALL_P, TOPDOWN_FE_BOUND.ALL_P and TOPDOWN_RETIRING.ALL_P. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Ian Rogers	2edee9e666	perf vendor events intel: Update sapphirerapids to 1.20 Update events from 1.17 to 1.20 as released in: `6f67405774` Various description updates. Adds uncore events UNC_CHA_TOR_INSERTS.IO_ITOMCACHENEAR_LOCAL, UNC_CHA_TOR_INSERTS.IO_ITOMCACHENEAR_REMOTE, UNC_CHA_TOR_INSERTS.IO_ITOM_LOCAL, UNC_CHA_TOR_INSERTS.IO_ITOM_REMOTE, UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_LOCAL, UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_REMOTE, UNC_CHA_TOR_OCCUPANCY.IO_MISS_ITOMCACHENEAR_LOCAL, UNC_CHA_TOR_OCCUPANCY.IO_MISS_ITOMCACHENEAR_REMOTE, UNC_CHA_TOR_OCCUPANCY.IO_MISS_ITOM_LOCAL, UNC_CHA_TOR_OCCUPANCY.IO_MISS_ITOM_REMOTE, UNC_CHA_TOR_OCCUPANCY.IO_MISS_PCIRDCUR_LOCAL, UNC_CHA_TOR_OCCUPANCY.IO_MISS_PCIRDCUR_REMOTE and removes core events AMX_OPS_RETIRED.BF16 and AMX_OPS_RETIRED.INT8. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Ian Rogers	84d0e8c6db	perf vendor events intel: Update meteorlake to 1.08 Update events from 1.07 to 1.08 as released in: `f0f8f3e163` Various description updates. Adds topdown, offcore and uncore events OCR.DEMAND_DATA_RD.L3_HIT, OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_NO_FWD, OCR.DEMAND_RFO.L3_HIT, OCR.DEMAND_DATA_RD.L3_MISS, OCR.DEMAND_RFO.L3_MISS, OCR.DEMAND_DATA_RD.ANY_RESPONSE, OCR.DEMAND_DATA_RD.DRAM, OCR.DEMAND_RFO.ANY_RESPONSE, OCR.DEMAND_RFO.DRAM, TOPDOWN_BAD_SPECULATION.ALL_P, TOPDOWN_BE_BOUND.ALL_P, TOPDOWN_FE_BOUND.ALL_P, TOPDOWN_RETIRING.ALL_P, UNC_ARB_DAT_OCCUPANCY.RD and UNC_HAC_ARB_COH_TRK_REQUESTS.ALL. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Ian Rogers	3670ffbda1	perf vendor events intel: Update lunarlake to 1.01 Update events from 1.00 to 1.01 as released in: `56ab8d837a` Various encoding and description updates. Adds the events CPU_CLK_UNHALTED.CORE, CPU_CLK_UNHALTED.CORE_P, CPU_CLK_UNHALTED.REF_TSC_P, CPU_CLK_UNHALTED.THREAD, MISC_RETIRED.LBR_INSERTS, TOPDOWN_BAD_SPECULATION.ALL_P, TOPDOWN_BE_BOUND.ALL_P, TOPDOWN_FE_BOUND.ALL_P, TOPDOWN_RETIRING.ALL_P. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Ian Rogers	5157c2042e	perf vendor events intel: Update icelakex to 1.24 Update events from 1.23 to 1.24 as released in: `d883888ae6` Fixes spelling and descriptions. Adds the uncore events UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_LOCAL and UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_REMOTE, while removing UNC_IIO_NUM_REQ_FROM_CPU.IRP. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Ian Rogers	a02dc01cef	perf vendor events intel: Update grandridge to 1.02 Update events from 1.01 to 1.02 as released in: `b2a81e803a` Fixes spelling and descriptions. Adds topdown events and uncore cache UNC_CHA_TOR_OCCUPANCY.IA_HIT_DRD_OPT, UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_OPT, UNC_CHA_TOR_OCCUPANCY.IA_DRD_OPT. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Ian Rogers	36f353a1eb	perf vendor events intel: Update emeraldrapids to 1.06 Update events from 1.03 to 1.96 as released in: `21a8be3ea7` Fixes spelling and descriptions. Adds cache miss latency events UNC_CHA_TOR_(INSERTS\|OCCUPANCY).IO_(PCIRDCUR\|ITOM\|ITOMCACHENEAR)_(LOCAL\|REMOTE). Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Ian Rogers	4376424acd	perf vendor events intel: Update cascadelakex to 1.21 Update events from 1.20 to 1.21 as released in: `fcfdba3be8` Largely fixes spelling and descriptions. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Arnaldo Carvalho de Melo	5810371519	perf probe: Add missing libgen.h header needed for using basename() This prototype is obtained indirectly, by luck, from some other header in probe-event.c in most systems, but recently exploded on alpine:edge: 8 13.39 alpine:edge : FAIL gcc version 13.2.1 20240309 (Alpine 13.2.1_git20240309) util/probe-event.c: In function 'convert_exec_to_group': util/probe-event.c:225:16: error: implicit declaration of function 'basename' [-Werror=implicit-function-declaration] 225 \| ptr1 = basename(exec_copy); \| ^~~~~~~~ util/probe-event.c:225:14: error: assignment to 'char ' from 'int' makes pointer from integer without a cast [-Werror=int-conversion] 225 \| ptr1 = basename(exec_copy); \| ^ cc1: all warnings being treated as errors make[3]: ** [/git/perf-6.8.0/tools/build/Makefile.build:158: util] Error 2 Fix it by adding the libgen.h header where basename() is prototyped. Fixes: `fb7345bbf7` ("perf probe: Support basic dwarf-based operations on uprobe events") Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Arnaldo Carvalho de Melo	0831638e8c	perf trace: Fix 'newfstatat'/'fstatat' argument pretty printing There were needless two entries, one for 'newfstatat' and another for 'fstatat', keep just one and pretty print its 'flags' argument using the fs_at_flags scnprintf that is also used by other FS syscalls such as 'stat', now: root@number:~# perf trace -e newfstatat --max-events=5 0.000 ( 0.010 ms): abrt-dump-jour/1400 newfstatat(dfd: 7, filename: "", statbuf: 0x7fff0d127000, flag: EMPTY_PATH) = 0 0.020 ( 0.003 ms): abrt-dump-jour/1400 newfstatat(dfd: 9, filename: "", statbuf: 0x55752507b0e8, flag: EMPTY_PATH) = 0 0.039 ( 0.004 ms): abrt-dump-jour/1400 newfstatat(dfd: 19, filename: "", statbuf: 0x557525061378, flag: EMPTY_PATH) = 0 0.047 ( 0.003 ms): abrt-dump-jour/1400 newfstatat(dfd: 20, filename: "", statbuf: 0x5575250b8cc8, flag: EMPTY_PATH) = 0 0.053 ( 0.003 ms): abrt-dump-jour/1400 newfstatat(dfd: 22, filename: "", statbuf: 0x5575250535d8, flag: EMPTY_PATH) = 0 root@number:~# Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240320193115.811899-6-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Arnaldo Carvalho de Melo	4d92328290	perf trace: Beautify the 'flags' arg of unlinkat Reusing the fs_at_flags array done for the 'stat' syscall. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240320193115.811899-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Arnaldo Carvalho de Melo	b8171a8406	perf beauty: Introduce faccessat2 flags scnprintf routine The fsaccessat and fsaccessat2 now have beautifiers for its arguments. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240320193115.811899-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Arnaldo Carvalho de Melo	f122b3d6d1	perf beauty: Introduce scrape script for the 'statx' syscall 'mask' argument It was using the first variation on producing a string representation for a binary flag, one that used the system's stat.h and preprocessor tricks that had to be updated everytime a new flag was introduced. Use the more recent scrape script + strarray + strarray__scnprintf_flags() combo. $ tools/perf/trace/beauty/statx_mask.sh static const char *statx_mask[] = { [ilog2(0x00000001) + 1] = "TYPE", [ilog2(0x00000002) + 1] = "MODE", [ilog2(0x00000004) + 1] = "NLINK", [ilog2(0x00000008) + 1] = "UID", [ilog2(0x00000010) + 1] = "GID", [ilog2(0x00000020) + 1] = "ATIME", [ilog2(0x00000040) + 1] = "MTIME", [ilog2(0x00000080) + 1] = "CTIME", [ilog2(0x00000100) + 1] = "INO", [ilog2(0x00000200) + 1] = "SIZE", [ilog2(0x00000400) + 1] = "BLOCKS", [ilog2(0x00000800) + 1] = "BTIME", [ilog2(0x00001000) + 1] = "MNT_ID", [ilog2(0x00002000) + 1] = "DIOALIGN", [ilog2(0x00004000) + 1] = "MNT_ID_UNIQUE", }; $ Now we need a copy of uapi/linux/stat.h from tools/include/ in the scrape only directory tools/perf/trace/beauty/include. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240320193115.811899-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Arnaldo Carvalho de Melo	3d6cfbaf27	perf beauty: Introduce scrape script for various fs syscalls 'flags' arguments It was using the first variation on producing a string representation for a binary flag, one that used the system's fcntl.h and preprocessor tricks that had to be updated everytime a new flag was introduced. Use the more recent scrape script + strarray + strarray__scnprintf_flags() combo. $ tools/perf/trace/beauty/fs_at_flags.sh static const char *fs_at_flags[] = { [ilog2(0x100) + 1] = "SYMLINK_NOFOLLOW", [ilog2(0x200) + 1] = "REMOVEDIR", [ilog2(0x400) + 1] = "SYMLINK_FOLLOW", [ilog2(0x800) + 1] = "NO_AUTOMOUNT", [ilog2(0x1000) + 1] = "EMPTY_PATH", [ilog2(0x0000) + 1] = "STATX_SYNC_AS_STAT", [ilog2(0x2000) + 1] = "STATX_FORCE_SYNC", [ilog2(0x4000) + 1] = "STATX_DONT_SYNC", [ilog2(0x8000) + 1] = "RECURSIVE", [ilog2(0x80000000) + 1] = "GETATTR_NOSEC", }; $ Now we need a copy of uapi/linux/fcntl.h from tools/include/ in the scrape only directory tools/perf/trace/beauty/include and will use that fs_at_flags array for other fs syscalls. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240320193115.811899-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Ian Rogers	4cef0e7ae7	perf tests: Run tests in parallel by default Switch from running tests sequentially to running in parallel by default. Change the opt-in '-p' or '--parallel' flag to '-S' or '--sequential'. On an 8 core tigerlake an address sanitizer run time changes from: 326.54user 622.73system 6:59.91elapsed 226%CPU to: 973.02user 583.98system 3:01.17elapsed 859%CPU So over twice as fast, saving 4 minutes. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240301174711.2646944-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Ian Rogers	7aea01eaf4	perf help: Lower levenshtein penality for deleting character The levenshtein penalty for deleting a character was far higher than subsituting or inserting a character. Lower the penalty to match that of inserting a character. Before: $ perf recccord perf: 'recccord' is not a perf-command. See 'perf --help'. $ After: $ perf recccord perf: 'recccord' is not a perf-command. See 'perf --help'. Did you mean this? record $ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240301201306.2680986-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Ian Rogers	f664d5159d	perf tools: Suggest inbuilt commands for unknown command The existing unknown command code looks for perf scripts like perf-archive.sh and perf-iostat.sh, however, inbuilt commands aren't suggested. Add the inbuilt commands so they may be suggested too. Before: $ perf reccord perf: 'reccord' is not a perf-command. See 'perf --help'. $ After: $ perf reccord perf: 'reccord' is not a perf-command. See 'perf --help'. Did you mean this? record $ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240301201306.2680986-1-irogers@google.com [ Added some fixes from Ian to problems I noticed while testing ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:40 -03:00
Ian Rogers	5f2f051a93	perf test: Read child test 10 times a second rather than 1 Make the perf test output smoother by timing out the poll of the child process after 100ms rather than 1s. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Disha Goel <disgoel@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240301074639.2260708-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:39 -03:00
Ian Rogers	e120f7091a	perf test: Use a single fd for the child process out/err Switch from dumping err then out, to a single file descriptor for both of them. This allows the err and output to be correctly interleaved in verbose output. Fixes: `b482f5f8e0` ("perf tests: Add option to run tests in parallel") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Disha Goel <disgoel@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240301074639.2260708-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:39 -03:00
Ian Rogers	f68c981be0	perf test: Stat output per thread of just the parent process Per-thread mode requires either system-wide (-a), a pid (-p) or a tid (-t). The stat output tests were using system-wide mode but this is racy when threads are starting and exiting - something that happens a lot when running the tests in parallel (perf test -p). Avoid the race conditions by using pid mode with the pid of the parent process. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Disha Goel <disgoel@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240301074639.2260708-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:39 -03:00
Ian Rogers	88ce0106a1	perf record: Delete session after stopping sideband thread The session has a header in it which contains a perf env with bpf_progs. The bpf_progs are accessed by the sideband thread and so the sideband thread must be stopped before the session is deleted, to avoid a use after free. This error was detected by AddressSanitizer in the following: ==2054673==ERROR: AddressSanitizer: heap-use-after-free on address 0x61d000161e00 at pc 0x55769289de54 bp 0x7f9df36d4ab0 sp 0x7f9df36d4aa8 READ of size 8 at 0x61d000161e00 thread T1 #0 0x55769289de53 in __perf_env__insert_bpf_prog_info util/env.c:42 #1 0x55769289dbb1 in perf_env__insert_bpf_prog_info util/env.c:29 #2 0x557692bbae29 in perf_env__add_bpf_info util/bpf-event.c:483 #3 0x557692bbb01a in bpf_event__sb_cb util/bpf-event.c:512 #4 0x5576928b75f4 in perf_evlist__poll_thread util/sideband_evlist.c:68 #5 0x7f9df96a63eb in start_thread nptl/pthread_create.c:444 #6 0x7f9df9726a4b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 0x61d000161e00 is located 384 bytes inside of 2136-byte region [0x61d000161c80,0x61d0001624d8) freed by thread T0 here: #0 0x7f9dfa6d7288 in __interceptor_free libsanitizer/asan/asan_malloc_linux.cpp:52 #1 0x557692978d50 in perf_session__delete util/session.c:319 #2 0x557692673959 in __cmd_record tools/perf/builtin-record.c:2884 #3 0x55769267a9f0 in cmd_record tools/perf/builtin-record.c:4259 #4 0x55769286710c in run_builtin tools/perf/perf.c:349 #5 0x557692867678 in handle_internal_command tools/perf/perf.c:402 #6 0x557692867a40 in run_argv tools/perf/perf.c:446 #7 0x557692867fae in main tools/perf/perf.c:562 #8 0x7f9df96456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 Fixes: `657ee55319` ("perf evlist: Introduce side band thread") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Disha Goel <disgoel@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240301074639.2260708-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:54:39 -03:00
Ian Rogers	67ee8e71da	perf tools: Add/use PMU reverse lookup from config to name Add perf_pmu__name_from_config that does a reverse lookup from a config number to an alias name. The lookup is expensive as the config is computed for every alias by filling in a perf_event_attr, but this is only done when verbose output is enabled. The lookup also only considers config, and not config1, config2 or config3. An example of the output: $ perf stat -vv -e data_read true ... perf_event_attr: type 24 (uncore_imc_free_running_0) size 136 config 0x20ff (data_read) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ... Committer notes: Fix the python binding build by adding dummies for not strictly needed perf_pmu__name_from_config() and perf_pmus__find_by_type(). Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240308001915.4060155-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 13:53:45 -03:00
Ian Rogers	7093882067	perf tools: Use pmus to describe type from attribute When dumping a perf_event_attr, use pmus to find the PMU and its name by the type number. This allows dynamically added PMUs to be described. Before: $ perf stat -vv -e data_read true ... perf_event_attr: type 24 size 136 config 0x20ff sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ... After: $ perf stat -vv -e data_read true ... perf_event_attr: type 24 (uncore_imc_free_running_0) size 136 config 0x20ff sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ... However, it also means that when we have a PMU name we prefer it to a hard coded name: Before: $ perf stat -vv -e faults true ... perf_event_attr: type 1 (PERF_TYPE_SOFTWARE) size 136 config 0x2 (PERF_COUNT_SW_PAGE_FAULTS) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ... After: $ perf stat -vv -e faults true ... perf_event_attr: type 1 (software) size 136 config 0x2 (PERF_COUNT_SW_PAGE_FAULTS) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ... It feels more consistent to do this, rather than only prefer a PMU name when a hard coded name isn't available. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240308001915.4060155-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:29 -03:00
Ian Rogers	4ccf3bb703	perf list: Give more details about raw event encodings List all the PMUs, not just the first core one, and list real format specifiers with value ranges. Before: $ perf list ... rNNN [Raw hardware event descriptor] cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor] [(see 'man perf-list' on how to encode it)] mem:<addr>[/len][:access] [Hardware breakpoint] ... After: $ perf list ... rNNN [Raw event descriptor] cpu/event=0..255,pc,edge,.../modifier [Raw event descriptor] [(see 'man perf-list' or 'man perf-record' on how to encode it)] breakpoint//modifier [Raw event descriptor] cstate_core/event=0..0xffffffffffffffff/modifier [Raw event descriptor] cstate_pkg/event=0..0xffffffffffffffff/modifier [Raw event descriptor] i915/i915_eventid=0..0x1fffff/modifier [Raw event descriptor] intel_bts//modifier [Raw event descriptor] intel_pt/ptw,event,cyc_thresh=0..15,.../modifier [Raw event descriptor] kprobe/retprobe/modifier [Raw event descriptor] msr/event=0..0xffffffffffffffff/modifier [Raw event descriptor] power/event=0..255/modifier [Raw event descriptor] software//modifier [Raw event descriptor] tracepoint//modifier [Raw event descriptor] uncore_arb/event=0..255,edge,inv,.../modifier [Raw event descriptor] uncore_cbox/event=0..255,edge,inv,.../modifier [Raw event descriptor] uncore_clock/event=0..255/modifier [Raw event descriptor] uncore_imc_free_running/event=0..255,umask=0..255/modifier[Raw event descriptor] uprobe/ref_ctr_offset=0..0xffffffff,retprobe/modifier[Raw event descriptor] mem:<addr>[/len][:access] [Hardware breakpoint] ... With '--details' provide more details on the formats encoding: cpu/event=0..255,pc,edge,.../modifier [Raw event descriptor] [(see 'man perf-list' or 'man perf-record' on how to encode it)] cpu/event=0..255,pc,edge,offcore_rsp=0..0xffffffffffffffff,ldlat=0..0xffff,inv, umask=0..255,frontend=0..0xffffff,cmask=0..255,config=0..0xffffffffffffffff, config1=0..0xffffffffffffffff,config2=0..0xffffffffffffffff,config3=0..0xffffffffffffffff, name=string,period=number,freq=number,branch_type=(u\|k\|hv\|any\|...),time, call-graph=(fp\|dwarf\|lbr),stack-size=number,max-stack=number,nr=number,inherit,no-inherit, overwrite,no-overwrite,percore,aux-output,aux-sample-size=number/modifier breakpoint//modifier [Raw event descriptor] breakpoint//modifier cstate_core/event=0..0xffffffffffffffff/modifier [Raw event descriptor] cstate_core/event=0..0xffffffffffffffff/modifier cstate_pkg/event=0..0xffffffffffffffff/modifier [Raw event descriptor] cstate_pkg/event=0..0xffffffffffffffff/modifier i915/i915_eventid=0..0x1fffff/modifier [Raw event descriptor] i915/i915_eventid=0..0x1fffff/modifier intel_bts//modifier [Raw event descriptor] intel_bts//modifier intel_pt/ptw,event,cyc_thresh=0..15,.../modifier [Raw event descriptor] intel_pt/ptw,event,cyc_thresh=0..15,pt,notnt,branch,tsc,pwr_evt,fup_on_ptw,cyc,noretcomp, mtc,psb_period=0..15,mtc_period=0..15/modifier kprobe/retprobe/modifier [Raw event descriptor] kprobe/retprobe/modifier msr/event=0..0xffffffffffffffff/modifier [Raw event descriptor] msr/event=0..0xffffffffffffffff/modifier power/event=0..255/modifier [Raw event descriptor] power/event=0..255/modifier software//modifier [Raw event descriptor] software//modifier tracepoint//modifier [Raw event descriptor] tracepoint//modifier uncore_arb/event=0..255,edge,inv,.../modifier [Raw event descriptor] uncore_arb/event=0..255,edge,inv,umask=0..255,cmask=0..31/modifier uncore_cbox/event=0..255,edge,inv,.../modifier [Raw event descriptor] uncore_cbox/event=0..255,edge,inv,umask=0..255,cmask=0..31/modifier uncore_clock/event=0..255/modifier [Raw event descriptor] uncore_clock/event=0..255/modifier uncore_imc_free_running/event=0..255,umask=0..255/modifier[Raw event descriptor] uncore_imc_free_running/event=0..255,umask=0..255/modifier uprobe/ref_ctr_offset=0..0xffffffff,retprobe/modifier[Raw event descriptor] uprobe/ref_ctr_offset=0..0xffffffff,retprobe/modifier Committer notes: Address this build error in various distros: 55 58.44 ubuntu:24.04 : FAIL gcc version 13.2.0 (Ubuntu 13.2.0-17ubuntu2) util/pmu.c:1638:70: error: '_Static_assert' with no message is a C2x extension [-Werror,-Wc2x-extensions] 1638 \| _Static_assert(ARRAY_SIZE(terms) == __PARSE_EVENTS__TERM_TYPE_NR - 6); \| ^ \| , "" 1 error generated. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240308001915.4060155-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:29 -03:00
Ian Rogers	aa1f4ad287	perf list: Allow wordwrap to wrap on commas A raw event encoding may be a block with terms separated by commas. If wrapping such a string it would be useful to break at the commas, so add this ability to wordwrap. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240308001915.4060155-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:29 -03:00
Ian Rogers	39aa4ff61f	perf pmu: Drop "default_core" from alias names "default_core" is used by jevents.py for json events' PMU name when none is specified. On x86 the "default_core" is typically the PMU "cpu". When creating an alias see if the event's PMU name is "default_core" in which case don't record it. This means in places like "perf list" the PMU's name will be used in its place. Before: $ perf list --details ... cache: l1d.replacement [Counts the number of cache lines replaced in L1 data cache] default_core/event=0x51,period=0x186a3,umask=0x1/ ... After: $ perf list --details ... cache: l1d.replacement [Counts the number of cache lines replaced in L1 data cache. Unit: cpu] cpu/event=0x51,period=0x186a3,umask=0x1/ ... Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240308001915.4060155-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:29 -03:00
Ian Rogers	525615ef6d	perf list: Add tracepoint encoding to detailed output The tracepoint id holds the config value and is probed in determining what an event is. Add reading of the id so that we can display the event encoding as: $ perf list --details ... alarmtimer:alarmtimer_cancel [Tracepoint event] tracepoint/config=0x18c/ ... Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240308001915.4060155-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:29 -03:00
Arnaldo Carvalho de Melo	2316ef5891	perf beauty: Introduce scrape script for 'clone' syscall 'flags' argument It was using the first variation on producing a string representation for a binary flag, one that used the copy of uapi/linux/sched.h with preprocessor tricks that had to be updated everytime a new flag was introduced. Use the more recent scrape script + strarray + strarray__scnprintf_flags() combo. $ tools/perf/trace/beauty/clone.sh \| head -5 static const char *clone_flags[] = { [ilog2(0x00000100) + 1] = "VM", [ilog2(0x00000200) + 1] = "FS", [ilog2(0x00000400) + 1] = "FILES", [ilog2(0x00000800) + 1] = "SIGHAND", $ Now we can move uapi/linux/sched.h from tools/include/, that is used for building perf to the scrape only directory tools/perf/trace/beauty/include. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZfnULIn3XKDq0bpc@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:29 -03:00
Namhyung Kim	bd62de0808	perf annotate-data: Do not retry for invalid types In some cases, it was able to find a type or location info (for per-cpu variable) but cannot match because of invalid offset or missing global information. In those cases, it's meaningless to go to the outer scope and retry because there will be no additional information. Let's change the return type of find_matching_type() and bail out if it returns -1 for the cases. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-24-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:29 -03:00
Namhyung Kim	55ee3d005d	perf annotate-data: Add a cache for global variable types They are often searched by many different places. Let's add a cache for them to reduce the duplicate DWARF access. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-23-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:29 -03:00
Namhyung Kim	b3c95109c1	perf annotate-data: Add stack canary type When the stack protector is enabled, compiler would generate code to check stack overflow with a special value called 'stack carary' at runtime. On x86_64, GCC hard-codes the stack canary as %gs:40. While there's a definition of fixed_percpu_data in asm/processor.h, it seems that the header is not included everywhere and many places it cannot find the type info. As it's in the well-known location (at %gs:40), let's add a pseudo stack canary type to handle it specially. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-22-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:29 -03:00
Namhyung Kim	eb9190afae	perf annotate-data: Handle ADD instructions There are different patterns for percpu variable access using a constant value added to the base. 2aeb: mov -0x7da0f7e0(,%rax,8),%r14 # r14 = __per_cpu_offset[cpu] 2af3: mov $0x34740,%rax # rax = address of runqueues * 2afa: add %rax,%r14 # r14 = &per_cpu(runqueues, cpu) 2bfd: cmpl $0x0,0x10(%r14) # cpu_rq(cpu)->has_blocked_load 2b03: je 0x2b36 At the first instruction, r14 has the __per_cpu_offset. And then rax has an immediate value and then added to r14 to calculate the address of a per-cpu variable. So it needs to track the immediate values and ADD instructions. Similar but a little different case is to use "this_cpu_off" instead of "__per_cpu_offset" for the current CPU. This time the variable address comes with PC-rel addressing. 89: mov $0x34740,%rax # rax = address of runqueues * 90: add %gs:0x7f015f60(%rip),%rax # 19a78 <this_cpu_off> 98: incl 0xd8c(%rax) # cpu_rq(cpu)->sched_count Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-21-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:29 -03:00
Namhyung Kim	f5b095924d	perf annotate-data: Support general per-cpu access This is to support per-cpu variable access often without a matching DWARF entry. For some reason, I cannot find debug info of per-cpu variables sometimes. They have more complex pattern to calculate the address of per-cpu variables like below. 2b7d: mov -0x1e0(%rbp),%rax ; rax = cpu 2b84: mov -0x7da0f7e0(,%rax,8),%rcx ; rcx = __per_cpu_offset[cpu] * 2b8c: mov 0x34870(%rcx),%rax ; *(__per_cpu_offset[cpu] + 0x34870) Let's assume the rax register has a number for a CPU at 2b7d. The next instruction is to get the per-cpu offset' for that cpu. The offset -0x7da0f7e0 is 0xffffffff825f0820 in u64 which is the address of the '__per_cpu_offset' array in my system. So it'd get the actual offset of that CPU's per-cpu region and save it to the rcx register. Then, at 2b8c, accesses using rcx can be handled same as the global variable access. To handle this case, it should check if the offset of the instruction matches to the address of '__per_cpu_offset'. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-20-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:29 -03:00
Namhyung Kim	ad62edbfc5	perf annotate-data: Track instructions with a this-cpu variable Like global variables, this per-cpu variables should be tracked correctly. Factor our get_global_var_type() to handle both global and per-cpu (for this cpu) variables in the same manner. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-19-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:29 -03:00
Namhyung Kim	02e17ca917	perf annotate-data: Handle this-cpu variables in kernel On x86, the kernel gets the current task using the current macro like below: #define current get_current() static __always_inline struct task_struct *get_current(void) { return this_cpu_read_stable(pcpu_hot.current_task); } So it returns the current_task field of struct pcpu_hot which is the first member. On my build, it's located at 0x32940. $ nm vmlinux \| grep pcpu_hot 0000000000032940 D pcpu_hot And the current macro generates the instructions like below: mov %gs:0x32940, %rcx So the %gs segment register points to the beginning of the per-cpu region of this cpu and it points the variable with a constant. Let's update the instruction location info to have a segment register and handle %gs in kernel to look up a global variable. Pretend it as a global variable by changing the register number to DWARF_REG_PC. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-18-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:29 -03:00
Namhyung Kim	cbaf89a8c5	perf annotate: Parse x86 segment register location Add a segment field in the struct annotated_insn_loc and save it for the segment based addressing like %gs:0x28. For simplicity it now handles %gs register only. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-17-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:29 -03:00
Namhyung Kim	bdc80ace07	perf annotate-data: Check register state for type As instruction tracking updates the type state for each register, check the final type info for the target register at the given instruction. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-16-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:29 -03:00
Namhyung Kim	eb8a55e01d	perf annotate-data: Implement instruction tracking If it failed to find a variable for the location directly, it might be due to a missing variable in the source code. For example, accessing pointer variables in a chain can result in the case like below: struct foo foo = ...; int i = foo->bar->baz; The DWARF debug information is created for each variable so it'd have one for 'foo'. But there's no variable for 'foo->bar' and then it cannot know the type of 'bar' and 'baz'. The above source code can be compiled to the follow x86 instructions: mov 0x8(%rax), %rcx mov 0x4(%rcx), %rdx <=== PMU sample mov %rdx, -4(%rbp) Let's say 'foo' is located in the %rax and it has a pointer to struct foo. But perf sample is captured in the second instruction and there is no variable or type info for the %rcx. It'd be great if compiler could generate debug info for %rcx, but we should handle it on our side. So this patch implements the logic to iterate instructions and update the type table for each location. As it already collected a list of scopes including the target instruction, we can use it to construct the type table smartly. +---------------- scope[0] subprogram \| \| +-------------- scope[1] lexical_block \| \| \| \| +------------ scope[2] inlined_subroutine \| \| \| \| \| \| +---------- scope[3] inlined_subroutine \| \| \| \| \| \| \| \| +-------- scope[4] lexical_block \| \| \| \| \| \| \| \| \| \| ** target instruction ... Image the target instruction has 5 scopes, each scope will have its own variables and parameters. Then it can start with the innermost scope (4). So it'd search the shortest path from the start of scope[4] to the target address and build a list of basic blocks. Then it iterates the basic blocks with the variables in the scope and update the table. If it finds a type at the target instruction, then returns it. Otherwise, it moves to the upper scope[3]. Now it'd search the shortest path from the start of scope[3] to the start of scope[4]. Then connect it to the existing basic block list. Then it'd iterate the blocks with variables for both scopes. It can repeat this until it finds a type at the target instruction or reaches to the top scope[0]. As the basic blocks contain the shortest path, it won't worry about branches and can update the table simply. The final check will be done by find_matching_type() in the next patch. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-15-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:29 -03:00
Namhyung Kim	cffb7910af	perf annotate-data: Handle call instructions When updating instruction states, the call instruction should play a role since it changes the register states. For simplicity, mark some registers as caller-saved registers (should be arch-dependent), and invalidate them all after a function call. If the function returns something, the designated register (ret_reg) will have the type info. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-14-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:29 -03:00
Namhyung Kim	0a41e5d684	perf annotate-data: Handle global variable access When updating the instruction states, it also needs to handle global variable accesses. Same as it does for PC-relative addressing, it can look up the type by address (if it's defined in the same file), or by name after finding the symbol by address (for declarations). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-13-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:28 -03:00
Namhyung Kim	1ebb5e17ef	perf annotate-data: Add get_global_var_type() Accessing global variable is common when it tracks execution later. Factor out the common code into a function for later use. It adds thread and cpumode to struct data_loc_info to find (global) symbols if needed. Also remove var_name as it's retrieved in the helper function. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-12-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:28 -03:00
Namhyung Kim	4f903455be	perf annotate-data: Add update_insn_state() The update_insn_state() function is to update the type state table after processing each instruction. For now, it handles MOV (on x86) insn to transfer type info from the source location to the target. The location can be a register or a stack slot. Check carefully when memory reference happens and fetch the type correctly. It basically ignores write to a memory since it doesn't change the type info. One exception is writes to (new) stack slots for register spilling. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-11-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:28 -03:00
Namhyung Kim	06b2ce7538	perf annotate-data: Maintain variable type info As it collected basic block and variable information in each scope, it now can build a state table to find matching variable at the location. The struct type_state is to keep the type info saved in each register and stack slot. The update_var_state() updates the table when it finds variables in the current address. It expects die_collect_vars() filled a list of variables with type info and starting address. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-10-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:28 -03:00
Namhyung Kim	90429524f3	perf annotate-data: Add debug messages Add a new debug option "type-profile" to enable the detailed info during the type analysis especially for instruction tracking. You can use this before the command name like 'report' or 'annotate'. $ perf --debug type-profile annotate --data-type Committer testing: First get some memory events: $ perf mem record ls Then, without data-type profiling debug: $ perf annotate --data-type \| head Annotate type: 'struct rtld_global' in /usr/lib64/ld-linux-x86-64.so.2 (1 samples): ============================================================================ samples offset size field 1 0 4336 struct rtld_global { 0 0 0 struct link_namespaces* _dl_ns; 0 2560 8 size_t _dl_nns; 0 2568 40 __rtld_lock_recursive_t _dl_load_lock { 0 2568 40 pthread_mutex_t mutex { 0 2568 40 struct __pthread_mutex_s __data { 0 2568 4 int __lock; $ And with only data-type profiling: $ perf --debug type-profile annotate --data-type \| head ----------------------------------------------------------- find_data_type_die [1e67] for reg13873052 (PC) offset=0x150e2 in dl_main CU die offset: 0x29cd3 found PC-rel by addr=0x34020 offset=0x20 ----------------------------------------------------------- find_data_type_die [2e] for reg12 offset=0 in __GI___readdir64 CU die offset: 0x137a45 frame base: cfa=1 fbreg=-1 found "__futex" in scope=2/2 (die: 0x137ad5) 0(reg12) type=int (die:2a) ----------------------------------------------------------- find_data_type_die [52] for reg5 offset=0 in __memmove_avx_unaligned_erms CU die offset: 0x1124ed no variable found Annotate type: 'struct rtld_global' in /usr/lib64/ld-linux-x86-64.so.2 (1 samples): ============================================================================ samples offset size field 1 0 4336 struct rtld_global { 0 0 0 struct link_namespaces* _dl_ns; 0 2560 8 size_t _dl_nns; 0 2568 40 __rtld_lock_recursive_t _dl_load_lock { 0 2568 40 pthread_mutex_t mutex { 0 2568 40 struct __pthread_mutex_s __data { 0 2568 4 int __lock; $ Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-9-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:28 -03:00
Namhyung Kim	5cdd3fd799	perf annotate: Add annotate_get_basic_blocks() The annotate_get_basic_blocks() is to find a list of basic blocks from the source instruction to the destination instruction in a function. It'll be used to find variables in a scope. Use BFS (Breadth First Search) to find a shortest path to carry the variable/register state minimally. Also change find_disasm_line() to be used in annotate_get_basic_blocks() and add 'allow_update' argument to control if it can update the IP. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:28 -03:00
Namhyung Kim	a3f4d5b57d	perf annotate-data: Introduce 'struct data_loc_info' The find_data_type() needs many information to describe the location of the data. Add the new 'struct data_loc_info' to pass those information at once. No functional changes intended. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:28 -03:00
Namhyung Kim	52a09bc24c	perf map: Add map__objdump_2rip() Sometimes we want to convert an address in objdump output to map-relative address to match with a sample data. Let's add map__objdump_2rip() for that. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:28 -03:00
Namhyung Kim	7a838c2fd2	perf dwarf-aux: Add die_find_func_rettype() The die_find_func_rettype() is to find a debug entry for the given function name and sets the type information of the return value. By convention, it'd return the pointer to the type die (should be the same as the given mem_die argument) if found, or NULL otherwise. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:28 -03:00
Namhyung Kim	437683a994	perf dwarf-aux: Handle type transfer for memory access We want to track type states as instructions are executed. Each instruction can access compound types like struct or union and load/ store its members to a different location. The die_deref_ptr_type() is to find a type of memory access with a pointer variable. If it points to a compound type like struct, the target memory is a member in the struct. The access will happen with an offset indicating which member it refers. Let's follow the DWARF info to figure out the type of the pointer target. For example, say we have the following code. struct foo { int a; int b; }; struct foo p = malloc(sizeof(p)); p->b = 0; The last pointer access should produce x86 asm like below: mov 0x0, 4(%rbx) And we know %rbx register has a pointer to struct foo. Then offset 4 should return the debug info of member 'b'. Also variables of compound types can be accessed directly without a pointer. The die_get_member_type() is to handle a such case. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-4-namhyung@kernel.org [ Check if die_get_real_type() returned NULL ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:28 -03:00
Namhyung Kim	932dcc2c39	perf dwarf-aux: Add die_collect_vars() The die_collect_vars() is to find all variable information in the scope including function parameters. The struct die_var_type is to save the type of the variable with the location (reg and offset) as well as where it's defined in the code (addr). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:28 -03:00
Namhyung Kim	b508965d35	perf dwarf-aux: Remove unused pc argument It's not used, let's get rid of it. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:28 -03:00
Ian Rogers	71bc3ac8e8	perf cpumap: Use perf_cpu_map__for_each_cpu when possible Rather than manually iterating the CPU map, use perf_cpu_map__for_each_cpu(). When possible tidy local variables. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: André Almeida <andrealmeid@igalia.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/20240202234057.2085863-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:28 -03:00
Ian Rogers	954ac1b4a7	perf stat: Remove duplicate cpus_map_matched function Use libperf's perf_cpu_map__equal() that performs the same function. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: André Almeida <andrealmeid@igalia.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/20240202234057.2085863-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:28 -03:00
Ian Rogers	4ddccd0048	perf arm64 header: Remove unnecessary CPU map get and put In both cases the CPU map is known owned by either the caller or a PMU. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: André Almeida <andrealmeid@igalia.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/20240202234057.2085863-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:28 -03:00
Ian Rogers	3e5deb708c	perf cpumap: Clean up use of perf_cpu_map__has_any_cpu_or_is_empty Most uses of what was perf_cpu_map__empty but is now perf_cpu_map__has_any_cpu_or_is_empty want to do something with the CPU map if it contains CPUs. Replace uses of perf_cpu_map__has_any_cpu_or_is_empty with other helpers so that CPUs within the map can be handled. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: André Almeida <andrealmeid@igalia.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/20240202234057.2085863-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:28 -03:00
Ian Rogers	291dcd774b	perf intel-pt/intel-bts: Switch perf_cpu_map__has_any_cpu_or_is_empty use Switch perf_cpu_map__has_any_cpu_or_is_empty() to perf_cpu_map__is_any_cpu_or_is_empty() as a CPU map may contain CPUs as well as the dummy event and perf_cpu_map__is_any_cpu_or_is_empty() is a more correct alternative. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: André Almeida <andrealmeid@igalia.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/20240202234057.2085863-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:28 -03:00
Ian Rogers	e28ee1239d	perf arm-spe/cs-etm: Directly iterate CPU maps Rather than iterate all CPUs and see if they are in CPU maps, directly iterate the CPU map. Similarly make use of the intersect function taking care for when "any" CPU is specified. Switch perf_cpu_map__has_any_cpu_or_is_empty() to more appropriate alternatives. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: André Almeida <andrealmeid@igalia.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/20240202234057.2085863-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:28 -03:00
Ethan Adams	efae55bb78	perf build: Fix out of tree build related to installation of sysreg-defs It seems that a previous modification to sysreg-defs, which corrected emitting the header to the specified output directory, exposed missing subdir, prefix variables. This breaks out of tree builds of perf as the file is now built into the output directory, but still tries to descend into output directory as a subdir. Fixes: `a29ee6aea7` ("perf build: Ensure sysreg-defs Makefile respects output dir") Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Tycho Andersen <tycho@tycho.pizza> Signed-off-by: Ethan Adams <j.ethan.adams@gmail.com> Tested-by: Tycho Andersen <tycho@tycho.pizza> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240314222012.47193-1-j.ethan.adams@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:27 -03:00
Adrian Hunter	bb69c912c4	perf auxtrace: Fix multiple use of --itrace option If the --itrace option is used more than once, the options are combined, but "i" and "y" (sub-)options can be corrupted because itrace_do_parse_synth_opts() incorrectly overwrites the period type and period with default values. For example, with: --itrace=i0ns --itrace=e The processing of "--itrace=e", resets the "i" period from 0 nanoseconds to the default 100 microseconds. Fix by performing the default setting of period type and period only if "i" or "y" are present in the currently processed --itrace value. Fixes: `f6986c95af` ("perf session: Add instruction tracing options") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240315071334.3478-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:27 -03:00
Adrian Hunter	d4a98b45fb	perf script: Show also errors for --insn-trace option The trace could be misleading if trace errors are not taken into account, so display them also by adding the itrace "e" option. Note --call-trace and --call-ret-trace already add the itrace "e" option. Fixes: `b585ebdb59` ("perf script: Add --insn-trace for instruction decoding") Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240315071334.3478-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:27 -03:00
James Clark	36f65f9b7a	perf docs arm_spe: Clarify more SPE requirements related to KPTI The question of exactly when KPTI needs to be disabled comes up a lot because it doesn't always need to be done. Add the relevant kernel function and some examples that describe the behavior. Also describe the interrupt requirement and that no error message will be printed if this isn't met. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240312132508.423320-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo	a672af9139	tools headers: Remove almost unused copy of uapi/stat.h, add few conditional defines These were used to build perf to provide defines not available in older distros, but this was back in 2017, nowadays most the distros that are supported and I have build containers for work using just the system headers, so ditch them. For the few that don't have STATX_MNT_ID{_UNIQUE}, or STATX_MNT_DIOALIGN add them conditionally. Some of these older distros may not have things that are used in 'perf trace', but then they also don't have libtraceevent packages, so don't build 'perf trace'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240315204835.748716-6-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo	8a1ad44135	tools headers: Remove now unused copies of uapi/{fcntl,openat2}.h and asm/fcntl.h These were used to build perf to provide defines not available in older distros, but this was back in 2017, nowadays all the distros that are supported and I have build containers for work using just the system headers, so ditch them. Some of these older distros may not have things that are used in 'perf trace', but then they also don't have libtraceevent packages, so don't build 'perf trace'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240315204835.748716-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo	6652830c87	perf beauty: Use the system linux/fcntl.h instead of a copy from the kernel Builds ok all the way back to these older distros: 1 almalinux:8 : Ok gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20) , clang version 16.0.6 (Red Hat 16.0.6-2.module_el8.9.0+3621+df7f7146) flex 2.6.1 3 alpine:3.15 : Ok gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027 , Alpine clang version 12.0.1 flex 2.6.4 15 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0 , Debian clang version 11.0.1-2~deb10u1 flex 2.6.4 32 opensuse:15.4 : Ok gcc (SUSE Linux) 7.5.0 , clang version 15.0.7 flex 2.6.4 23 fedora:35 : Ok gcc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-3) , clang version 13.0.1 (Fedora 13.0.1-1.fc35) flex 2.6.4 38 ubuntu:18.04 : Ok gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 flex 2.6.4 Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240315204835.748716-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo	eb01fe7abb	perf beauty: Move prctl.h files (uapi/linux and x86's) copy out of the directory used to build perf It is used only to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/{include,arch}/ hierarchies, that is used just for scraping. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. No other tools/ living code uses it, just <linux/usbdevice_fs.h> coming from either 'make install_headers' or from the system /usr/include/ directory. Suggested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240315204835.748716-3-acme@kernel.org Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo	f324b73c2c	perf beauty: Stop using the copy of uapi/linux/prctl.h Use the system one, nothing used in that file isn't available in the supported, active distros. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> To: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/lkml/20240315204835.748716-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo	c8bfe3fad4	perf beauty: Move arch/x86/include/asm/irq_vectors.h copy out of the directory used to build perf It is used only to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/include/ hierarchy, that is used just for scraping. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. No other tools/ living code uses it. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo	7050e33e86	perf beauty: Move uapi/sound/asound.h copy out of the directory used to build perf It is used only to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/include/ hierarchy, that is used just for scraping. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo	44512bd613	perf beauty: Move uapi/linux/usbdevice_fs.h copy out of the directory used to build perf It is mostly used only to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/include/ hierarchy, that is used just for scraping. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. No other tools/ living code uses it, just <linux/usbdevice_fs.h> coming from either 'make install_headers' or from the system /usr/include/ directory. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo	ab3316119f	perf beauty: Move uapi/linux/mount.h copy out of the directory used to build perf It is mostly used only to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/include/ hierarchy, that is used just for scraping. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. No other tools/ living code uses it, just <linux/mount.h> coming from either 'make install_headers' or from the system /usr/include/ directory. Suggested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo	22916d2cba	perf beauty: Don't include uapi/linux/mount.h, use sys/mount.h instead The tools/include/uapi/linux/mount.h file is mostly used for scrapping defines into id->string tables, this is the only place were it is being directly used, stop doing so. Define MOUNT_ATTR_RELATIME and MOUNT_ATTR__ATIME if not available in the system's headers. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo	faf7217a39	perf beauty: Move uapi/linux/fs.h copy out of the directory used to build perf It is mostly used only to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/include/ hierarchy, that is used just for scraping. The only case where it was being used to build was in tools/perf/trace/beauty/sync_file_range.c, because some older systems doesn't have the SYNC_FILE_RANGE_WRITE_AND_WAIT define, just use the system's linux/fs.h header instead, defining it if not available. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. No other tools/ living code uses it, just <linux/fs.h> coming from either 'make install_headers' or from the system /usr/include/ directory. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo	5d8c646038	perf beauty: Fix dependency of tables using uapi/linux/mount.h Several such tables were depending on uapi/linux/fs.h, cut and paste error when they were introduced, fix it. Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Ze9vjxv42PN_QGZv@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:27 -03:00
Bhaskar Chowdhury	4b3761eebb	perf c2c: Fix a punctuation s/dont/don\'t/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210319232824.742-1-unixbhaskar@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:27 -03:00
Arnaldo Carvalho de Melo	a9f4c6c999	perf trace: Collect sys_nanosleep first argument That is a 'struct timespec' passed from userspace to the kernel as we can see with a system wide syscall tracing: root@number:~# perf trace -e nanosleep 0.000 (10.102 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 38.924 (10.077 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 100.177 (10.107 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 139.171 (10.063 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 200.603 (10.105 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 239.399 (10.064 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 300.994 (10.096 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 339.584 (10.067 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 401.335 (10.057 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 439.758 (10.166 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 501.814 (10.110 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 539.983 (10.227 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 602.284 (10.199 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 640.208 (10.105 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 702.662 (10.163 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 740.440 (10.107 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 802.993 (10.159 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0 ^Croot@number:~# strace -p 9150 -e nanosleep If we then use the ptrace method to look at that podman process: root@number:~# strace -p 9150 -e nanosleep strace: Process 9150 attached nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0 ^Cstrace: Process 9150 detached root@number:~# With some changes we can get something closer to the strace output, still in system wide mode: root@number:~# perf config trace.show_arg_names=false root@number:~# perf config trace.show_duration=false root@number:~# perf config trace.show_timestamp=false root@number:~# perf config trace.show_zeros=true root@number:~# perf config trace.args_alignment=0 root@number:~# perf trace -e nanosleep --max-events=10 podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0 root@number:~# root@number:~# perf config trace.show_arg_names=false trace.show_duration=false trace.show_timestamp=false trace.show_zeros=true trace.args_alignment=0 root@number:~# cat ~/.perfconfig # this file is auto-generated. [trace] show_arg_names = false show_duration = false show_timestamp = false show_zeros = true args_alignment = 0 root@number:~# This will not get reused by any other syscall as nanosleep is the only one to have as its first argument a 'struct timespec" pointer argument passed from userspace to the kernel: root@number:~# grep timespec /sys/kernel/tracing/events/syscalls/sys_enter_/format \| grep offset:16 /sys/kernel/tracing/events/syscalls/sys_enter_nanosleep/format: field:struct __kernel_timespec rqtp; offset:16; size:8; signed:0; root@number:~# BTF based pretty printing will simplify all this, but then lets just get the low hanging fruits first. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Zbq72dJRpOlfRWnf@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2024-03-21 10:41:26 -03:00
Jens Axboe	e54e09c05c	net: remove {revc,send}msg_copy_msghdr() from exports The only user of these was io_uring, and it's not using them anymore. Make them static and remove them from the socket header file. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/1b6089d3-c1cf-464a-abd3-b0f0b6bb2523@kernel.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-14 16:48:53 -07:00
Locus Wei-Han Chen	f5102e31c2	riscv: andes: Support specifying symbolic firmware and hardware raw events Add the Andes AX45 JSON files that allows specifying symbolic event names for the raw PMU events. Signed-off-by: Locus Wei-Han Chen <locus84@andestech.com> Reviewed-by: Yu Chien Peter Lin <peterlin@andestech.com> Reviewed-by: Charles Ci-Jyun Wu <dminus@andestech.com> Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Acked-by: Atish Patra <atishp@rivosinc.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240222083946.3977135-11-peterlin@andestech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2024-03-12 07:13:19 -07:00
Namhyung Kim	0f66dfe7b9	perf annotate: Add comments in the data structures Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240304230815.1440583-5-namhyung@kernel.org	2024-03-06 20:25:48 -08:00
Namhyung Kim	f59e3660cd	perf annotate: Remove sym_hist.addr[] array It's not used anymore and the code is coverted to use a hash map. Now sym_hist has a static size, so no need to have sizeof_sym_hist in the struct annotated_source. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240304230815.1440583-4-namhyung@kernel.org	2024-03-06 20:25:36 -08:00
Namhyung Kim	8015457584	perf annotate: Calculate instruction overhead using hashmap Use annotated_source.samples hashmap instead of addr array in the struct sym_hist. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240304230815.1440583-3-namhyung@kernel.org	2024-03-06 20:25:20 -08:00
Namhyung Kim	d3e7cad6f3	perf annotate: Add a hashmap for symbol histogram Now symbol histogram uses an array to save per-offset sample counts. But it wastes a lot of memory if the symbol has a few samples only. Add a hashmap to save values only for actual samples. For now, it has duplicate histogram (one in the existing array and another in the new hash map). Once it can convert to use the hash in all places, we can get rid of the array later. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240304230815.1440583-2-namhyung@kernel.org	2024-03-06 20:24:55 -08:00
Ian Rogers	7bfc84b23e	perf threads: Reduce table size from 256 to 8 The threads data structure is an array of hashmaps, previously rbtrees. The two levels allows for a fixed outer array where access is guarded by rw_semaphores. Commit `91e467bc56` ("perf machine: Use hashtable for machine threads") sized the outer table at 256 entries to avoid future scalability problems, however, this means the threads struct is sized at 30,720 bytes. As the hashmaps allow O(1) access for the common find/insert/remove operations, lower the number of entries to 8. This reduces the size overhead to 960 bytes. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240301053646.1449657-8-irogers@google.com	2024-03-03 22:52:13 -08:00
Ian Rogers	412a2ff473	perf threads: Switch from rbtree to hashmap The rbtree provides a sorting on entries but this is unused. Switch to using hashmap for O(1) rather than O(log n) find/insert/remove complexity. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240301053646.1449657-7-irogers@google.com	2024-03-03 22:52:04 -08:00
Ian Rogers	93bb5b0d93	perf threads: Move threads to its own files Move threads out of machine and into its own file. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240301053646.1449657-6-irogers@google.com	2024-03-03 22:51:55 -08:00
Ian Rogers	d436f90a64	perf machine: Move machine's threads into its own abstraction Move thread_rb_node into the machine.c file. This hides the implementation of threads from the rest of the code allowing for it to be refactored. Locking discipline is tightened up in this change. As the lock is now encapsulated in threads, the findnew function requires holding it (as it already did in machine). Rather than do conditionals with locks based on whether the thread should be created (which could potentially be error prone with a read lock match with a write unlock), have a separate threads__find that won't create the thread and only holds the read lock. This effectively duplicates the findnew logic, with the existing findnew logic only operating under a write lock assuming creation is necessary as a previous find failed. The creation may still fail with the write lock due to another thread. The duplication is removed in a later next patch that delegates the implementation to hashtable. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240301053646.1449657-5-irogers@google.com	2024-03-03 22:51:44 -08:00
Ian Rogers	45ac4960d7	perf machine: Move fprintf to for_each loop and a callback Avoid exposing the threads data structure by switching to the callback machine__for_each_thread approach. machine__fprintf is only used in tests and verbose >3 output so don't turn to list and sort. Add machine__threads_nr to be refactored later. Note, all existing *_fprintf routines ignore fprintf errors. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240301053646.1449657-4-irogers@google.com	2024-03-03 22:51:31 -08:00
Ian Rogers	f178ffdf7e	perf trace: Ignore thread hashing in summary Commit `91e467bc56` ("perf machine: Use hashtable for machine threads") made the iteration of thread tids unordered. The perf trace --summary output sorts and prints each hash bucket, rather than all threads globally. Change this behavior by turn all threads into a list, sort the list by number of trace events then by tids, finally print the list. This also allows the rbtree in threads to be not accessed outside of machine. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240301053646.1449657-3-irogers@google.com	2024-03-03 22:51:18 -08:00
Ian Rogers	2f1e20feb9	perf report: Sort child tasks by tid Commit `91e467bc56` ("perf machine: Use hashtable for machine threads") made the iteration of thread tids unordered. The perf report --tasks output now shows child threads in an order determined by the hashing. For example, in this snippet tid 3 appears after tid 256 even though they have the same ppid 2: ``` $ perf report --tasks % pid tid ppid comm 0 0 -1 \|swapper 2 2 0 \| kthreadd 256 256 2 \| kworker/12:1H-k 693761 693761 2 \| kworker/10:1-mm `1301762` `1301762` 2 \| kworker/1:1-mm_ 1302530 1302530 2 \| kworker/u32:0-k 3 3 2 \| rcu_gp ... ``` The output is easier to read if threads appear numerically increasing. To allow for this, read all threads into a list then sort with a comparator that orders by the child task's of the first common parent. The list creation and deletion are created as utilities on machine. The indentation is possible by counting the number of parents a child has. With this change the output for the same data file is now like: ``` $ perf report --tasks % pid tid ppid comm 0 0 -1 \|swapper 1 1 0 \| systemd 823 823 1 \| systemd-journal 853 853 1 \| systemd-udevd 3230 3230 1 \| systemd-timesyn 3236 3236 1 \| auditd 3239 3239 3236 \| audisp-syslog 3321 3321 1 \| accounts-daemon ... ``` Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240301053646.1449657-2-irogers@google.com	2024-03-03 22:50:55 -08:00
Sandipan Das	498d348637	perf vendor events amd: Fix Zen 4 cache latency events L3PMCx0AC and L3PMCx0AD, used in l3_xi_sampled_latency* events, have a quirk that requires them to be programmed with SliceId set to 0x3. Without this, the events do not count at all and affects dependent metrics such as l3_read_miss_latency. If ThreadMask is not specified, the amd-uncore driver internally sets ThreadMask to 0x3, EnAllCores to 0x1 and EnAllSlices to 0x1 but does not set SliceId. Since SliceId must also be set to 0x3 in this case, specify all the other fields explicitly. E.g. $ sudo perf stat -e l3_xi_sampled_latency.all,l3_xi_sampled_latency_requests.all -a sleep 1 Before: Performance counter stats for 'system wide': 0 l3_xi_sampled_latency.all 0 l3_xi_sampled_latency_requests.all 1.005155399 seconds time elapsed After: Performance counter stats for 'system wide': 921,446 l3_xi_sampled_latency.all 54,210 l3_xi_sampled_latency_requests.all 1.005664472 seconds time elapsed Fixes: `5b2ca349c3` ("perf vendor events amd: Add Zen 4 uncore events") Signed-off-by: Sandipan Das <sandipan.das@amd.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: ananth.narayan@amd.com Cc: ravi.bangoria@amd.com Cc: eranian@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240301084431.646221-1-sandipan.das@amd.com	2024-03-03 22:49:37 -08:00
James Clark	507ad2bde3	perf version: Display availability of OpenCSD support This is useful for scripts that work with Perf and ETM trace. Rather than them trying to parse Perf's error output at runtime to see if it was linked or not. Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: al.grant@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240301133829.346286-1-james.clark@arm.com	2024-03-03 22:48:40 -08:00
Ian Rogers	dd267d056f	perf vendor events intel: Add umasks/occ_sel to PCU events. UMasks were being dropped leading to all PCU UNC_P_POWER_STATE_OCCUPANCY events having the same encoding. Don't drop the umask trying to be consistent with other sources of events like libpfm4 [1]. Older models need to use occ_sel rather than umask, correct these values too. This applies the change from [2]. [1] https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/lib/events/intel_skx_unc_pcu_events.h#l30 [2] `cbd4aee810` Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240228170529.4035675-1-irogers@google.com	2024-02-29 18:08:13 -08:00
Ian Rogers	ec42d3d568	perf map: Fix map reference count issues The find will get the map, ensure puts are done on all paths. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240229062048.558799-1-irogers@google.com	2024-02-29 18:06:00 -08:00
Namhyung Kim	b44d665368	perf lock contention: Account contending locks too Currently it accounts the contention using delta between timestamps in lock:contention_begin and lock:contention_end tracepoints. But it means the lock should see the both events during the monitoring period. Actually there are 4 cases that happen with the monitoring: monitoring period / \ \| \| 1: B------+-----------------------+--------E 2: B----+-------------E \| 3: \| B-----------+----E 4: \| B-------------E \| \| \| t0 t1 where B and E mean contention BEGIN and END, respectively. So it only accounts the case 4 for now. It seems there's no way to handle the case 1. The case 2 might be handled if it saved the timestamp (t0), but it lacks the information from the B notably the flags which shows the lock types. Also it could be a nested lock which it currently ignores. So I think we should ignore the case 2. However we can handle the case 3 if we save the timestamp (t1) at the end of the period. And then it can iterate the map entries in the userspace and update the lock stat accordinly. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Reviwed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240228053335.312776-1-namhyung@kernel.org	2024-02-29 13:53:56 -08:00
Ian Rogers	97b6b4ac1c	perf metrics: Fix segv for metrics with no events A metric may have no events, for example, the transaction metrics on x86 are dependent on there being TSX events. Fix a segv where an evsel of NULL is dereferenced for a metric leader value. Fixes: `a59fb796a3` ("perf metrics: Compute unmerged uncore metrics individually") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240224011420.3066322-2-irogers@google.com	2024-02-29 13:40:13 -08:00
Ian Rogers	d4be39cade	perf metrics: Fix metric matching The metric match function fails for cases like looking for "metric" in the string "all;foo_metric;metric" as the "metric" in "foo_metric" matches but isn't preceeded by a ';'. Fix this by matching the first list item and recursively matching on failure the next item after a semicolon. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240224011420.3066322-1-irogers@google.com	2024-02-29 13:39:54 -08:00
Christophe JAILLET	ef5de1613d	perf pmu: Fix a potential memory leak in perf_pmu__lookup() The commit in Fixes has reordered some code, but missed an error handling path. 'goto err' now, in order to avoid a memory leak in case of error. Fixes: `f63a536f03` ("perf pmu: Merge JSON events with sysfs at load time") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Ian Rogers <irogers@google.com> Cc: kernel-janitors@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/9538b2b634894c33168dfe9d848d4df31fd4d801.1693085544.git.christophe.jaillet@wanadoo.fr	2024-02-26 21:43:00 -08:00
Colin Ian King	eb94225eb4	perf test: Fix spelling mistake "curent" -> "current" There is a spelling mistake in a pr_debug message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: kernel-janitors@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240226105326.3944887-1-colin.i.king@gmail.com	2024-02-26 21:41:27 -08:00
Arnaldo Carvalho de Melo	8680999dbe	perf test: Use TEST_FAIL in the TEST_ASSERT macros instead of -1 Just to make things clearer, return TEST_FAIL (-1) instead of an open coded -1. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/ZdepeMsjagbf1ufD@x1	2024-02-26 08:31:24 -08:00
Ilkka Koskinen	bae4d1f86e	perf data convert: Fix segfault when converting to json when cpu_desc isn't set Arm64 doesn't have Model in /proc/cpuinfo and, thus, cpu_desc doesn't get assigned. Running $ perf data convert --to-json perf.data.json ends up calling output_json_string() with NULL pointer, which causes a segmentation fault. Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Evgeny Pistun <kotborealis@awooo.ru> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240223220458.15282-1-ilkka@os.amperecomputing.com	2024-02-26 08:30:17 -08:00
Arnaldo Carvalho de Melo	529d5818a3	perf bpf: Check that the minimal vmlinux.h installed is the latest one When building BPF skels perf will, by default, install a minimalistic vmlinux.h file with the types needed by the BPF skels in tools/perf/util/bpf_skel/ in its build directory. When `29d16de26d` ("perf augmented_raw_syscalls.bpf: Move 'struct timespec64' to vmlinux.h") was added, a type used in the augmented_raw_syscalls BPF skel, 'struct timespec64' was not found when building from a pre-existing build directory, because the vmlinux.h there didn't contain that type, ending up with this error, spotted in linux-next: CLANG /tmp/build/perf-tools-next/util/bpf_skel/.tmp/augmented_raw_syscalls.bpf.o util/bpf_skel/augmented_raw_syscalls.bpf.c:329:15: error: invalid application of 'sizeof' to an incomplete type 'struct timespec64' 329 \| __u32 size = sizeof(struct timespec64); \| ^ ~~~~~~~~~~~~~~~~~~~ util/bpf_skel/augmented_raw_syscalls.bpf.c:329:29: note: forward declaration of 'struct timespec64' 329 \| __u32 size = sizeof(struct timespec64); \| ^ util/bpf_skel/augmented_raw_syscalls.bpf.c:350:15: error: invalid application of 'sizeof' to an incomplete type 'struct timespec64' 350 \| __u32 size = sizeof(struct timespec64); \| ^ ~~~~~~~~~~~~~~~~~~~ util/bpf_skel/augmented_raw_syscalls.bpf.c:350:29: note: forward declaration of 'struct timespec64' 350 \| __u32 size = sizeof(struct timespec64); \| ^ 2 errors generated. make[2]: * [Makefile.perf:1158: /tmp/build/perf-tools-next/util/bpf_skel/.tmp/augmented_raw_syscalls.bpf.o] Error 1 make[2]: * Waiting for unfinished jobs.... make[1]: * [Makefile.perf:261: sub-make] Error 2 make: * [Makefile:113: install-bin] Error 2 make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf' So add a Makefile dependency (Namhyung's suggestion) to make sure that the new tools/perf/util/bpf_skel/vmlinux/vmlinux.h minimal vmlinux is updated in the build directory, providing the moved 'struct timespec64' type. Fixes: `29d16de26d` ("perf augmented_raw_syscalls.bpf: Move 'struct timespec64' to vmlinux.h") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Ian Rogers <irogers@google.com> Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/ZdoPrWg-qYFpBJbz@x1	2024-02-26 08:16:08 -08:00
Masahiro Yamada	c2bd08ba20	treewide: remove meaningless assignments in Makefiles In Makefiles, $(error ), $(warning ), and $(info ) expand to the empty string, as explained in the GNU Make manual [1]: "The result of the expansion of this function is the empty string." Therefore, they are no-op except for logging purposes. $(shell ...) expands to the output of the command. It expands to the empty string when the command does not print anything to stdout. Hence, $(shell mkdir ...) is no-op except for creating the directory. Remove meaningless assignments. [1]: https://www.gnu.org/software/make/manual/make.html#Make-Control-Functions Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240221134201.2656908-1-masahiroy@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-kbuild@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org	2024-02-23 14:19:07 -08:00
Mark Rutland	25412c0364	perf print-events: make is_event_supported() more robust Currently the perf tool doesn't detect support for extended event types on Apple M1/M2 systems, and will not auto-expand plain PERF_EVENT_TYPE hardware events into per-PMU events. This is due to the detection of extended event types not handling mandatory filters required by the M1/M2 PMU driver. PMU drivers and the core perf_events code can require that perf_event_attr::exclude_* filters are configured in a specific way and may reject certain configurations of filters, for example: (a) Many PMUs lack support for any event filtering, and require all perf_event_attr::exclude_* bits to be clear. This includes Alpha's CPU PMU, and ARM CPU PMUs prior to the introduction of PMUv2 in ARMv7, (b) When /proc/sys/kernel/perf_event_paranoid >= 2, the perf core requires that perf_event_attr::exclude_kernel is set. (c) The Apple M1/M2 PMU requires that perf_event_attr::exclude_guest is set as the hardware PMU does not count while a guest is running (but might be extended in future to do so). In is_event_supported(), we try to account for cases (a) and (b), first attempting to open an event without any filters, and if this fails, retrying with perf_event_attr::exclude_kernel set. We do not account for case (c), or any other filters that drivers could theoretically require to be set. Thus is_event_supported() will fail to detect support for any events targeting an Apple M1/M2 PMU, even where events would be supported with perf_event_attr:::exclude_guest set. Since commit: `82fe2e45cd` ("perf pmus: Check if we can encode the PMU number in perf_event_attr.type") ... we use is_event_supported() to detect support for extended types, with the PMU ID encoded into the perf_event_attr::type. As above, on an Apple M1/M2 system this will always fail to detect that the event is supported, and consequently we fail to detect support for extended types even when these are supported, as they have been since commit: `5c81672865` ("arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability") Due to this, the perf tool will not automatically expand plain PERF_TYPE_HARDWARE events into per-PMU events, even when all the necessary kernel support is present. This patch updates is_event_supported() to additionally try opening events with perf_event_attr::exclude_guest set, allowing support for events to be detected on Apple M1/M2 systems. I believe that this is sufficient for all contemporary CPU PMU drivers, though in future it may be necessary to check for other combinations of filter bits. I've deliberately changed the check to not expect a specific error code for missing filters, as today ;the kernel may return a number of different error codes for missing filters (e.g. -EACCESS, -EINVAL, or -EOPNOTSUPP) depending on why and where the filter configuration is rejected, and retrying for any error is more robust. Note that this does not remove the need for commit: `a24d9d9dc0` ("perf parse-events: Make legacy events lower priority than sysfs/JSON") ... which is still necessary so that named-pmu/event/ events work on kernels without extended type support, even if the event name happens to be the same as a PERF_EVENT_TYPE_HARDWARE event (e.g. as is the case for the M1/M2 PMU's 'cycles' and 'instructions' events). Fixes: `82fe2e45cd` ("perf pmus: Check if we can encode the PMU number in perf_event_attr.type") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@arm.com> Tested-by: Marc Zyngier <maz@kernel.org> Cc: Hector Martin <marcan@marcan.st> Cc: James Clark <james.clark@arm.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mike Leach <mike.leach@linaro.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240126145605.1005472-1-mark.rutland@arm.com	2024-02-23 14:16:33 -08:00
Ian Rogers	b482f5f8e0	perf tests: Add option to run tests in parallel By default tests are forked, add an option (-p or --parallel) so that the forked tests are all started in parallel and then their output gathered serially. This is opt-in as running in parallel can cause test flakes. Rather than fork within the code, the start_command/finish_command from libsubcmd are used. This changes how stderr and stdout are handled. The child stderr and stdout are always read to avoid the child blocking. If verbose is 1 (-v) then if the test fails the child stdout and stderr are displayed. If the verbose is >1 (e.g. -vv) then the stdout and stderr from the child are immediately displayed. An unscientific test on my laptop shows the wall clock time for perf test without parallel being 5 minutes 21 seconds and with parallel (-p) being 1 minute 50 seconds. Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-9-irogers@google.com	2024-02-22 09:13:20 -08:00
Ian Rogers	964461ee37	perf tests: Run time generate shell test suites Rather than special shell test logic, do a single pass to create an array of test suites. Hold the shell test file name in the test suite priv field. This makes the special shell test logic in builtin-test.c redundant so remove it. Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-8-irogers@google.com	2024-02-22 09:13:06 -08:00
Ian Rogers	f3295f5b06	perf tests: Use scandirat for shell script finding Avoid filename appending buffers by using openat, faccessat and scandirat more widely. Turn the script's path back to a file name using readlink from /proc/<pid>/fd/<fd>. Read the script's description using api/io.h to avoid fdopen conversions. Whilst reading perform additional sanity checks on the script's contents. Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-7-irogers@google.com	2024-02-22 09:12:53 -08:00
Ian Rogers	d5bcade989	perf test: Rename builtin-test-list and add missed header guard builtin-test-list is primarily concerned with shell script tests. Rename the file to better reflect this and add a missed header guard. Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-6-irogers@google.com	2024-02-22 09:12:40 -08:00
Ian Rogers	526f2ac9f6	perf tests: Avoid fork in perf_has_symbol test perf test -vv Symbols is used to indentify symbols within the perf binary. Add the -F flag so that the test command doesn't fork the test before running. This removes a little overhead. Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-4-irogers@google.com	2024-02-22 09:12:04 -08:00
Ian Rogers	8ece26ad5a	perf list: Add scandirat compatibility function scandirat is used during the printing of tracepoint events but may be missing from certain libcs. Add a compatibility implementation that uses the symlink of an fd in /proc as a path for the reliably present scandir. Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-3-irogers@google.com	2024-02-22 09:11:41 -08:00
Ian Rogers	510e528786	perf thread_map: Skip exited threads when scanning /proc Scanning /proc is inherently racy. Scanning /proc/pid/task within that is also racy as the pid can terminate. Rather than failing in __thread_map__new_all_cpus, skip pids for such failures. Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-2-irogers@google.com	2024-02-22 09:11:03 -08:00
Thomas Richter	b6968f9b50	perf list: fix short description for some cache events Correct the short description of the following events: DCW_REQ, DCW_REQ_CHIP_HIT, DCW_REQ_DRAWER_HIT, DCW_REQ_IV, DCW_ON_CHIP, DCW_ON_CHIP_IV, DCW_ON_CHIP_CHIP_HIT, DCW_ON_CHIP_DRAWER_HIT, CW_ON_MODULE, DCW_ON_DRAWER, DCW_OFF_DRAWER, IDCW_ON_MODULE_IV, IDCW_ON_MODULE_CHIP_HIT, IDCW_ON_MODULE_DRAWER_HIT, IDCW_ON_DRAWER_IV, IDCW_ON_DRAWER_CHIP_HIT, IDCW_ON_DRAWER_DRAWER_HIT, IDCW_OFF_DRAWER_IV, IDCW_OFF_DRAWER_CHIP_HIT, IDCW_OFF_DRAWER_DRAWER_HIT, ICW_REQ, ICW_REQ_IV, CW_REQ_CHIP_HIT, ICW_REQ_DRAWER_HIT, ICW_ON_CHIP, ICW_ON_CHIP_IV, ICW_ON_CHIP_CHIP_HIT, ICW_ON_CHIP_DRAWER_HIT, ICW_ON_MODULE and ICW_OFF_DRAWER. The second Cache should be L2-Cache. Output before (display diff of the first four events) # perf list -d DCW_REQ [Directory Write Level 1 Data Cache from Cache. Unit: cpum_cf] DCW_REQ_CHIP_HIT [Directory Write Level 1 Data Cache from Cache with Chip HP \ Hit. Unit: cpum_cf] DCW_REQ_DRAWER_HIT [Directory Write Level 1 Data Cache from Cache with Drawer \ HP Hit. Unit: cpum_cf] DCW_REQ_IV [Directory Write Level 1 Data Cache from Cache with Intervention. \ Unit: cpum_cf] Output after: # perf list -d DCW_REQ [Directory Write Level 1 Data Cache from L2-Cache. Unit: cpum_cf] DCW_REQ_CHIP_HIT [Directory Write Level 1 Data Cache from L2-Cache with Chip HP \ Hit. Unit: cpum_cf] DCW_REQ_DRAWER_HIT [Directory Write Level 1 Data Cache from L2-Cache with Drawer \ HP Hit. Unit: cpum_cf] DCW_REQ_IV [Directory Write Level 1 Data Cache from L2-Cache with \ Intervention. Unit: cpum_cf] Fixes: `7f76b31130` ("perf list: Add IBM z16 event description for s390") Reported-by: Andreas Krebbel <krebbel@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Andreas Krebbel <krebbel@linux.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: gor@linux.ibm.com Cc: hca@linux.ibm.com Cc: sumanthk@linux.ibm.com Cc: svens@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221091908.1759083-1-tmricht@linux.ibm.com	2024-02-22 09:02:59 -08:00
Ian Rogers	bafd4e75c1	perf stat: Fix metric-only aggregation index Aggregation index was being computed using the evsel's cpumap which may have a different (typically the same or fewer) entries. Before: ``` $ perf stat --metric-only -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total CPU0 12.8 0.0 12.9 12.7 0.0 12.6 CPU1 1.007806367 seconds time elapsed ``` After: ``` $ perf stat --metric-only -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total CPU0 15.4 0.0 15.3 15.0 0.0 14.9 CPU18 0.0 0.0 13.5 5.2 0.0 11.9 1.007858736 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> \| Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Kaige Ye <ye@kaige.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221070754.4163916-3-irogers@google.com	2024-02-22 08:57:29 -08:00
Ian Rogers	a59fb796a3	perf metrics: Compute unmerged uncore metrics individually When merging counts from multiple uncore PMUs the metric is only computed for the metric leader. When merging/aggregation is disabled, prior to this patch just the leader's metric would be computed. Fix this by computing the metric for each PMU. On a SkylakeX: Before: ``` $ perf stat -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': CPU0 82,217 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 9.2 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 0.0 MB/s memory_bandwidth_total CPU0 61,395 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU0 81,570 UNC_M_CAS_COUNT.RD [uncore_imc_2] CPU18 113,886 UNC_M_CAS_COUNT.RD [uncore_imc_2] CPU0 62,330 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU18 66,942 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU0 75,489 UNC_M_CAS_COUNT.RD [uncore_imc_3] CPU18 27,958 UNC_M_CAS_COUNT.RD [uncore_imc_3] CPU0 55,864 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU18 38,727 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU0 75,423 UNC_M_CAS_COUNT.RD [uncore_imc_5] CPU18 104,527 UNC_M_CAS_COUNT.RD [uncore_imc_5] CPU0 57,596 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU18 56,777 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU0 1,003,440,851 ns duration_time 1.003440851 seconds time elapsed ``` After: ``` $ perf stat -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': CPU0 88,968 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 9.5 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 0.0 MB/s memory_bandwidth_total CPU0 59,498 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 0.0 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 0.0 MB/s memory_bandwidth_total CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU0 88,635 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 9.5 MB/s memory_bandwidth_total CPU18 117,975 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 11.5 MB/s memory_bandwidth_total CPU0 60,829 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU18 62,105 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU0 82,238 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 8.7 MB/s memory_bandwidth_total CPU18 22,906 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 3.6 MB/s memory_bandwidth_total CPU0 53,959 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU18 32,990 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 0.0 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 0.0 MB/s memory_bandwidth_total CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU0 83,595 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 8.9 MB/s memory_bandwidth_total CPU18 110,151 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 10.5 MB/s memory_bandwidth_total CPU0 56,540 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU18 53,816 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU0 1,003,353,416 ns duration_time ``` Signed-off-by: Ian Rogers <irogers@google.com> \| Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Kaige Ye <ye@kaige.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221070754.4163916-2-irogers@google.com	2024-02-22 08:57:09 -08:00
Ian Rogers	eee41e6b28	perf stat: Pass fewer metric arguments Pass metric_expr and evsel rather than specific variables from the struct, thereby reducing the number of arguments. This will enable later fixes. To reduce the size of the diff, local variables are added to match the previous parameter names. This isn't done in the case of "name" as evsel->name is more intention revealing. A whitespace issue is also addressed. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Kaige Ye <ye@kaige.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221070754.4163916-1-irogers@google.com	2024-02-22 08:56:45 -08:00
Changbin Du	659663f0bc	perf: script: prefer capstone to XED Now perf can show assembly instructions with libcapstone for x86, and the capstone is better in general. Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-6-changbin.du@huawei.com	2024-02-20 18:07:34 -08:00
Changbin Du	6750ba4b64	perf: script: add raw\|disasm arguments to --insn-trace option Now '--insn-trace' accept a argument to specify the output format: - raw: display raw instructions. - disasm: display mnemonic instructions (if capstone is installed). $ sudo perf script --insn-trace=raw ls 1443864 [006] 2275506.209908875: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: 48 89 e7 ls 1443864 [006] 2275506.209908875: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: e8 e8 0c 00 00 ls 1443864 [006] 2275506.209908875: 7f216b426df0 _dl_start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: f3 0f 1e fa $ sudo perf script --insn-trace=disasm ls 1443864 [006] 2275506.209908875: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rdi ls 1443864 [006] 2275506.209908875: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) callq _dl_start+0x0 ls 1443864 [006] 2275506.209908875: 7f216b426df0 _dl_start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) illegal instruction ls 1443864 [006] 2275506.209908875: 7f216b426df4 _dl_start+0x4 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) pushq %rbp ls 1443864 [006] 2275506.209908875: 7f216b426df5 _dl_start+0x5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rbp ls 1443864 [006] 2275506.209908875: 7f216b426df8 _dl_start+0x8 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) pushq %r15 Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-5-changbin.du@huawei.com	2024-02-20 18:07:21 -08:00
Changbin Du	9941723438	perf: script: add field 'disasm' to display mnemonic instructions In addition to the 'insn' field, this adds a new field 'disasm' to display mnemonic instructions instead of the raw code. $ sudo perf script -F +disasm perf-exec 1443864 [006] 2275506.209848: psb: psb offs: 0 0 [unknown] ([unknown]) perf-exec 1443864 [006] 2275506.209848: cbr: cbr: 41 freq: 4100 MHz (114%) 0 [unknown] ([unknown]) ls 1443864 [006] 2275506.209905: 1 branches:uH: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rdi ls 1443864 [006] 2275506.209908: 1 branches:uH: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) callq _dl_start+0x0 Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-4-changbin.du@huawei.com	2024-02-20 18:07:07 -08:00
Changbin Du	8f0ec15ff6	perf: util: use capstone disasm engine to show assembly instructions Currently, the instructions of samples are shown as raw hex strings which are hard to read. x86 has a special option '--xed' to disassemble the hex string via intel XED tool. Here we use capstone as our disassembler engine to give more friendly instructions. We select libcapstone because capstone can provide more insn details. Perf will fallback to raw instructions if libcapstone is not available. The advantages compared to XED tool: * Support arm, arm64, x86-32, x86_64 (more could be supported), xed only for x86_64. * Immediate address operands are shown as symbol+offs. Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-3-changbin.du@huawei.com	2024-02-20 18:06:48 -08:00
Changbin Du	8b767db330	perf: build: introduce the libcapstone Later we will use libcapstone to disassemble instructions of samples. Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-2-changbin.du@huawei.com	2024-02-20 18:06:25 -08:00
Ian Rogers	81377de00f	perf list: For metricgroup only list include description If perf list is invoked with 'metricgroups' include the description unless it is invoked with flags to exclude it. Make the description of metricgroup dumping dependent on the desc flag in print_state as with metrics. Before: ``` $ perf list metricgroups List of pre-defined events (to be used in -e or -M): Metric Groups: Backend Bad BadSpec ... ``` After: ``` $ perf list metricgroups List of pre-defined events (to be used in -e or -M): Metric Groups: Backend [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] Bad [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] BadSpec ... ``` Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240216192044.119897-1-irogers@google.com	2024-02-16 16:07:34 -08:00
Namhyung Kim	bacefe0c7b	perf tools: Fixup module symbol end address properly I got a strange error on ARM to fail on processing FINISHED_ROUND record. It turned out that it was failing in symbol__alloc_hist() because the symbol size is too big. When a sample is captured on a specific BPF program, it failed. I've added a debug code and found the end address of the symbol is from the next module which is placed far way. ffff800008795778-ffff80000879d6d8: bpf_prog_1bac53b8aac4bc58_netcg_sock [bpf] ffff80000879d6d8-ffff80000ad656b4: bpf_prog_76867454b5944e15_netcg_getsockopt [bpf] ffff80000ad656b4-ffffd69b7af74048: bpf_prog_1d50286d2eb1be85_hn_egress [bpf] <---------- here ffffd69b7af74048-ffffd69b7af74048: $x.5 [sha3_generic] ffffd69b7af74048-ffffd69b7af740b8: crypto_sha3_init [sha3_generic] ffffd69b7af740b8-ffffd69b7af741e0: crypto_sha3_update [sha3_generic] The logic in symbols__fixup_end() just uses curr->start to update the prev->end. But in this case, it won't work as it's too different. I think ARM has a different kernel memory layout for modules and BPF than on x86. Actually there's a logic to handle kernel and module boundary. Let's do the same for symbols between different modules. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Leo Yan <leo.yan@linux.dev> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240212233322.1855161-1-namhyung@kernel.org	2024-02-16 16:07:28 -08:00
Ian Rogers	6f146b249b	perf vendor events intel: Update tigerlake TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-31-irogers@google.com	2024-02-16 15:29:11 -08:00
Ian Rogers	e2c8b40e37	perf vendor events intel: Update skylakex TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-30-irogers@google.com	2024-02-16 15:28:59 -08:00
Ian Rogers	f15fa6ba76	perf vendor events intel: Update skylake TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-29-irogers@google.com	2024-02-16 15:28:47 -08:00
Ian Rogers	53c83c79aa	perf vendor events intel: Update sapphirerapids TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - tma_c01_wait and tma_c02_wait metrics measure power-performance states. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-28-irogers@google.com	2024-02-16 15:28:36 -08:00
Ian Rogers	176e66715d	perf vendor events intel: Update sandybridge TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Add metrics tma_fp_vector_128b, tma_fp_vector_256b and tma_info_system_cpus_utilized. - Remove metrics tma_info_system_mem_parallel_requests, tma_info_system_core_frequency and tma_info_system_mem_request_latency. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-27-irogers@google.com	2024-02-16 15:28:24 -08:00
Ian Rogers	74f76c3ba7	perf vendor events intel: Update rocketlake TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-26-irogers@google.com	2024-02-16 15:28:12 -08:00
Ian Rogers	5f9a13bee0	perf vendor events intel: Update jaketown TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-25-irogers@google.com	2024-02-16 15:27:59 -08:00
Ian Rogers	14bc1a59f2	perf vendor events intel: Update ivytown TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Reduced number of events when SMT is off. - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-24-irogers@google.com	2024-02-16 15:27:47 -08:00
Ian Rogers	8cf54fa844	perf vendor events intel: Update ivybridge TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Reduced number of events when SMT is off. - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-23-irogers@google.com	2024-02-16 15:27:34 -08:00
Ian Rogers	b15cae3f69	perf vendor events intel: Update icelakex TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-22-irogers@google.com	2024-02-16 15:27:22 -08:00
Ian Rogers	70bfdad63f	perf vendor events intel: Update icelake TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-21-irogers@google.com	2024-02-16 15:27:07 -08:00
Ian Rogers	2a264a1946	perf vendor events intel: Update haswellx TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-20-irogers@google.com	2024-02-16 15:26:54 -08:00
Ian Rogers	89b66259a7	perf vendor events intel: Update haswell TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-19-irogers@google.com	2024-02-16 15:26:42 -08:00
Ian Rogers	c72a20435a	perf vendor events intel: Update cascadelakex TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-18-irogers@google.com	2024-02-16 15:26:28 -08:00
Ian Rogers	8792e8f89d	perf vendor events intel: Update broadwellx TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc and tma_info_inst_mix_ipflop. - Removal of tma_info_bad_spec_branch_misprediction_cost. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-17-irogers@google.com	2024-02-16 15:26:15 -08:00
Ian Rogers	4018680df9	perf vendor events intel: Update broadwellde TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc and tma_info_inst_mix_ipflop. - Removal of tma_info_bad_spec_branch_misprediction_cost. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-16-irogers@google.com	2024-02-16 15:26:03 -08:00
Ian Rogers	eedd6d0a72	perf vendor events intel: Update broadwell TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc and tma_info_inst_mix_ipflop. - Removal of tma_info_bad_spec_branch_misprediction_cost. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-15-irogers@google.com	2024-02-16 15:25:51 -08:00
Ian Rogers	52530942ba	perf vendor events intel: Update alderlake TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - tma_c01_wait and tma_c02_wait metrics measure power-performance states. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-14-irogers@google.com	2024-02-16 15:25:40 -08:00
Ian Rogers	c4bb31c7b0	perf vendor events intel: Update tigerlake events to v1.15 Update alderlake events to v1.15 released in: `282a6951fd` Documentation fixes, removal of TOPDOWN.BR_MISPREDICT_SLOTS, deprecation of UNC_ARB_DAT_REQUESTS.RD, UNC_ARB_DAT_REQUESTS.RD and UNC_ARB_IFA_OCCUPANCY.ALL. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-13-irogers@google.com	2024-02-16 15:25:28 -08:00
Ian Rogers	c31d718ca2	perf vendor events intel: Update skylake events to v58 Update skylake events to v58 released in: `625fb75073` Improves documentation. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-12-irogers@google.com	2024-02-16 15:25:17 -08:00
Ian Rogers	9626368d42	perf vendor events intel: Update sierraforst events to v1.01 Update sierraforest events to v1.01 released in: `582bca24aa` Adds the majority of core and uncore events. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-11-irogers@google.com	2024-02-16 15:25:06 -08:00
Ian Rogers	8972c03353	perf vendor events intel: Update rocketlake events to v1.02 Update alderlake events to v1.02 released in: `4931178d1e` Improves documentation and removes TOPDOWN.BR_MISPREDICT_SLOTS. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-10-irogers@google.com	2024-02-16 15:24:54 -08:00
Ian Rogers	1d262a85e2	perf vendor events intel: Update meteorlake events to v1.07 Update meteorlake events to v1.07 released in: `6251722308` Umask changed on atom mem_bound events. Adds atom events ARITH.FPDIV_ACTIVE, FP_FLOPS_RETIRED.ALL, FP_FLOPS_RETIRED.DP, FP_FLOPS_RETIRED.FP32, ARITH.DIV_ACTIVE, BR_INST_RETIRED.COND, BR_INST_RETIRED.COND_TAKEN, BR_INST_RETIRED.INDIRECT, BR_INST_RETIRED.INDIRECT_CALL, BR_INST_RETIRED.IND_CALL, BR_INST_RETIRED.NEAR_RETURN, DTLB_LOAD_MISSES.WALK_COMPLETED_4K, DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M, DTLB_STORE_MISSES.WALK_COMPLETED_4K, ITLB_MISSES.WALK_COMPLETED_4K, and alias events. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-9-irogers@google.com	2024-02-16 15:24:16 -08:00
Ian Rogers	e8866cdbe1	perf vendor events intel: Update icelake events to v1.21 Update icelake events to v1.21 released in: `54f1246b04` Improves descriptions, removes TOPDOWN.BR_MISPREDICT_SLOTS. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-8-irogers@google.com	2024-02-16 15:24:04 -08:00
Ian Rogers	f9044d46b7	perf vendor events intel: Update haswell events to v35 Update haswell events to v35 released in: `c0f9b34d42` Updates "must be precise" on RTM_RETIRED.ABORTED. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: linux-perf-users@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-7-irogers@google.com	2024-02-16 15:23:53 -08:00
Ian Rogers	24cda3081a	perf vendor events intel: Update grandridge events to v1.01 Update grandridge events to v1.01 released in: `211d607165` Adds the majority of core and uncore events. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-6-irogers@google.com	2024-02-16 15:23:40 -08:00
Ian Rogers	ea518afc99	perf vendor events intel: Update emeraldrapids events to v1.03 Update emeraldrapids events to v1.03 released in: `c7c6f72dae` Adds uncore CHA events. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-5-irogers@google.com	2024-02-16 15:23:24 -08:00
Ian Rogers	7163acea30	perf vendor events intel: Update broadwell events to v29 Update broadwell events to v29 released in: `47117146c6` Updates "must be precise" on RTM_RETIRED.ABORTED. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-4-irogers@google.com	2024-02-16 15:23:07 -08:00
Ian Rogers	5dcc2abaa5	perf vendor events intel: Update alderlaken events to v1.24 Update alderlaken events to v1.24 released in: `e627dd8d89` Adds LBR_INSERTS.ANY/MISC_RETIRED.LBR_INSERTS event. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-3-irogers@google.com	2024-02-16 15:22:48 -08:00
Ian Rogers	2252ddf434	perf vendor events intel: Update alderlake events to v1.24 Update alderlake events to v1.24 released in: `e627dd8d89` Adds aliased events, improves documentation and fix some event fields. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-2-irogers@google.com	2024-02-16 15:22:26 -08:00
Arnaldo Carvalho de Melo	29d16de26d	perf augmented_raw_syscalls.bpf: Move 'struct timespec64' to vmlinux.h If we instead decide to generate vmlinux.h from BTF info, it will be there: $ pahole timespec64 struct timespec64 { time64_t tv_sec; /* 0 8 / long int tv_nsec; / 8 8 / / size: 16, cachelines: 1, members: 2 / / last cacheline: 16 bytes */ }; $ pahole manages to find it from /sys/kernel/btf/vmlinux, that is generated from the kernel types. With this linux/bpf.h doesn't need to be included, as its already in the minimalistic tools/perf/util/bpf_skel/vmlinux/vmlinux.h file or what we need comes when generating a vmlinux.h file from BTF info, i.e. when using GEN_VMLINUX_H=1, as noticed by Namyung in a build break before removing linux/bpf.h. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/Zc_fp6CgDClPhS_O@x1	2024-02-16 15:19:57 -08:00
Michael Petlan	f512e08fd0	perf testsuite: Install kprobe tests and common files Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: kjain@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240215110231.15385-8-mpetlan@redhat.com	2024-02-16 11:50:02 -08:00
Veronika Molnarova	e7d759f31c	perf testsuite: Add test for kprobe handling Test perf interface to kprobes: listing, adding and removing probes. It is run as a part of perftool-testsuite_probe test case. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: kjain@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240215110231.15385-7-mpetlan@redhat.com	2024-02-16 11:49:47 -08:00
Veronika Molnarova	61d348f1e9	perf testsuite: Add common output checking helpers As a form of validation, it is a common practice to check the outputs of commands whether they contain expected patterns or match a certain regex. Add helpers for verifying that all regexes are found in the output, that all lines match any pattern from a set and that a certain expression is not present in the output. In verbose mode these helpers log mismatches for easier failure investigation. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: kjain@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240215110231.15385-6-mpetlan@redhat.com	2024-02-16 11:49:36 -08:00
Veronika Molnarova	c8eb2a9ff8	perf testsuite: Add test case for perf probe Add new perf probe test case that acts as an entry element in perf test list. Runs multiple subtests from directory "base_probe", which will be added in incomming patches and can be expanded without further editing. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: kjain@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240215110231.15385-5-mpetlan@redhat.com	2024-02-16 11:49:22 -08:00

... 2 3 4 5 6 ...

16183 commits