linux/tools/perf
Namhyung Kim e369517ce5 perf callchain: Convert children list to rbtree
Current collapse stage has a scalability problem which can be reproduced
easily with a parallel kernel build.

This is because it needs to traverse every children of callchains
linearly during the collapse/merge stage.

Converting it to a rbtree reduced the overhead significantly.

On my 400MB perf.data file which recorded with make -j32 kernel build:

  $ time perf --no-pager report --stdio > /dev/null

before:
  real	6m22.073s
  user	6m18.683s
  sys	0m0.706s

after:
  real	0m20.780s
  user	0m19.962s
  sys	0m0.689s

During the perf report the overhead on append_chain_children went down
from 96.69% to 18.16%:

  -  18.16%  perf  perf                [.] append_chain_children
     - append_chain_children
        - 77.48% append_chain_children
           + 69.79% merge_chain_branch
           - 22.96% append_chain_children
              + 67.44% merge_chain_branch
              + 30.15% append_chain_children
              + 2.41% callchain_append
           + 7.25% callchain_append
        + 12.26% callchain_append
        + 10.22% merge_chain_branch
  +  11.58%  perf  perf                [.] dso__find_symbol
  +   8.02%  perf  perf                [.] sort__comm_cmp
  +   5.48%  perf  libc-2.17.so        [.] malloc_consolidate

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1381468543-25334-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-21 17:33:23 -03:00
..
arch tools/perf: Standardize feature support define names to: HAVE_{FEATURE}_SUPPORT 2013-10-09 08:48:28 +02:00
bench perf tools: Fix bench/numa.c for 32-bit build 2013-10-21 11:19:42 -03:00
config perf tools: Fix test_on_exit for 32-bit build 2013-10-21 11:19:42 -03:00
Documentation perf trace: Use vfs_getname hook if available 2013-10-16 11:18:24 -03:00
python perf python: Remove duplicate TID bit from mask 2013-08-07 17:35:25 -03:00
scripts perf script: Fix broken include in Context.xs 2013-07-10 13:47:00 -03:00
tests perf tests: Fix memory leak in dso-data.c 2013-10-14 10:28:54 -03:00
ui perf annotate: Another fix for annotate_browser__callq() 2013-10-14 12:21:18 -03:00
util perf callchain: Convert children list to rbtree 2013-10-21 17:33:23 -03:00
.gitignore perf tools: Ignore 'perf timechart' output file 2013-10-11 12:17:37 -03:00
bash_completion perf completion: Use more comp words 2013-10-09 11:12:31 -03:00
builtin-annotate.c perf tools: Separate out GTK codes to libperf-gtk.so 2013-10-09 15:55:25 -03:00
builtin-bench.c tools/perf: Standardize feature support define names to: HAVE_{FEATURE}_SUPPORT 2013-10-09 08:48:28 +02:00
builtin-buildid-cache.c perf buildid-cache: Add ability to add kcore to the cache 2013-10-14 12:20:38 -03:00
builtin-buildid-list.c perf symbols: Generalize filter in __fprintf_buildid methods 2012-12-09 08:46:07 -03:00
builtin-diff.c tools/perf: Add support for record transaction flags 2013-10-04 10:06:12 +02:00
builtin-evlist.c perf evlist: Pass the event_group info via perf_attr_details 2013-02-06 18:09:28 -03:00
builtin-help.c perf help: Fix --help for builtins 2012-10-22 12:35:49 -02:00
builtin-inject.c tools/perf: Standardize feature support define names to: HAVE_{FEATURE}_SUPPORT 2013-10-09 08:48:28 +02:00
builtin-kmem.c perf kmem: Make it work again on non NUMA machines 2013-09-24 14:13:46 -03:00
builtin-kvm.c perf tools: Check mmap pages value early 2013-10-09 11:24:10 -03:00
builtin-list.c perf list: List kernel supplied event aliases 2013-07-12 13:53:53 -03:00
builtin-lock.c perf lock: Account for lock average wait time 2013-10-09 11:24:01 -03:00
builtin-mem.c perf tools: Add attr->mmap2 support 2013-09-11 10:09:32 -03:00
builtin-probe.c tools/perf: Standardize feature support define names to: HAVE_{FEATURE}_SUPPORT 2013-10-09 08:48:28 +02:00
builtin-record.c perf record: Improve write_output error message 2013-10-21 11:19:06 -03:00
builtin-report.c perf tools: Separate out GTK codes to libperf-gtk.so 2013-10-09 15:55:25 -03:00
builtin-sched.c perf tools: change machine__findnew_thread() to set thread pid 2013-08-29 11:51:31 -03:00
builtin-script.c perf script: Print addr by default for BTS 2013-10-21 17:33:22 -03:00
builtin-stat.c perf stat: Add units to nanosec-based counters 2013-10-11 12:17:46 -03:00
builtin-timechart.c perf timechart: Remove event types framework only user 2013-07-15 16:14:47 -03:00
builtin-top.c perf symbols: Add new option --ignore-vmlinux for perf top 2013-10-09 11:42:20 -03:00
builtin-trace.c perf trace: Improve messages related to /proc/sys/kernel/perf_event_paranoid 2013-10-17 17:38:29 -03:00
builtin.h perf tools: Add new mem command for memory access profiling 2013-04-01 12:21:44 -03:00
command-list.txt perf tools: Add new mem command for memory access profiling 2013-04-01 12:21:44 -03:00
CREDITS
design.txt perf tools: Update ioctl documentation for PERF_IOC_FLAG_GROUP 2012-05-31 11:38:42 -03:00
Makefile tools/perf/build: Pass through DEBUG parameter 2013-10-14 10:29:07 -03:00
Makefile.perf perf tools: Implement summary output for 'make install' 2013-10-11 12:18:11 -03:00
MANIFEST perf tools: Introduce tools/lib/lk library 2013-03-15 13:06:00 -03:00
perf-archive.sh perf archive: Make 'f' the last parameter for tar 2012-09-17 13:10:42 -03:00
perf.c perf trace: Add 'trace' alias to 'perf trace' 2013-10-11 12:17:10 -03:00
perf.h tools/perf: Add support for record transaction flags 2013-10-04 10:06:12 +02:00