linux/kernel/trace/trace_clock.c
Linus Torvalds 8f55cea410 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "There are lots of improvements, the biggest changes are:

  Main kernel side changes:

   - Improve uprobes performance by adding 'pre-filtering' support, by
     Oleg Nesterov.

   - Make some POWER7 events available in sysfs, equivalent to what was
     done on x86, from Sukadev Bhattiprolu.

   - tracing updates by Steve Rostedt - mostly misc fixes and smaller
     improvements.

   - Use perf/event tracing to report PCI Express advanced errors, by
     Tony Luck.

   - Enable northbridge performance counters on AMD family 15h, by Jacob
     Shin.

   - This tracing commit:

        tracing: Remove the extra 4 bytes of padding in events

     changes the ABI.  All involved parties (PowerTop in particular)
     seem to agree that it's safe to do now with the introduction of
     libtraceevent, but the devil is in the details ...

  Main tooling side changes:

   - Add 'event group view', from Namyung Kim:

     To use it, 'perf record' should group events when recording.  And
     then perf report parses the saved group relation from file header
     and prints them together if --group option is provided.  You can
     use the 'perf evlist' command to see event group information:

        $ perf record -e '{ref-cycles,cycles}' noploop 1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.385 MB perf.data (~16807 samples) ]

        $ perf evlist --group
        {ref-cycles,cycles}

     With this example, default perf report will show you each event
     separately.

     You can use --group option to enable event group view:

        $ perf report --group
        ...
        # group: {ref-cycles,cycles}
        # ========
        # Samples: 7K of event 'anon group { ref-cycles, cycles }'
        # Event count (approx.): 6876107743
        #
        #         Overhead  Command      Shared Object                      Symbol
        # ................  .......  .................  ..........................
            99.84%  99.76%  noploop  noploop            [.] main
             0.07%   0.00%  noploop  ld-2.15.so         [.] strcmp
             0.03%   0.00%  noploop  [kernel.kallsyms]  [k] timerqueue_del
             0.03%   0.03%  noploop  [kernel.kallsyms]  [k] sched_clock_cpu
             0.02%   0.00%  noploop  [kernel.kallsyms]  [k] account_user_time
             0.01%   0.00%  noploop  [kernel.kallsyms]  [k] __alloc_pages_nodemask
             0.00%   0.00%  noploop  [kernel.kallsyms]  [k] native_write_msr_safe
             0.00%   0.11%  noploop  [kernel.kallsyms]  [k] _raw_spin_lock
             0.00%   0.06%  noploop  [kernel.kallsyms]  [k] find_get_page
             0.00%   0.02%  noploop  [kernel.kallsyms]  [k] rcu_check_callbacks
             0.00%   0.02%  noploop  [kernel.kallsyms]  [k] __current_kernel_time

     As you can see the Overhead column now contains both of ref-cycles
     and cycles and header line shows group information also - 'anon
     group { ref-cycles, cycles }'.  The output is sorted by period of
     group leader first.

   - Initial GTK+ annotate browser, from Namhyung Kim.

   - Add option for runtime switching perf data file in perf report,
     just press 's' and a menu with the valid files found in the current
     directory will be presented, from Feng Tang.

   - Add support to display whole group data for raw columns, from Jiri
     Olsa.

   - Add per processor socket count aggregation in perf stat, from
     Stephane Eranian.

   - Add interval printing in 'perf stat', from Stephane Eranian.

   - 'perf test' improvements

   - Add support for wildcards in tracepoint system name, from Jiri
     Olsa.

   - Add anonymous huge page recognition, from Joshua Zhu.

   - perf build-id cache now can show DSOs present in a perf.data file
     that are not in the cache, to integrate with build-id servers being
     put in place by organizations such as Fedora.

   - perf top now shares more of the evsel config/creation routines with
     'record', paving the way for further integration like 'top'
     snapshots, etc.

   - perf top now supports DWARF callchains.

   - Fix mmap limitations on 32-bit, fix from David Miller.

   - 'perf bench numa mem' NUMA performance measurement suite

   - ... and lots of fixes, performance improvements, cleanups and other
     improvements I failed to list - see the shortlog and git log for
     details."

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (270 commits)
  perf/x86/amd: Enable northbridge performance counters on AMD family 15h
  perf/hwbp: Fix cleanup in case of kzalloc failure
  perf tools: Fix build with bison 2.3 and older.
  perf tools: Limit unwind support to x86 archs
  perf annotate: Make it to be able to skip unannotatable symbols
  perf gtk/annotate: Fail early if it can't annotate
  perf gtk/annotate: Show source lines with gray color
  perf gtk/annotate: Support multiple event annotation
  perf ui/gtk: Implement basic GTK2 annotation browser
  perf annotate: Fix warning message on a missing vmlinux
  perf buildid-cache: Add --update option
  uprobes/perf: Avoid uprobe_apply() whenever possible
  uprobes/perf: Teach trace_uprobe/perf code to use UPROBE_HANDLER_REMOVE
  uprobes/perf: Teach trace_uprobe/perf code to pre-filter
  uprobes/perf: Teach trace_uprobe/perf code to track the active perf_event's
  uprobes: Introduce uprobe_apply()
  perf: Introduce hw_perf_event->tp_target and ->tp_list
  uprobes/perf: Always increment trace_uprobe->nhit
  uprobes/tracing: Kill uprobe_trace_consumer, embed uprobe_consumer into trace_uprobe
  uprobes/tracing: Introduce is_trace_uprobe_enabled()
  ...
2013-02-19 17:49:41 -08:00

127 lines
3 KiB
C

/*
* tracing clocks
*
* Copyright (C) 2009 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
*
* Implements 3 trace clock variants, with differing scalability/precision
* tradeoffs:
*
* - local: CPU-local trace clock
* - medium: scalable global clock with some jitter
* - global: globally monotonic, serialized clock
*
* Tracer plugins will chose a default from these clocks.
*/
#include <linux/spinlock.h>
#include <linux/irqflags.h>
#include <linux/hardirq.h>
#include <linux/module.h>
#include <linux/percpu.h>
#include <linux/sched.h>
#include <linux/ktime.h>
#include <linux/trace_clock.h>
/*
* trace_clock_local(): the simplest and least coherent tracing clock.
*
* Useful for tracing that does not cross to other CPUs nor
* does it go through idle events.
*/
u64 notrace trace_clock_local(void)
{
u64 clock;
/*
* sched_clock() is an architecture implemented, fast, scalable,
* lockless clock. It is not guaranteed to be coherent across
* CPUs, nor across CPU idle events.
*/
preempt_disable_notrace();
clock = sched_clock();
preempt_enable_notrace();
return clock;
}
EXPORT_SYMBOL_GPL(trace_clock_local);
/*
* trace_clock(): 'between' trace clock. Not completely serialized,
* but not completely incorrect when crossing CPUs either.
*
* This is based on cpu_clock(), which will allow at most ~1 jiffy of
* jitter between CPUs. So it's a pretty scalable clock, but there
* can be offsets in the trace data.
*/
u64 notrace trace_clock(void)
{
return local_clock();
}
/*
* trace_clock_global(): special globally coherent trace clock
*
* It has higher overhead than the other trace clocks but is still
* an order of magnitude faster than GTOD derived hardware clocks.
*
* Used by plugins that need globally coherent timestamps.
*/
/* keep prev_time and lock in the same cacheline. */
static struct {
u64 prev_time;
arch_spinlock_t lock;
} trace_clock_struct ____cacheline_aligned_in_smp =
{
.lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED,
};
u64 notrace trace_clock_global(void)
{
unsigned long flags;
int this_cpu;
u64 now;
local_irq_save(flags);
this_cpu = raw_smp_processor_id();
now = sched_clock_cpu(this_cpu);
/*
* If in an NMI context then dont risk lockups and return the
* cpu_clock() time:
*/
if (unlikely(in_nmi()))
goto out;
arch_spin_lock(&trace_clock_struct.lock);
/*
* TODO: if this happens often then maybe we should reset
* my_scd->clock to prev_time+1, to make sure
* we start ticking with the local clock from now on?
*/
if ((s64)(now - trace_clock_struct.prev_time) < 0)
now = trace_clock_struct.prev_time + 1;
trace_clock_struct.prev_time = now;
arch_spin_unlock(&trace_clock_struct.lock);
out:
local_irq_restore(flags);
return now;
}
static atomic64_t trace_counter;
/*
* trace_clock_counter(): simply an atomic counter.
* Use the trace_counter "counter" for cases where you do not care
* about timings, but are interested in strict ordering.
*/
u64 notrace trace_clock_counter(void)
{
return atomic64_add_return(1, &trace_counter);
}