Skip to content

Commit

Permalink
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/li…
Browse files Browse the repository at this point in the history
…nux/kernel/git/tip/linux-2.6-tip

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (184 commits)
  perf probe: Clean up probe_point_lazy_walker() return value
  tracing: Fix irqoff selftest expanding max buffer
  tracing: Align 4 byte ints together in struct tracer
  tracing: Export trace_set_clr_event()
  tracing: Explain about unstable clock on resume with ring buffer warning
  ftrace/graph: Trace function entry before updating index
  ftrace: Add .ref.text as one of the safe areas to trace
  tracing: Adjust conditional expression latency formatting.
  tracing: Fix event alignment: skb:kfree_skb
  tracing: Fix event alignment: mce:mce_record
  tracing: Fix event alignment: kvm:kvm_hv_hypercall
  tracing: Fix event alignment: module:module_request
  tracing: Fix event alignment: ftrace:context_switch and ftrace:wakeup
  tracing: Remove lock_depth from event entry
  perf header: Stop using 'self'
  perf session: Use evlist/evsel for managing perf.data attributes
  perf top: Don't let events to eat up whole header line
  perf top: Fix events overflow in top command
  ring-buffer: Remove unused #include <linux/trace_irq.h>
  tracing: Add an 'overwrite' trace_option.
  ...
  • Loading branch information
torvalds committed Mar 16, 2011
2 parents 0586bed + 5e814dd commit a926021
Show file tree
Hide file tree
Showing 140 changed files with 9,295 additions and 4,713 deletions.
7 changes: 7 additions & 0 deletions Documentation/trace/ftrace-design.txt
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,13 @@ You need very few things to get the syscalls tracing in an arch.
- Support the TIF_SYSCALL_TRACEPOINT thread flags.
- Put the trace_sys_enter() and trace_sys_exit() tracepoints calls from ptrace
in the ptrace syscalls tracing path.
- If the system call table on this arch is more complicated than a simple array
of addresses of the system calls, implement an arch_syscall_addr to return
the address of a given system call.
- If the symbol names of the system calls do not match the function names on
this arch, define ARCH_HAS_SYSCALL_MATCH_SYM_NAME in asm/ftrace.h and
implement arch_syscall_match_sym_name with the appropriate logic to return
true if the function name corresponds with the symbol name.
- Tag this arch as HAVE_SYSCALL_TRACEPOINTS.


Expand Down
151 changes: 23 additions & 128 deletions Documentation/trace/ftrace.txt
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,11 @@ of ftrace. Here is a list of some of the key files:
tracers listed here can be configured by
echoing their name into current_tracer.

tracing_enabled:
tracing_on:

This sets or displays whether the current_tracer
is activated and tracing or not. Echo 0 into this
file to disable the tracer or 1 to enable it.
This sets or displays whether writing to the trace
ring buffer is enabled. Echo 0 into this file to disable
the tracer or 1 to enable it.

trace:

Expand Down Expand Up @@ -202,10 +202,6 @@ Here is the list of current tracers that may be configured.
to draw a graph of function calls similar to C code
source.

"sched_switch"

Traces the context switches and wakeups between tasks.

"irqsoff"

Traces the areas that disable interrupts and saves
Expand Down Expand Up @@ -273,39 +269,6 @@ format, the function name that was traced "path_put" and the
parent function that called this function "path_walk". The
timestamp is the time at which the function was entered.

The sched_switch tracer also includes tracing of task wakeups
and context switches.

ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S
ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 10:115:S
ksoftirqd/1-7 [01] 1453.070013: 7:115:R ==> 10:115:R
events/1-10 [01] 1453.070013: 10:115:S ==> 2916:115:R
kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R
ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R

Wake ups are represented by a "+" and the context switches are
shown as "==>". The format is:

Context switches:

Previous task Next Task

<pid>:<prio>:<state> ==> <pid>:<prio>:<state>

Wake ups:

Current task Task waking up

<pid>:<prio>:<state> + <pid>:<prio>:<state>

The prio is the internal kernel priority, which is the inverse
of the priority that is usually displayed by user-space tools.
Zero represents the highest priority (99). Prio 100 starts the
"nice" priorities with 100 being equal to nice -20 and 139 being
nice 19. The prio "140" is reserved for the idle task which is
the lowest priority thread (pid 0).


Latency trace format
--------------------

Expand Down Expand Up @@ -491,78 +454,10 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
latencies, as described in "Latency
trace format".

sched_switch
------------

This tracer simply records schedule switches. Here is an example
of how to use it.

# echo sched_switch > current_tracer
# echo 1 > tracing_enabled
# sleep 1
# echo 0 > tracing_enabled
# cat trace

# tracer: sched_switch
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
bash-3997 [01] 240.132281: 3997:120:R + 4055:120:R
bash-3997 [01] 240.132284: 3997:120:R ==> 4055:120:R
sleep-4055 [01] 240.132371: 4055:120:S ==> 3997:120:R
bash-3997 [01] 240.132454: 3997:120:R + 4055:120:S
bash-3997 [01] 240.132457: 3997:120:R ==> 4055:120:R
sleep-4055 [01] 240.132460: 4055:120:D ==> 3997:120:R
bash-3997 [01] 240.132463: 3997:120:R + 4055:120:D
bash-3997 [01] 240.132465: 3997:120:R ==> 4055:120:R
<idle>-0 [00] 240.132589: 0:140:R + 4:115:S
<idle>-0 [00] 240.132591: 0:140:R ==> 4:115:R
ksoftirqd/0-4 [00] 240.132595: 4:115:S ==> 0:140:R
<idle>-0 [00] 240.132598: 0:140:R + 4:115:S
<idle>-0 [00] 240.132599: 0:140:R ==> 4:115:R
ksoftirqd/0-4 [00] 240.132603: 4:115:S ==> 0:140:R
sleep-4055 [01] 240.133058: 4055:120:S ==> 3997:120:R
[...]


As we have discussed previously about this format, the header
shows the name of the trace and points to the options. The
"FUNCTION" is a misnomer since here it represents the wake ups
and context switches.

The sched_switch file only lists the wake ups (represented with
'+') and context switches ('==>') with the previous task or
current task first followed by the next task or task waking up.
The format for both of these is PID:KERNEL-PRIO:TASK-STATE.
Remember that the KERNEL-PRIO is the inverse of the actual
priority with zero (0) being the highest priority and the nice
values starting at 100 (nice -20). Below is a quick chart to map
the kernel priority to user land priorities.

Kernel Space User Space
===============================================================
0(high) to 98(low) user RT priority 99(high) to 1(low)
with SCHED_RR or SCHED_FIFO
---------------------------------------------------------------
99 sched_priority is not used in scheduling
decisions(it must be specified as 0)
---------------------------------------------------------------
100(high) to 139(low) user nice -20(high) to 19(low)
---------------------------------------------------------------
140 idle task priority
---------------------------------------------------------------

The task states are:

R - running : wants to run, may not actually be running
S - sleep : process is waiting to be woken up (handles signals)
D - disk sleep (uninterruptible sleep) : process must be woken up
(ignores signals)
T - stopped : process suspended
t - traced : process is being traced (with something like gdb)
Z - zombie : process waiting to be cleaned up
X - unknown

overwrite - This controls what happens when the trace buffer is
full. If "1" (default), the oldest events are
discarded and overwritten. If "0", then the newest
events are discarded.

ftrace_enabled
--------------
Expand Down Expand Up @@ -607,10 +502,10 @@ an example:
# echo irqsoff > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
# echo 1 > tracing_enabled
# echo 1 > tracing_on
# ls -ltr
[...]
# echo 0 > tracing_enabled
# echo 0 > tracing_on
# cat trace
# tracer: irqsoff
#
Expand Down Expand Up @@ -715,10 +610,10 @@ is much like the irqsoff tracer.
# echo preemptoff > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
# echo 1 > tracing_enabled
# echo 1 > tracing_on
# ls -ltr
[...]
# echo 0 > tracing_enabled
# echo 0 > tracing_on
# cat trace
# tracer: preemptoff
#
Expand Down Expand Up @@ -863,10 +758,10 @@ tracers.
# echo preemptirqsoff > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
# echo 1 > tracing_enabled
# echo 1 > tracing_on
# ls -ltr
[...]
# echo 0 > tracing_enabled
# echo 0 > tracing_on
# cat trace
# tracer: preemptirqsoff
#
Expand Down Expand Up @@ -1026,9 +921,9 @@ Instead of performing an 'ls', we will run 'sleep 1' under
# echo wakeup > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
# echo 1 > tracing_enabled
# echo 1 > tracing_on
# chrt -f 5 sleep 1
# echo 0 > tracing_enabled
# echo 0 > tracing_on
# cat trace
# tracer: wakeup
#
Expand Down Expand Up @@ -1140,9 +1035,9 @@ ftrace_enabled is set; otherwise this tracer is a nop.

# sysctl kernel.ftrace_enabled=1
# echo function > current_tracer
# echo 1 > tracing_enabled
# echo 1 > tracing_on
# usleep 1
# echo 0 > tracing_enabled
# echo 0 > tracing_on
# cat trace
# tracer: function
#
Expand Down Expand Up @@ -1180,7 +1075,7 @@ int trace_fd;
[...]
int main(int argc, char *argv[]) {
[...]
trace_fd = open(tracing_file("tracing_enabled"), O_WRONLY);
trace_fd = open(tracing_file("tracing_on"), O_WRONLY);
[...]
if (condition_hit()) {
write(trace_fd, "0", 1);
Expand Down Expand Up @@ -1631,9 +1526,9 @@ If I am only interested in sys_nanosleep and hrtimer_interrupt:
# echo sys_nanosleep hrtimer_interrupt \
> set_ftrace_filter
# echo function > current_tracer
# echo 1 > tracing_enabled
# echo 1 > tracing_on
# usleep 1
# echo 0 > tracing_enabled
# echo 0 > tracing_on
# cat trace
# tracer: ftrace
#
Expand Down Expand Up @@ -1879,9 +1774,9 @@ different. The trace is live.
# echo function > current_tracer
# cat trace_pipe > /tmp/trace.out &
[1] 4153
# echo 1 > tracing_enabled
# echo 1 > tracing_on
# usleep 1
# echo 0 > tracing_enabled
# echo 0 > tracing_on
# cat trace
# tracer: function
#
Expand Down
16 changes: 15 additions & 1 deletion Documentation/trace/kprobetrace.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,25 @@ Synopsis of kprobe_events
+|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
(u8/u16/u32/u64/s8/s16/s32/s64) and string are supported.
(u8/u16/u32/u64/s8/s16/s32/s64), "string" and bitfield
are supported.

(*) only for return probe.
(**) this is useful for fetching a field of data structures.

Types
-----
Several types are supported for fetch-args. Kprobe tracer will access memory
by given type. Prefix 's' and 'u' means those types are signed and unsigned
respectively. Traced arguments are shown in decimal (signed) or hex (unsigned).
String type is a special type, which fetches a "null-terminated" string from
kernel space. This means it will fail and store NULL if the string container
has been paged out.
Bitfield is another special type, which takes 3 parameters, bit-width, bit-
offset, and container-size (usually 32). The syntax is;

b<bit-width>@<bit-offset>/<container-size>


Per-Probe Event Filtering
-------------------------
Expand Down
2 changes: 2 additions & 0 deletions arch/x86/ia32/ia32entry.S
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@
#define sysretl_audit ia32_ret_from_sys_call
#endif

.section .entry.text, "ax"

#define IA32_NR_syscalls ((ia32_syscall_end - ia32_sys_call_table)/8)

.macro IA32_ARG_FIXUP noebp=0
Expand Down
2 changes: 2 additions & 0 deletions arch/x86/include/asm/cpufeature.h
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@
#define X86_FEATURE_NODEID_MSR (6*32+19) /* NodeId MSR */
#define X86_FEATURE_TBM (6*32+21) /* trailing bit manipulations */
#define X86_FEATURE_TOPOEXT (6*32+22) /* topology extensions CPUID leafs */
#define X86_FEATURE_PERFCTR_CORE (6*32+23) /* core performance counter extensions */

/*
* Auxiliary flags: Linux defined - For features scattered in various
Expand Down Expand Up @@ -279,6 +280,7 @@ extern const char * const x86_power_flags[32];
#define cpu_has_xsave boot_cpu_has(X86_FEATURE_XSAVE)
#define cpu_has_hypervisor boot_cpu_has(X86_FEATURE_HYPERVISOR)
#define cpu_has_pclmulqdq boot_cpu_has(X86_FEATURE_PCLMULQDQ)
#define cpu_has_perfctr_core boot_cpu_has(X86_FEATURE_PERFCTR_CORE)

#if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64)
# define cpu_has_invlpg 1
Expand Down
1 change: 0 additions & 1 deletion arch/x86/include/asm/kdebug.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ enum die_val {
DIE_PANIC,
DIE_NMI,
DIE_DIE,
DIE_NMIWATCHDOG,
DIE_KERNELDEBUG,
DIE_TRAP,
DIE_GPF,
Expand Down
3 changes: 3 additions & 0 deletions arch/x86/include/asm/msr-index.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@
#define MSR_IA32_MCG_STATUS 0x0000017a
#define MSR_IA32_MCG_CTL 0x0000017b

#define MSR_OFFCORE_RSP_0 0x000001a6
#define MSR_OFFCORE_RSP_1 0x000001a7

#define MSR_IA32_PEBS_ENABLE 0x000003f1
#define MSR_IA32_DS_AREA 0x00000600
#define MSR_IA32_PERF_CAPABILITIES 0x00000345
Expand Down
1 change: 0 additions & 1 deletion arch/x86/include/asm/nmi.h
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@

#ifdef CONFIG_X86_LOCAL_APIC

extern void die_nmi(char *str, struct pt_regs *regs, int do_panic);
extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
extern int reserve_perfctr_nmi(unsigned int);
extern void release_perfctr_nmi(unsigned int);
Expand Down
10 changes: 10 additions & 0 deletions arch/x86/include/asm/smp.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,20 @@
#endif
#include <asm/thread_info.h>
#include <asm/cpumask.h>
#include <asm/cpufeature.h>

extern int smp_num_siblings;
extern unsigned int num_processors;

static inline bool cpu_has_ht_siblings(void)
{
bool has_siblings = false;
#ifdef CONFIG_SMP
has_siblings = cpu_has_ht && smp_num_siblings > 1;
#endif
return has_siblings;
}

DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_map);
DECLARE_PER_CPU(cpumask_var_t, cpu_core_map);
DECLARE_PER_CPU(u16, cpu_llc_id);
Expand Down
Loading

0 comments on commit a926021

Please sign in to comment.