aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorThomas Richter <tmricht@linux.ibm.com>2022-03-17 16:53:46 +0100
committerArnaldo Carvalho de Melo <acme@redhat.com>2022-03-24 17:36:54 -0300
commitd0a0a511493d269514fcbd852481cdca32c95350 (patch)
tree6e9c921a9d8a5164141c58c1d04bcb5081019f3e
parent61726144c9c91c54cb1e8126017ba0fc5a7999e3 (diff)
downloadlinux-d0a0a511493d269514fcbd852481cdca32c95350.tar.gz
perf stat: Fix forked applications enablement of counters
I have run into the following issue: # perf stat -a -e new_pmu/INSTRUCTION_7/ -- mytest -c1 7 Performance counter stats for 'system wide': 0 new_pmu/INSTRUCTION_7/ 0.000366428 seconds time elapsed # The new PMU for s390 counts the execution of certain CPU instructions. The root cause is the extremely small run time of the mytest program. It just executes some assembly instructions and then exits. In above invocation the instruction is executed exactly one time (-c1 option). The PMU is expected to report this one time execution by a counter value of one, but fails to do so in some cases, not all. Debugging reveals the invocation of the child process is done *before* the counter events are installed and enabled. Tracing reveals that sometimes the child process starts and exits before the event is installed on all CPUs. The more CPUs the machine has, the more often this miscount happens. Fix this by reversing the start of the work load after the events have been installed on the specified CPUs. Now the comment also matches the code. Output after: # perf stat -a -e new_pmu/INSTRUCTION_7/ -- mytest -c1 7 Performance counter stats for 'system wide': 1 new_pmu/INSTRUCTION_7/ 0.000366428 seconds time elapsed # Now the correct result is reported rock solid all the time regardless how many CPUs are online. Reviewers notes: Jiri: Right, without -a the event has enable_on_exec so the race does not matter, but it's a problem for system wide with fork. Namhyung: Agreed. Also we may move the enable_counters() and the clock code out of the if block to be shared with the else block. Fixes: acf2892270dcc428 ("perf stat: Use perf_evlist__prepare/start_workload()") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20220317155346.577384-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-rw-r--r--tools/perf/builtin-stat.c2
1 files changed, 1 insertions, 1 deletions
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 3f98689dd6878..60baa3dadc4b6 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -955,10 +955,10 @@ try_again_reset:
* Enable counters and exec the command:
*/
if (forks) {
- evlist__start_workload(evsel_list);
err = enable_counters();
if (err)
return -1;
+ evlist__start_workload(evsel_list);
t0 = rdclock();
clock_gettime(CLOCK_MONOTONIC, &ref_time);