diff options
author | Tony Luck <tony.luck@intel.com> | 2023-07-19 11:05:10 -0700 |
---|---|---|
committer | Tony Luck <tony.luck@intel.com> | 2023-07-19 11:05:10 -0700 |
commit | 17b57df4684ac939e6db67b715b322e47c3c0961 (patch) | |
tree | 50fb0d8879a708de0e60f4c0d52681ea0d2a1161 | |
parent | 7f52fe31180e3ffb2378e929d3f4b464f0d48717 (diff) | |
download | ras-tools-17b57df4684ac939e6db67b715b322e47c3c0961.tar.gz |
einj_mem_uc: Check if kernel has CMCI disabled
On Intel there is a race between a memory controller reporting it
saw an error with CMCI and the consumption of an uncorrected error
reporting with machine check. If the CMCI wins the race, Linux
takes the page offline before any consumption can occur. Thus
there may be no machine check.
Some users want to explicity test the #MC recovery case. They
disable CMCI in the kernel with the boot flag "mce=no_cmci".
In this case there will always be a machine check. But the test
reports "fail" because it was expecting to se a CMCI.
Add a check to see if CMCI is disabled. If it is, mask out the
F_CMCI expectation.
Signed-off-by: Tony Luck <tony.luck@intel.com>
-rw-r--r-- | einj_mem_uc.c | 16 |
1 files changed, 16 insertions, 0 deletions
diff --git a/einj_mem_uc.c b/einj_mem_uc.c index 72e26df..7978674 100644 --- a/einj_mem_uc.c +++ b/einj_mem_uc.c @@ -976,6 +976,19 @@ void kick_by_file(struct test *t, char *addr) { t->trigger(addr); } +static int cmci_is_disabled(void) +{ + FILE *fp = popen("grep -c mce=no_cmci /proc/cmdline", "r"); + int disabled = 0; + + if (fp) { + fscanf(fp, "%d", &disabled); + pclose(fp); + } + + return disabled; +} + int main(int argc, char **argv) { int c, i; @@ -1040,6 +1053,9 @@ int main(int argc, char **argv) else t = tests; + if (cmci_is_disabled()) + t->flags &= ~F_CMCI; + if ((t->flags & F_FATAL) && !force_flag) { fprintf(stderr, "%s: selected test may be fatal. Use '-f' flag if you really want to do this\n", progname); exit(1); |