Age | Commit message (Collapse) | Author | Files | Lines |
|
The current sighandler only restores the environment saved before, we
can not tell the SIGBUS reason. Therefore, explictly print si_code like
we do in tsimpleinj, 4 for BUS_MCEERR_AR, and 5 for BUS_MCEERR_AO.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
The post-decrement of count as while loop condition will end with count
equal to -1, result in failing to catch triggering timeout.
Change to use pre-decrement.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
List some situation in which EDAC test will likely fail and give
corresponding solution.
Signed-off-by: Jin Wen <wen.jin@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
When enabling eMCA in BIOS setting on some platforms.two kind similar
EDAC messages for one address may be received, messages include incomplete
EDAC information from BIOS, such as invalid Machine Check Bank information.
e.g., the following is received on CLX-4S,
EDAC skx MC4: CPU 0: Machine Check Event: 0 Bank 255:940000000000009f
this message should be ignored and avoid to be added into reference file.
otherwise the messaage will affect test result.
Signed-off-by: Jin Wen <wen.jin@intel.com>
Signed-off-by: Weihong Zhang <weihong.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Update source codes related to writing log in EDAC test,
Add retry during test specify address.
Signed-off-by: Jin Wen <wen.jin@intel.com>
Signed-off-by: Weihong Zhang <weihong.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Remove temporarily created files before exit. And remove partially created
reference file before exit when received SIGINT or similiar signals.
Signed-off-by: Weihong Zhang <weihong.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Clean up EDAC test script to make it more readable, remove 'exit' after
created reference file to make it suitable for automatic test, and add
reasonable return value to be called by other scripts.
Signed-off-by: Jin Wen <wen.jin@intel.com>
Signed-off-by: Weihong Zhang <weihong.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Remove original delay code, just add engough delay for every error injection.
after trigger injection, add delay for get full kernel message
Add progress display prompt for error injection
Some miscellaneous whitespace/tab cleanup.
Signed-off-by: Weihong Zhang <weihong.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
1) copy/paste in error message prints "put_semaphore" instead of
"get_semaphore"
2) Spurious ";" on "if" statement means incorrect execution.
Signed-off-by: Weihong Zhang <weihong.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Upstream kernel changed the return value from madvise(2) in the case
where a page is already poisoned in v5.13 with
commit 47af12bae17f ("mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned")
Check for the EHWPOISON error code and treat the same as success.
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Sleep one second every ten error injections to avoid triggering CMCI
storm.
Signed-off-by: Jin Wen <wen.jin@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
The output of "find | md5sum" command under one dir is often different from under its
duplicated dir on Red hat OS, which will cause STRESS-HWPOISON-SOFT test fail, but
it isn't found on Ubuntu. Change "find | md5sum" command to "find | sort | md5sum"
command in k_tree_diff() can get expected result in both OS.
Signed-off-by: Jin Wen <wen.jin@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
EDAC drivers sometimes claim to handle memory errors. When this happens,
those errors do not appear in mcelog. We already have tests to check
for the sb_edac driver. Add a check for the Skylake driver (skx_edac).
Signed-off-by: Dezhu Zhang <dezhux.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Missing "echo" in failure path
Signed-off-by: Dezhu Zhang <dezhux.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
The "SRAR DCU" test case failed on a CLX-AP server. It's root caused
that the gcc v8.2.1 optimized out the access to the injected location.
Move the "total" from being a local variable to a global to avoid the
optimization by the compiler. Consequently, it can make sure the poisoned
data is consumed to trigger the machine check recovery process.
Before applying the patch:
run : ./srar_recovery.sh -d
log : The poisoned process can't be killed by kernel automatically. Test fails!
After applying the patch:
run : ./srar_recovery.sh -d
log : ./srar_recovery.sh: line 80: 11650 Broken pipe tail -f trigger --pid=$$
11651 Bus error (core dumped) | victim $1 > log
SRAR/DCU test passes!
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Currently, kernel can correctly handle/recover a user-space error
page, e.g., set "PG_hwpoison" flag, de-attach the error page from
the address mapping, and kill the related processes if it's a
dirty page. While for some kernel error page, kernel can only set
"PG_hwpoison" flag, and return "-EBUSY" error code for system
request (try to be lucky and not touch the kernel error page in
future). The table below shows the error code mapping from kernel
error code to system call error code of madvise() when handling
an error page.
+--------------------------------------------+
| Kernel error code | System call error code |
|-------------------|------------------------|
| MF_IGNORE[1] | EBUSY |
|-------------------|------------------------|
| MF_FAILED[2] | EBUSY |
|-------------------|------------------------|
| MF_DELAYED[2] | 0 (SUCCESS) |
|-------------------|------------------------|
| MF_RECOVERED[2] | 0 (SUCCESS) |
|--------------------------------------------|
| [1] For reserved/slab kernel error pages. |
| [2] For other error pages. |
+--------------------------------------------+
There isn't an existing system error code more suitable than
"EBUSY" to map "MF_IGNORE". And from the above table, the "EBUSY"
system call error code could indicate that kernel ignores a
reserved kernel error page (expected failure) or fails to handle
an error page (real failure).
The page for "clean-anonymous" sub-test from system call
"mmap(..., MAP_PRIVATE|MAP_ANONYMOUS,...)" is a reserved kernel
zeroed page with copy-on-write mapping which kernel can't recover.
So the "EBUSY" error code for this sub-test indicates an expected
failure when doing hardware poison test.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
EDAC drivers sometimes claim to handle memory errors. When this
happens, those errors do not appear in mcelog. We already have
tests to check for the sb_edac driver. Add a check for the Skylake
driver (skx_edac).
Signed-off-by: Zhang Dezhu <dezhux.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Some OS such as Clear Linux has no /etc/issue. Print /etc/os-release
instead of /etc/issue.
Signed-off-by: Dezhu Zhang <dezhux.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This test is used for verifying EDAC driver by checking if its output
can keep correct under different kernel release by comparing against
a reference result run earlier or on earlier kernel version.
Signed-off-by: Jin Wen <wenx.jin@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Signed-off-by: Wen Jin <wenx.jin@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
On system such as 0-day system, S3/S4 mode test is not required, can add environment
variable MCE_TEST_SKIP=s3_s4_test to skip it.
Signed-off-by: Wen Jin <wenx.jin@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Use files under current directory as injected target to avoid
testing fail on system where rootfs setup on ramdisk.
Signed-off-by: Wen Jin <wenx.jin@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Kernel commit:
b37ff71cc626 ("mm: hwpoison: change PageHWPoison behavior on hugetlb pages")
modified Hwpoison behavior for hugetlb pages, only one poisoned page increased
when one hugepage poisoned.
Signed-off-by: Wen Jin <wenx.jin@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
1. get hugepage from /proc/meminfo. Don't support 1G hugepage yet
2. when vm.memory_failure_early_kill = 0, prctl command can toggle
the kill policy correctly between early_kill and late_kill
Signed-off-by: Wen Jin <wenx.jin@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
1. Merge pfa.c into victim.c because some functions such as
vtop is duplicated
2. run mcelog as daemon to get mcelog information background.
3. rewrite some codes in victim.c
Signed-off-by: Wen Jin <wenx.jin@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
1. add --pid to support tail exit automatically after
script is ended
2. Add delay to ensure physical address got from victim
Signed-off-by: Wen Jin <wenx.jin@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Remove simple_process and related codes.
Signed-off-by: Wen Jin <wenx.jin@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
1. Use "victim" replacing "simple_process" test case for KVM
2. Remove MAP_LOCKED flag in mmap() function to avoid failure
Signed-off-by: Wen Jin <wenx.jin@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
1. Use "victim" replacing "simple_process"
2. Remove mcelog related codes , which is useless for the test result
3. Check dmesg every time in wait loop to get test result
4. Increase delay time because of print rate limitation in kernel
Signed-off-by: Wen Jin <wenx.jin@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
When mce-test is used along with ltp, the "pipe" will be used
between them. "fsck" will do nothing under this situation
because it is not an interactive terminal environment.
To make mce-test worked for this situation, add a "-p" option to
fix this issue.
For example:
[root@localhost tmp]# fsck.ext4 /dev/sdb1
e2fsck 1.42.11 (09-Jul-2014)
ALLEN: clean, 11/610800 files, 76472/2441216 blocks
[root@localhost tmp]# fsck.ext4 /dev/sdb1 | tee -a tmp
e2fsck 1.42.11 (09-Jul-2014)
e2fsck: need terminal for interactive repairs
[root@localhost tmp]# fsck.ext4 /dev/sdb1 -p | tee -a tmp
ALLEN: clean, 11/610800 files, 76472/2441216 blocks
[root@localhost tmp]#
comments massaged by Gong.
Signed-off-by: Yilong Ren <yilongx.ren@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
fsck.xfs is a no-op, use xfs_repair to check/repair filesystem.
Signed-off-by: Yilong Ren <yilongx.ren@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Current codes in fs-metadata.sh exist two problems:
* when using "-d" option to specify the test disk, test disk will be
mounted in $K_CWD/../hwpoison but not in $K_CWD.
* test disk free space calculation is wrong
"local free_space=$( df . -m | awk '{ print $3}' | tail -1)"
the variable "free_space" points to "Used", but not *free space*.
comments are rewritten by Gong.
Signed-off-by: Yilong Ren <yilongx.ren@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This patch is used to test SRAR error recovery in QEMU/KVM.
Meanwhile. It uses EINJ to substitute mce-inject as injection
tool to ensure error happended in QEMU context.
Minor update by Gong.
Signed-off-by: Wen Jin <wenx.jin@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
We have two similiar to-be-injected "victim" programs but
it is a little bit redundant. Merge them into one. BTW,
add a new function to enable to choose if injecting a
error by hand.
Minor fix by Gong.
Signed-off-by: Wen Jin <wenx.jin@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
page-types is a utility to be used to translate address from
VA to PA. It has been updated in upstream kernel to accommodate
changes in the kernel. So it should be updated in mce-test, too.
Signed-off-by: Wen Jin <wenx.jin@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Victim will be used as new test case for error injection. It provides
an unified interface to export physical address for CE/PFA/IFU/DCU
test, even for eMCA.
Signed-off-by: zhilongx.liu <zhilongx.liu@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
|
|
casefile is used to save what test cases will be used finally. So
a proper introduction is necessary.
BTW, fix a spell mistake in runmcetest.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
|
|
Fix two bugs in two test cases.
1) In the test for disk file soft off-line, it often fails
because it is mmaped via shared mode. Now chaning it
to private mode to fix wider test environment.
2) in run_soft.sh there is one spell mistake so that some
test case will fail.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
|
|
If BIOS is bogus so that error injection can't be executed
as expected, curent test case will fail. Fix this bug.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
|
|
Too many BIOSes are bogus so that we have to disable
auto trigger mechanism for PFA test case.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
|
|
eMCA is a kind of new mechanism to report H/W errors since
IVB-EX platform. By now only eMCA Gen1 is supported, which
means only CE error can be reported from this path.
Signed-off-by: Liu, ZhilongX <zhilongx.liu@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
|
|
Add load checker of hwpoison-inject module for all other hwpoison
test cases besides run_hugepage_overcommit.sh.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
|
|
Add load checker of hwpoison-inject module for test case
run_hugepage_overcommit.sh.
NOTE: Gong revisits this patch a little bit.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
|
|
After a successful hugetlb page migration by soft offline, the source
page will either be freed into hugepage_freelists or buddy
(over-commit page). If page is in buddy, page_hstate(page) will be
NULL. It will hit a NULL pointer dereference in
dequeue_hwpoisoned_huge_page().
[ 890.677918] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
[ 890.685741] IP: [<ffffffff81163761>] dequeue_hwpoisoned_huge_page+0x131/0x1d0
[ 890.692861] PGD c23762067 PUD c24be2067 PMD 0
[ 890.697314] Oops: 0000 [#1] SMP
This test case is targeted for the bug reported by Jianguo Wu,
where we have NULL pointer access when we have to free source
hugepage under overcommitting situation.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
|
|
Remove possible EDAC driver to avoid interference.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
"&>>" can't be recognized on some Linux OS such as SuSE because it
uses older BASH version, So use substitute mode.
Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
1. Don't use $ROOT to locate BSP directory, $TMP_DIR instead
2. Change the invoke sequence of variables (NUM_FAIL_CPU/NUM_PASS_CPU)
to avoid any complaint.
Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
To avoid temporary files are saved in wrong directory when test
script is executed under its own directory, TMP_DIR path should
be identified before the test.
Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
The lack of double quotation leads to a grammar mistake.
Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Fix incomplete dmesg information which is used for result analysis.
Put related dmesg/mcelog log under path/to/apei-inj/log/.
Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
This test includes regular EINJ error injection test and
Vendor Extension Specific Error Injection test with ACPI5.0 enabled BIOS.
Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
tinjpage and ttranshuge can get SIGCHLD(CLD_DUMPED) from their child
processes, but now they only check CLD_KILLED, so tests fail.
This behavior of the kernel might not be wrong, because the defalut
action of the SIGBUS is 'coredump', not 'terminate' (see comments in
include/linux/signal.h).
With this patch, we accept SIGCHLD(CLD_DUMPED) as a correct behavior.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Code displacement:
- moved common code into helper.sh to avoid duplicates,
- merged run-huge-test.sh into run_hugepage.sh and
run-transhuge-test.sh into run_thp.sh.
Minor improvements:
- added sysctl vm.memory_failure_early_kill=0 in the setup of each
testcase (some testcases change this global parameter, so it's safe
to reset it to 0 to avoid interference between testcases),
- added freeing resources (shmems, semaphores) and unpoisoning
in the cleanup of each testcase,
- added counter check ("HardwareCorrupted:" in /proc/meminfo)
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
New page-types fixes some bugs and support THP, so update this
tool for mce-test.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
One dot is missed in the Makefile so that GDB can't get symbol
table from the binary when debugging.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
The type parameter in mount entry is random especially for pseudo
filesystem, thus, we don't want a hardcode on it.
Signed-off-by: Lans Zhang <jia.zhang@windriver.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
anonymous hugepage, file backed hugepage and shared memory hugepage
need a mounted hugetlbfs.
Signed-off-by: Lans Zhang <jia.zhang@windriver.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Add missed file attribution for BSP test case.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Basic BSP online/offline tests include 3 modes: PER-CPU mode
GROUP-CPU mode and S3/S4 with CPU0 onlined or offlined,
respectively.
Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Coverage test cases are only for white-box test during development
of some RAS features in the kernel. By now it is totally obsolete.
Mask these test cases to avoid confusing users.
It will be removed from the test suite after some time, if no one
has complainant.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
This fixes a compile warning.
open(2) manpage says:
... mode specifies the permissions to use in case a new
file is created. This argument must be supplied when
O_CREAT is specified in flags; ...
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
_XOPEN_SOURCE=500 must be defined for pread
but this will result in MAP_ANONYMOUS not being defined
-> also define _BSD_SOURCE for MAP_ANONYMOUS
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Some she-bang are missed in the bash header.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Add some information to remind one possible reasons
when meeting failures.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
The output from special dialog version has double quote even if
--separate-output is used. If so, rip them to ensure the output
is like regular dialog output.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
On some platforms OS doesn't support parameter notrigger.
Under this kind of situation, injection procedure is dangerous
because it maybe causes sytem oops/crash. If no this parameter,
the test should be teminated.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
On some platforms PFA will not be triggered so that the PFA test
can't finish. So the timeout functionality is necessary to avoid
endless PFA test.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
If ERST table is full, the test can't begin. To avoid
this potential issue, if existing ERST record, erase
one record to relase the storage space and let the test
go on.
Because the ERST test maybe damges the data in the ERST
table, please restore the valid data in the ERST to the
other safe place.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Some EDAC modules will stop mcelog to collect the error log from
kernel mcelog buffer, which cause the mcelog PFA function invalid.
To avoid the influence from EDAC module, remove the specific EDAC
module before the test and restore it after the test.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
On some platforms original PFA case can't work well because
of no actual reading/writing action in time. This patch enhances
the reading/writing operations to ensure the error can be triggered.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Some test scripts can't be recognized well on some Linux OS,
such as Ubuntu. Change default *sh* to *bash*.
Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com>
Signed-off-by: Chen Gong <gong.chen@intel.com>
|
|
This patch adds two SRAR functinal test cases (DCU & IFU). The
SRAR test is highly BIOS dependent so if BIOS is bogus, system
will be hang or panic. By default these two test cases are
disabled, if one wants to test SRAR, please open them.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
On some platforms old methods can't find debugfs correctly,
so a new way via /proc/mounts is used to find debugfs path.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Many minor fixes are added. Some for compatibility, some for
enhancement, and the others for bug fixes.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Old logic will filter out comment lines and the words containing
on/off letters in case list files when executing case selecting.
Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
mcemenu and runmcetest are shell files and should own 'x' bit.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This new design reorganize entire structure of MCE-test. After
applying new structure, MCE-test owns new unified output format
and interface.
In principle, during this change, no functional change. Only some
minor fixes and updates are added, BTW, a few new test cases are
merged such as PFA. Other test cases will be applied after this
change is fused into current MCE-test.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
param_extension is an new module parameter to support
param1/param2 as an BIOS extension for specific vendor.
By default the tests need to enable this parameter to
to get param1/param2.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This is part of the SRAR test cases. It is used
to test DCU error happening under user land and
other CPUs working in the user context, kernel
context, NMI context and IRQ context.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
reformat erst-inject.c to make it to follow UNIX style
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
1) in the last patch after update makefile rule, I forget to update
corresponding shell script. And the shell script mode attribute is
not correct, too
2) update erst-inject tool to provide more friendly prompt
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
this case is used to test read/write/clear operations
on ERST.
Pay attention, please use this case on the kernel >=2.6.39-rc1.
More detail information please refer the test case itself.
BTW, this case doesn't consider the situation such as duplicate
or missing id because current firmware has bugs. It will be
updated after the firmware fixes this issue.
V3 -> V2: Makefile without recursive make
V2 -> V1: add copyright information
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
guest_tmp usage is totally wrong. It assumes existing
the same directory on the host and guest. In fact, the
definition is just correct for guest system. Otherwise,
the file guest_tmp can't be transfered to the host correctly.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
when first connecting to guest OS, guest OS will transfer
its public key fingerprint to the host OS. To avoid interactive
operation in the test procedure, no strict check is necessary.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
1) latest qemu monitor output format is changed, so
update the condition check
2) it looks the starting anonymous memory addresses of simple_process
can't be used as injection address. Just skip them.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
here is the fix list:
1) rc3.d shouldn't be the default start position. it should be
assgined according to the /etc/inittab
2) when test case quits unexpected, qemu should be killed, too.
3) delete an extra local parameter
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Add more content into it to make it more readable and
operable. Besides the update for README file. Some related
patches are added into mce-test suite, too.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Some operations in the procedure of creating guest image
can be done automatically. Such as copying simple_process
and page-types tool into guest image.
Another update is about public/private keys. The original usage
maybe breaks the path relationship because user can set
public/private key file path indepently without HOST_DIR involved.
But these setting is useless, so delete these options.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Here is the list:
1) EARLYKILL is defined but not used
2) some echo info outside functions include "!", which will
make shell confused and give wrong output
3) add page-types check on the host side
4) some $mnt usages are dangerous. Such as $mnt$get_tmp
will return wrong path
5) fix a spell error for variable QEMU_PID
6) update p2v -> x-gpa2hva according to Ying's latest QEMU patch
7) in the usage host_run.sh can be executed directly but in fact
it doesn't. Add execution permission for it.
8) add "-h" description and option "h" should not be given a ":"
9) make "-m" option a consistent action as other options
10) add more conditions check before tests
11) simplify some statements
12) auto mount mce_inject module
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
This reverts commit 916cfd584ec37aa3dec3aae25b265e2701b35246.
|
|
This reverts commit b09f37e5d0d93d33fd5930222cc106708d85e1ed.
|
|
This reverts commit 5c854ab100dcbd6a445a0c07e2f35f40fefe2a59.
|
|
Add more content into it to make it more readable and
operable. Besides the update for README file. Some related
patches are added into mce-test suite, too.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Some operations in the procedure of creating guest image
can be done automatically. Such as copying simple_process
and page-types tool into guest image.
Another update is about public/private keys. The original usage
maybe breaks the path relationship because user can set
public/private key file path indepently without HOST_DIR involved.
But these setting is useless, so delete these options.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
I give a quick overview and find some defects. Here is the list:
1) EARLYKILL is defined but not used
2) some echo info outside functions include "!", which will
make shell confused and give wrong output
3) add page-types check on the host side
4) some $mnt usages are dangerous. Such as $mnt$get_tmp
will return wrong path
5) fix a spell error for variable QEMU_PID
6) update p2v -> x-gpa2hva according to Ying's latest QEMU patch
7) in the usage host_run.sh can be executed directly but in fact
it doesn't. Add execution permission for it.
8) add "-h" description and option "h" should not be given a ":"
9) make "-m" option a consistent action as other options
10) add more conditions check before tests
11) simplify some statements
12) auto mount mce_inject module
All of these fixes don't touch actual functions.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
THP is supported from v2.6.38-rc1. So add hwpoison test for testing it easier.
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Make parameters of write/read_hugepage() understand easier.
And add comment for the write/read_hugepage().
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
The addr of write/read_hugepage() is the mapping address of file.
So no matter how many hugepages are mapped, addr will be
the head address of all hugepages.
The avoid of write/read_hugepage() is the address which does not
want to be touched. So it could be the head address of any hugepage.
So addr == avoid in write/read_hugepage() is not equal always except
the avoid is the address of the first hugepage.
This patch fixed it.
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
When the cowflag is valid, child process should copy all the hugepage of
its parent. But now no matter what cowflag is, the child process will not do
copy-on-write operation. It is because the parameter(size==0) of
write_hugepage() make write_hugepage() do nothing.
This problem is introduced by
commit c6a4c3d950385063db705e520bc9b6cda9587f57
Author: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
With this patch, the state of parent and child processes will be like following:
Before this patch After this patch
NO-COW Parent and child processes are killed. Same as before.
COW Parent and child processes are killed. Only parent process is killed.
(Here process is killed by memory-failure.)
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
si_addr_lsb check in sighandler() is also extended to hugepage shift.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Soft offlining is driven by using options '-O' and '-x'
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Add three testcases for hugepage soft offlining.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Add routines allocating/freeing hugepages of the following types:
- hugepage on shared memory,
- anonymous hugepage,
- filebacked hugepage.
And also add read/write helper functions.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This patch makes the following changes to the mce-test suite's kvm test.
(git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git)
. Re-enable the late kill option (-l) on host_run.sh.
. Add a virtual guest RAM size option (-m) to host_run.sh that gets passed to
qemu-system-x86_64. This allows for testing guest's >= 4069M in size.
. Allow for guest .img files to consist of LVM partitions.
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
add failure statistic and exit value check, so that
it is easy to run automatic test.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
If hwpoison.sh is executed from the top level
Makefile, it doesn't compile/install the required binaries. The Makefile
in mce-test/stress works correctly.
...
Test aborted by unexpected error!
[error] !!! no bin subdir there !!!
Reported-by: Evan McNabb <emcnabb@redhat.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This patch is to fix below problem:
...
result summary:
fs_metadata -- no test finished
details: /root/git/mce-test/stress/log/fs_metadata/fs_metadata.log
fsck.ext3 -- fsck on /dev/loop5 got pass
totally 1 task-groups report failures
...
...
[04-05 16:29:08] thread 0 starts with pid 25027
tee: ./hwpoison/fs_metadata/k-threads.pid: No such file or directory
25027
Signed-off-by: Evan McNabb <emcnabb@redhat.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Supply correct err() and errmsg macro, don't use implicit
ones from glibc with different prototype
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
So far not hooked up to standard "make test" because
the kernel patches are not in yet.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
In case something goes wrong in the kernel with the poisoned
mappings
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
If I run hwpoison.sh without -C option I get the following errors:
./hwpoison.sh: line 366: [: -eq: unary operator expected
./hwpoison.sh: line 371: [: -gt: unary operator expected
./hwpoison.sh: line 372: [: -eq: unary operator expected
The reason is g_children is NULL, which should be zero.
Reported-by: Evan McNabb <emcnabb@redhat.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Not strictly needed due to line buffering, but more
future proof.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This way the child won't fail if there were already other
errors.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Based on a report from Evan McNabb
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
$ltp_root/pan/ - use invalid() to exit when error is related to command option. - add die() to let stress tester work fine with common func check_debugfs().
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
On the Ubuntu platform, sh is linked to dash so that
all of these shell scripts can't run correctly. It needs to
be substituted with BASH.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
added clearing and backing up old logs for kdump driver. As testcases causes
reboot and the script is re-run after each reboot the test ends up in infinite
loop (as setupped stamp is moved).
Second one is with loading mce-inject module. The kdump test driver is
appereantly run with "set -ex" so all lines that can return non zero (and
should not stop script exectuion) must be used only as a part of a
conditionals.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This patch is to add KVM RAS test suite into mce-test, which is a
collection of test scripts for testing the Linux kernel MCE processing
features in KVM guest system.
Signed-off-by: Jiajia Zheng <jiajia.zheng@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
1. auto-load einj module before apei test begins and update APEI_IF
definition to a proper place
2. fix typos in the check_debugfs
3. enhance the module check before stress test
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
1. test path shouldn't be placed under "/"in the stress/hwpoison.sh
2. to clear the log history, backup old test log with different names.
3. add execution attribute for apei test case
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
cleanup some confusion execution paths
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
g_ltppan needs to be updated after g_ltproot is set.
BTW, I consider g_ltppan should be under g_ltproot directly. It is
more clear.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
1. more graceful output in the check_debugfs
2. eliminate trivial usage of parameter "debugfs" in hwpoison.sh
3. add some additional checks before driver kicks off, if not so,
one maybe meets such info "Failed: MCE log is different from input",
in fact it is only because module mce_inject isn't be inserted.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
update definiton of APEI_IF. Now it can be located anywhere.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
check_debugfs should not only be serviced for mce.
And add a new function dedicated for mce.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This essentially renames test-simple to test
Also some minor fixes to the Makefiel
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This runs all the standard functional tests for a quick test in hwpoison
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
- Add way to specify random seed
- Add timeout
- Various new checks to be more user friendly
- Use standard option parsing
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This is to handle kernel where the filter defaults to off.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
- Fix indentation
- Always report failure to parent
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
This patch adds the testcases for mmap/IPV shared pages.
The purpose of these testcases is as follows:
- We can check whether a process A is killed expectedly when it accesses
the page shared with and hwpoisoned by another process B
(in the late killing case).
- We can check whether a process A is killed at once when another process B
injected hwpoison into the page shared by both of them
(in the early killing case).
ChangeLog:
- Add synchronization code between parent and child process with semaphore.
- Share the common function do_shared() between mmap case and IPV case.
- Add error chack code.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
It is the fs-metadata workload. fs-metadata is designed to test i-node
operations with heavy workload and make sure every i-node operation gets
the expected result. In details, it firstly generates a huge directory
hierarchy on the target disk, then it performs unlink operations on this
directory hierarchy and duplicate a copy of the directory, finally it
checks if these two directories are same as expected.
Acked-by: Andi Kleen <andi.kleen@intel.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
page-poisoning test program is an extension of tinjpage test program with a
multi-process model. It spawns thousands of processes that inject HWPosion
error to various pages simultaneously thru madvise syscall. Then it checks
if these errors get handled correctly, i.e. whether each test process
receives or doesn't receive SIGBUS signal as expected.
In details, page-poisoning is designed to cover all of possible userspace page
types via following two test operations:
- anonymous pages operations.
- file data operations.
Acked-by: Andi Kleen <andi.kleen@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Documentation of MCE stress test suite.
Reviewed-by: Jiajia Zheng <jiajia.zheng@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
The MCE stress test suite is a collection of tools and test scripts, which
intends to achieve stress testing for Linux kernel MCA high level handlers
that include HWPosion page recovery, soft page offline, and so on.
In general, this test suite is designed to do stress testing thru various
test interfaces, i.e. madvise syscall, HWPoison page injector, and APEI
injector (see ACPI4.0 spec). And it's able to support most of popular
Linux File Systems (FS), that is, there is an option for user to specify which
FS type they want the test to be running on.
The MCE stress test suite consists of four parts: test driver, workload
controller, customized workloads, and background workloads.
The main test idea is described as below:
- Test driver launchs various customized workloads to continuously generate
lots of pages with expected page states, Note, all of these workloads know
about their expected results that should not be affected by Linux MCE high
level handlers.
- Then test driver injects MCE errors to these pages thru either madvise
syscall or HWPoison injector or APEI injector. While Linux Kernel handling
these MCE errors, all the workloads continue running normally,
- After long time running, test driver will collect test result of each
workload to see if any unexpected failures happened. In such a way, it can
decide if any bug is found.
- If any system panics or FS corruption happens, that means there must be a
bug. It's the bottom line to decide if test gets pass.
Test driver (a.k.a hwpoison.sh) drives the whole test procedure. It's
responsible for managing test environment, setting up error injection
interface, controlling test progress, launching workloads, injecting page
errors, as well as recordng test logs and reportng test result.
Acked-by: Andi Kleen <andi.kleen@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
My debugging shows mmap() with MLOCKED flag will set page dirty again,
then kernel handler would never enter into clean page handling logic.
So use fsync() after mmap() to make the page clean.
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Tested-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
(requires hwpoison-2.6.32)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
tsrc tests hwpoison
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Update the howto.txt.
-recommend to stop cron before mce testing.
-add an introduction to loop-mce-test as well.
Signed-off-by: Zheng Jiajia <jiajia.zheng@intel.com>
|
|
Some parameter changes and other minor changes.
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
Rename to tools/loop-mce-test.sh to follow naming convention. chmod +x.
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Based on work from Dean Nelson
Signed-off-by: Zheng Jiajia <jiajia.zheng@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
- all second errors are optional because the VFS reports only once
- hole errors are optional because we can't propagate errors for holes
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Also add Fengguang as author
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
|
On machine with SER_P, machine_check_poll in kernel filters out MCE
with MCI_STATUS_S instead of MCI_STATUS_UC. So for some test cases run
on machine with/without SER_P, both UC and S should be set.
Signed-off-by: Huang Ying <ying.huang@intel.com>
|
|
SLE11 change the kdump name from "kdump" to "boot.kdump". So
fix it in a usual way.
Signed-off-by: Chen Gong <gong.chen@intel.com>
|
|
update the document for test with kdump test driver.
Signed-off-by: Jiajia Zheng <jiajia.zheng@intel.com>
|
|
Add a new test group -- poll_noser, add three cases -- fatal_poll,
srar_poll and uc_poll to test the conditional control statement in
machine_check_poll.
Signed-off-by: Jiajia Zheng <jiajia.zheng@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
|