aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2023-06-01tprctl: enhance sighandler to explicitly print si_codeHEADmasterShuai Xue1-2/+8
The current sighandler only restores the environment saved before, we can not tell the SIGBUS reason. Therefore, explictly print si_code like we do in tsimpleinj, 4 for BUS_MCEERR_AR, and 5 for BUS_MCEERR_AO. Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2022-10-28tools: victim: fix timeout count with pre-decrementShuai Xue1-1/+1
The post-decrement of count as while loop condition will end with count equal to -1, result in failing to catch triggering timeout. Change to use pre-decrement. Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2022-08-25edac/README: Some notes for EDAC testJin Wen1-0/+28
List some situation in which EDAC test will likely fail and give corresponding solution. Signed-off-by: Jin Wen <wen.jin@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2022-08-25edac.sh: Filter out incomplete EDAC message in EDAC testWeihong Zhang1-9/+39
When enabling eMCA in BIOS setting on some platforms.two kind similar EDAC messages for one address may be received, messages include incomplete EDAC information from BIOS, such as invalid Machine Check Bank information. e.g., the following is received on CLX-4S, EDAC skx MC4: CPU 0: Machine Check Event: 0 Bank 255:940000000000009f this message should be ignored and avoid to be added into reference file. otherwise the messaage will affect test result. Signed-off-by: Jin Wen <wen.jin@intel.com> Signed-off-by: Weihong Zhang <weihong.zhang@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2022-08-25edac.sh: Save more information to log and add retry in test specify addressWeihong Zhang1-19/+42
Update source codes related to writing log in EDAC test, Add retry during test specify address. Signed-off-by: Jin Wen <wen.jin@intel.com> Signed-off-by: Weihong Zhang <weihong.zhang@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2022-08-25edac.sh: Remove temporary files and reference file when abnormal exitWeihong Zhang1-2/+9
Remove temporarily created files before exit. And remove partially created reference file before exit when received SIGINT or similiar signals. Signed-off-by: Weihong Zhang <weihong.zhang@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2022-08-25edac.sh: Check memory config before test and clean scriptWeihong Zhang1-15/+38
Clean up EDAC test script to make it more readable, remove 'exit' after created reference file to make it suitable for automatic test, and add reasonable return value to be called by other scripts. Signed-off-by: Jin Wen <wen.jin@intel.com> Signed-off-by: Weihong Zhang <weihong.zhang@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2022-08-24edac.sh: Update delay code and add progress displayWeihong Zhang1-28/+49
Remove original delay code, just add engough delay for every error injection. after trigger injection, add delay for get full kernel message Add progress display prompt for error injection Some miscellaneous whitespace/tab cleanup. Signed-off-by: Weihong Zhang <weihong.zhang@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2022-08-24mce-test: Fix some typos in thugetlb.cWeihong Zhang1-2/+3
1) copy/paste in error message prints "put_semaphore" instead of "get_semaphore" 2) Spurious ";" on "if" statement means incorrect execution. Signed-off-by: Weihong Zhang <weihong.zhang@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2021-11-16Add wrapper to madvise() calls to handle new kernel error codeTony Luck5-5/+42
Upstream kernel changed the return value from madvise(2) in the case where a page is already poisoned in v5.13 with commit 47af12bae17f ("mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned") Check for the EHWPOISON error code and treat the same as success. Signed-off-by: Tony Luck <tony.luck@intel.com>
2021-07-20Add delay in EDAC test to avoid triggering CMCI stormJin Wen1-0/+16
Sleep one second every ten error injections to avoid triggering CMCI storm. Signed-off-by: Jin Wen <wen.jin@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2019-09-17k_tree_diff: Remove dependency on output order from find(1) commandJin Wen1-2/+2
The output of "find | md5sum" command under one dir is often different from under its duplicated dir on Red hat OS, which will cause STRESS-HWPOISON-SOFT test fail, but it isn't found on Ubuntu. Change "find | md5sum" command to "find | sort | md5sum" command in k_tree_diff() can get expected result in both OS. Signed-off-by: Jin Wen <wen.jin@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2019-07-29mce-test: Unload possible EDAC drivers to avoid interference.Dezhu Zhang3-24/+15
EDAC drivers sometimes claim to handle memory errors. When this happens, those errors do not appear in mcelog. We already have tests to check for the sb_edac driver. Add a check for the Skylake driver (skx_edac). Signed-off-by: Dezhu Zhang <dezhux.zhang@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2019-07-29mce-test: Fix typo issue in run_hugepage_overcommit.shDezhu Zhang1-1/+1
Missing "echo" in failure path Signed-off-by: Dezhu Zhang <dezhux.zhang@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2019-07-16Avoid optimizing out the access to the injected location by compilerQiuxu Zhuo1-1/+10
The "SRAR DCU" test case failed on a CLX-AP server. It's root caused that the gcc v8.2.1 optimized out the access to the injected location. Move the "total" from being a local variable to a global to avoid the optimization by the compiler. Consequently, it can make sure the poisoned data is consumed to trigger the machine check recovery process. Before applying the patch: run : ./srar_recovery.sh -d log : The poisoned process can't be killed by kernel automatically. Test fails! After applying the patch: run : ./srar_recovery.sh -d log : ./srar_recovery.sh: line 80: 11650 Broken pipe tail -f trigger --pid=$$ 11651 Bus error (core dumped) | victim $1 > log SRAR/DCU test passes! Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2019-07-08Add an "EBUSY" check for the "clean-anonymous" sub-testQiuxu Zhuo1-12/+65
Currently, kernel can correctly handle/recover a user-space error page, e.g., set "PG_hwpoison" flag, de-attach the error page from the address mapping, and kill the related processes if it's a dirty page. While for some kernel error page, kernel can only set "PG_hwpoison" flag, and return "-EBUSY" error code for system request (try to be lucky and not touch the kernel error page in future). The table below shows the error code mapping from kernel error code to system call error code of madvise() when handling an error page. +--------------------------------------------+ | Kernel error code | System call error code | |-------------------|------------------------| | MF_IGNORE[1] | EBUSY | |-------------------|------------------------| | MF_FAILED[2] | EBUSY | |-------------------|------------------------| | MF_DELAYED[2] | 0 (SUCCESS) | |-------------------|------------------------| | MF_RECOVERED[2] | 0 (SUCCESS) | |--------------------------------------------| | [1] For reserved/slab kernel error pages. | | [2] For other error pages. | +--------------------------------------------+ There isn't an existing system error code more suitable than "EBUSY" to map "MF_IGNORE". And from the above table, the "EBUSY" system call error code could indicate that kernel ignores a reserved kernel error page (expected failure) or fails to handle an error page (real failure). The page for "clean-anonymous" sub-test from system call "mmap(..., MAP_PRIVATE|MAP_ANONYMOUS,...)" is a reserved kernel zeroed page with copy-on-write mapping which kernel can't recover. So the "EBUSY" error code for this sub-test indicates an expected failure when doing hardware poison test. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2019-03-27mce-test: Unload possible EDAC drivers to avoid interference.Zhang Dezhu3-0/+9
EDAC drivers sometimes claim to handle memory errors. When this happens, those errors do not appear in mcelog. We already have tests to check for the sb_edac driver. Add a check for the Skylake driver (skx_edac). Signed-off-by: Zhang Dezhu <dezhux.zhang@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2019-03-27mce-test: Don't depend on /etc/issueDezhu Zhang3-3/+3
Some OS such as Clear Linux has no /etc/issue. Print /etc/os-release instead of /etc/issue. Signed-off-by: Dezhu Zhang <dezhux.zhang@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2018-09-17Add 'EDAC' regression test caseJin Wen3-0/+310
This test is used for verifying EDAC driver by checking if its output can keep correct under different kernel release by comparing against a reference result run earlier or on earlier kernel version. Signed-off-by: Jin Wen <wenx.jin@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2017-10-30screen command will terminate if stdin is not terminalWen Jin1-1/+6
Signed-off-by: Wen Jin <wenx.jin@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2017-10-30If environment variable MCE_TEST_SKIP=s3_s4_test is set then skip s3_s4_testWen Jin1-1/+7
On system such as 0-day system, S3/S4 mode test is not required, can add environment variable MCE_TEST_SKIP=s3_s4_test to skip it. Signed-off-by: Wen Jin <wenx.jin@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2017-09-27Changed the injected target filesWen Jin1-2/+2
Use files under current directory as injected target to avoid testing fail on system where rootfs setup on ramdisk. Signed-off-by: Wen Jin <wenx.jin@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2017-09-27When one hugepage poisoned, increase num_hwpoison_pages only by 1Wen Jin1-3/+3
Kernel commit: b37ff71cc626 ("mm: hwpoison: change PageHWPoison behavior on hugetlb pages") modified Hwpoison behavior for hugetlb pages, only one poisoned page increased when one hugepage poisoned. Signed-off-by: Wen Jin <wenx.jin@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2016-04-11Update hugepage poison test case in hwpoisonWen Jin2-8/+13
1. get hugepage from /proc/meminfo. Don't support 1G hugepage yet 2. when vm.memory_failure_early_kill = 0, prctl command can toggle the kill policy correctly between early_kill and late_kill Signed-off-by: Wen Jin <wenx.jin@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2016-04-11Merge pfa.c into victim.cWen Jin5-204/+140
1. Merge pfa.c into victim.c because some functions such as vtop is duplicated 2. run mcelog as daemon to get mcelog information background. 3. rewrite some codes in victim.c Signed-off-by: Wen Jin <wenx.jin@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2016-04-11Support daemon exit automaticallyWen Jin1-2/+4
1. add --pid to support tail exit automatically after script is ended 2. Add delay to ensure physical address got from victim Signed-off-by: Wen Jin <wenx.jin@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2016-04-11Remove unused codes about simple_processWen Jin6-78/+5
Remove simple_process and related codes. Signed-off-by: Wen Jin <wenx.jin@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2016-04-11Replace simple_process in test case for KVMWen Jin2-45/+26
1. Use "victim" replacing "simple_process" test case for KVM 2. Remove MAP_LOCKED flag in mmap() function to avoid failure Signed-off-by: Wen Jin <wenx.jin@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2016-04-11Drop obsolete test case "simple_process"Wen Jin3-29/+59
1. Use "victim" replacing "simple_process" 2. Remove mcelog related codes , which is useless for the test result 3. Check dmesg every time in wait loop to get test result 4. Increase delay time because of print rate limitation in kernel Signed-off-by: Wen Jin <wenx.jin@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2016-04-11hwpoison.sh: add -p option with e2fsck commandYilong Ren1-0/+1
When mce-test is used along with ltp, the "pipe" will be used between them. "fsck" will do nothing under this situation because it is not an interactive terminal environment. To make mce-test worked for this situation, add a "-p" option to fix this issue. For example: [root@localhost tmp]# fsck.ext4 /dev/sdb1 e2fsck 1.42.11 (09-Jul-2014) ALLEN: clean, 11/610800 files, 76472/2441216 blocks [root@localhost tmp]# fsck.ext4 /dev/sdb1 | tee -a tmp e2fsck 1.42.11 (09-Jul-2014) e2fsck: need terminal for interactive repairs [root@localhost tmp]# fsck.ext4 /dev/sdb1 -p | tee -a tmp ALLEN: clean, 11/610800 files, 76472/2441216 blocks [root@localhost tmp]# comments massaged by Gong. Signed-off-by: Yilong Ren <yilongx.ren@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2016-04-11hwpoison.sh: use xfs_repair instead of fsck.xfsYilong Ren1-0/+1
fsck.xfs is a no-op, use xfs_repair to check/repair filesystem. Signed-off-by: Yilong Ren <yilongx.ren@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2016-04-11Minor bug fix about fs-metadata.shYilong Ren1-2/+2
Current codes in fs-metadata.sh exist two problems: * when using "-d" option to specify the test disk, test disk will be mounted in $K_CWD/../hwpoison but not in $K_CWD. * test disk free space calculation is wrong "local free_space=$( df . -m | awk '{ print $3}' | tail -1)" the variable "free_space" points to "Used", but not *free space*. comments are rewritten by Gong. Signed-off-by: Yilong Ren <yilongx.ren@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2015-12-25Support SRAR error injection in QEMU/KVMWen Jin6-239/+485
This patch is used to test SRAR error recovery in QEMU/KVM. Meanwhile. It uses EINJ to substitute mce-inject as injection tool to ensure error happended in QEMU context. Minor update by Gong. Signed-off-by: Wen Jin <wenx.jin@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2015-12-25Unify test case for error injectionWen Jin5-353/+212
We have two similiar to-be-injected "victim" programs but it is a little bit redundant. Merge them into one. BTW, add a new function to enable to choose if injecting a error by hand. Minor fix by Gong. Signed-off-by: Wen Jin <wenx.jin@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2015-12-25Update page-typesWen Jin5-114/+460
page-types is a utility to be used to translate address from VA to PA. It has been updated in upstream kernel to accommodate changes in the kernel. So it should be updated in mce-test, too. Signed-off-by: Wen Jin <wenx.jin@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2015-01-21Add victim to replace simple_procee and page_typezhilongx.liu3-0/+260
Victim will be used as new test case for error injection. It provides an unified interface to export physical address for CE/PFA/IFU/DCU test, even for eMCA. Signed-off-by: zhilongx.liu <zhilongx.liu@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
2015-01-21Add new doc to introduce how to create case fileChen, Gong2-2/+10
casefile is used to save what test cases will be used finally. So a proper introduction is necessary. BTW, fix a spell mistake in runmcetest. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
2015-01-21Fix the bugs in hwpoison related test casesChen, Gong2-4/+4
Fix two bugs in two test cases. 1) In the test for disk file soft off-line, it often fails because it is mmaped via shared mode. Now chaning it to private mode to fix wider test environment. 2) in run_soft.sh there is one spell mistake so that some test case will fail. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
2015-01-21core_recovery: Fix the bug in DCU/IFU test caseChen, Gong1-2/+5
If BIOS is bogus so that error injection can't be executed as expected, curent test case will fail. Fix this bug. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
2015-01-21Enable notrigger for PFA test caseChen, Gong1-0/+1
Too many BIOSes are bogus so that we have to disable auto trigger mechanism for PFA test case. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
2015-01-21Add eMCA test caseChen, Gong4-0/+190
eMCA is a kind of new mechanism to report H/W errors since IVB-EX platform. By now only eMCA Gen1 is supported, which means only CE error can be reported from this path. Signed-off-by: Liu, ZhilongX <zhilongx.liu@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
2015-01-21Add extra hwpoison-inject load checkChen, Gong4-0/+20
Add load checker of hwpoison-inject module for all other hwpoison test cases besides run_hugepage_overcommit.sh. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
2015-01-21Add hwpoison-inject load checkNaoya Horiguchi2-0/+18
Add load checker of hwpoison-inject module for test case run_hugepage_overcommit.sh. NOTE: Gong revisits this patch a little bit. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
2015-01-21Add test case 'hugepage_overcommit'Naoya Horiguchi4-1/+71
After a successful hugetlb page migration by soft offline, the source page will either be freed into hugepage_freelists or buddy (over-commit page). If page is in buddy, page_hstate(page) will be NULL. It will hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page(). [ 890.677918] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 [ 890.685741] IP: [<ffffffff81163761>] dequeue_hwpoisoned_huge_page+0x131/0x1d0 [ 890.692861] PGD c23762067 PUD c24be2067 PMD 0 [ 890.697314] Oops: 0000 [#1] SMP This test case is targeted for the bug reported by Jianguo Wu, where we have NULL pointer access when we have to free source hugepage under overcommitting situation. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
2015-01-21Improve test reliability for SRAR caseChen Gong1-0/+13
Remove possible EDAC driver to avoid interference. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21fix "syntax error near unexpected token `>'"Shaoyong Wang2-4/+4
"&>>" can't be recognized on some Linux OS such as SuSE because it uses older BASH version, So use substitute mode. Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com> Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Update the test script for BSP test caseShaoyong Wang2-11/+13
1. Don't use $ROOT to locate BSP directory, $TMP_DIR instead 2. Change the invoke sequence of variables (NUM_FAIL_CPU/NUM_PASS_CPU) to avoid any complaint. Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com> Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21check TMP_DIR path for all runtest.sh scriptsShaoyong Wang10-2/+66
To avoid temporary files are saved in wrong directory when test script is executed under its own directory, TMP_DIR path should be identified before the test. Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com> Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Fix "too many arguments" issue in mcemenuShaoyong Wang1-2/+2
The lack of double quotation leads to a grammar mistake. Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com> Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Update apei-inj test caseShaoyong Wang2-22/+43
Fix incomplete dmesg information which is used for result analysis. Put related dmesg/mcelog log under path/to/apei-inj/log/. Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com> Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Add test case for ACPI5.0 extension support for EINJShaoyong Wang3-0/+208
This test includes regular EINJ error injection test and Vendor Extension Specific Error Injection test with ACPI5.0 enabled BIOS. Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com> Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21hwpoison: check with CLD_KILLED|CLD_DUMPED instead of just CLD_KILLEDNaoya Horiguchi2-3/+5
tinjpage and ttranshuge can get SIGCHLD(CLD_DUMPED) from their child processes, but now they only check CLD_KILLED, so tests fail. This behavior of the kernel might not be wrong, because the defalut action of the SIGBUS is 'coredump', not 'terminate' (see comments in include/linux/signal.h). With this patch, we accept SIGCHLD(CLD_DUMPED) as a correct behavior. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Clean up hwpoison functional testsNaoya Horiguchi7-314/+341
Code displacement: - moved common code into helper.sh to avoid duplicates, - merged run-huge-test.sh into run_hugepage.sh and run-transhuge-test.sh into run_thp.sh. Minor improvements: - added sysctl vm.memory_failure_early_kill=0 in the setup of each testcase (some testcases change this global parameter, so it's safe to reset it to 0 to avoid interference between testcases), - added freeing resources (shmems, semaphores) and unpoisoning in the cleanup of each testcase, - added counter check ("HardwareCorrupted:" in /proc/meminfo) Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Update page-types toolChen Gong1-15/+116
New page-types fixes some bugs and support THP, so update this tool for mce-test. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Update Makefile for hwpoison test caseChen Gong1-1/+1
One dot is missed in the Makefile so that GDB can't get symbol table from the binary when debugging. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Fix hugetlbfs mount path detectionLans Zhang1-2/+3
The type parameter in mount entry is random especially for pseudo filesystem, thus, we don't want a hardcode on it. Signed-off-by: Lans Zhang <jia.zhang@windriver.com> Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Mount hugetlbfs for hwposion-hard testLans Zhang1-0/+6
anonymous hugepage, file backed hugepage and shared memory hugepage need a mounted hugetlbfs. Signed-off-by: Lans Zhang <jia.zhang@windriver.com> Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Update missed file attributionChen Gong2-0/+0
Add missed file attribution for BSP test case. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Add BSP online/offline test caseShaoyong Wang3-0/+358
Basic BSP online/offline tests include 3 modes: PER-CPU mode GROUP-CPU mode and S3/S4 with CPU0 onlined or offlined, respectively. Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com> Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Remove coverage test casesChen Gong2-2/+3
Coverage test cases are only for white-box test during development of some RAS features in the kernel. By now it is totally obsolete. Mask these test cases to avoid confusing users. It will be removed from the test suite after some time, if no one has complainant. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21mce-test: Add file attributes to openThomas Renninger1-1/+1
This fixes a compile warning. open(2) manpage says: ... mode specifies the permissions to use in case a new file is created. This argument must be supplied when O_CREAT is specified in flags; ... Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21mce-test: OpenSUSE Build Service check wants to have the she-bang on topThomas Renninger1-1/+1
Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21mce-test: Fix pread and MAP_ANONYMOUS usageThomas Renninger2-2/+8
_XOPEN_SOURCE=500 must be defined for pread but this will result in MAP_ANONYMOUS not being defined -> also define _BSD_SOURCE for MAP_ANONYMOUS Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21mce-test: Fix she-bang in fs-metadata.sh and k-thread.shThomas Renninger2-4/+4
Some she-bang are missed in the bash header. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Update test result report information for ERST testChen Gong1-2/+3
Add some information to remind one possible reasons when meeting failures. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Add workaround for dialog output because of its versionChen Gong1-0/+5
The output from special dialog version has double quote even if --separate-output is used. If so, rip them to ensure the output is like regular dialog output. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21add check for parameter notrigger in APEI/SRAR test caseChen Gong1-1/+3
On some platforms OS doesn't support parameter notrigger. Under this kind of situation, injection procedure is dangerous because it maybe causes sytem oops/crash. If no this parameter, the test should be teminated. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Add timeout condition in the PFA testChen Gong1-0/+6
On some platforms PFA will not be triggered so that the PFA test can't finish. So the timeout functionality is necessary to avoid endless PFA test. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Clear existed ERST record before the testChen Gong1-3/+15
If ERST table is full, the test can't begin. To avoid this potential issue, if existing ERST record, erase one record to relase the storage space and let the test go on. Because the ERST test maybe damges the data in the ERST table, please restore the valid data in the ERST to the other safe place. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Auto remove EDAC module for PFA testChen Gong1-0/+14
Some EDAC modules will stop mcelog to collect the error log from kernel mcelog buffer, which cause the mcelog PFA function invalid. To avoid the influence from EDAC module, remove the specific EDAC module before the test and restore it after the test. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Update pfa test caseChen Gong1-3/+20
On some platforms original PFA case can't work well because of no actual reading/writing action in time. This patch enhances the reading/writing operations to ensure the error can be triggered. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Use BASH as default shell script interpreterShaoyong Wang14-14/+14
Some test scripts can't be recognized well on some Linux OS, such as Ubuntu. Change default *sh* to *bash*. Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com> Signed-off-by: Chen Gong <gong.chen@intel.com>
2015-01-21Add SRAR DCU/IFU functional test caseChen Gong7-0/+331
This patch adds two SRAR functinal test cases (DCU & IFU). The SRAR test is highly BIOS dependent so if BIOS is bogus, system will be hang or panic. By default these two test cases are disabled, if one wants to test SRAR, please open them. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21update the check method for debugfsChen Gong4-15/+8
On some platforms old methods can't find debugfs correctly, so a new way via /proc/mounts is used to find debugfs path. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Minor fixes for MCE-testChen Gong5-6/+26
Many minor fixes are added. Some for compatibility, some for enhancement, and the others for bug fixes. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21Fix matching issue when selecting test cases from the menuShaoyong Wang1-1/+1
Old logic will filter out comment lines and the words containing on/off letters in case list files when executing case selecting. Signed-off-by: Shaoyong Wang <shaoyongx.wang@intel.com> Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2015-01-21update missed 'x' operation bitv1.0Chen Gong2-0/+0
mcemenu and runmcetest are shell files and should own 'x' bit. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2012-04-24Readd x bits for shell scriptsAndi Kleen39-0/+0
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2012-04-13Reorganize mce-testChen Gong203-1661/+4261
This new design reorganize entire structure of MCE-test. After applying new structure, MCE-test owns new unified output format and interface. In principle, during this change, no functional change. Only some minor fixes and updates are added, BTW, a few new test cases are merged such as PFA. Other test cases will be applied after this change is fused into current MCE-test. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2012-02-21add param_extension as default module parameter for EINJChen Gong2-2/+2
param_extension is an new module parameter to support param1/param2 as an BIOS extension for specific vendor. By default the tests need to enable this parameter to to get param1/param2. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2012-01-12Clarify README a bitAndi Kleen1-3/+7
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2012-01-12add SRAR test case in the user context situationChen Gong5-1/+75
This is part of the SRAR test cases. It is used to test DCU error happening under user land and other CPUs working in the user context, kernel context, NMI context and IRQ context. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2011-05-19format erst-inject.cChen Gong1-154/+154
reformat erst-inject.c to make it to follow UNIX style Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2011-05-19minor fixes for erst testChen Gong2-2/+6
1) in the last patch after update makefile rule, I forget to update corresponding shell script. And the shell script mode attribute is not correct, too 2) update erst-inject tool to provide more friendly prompt Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2011-04-29Add ERST functional test case (V3)Chen Gong6-3/+607
this case is used to test read/write/clear operations on ERST. Pay attention, please use this case on the kernel >=2.6.39-rc1. More detail information please refer the test case itself. BTW, this case doesn't consider the situation such as duplicate or missing id because current firmware has bugs. It will be updated after the firmware fixes this issue. V3 -> V2: Makefile without recursive make V2 -> V1: add copyright information Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2011-04-20fix guest_tmp usage in the kvm SRAO test caseChen Gong1-3/+3
guest_tmp usage is totally wrong. It assumes existing the same directory on the host and guest. In fact, the definition is just correct for guest system. Otherwise, the file guest_tmp can't be transfered to the host correctly. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2011-04-20no strict check when ssh/scp in the kvm SRAO testChen Gong1-2/+3
when first connecting to guest OS, guest OS will transfer its public key fingerprint to the host OS. To avoid interactive operation in the test procedure, no strict check is necessary. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2011-04-20minor fixes for kvm SRAO test againChen Gong2-2/+2
1) latest qemu monitor output format is changed, so update the condition check 2) it looks the starting anonymous memory addresses of simple_process can't be used as injection address. Just skip them. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2011-04-20minor fixes for KVM SRAO test caseChen Gong1-3/+7
here is the fix list: 1) rc3.d shouldn't be the default start position. it should be assgined according to the /etc/inittab 2) when test case quits unexpected, qemu should be killed, too. 3) delete an extra local parameter Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2011-04-20update the KVM-SRAO test README file and related filesChen Gong3-15/+243
Add more content into it to make it more readable and operable. Besides the update for README file. Some related patches are added into mce-test suite, too. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2011-04-20enhance the test flexibility for auto-testChen Gong3-32/+1028
Some operations in the procedure of creating guest image can be done automatically. Such as copying simple_process and page-types tool into guest image. Another update is about public/private keys. The original usage maybe breaks the path relationship because user can set public/private key file path indepently without HOST_DIR involved. But these setting is useless, so delete these options. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2011-04-20Some minor fixes for KVM SRAO test casesChen Gong2-27/+61
Here is the list: 1) EARLYKILL is defined but not used 2) some echo info outside functions include "!", which will make shell confused and give wrong output 3) add page-types check on the host side 4) some $mnt usages are dangerous. Such as $mnt$get_tmp will return wrong path 5) fix a spell error for variable QEMU_PID 6) update p2v -> x-gpa2hva according to Ying's latest QEMU patch 7) in the usage host_run.sh can be executed directly but in fact it doesn't. Add execution permission for it. 8) add "-h" description and option "h" should not be given a ":" 9) make "-m" option a consistent action as other options 10) add more conditions check before tests 11) simplify some statements 12) auto mount mce_inject module Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2011-04-11Revert "Some minor fixes for KVM SRAO test cases"Andi Kleen2-59/+26
This reverts commit 916cfd584ec37aa3dec3aae25b265e2701b35246.
2011-04-11Revert "enhance the test flexibility for auto-test"Andi Kleen3-1028/+32
This reverts commit b09f37e5d0d93d33fd5930222cc106708d85e1ed.
2011-04-11Revert "update the KVM-SRAO test README file and related files"Andi Kleen3-211/+15
This reverts commit 5c854ab100dcbd6a445a0c07e2f35f40fefe2a59.
2011-04-05update the KVM-SRAO test README file and related filesChen Gong3-15/+211
Add more content into it to make it more readable and operable. Besides the update for README file. Some related patches are added into mce-test suite, too. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2011-04-05enhance the test flexibility for auto-testChen Gong3-32/+1028
Some operations in the procedure of creating guest image can be done automatically. Such as copying simple_process and page-types tool into guest image. Another update is about public/private keys. The original usage maybe breaks the path relationship because user can set public/private key file path indepently without HOST_DIR involved. But these setting is useless, so delete these options. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2011-04-05Some minor fixes for KVM SRAO test casesChen Gong2-26/+59
I give a quick overview and find some defects. Here is the list: 1) EARLYKILL is defined but not used 2) some echo info outside functions include "!", which will make shell confused and give wrong output 3) add page-types check on the host side 4) some $mnt usages are dangerous. Such as $mnt$get_tmp will return wrong path 5) fix a spell error for variable QEMU_PID 6) update p2v -> x-gpa2hva according to Ying's latest QEMU patch 7) in the usage host_run.sh can be executed directly but in fact it doesn't. Add execution permission for it. 8) add "-h" description and option "h" should not be given a ":" 9) make "-m" option a consistent action as other options 10) add more conditions check before tests 11) simplify some statements 12) auto mount mce_inject module All of these fixes don't touch actual functions. Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
2011-02-11page-poisoning.c: fix build warningAndi Kleen1-0/+1
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2011-02-11Add hwpoison test for THP.Jin Dongming3-0/+530
THP is supported from v2.6.38-rc1. So add hwpoison test for testing it easier. Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-11-24Cleanup write/read_hugepage()Jin Dongming1-7/+24
Make parameters of write/read_hugepage() understand easier. And add comment for the write/read_hugepage(). Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-11-24Fix unsuitable avoid checking of write/read_hugepage().Jin Dongming1-2/+2
The addr of write/read_hugepage() is the mapping address of file. So no matter how many hugepages are mapped, addr will be the head address of all hugepages. The avoid of write/read_hugepage() is the address which does not want to be touched. So it could be the head address of any hugepage. So addr == avoid in write/read_hugepage() is not equal always except the avoid is the address of the first hugepage. This patch fixed it. Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-11-24Fix write/read_hugepage() for copy-on-write.Jin Dongming1-2/+2
When the cowflag is valid, child process should copy all the hugepage of its parent. But now no matter what cowflag is, the child process will not do copy-on-write operation. It is because the parameter(size==0) of write_hugepage() make write_hugepage() do nothing. This problem is introduced by commit c6a4c3d950385063db705e520bc9b6cda9587f57 Author: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> With this patch, the state of parent and child processes will be like following: Before this patch After this patch NO-COW Parent and child processes are killed. Same as before. COW Parent and child processes are killed. Only parent process is killed. (Here process is killed by memory-failure.) Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-11-08Merge branch 'master' of git://git.kernel.org/pub/scm/utils/cpu/mce/mce-testAndi Kleen5-122/+471
2010-10-29Add missing utils.hAndi Kleen2-7/+1
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-29tinjpage: add hugepage testcasesNaoya Horiguchi2-24/+170
si_addr_lsb check in sighandler() is also extended to hugepage shift. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-29thugetlb: add soft offline code to thugetlb.cNaoya Horiguchi1-89/+47
Soft offlining is driven by using options '-O' and '-x' Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-29tsoftinj: add hugetlb code on tsoftinj.cNaoya Horiguchi1-11/+66
Add three testcases for hugepage soft offlining. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-29add header file giving utility functions for hugepageNaoya Horiguchi1-0/+162
Add routines allocating/freeing hugepages of the following types: - hugepage on shared memory, - anonymous hugepage, - filebacked hugepage. And also add read/write helper functions. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-06Make addr_lsb failure a warning only for nowAndi Kleen1-2/+1
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-06tinjpage: Test for correct si_addr_lsb field in signalsAndi Kleen1-0/+35
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-09-28KVM test fixesDean Nelson1-14/+75
This patch makes the following changes to the mce-test suite's kvm test. (git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git) . Re-enable the late kill option (-l) on host_run.sh. . Add a virtual guest RAM size option (-m) to host_run.sh that gets passed to qemu-system-x86_64. This allows for testing guest's >= 4069M in size. . Allow for guest .img files to consist of LVM partitions. Signed-off-by: Dean Nelson <dnelson@redhat.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-05-25tsimpleinj: enhance automatic test for hwpoison testChen Gong1-2/+13
add failure statistic and exit value check, so that it is easy to run automatic test. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-05-20thugetlb.c: avoid extra newline in errorsAndi Kleen1-1/+1
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-05-20run-huge-test.sh: Fix typo in usage stringAndi Kleen1-1/+1
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-05-20stress, Makefile: make test should depend on make allLi, Haicheng1-1/+1
If hwpoison.sh is executed from the top level Makefile, it doesn't compile/install the required binaries. The Makefile in mce-test/stress works correctly. ... Test aborted by unexpected error! [error] !!! no bin subdir there !!! Reported-by: Evan McNabb <emcnabb@redhat.com> Signed-off-by: Haicheng Li <haicheng.li@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-05-20stress, hwpoison.sh: fix fs_metadata workload for local test dirLi, Haicheng1-1/+1
This patch is to fix below problem: ... result summary: fs_metadata -- no test finished details: /root/git/mce-test/stress/log/fs_metadata/fs_metadata.log fsck.ext3 -- fsck on /dev/loop5 got pass totally 1 task-groups report failures ... ... [04-05 16:29:08] thread 0 starts with pid 25027 tee: ./hwpoison/fs_metadata/k-threads.pid: No such file or directory 25027 Signed-off-by: Evan McNabb <emcnabb@redhat.com> Signed-off-by: Haicheng Li <haicheng.li@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-05-20thugetlb: Declare wait()Andi Kleen1-0/+1
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-05-20thugetlb.c: Fix error reportingAndi Kleen1-2/+5
Supply correct err() and errmsg macro, don't use implicit ones from glibc with different prototype Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-05-20thugetlb.c: Fix printf format stringAndi Kleen1-1/+1
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-05-20Add hugetlb test for hugetlb mca recovery testingNaoya Horiguchi3-1/+528
So far not hooked up to standard "make test" because the kernel patches are not in yet. Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-05-20random_offline: avoid extra unpoison pass on timeoutAndi Kleen1-0/+1
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-05-20random_offline: fix endless run without -t argumentAndi Kleen1-4/+6
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-05-20random_offline: give total success/failure statistics for testAndi Kleen1-2/+17
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-04-11tinjpage: Add more error checks for memory unmapsAndi Kleen1-6/+12
In case something goes wrong in the kernel with the poisoned mappings Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-04-11hwpoison.sh: use default child process num for each workload.Haicheng Li1-1/+2
If I run hwpoison.sh without -C option I get the following errors: ./hwpoison.sh: line 366: [: -eq: unary operator expected ./hwpoison.sh: line 371: [: -gt: unary operator expected ./hwpoison.sh: line 372: [: -eq: unary operator expected The reason is g_children is NULL, which should be zero. Reported-by: Evan McNabb <emcnabb@redhat.com> Signed-off-by: Haicheng Li <haicheng.li@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-04-11tinjpage: Flush stdout in shared page childAndi Kleen1-0/+1
Not strictly needed due to line buffering, but more future proof. Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-04-11tinjpage: Reset failure counter in childAndi Kleen1-0/+2
This way the child won't fail if there were already other errors. Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-04-11Clean up IPC resources on shared memory tests in tinjpageAndi Kleen1-16/+35
Based on a report from Evan McNabb Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-03-13stress: add make test support.Haicheng Li2-1/+16
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-03-13page-poisoning: code cleanup, free unused shared mem timely.Haicheng Li1-21/+30
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-03-13page-poisoning: fix inaccurate result checking.Haicheng Li1-3/+2
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-03-13page-poisoning: fix Bad Address issue in file_clean case.Haicheng Li1-1/+1
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-03-13hwpoison.sh: test mode, to run test in local dir other than on target device.Haicheng Li1-24/+56
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-03-13hwpoison.sh: minor fixes. - as ltp source shows, ltp-pan must be under ↵Haicheng Li1-9/+15
$ltp_root/pan/ - use invalid() to exit when error is related to command option. - add die() to let stress tester work fine with common func check_debugfs(). Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-03-13hwpoison.sh: code cleanup to show more clear log and usage help.Haicheng Li1-12/+18
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-03-13hwpoison.sh: improve show_progress() to show more friendly logs.Haicheng Li1-3/+11
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-03-13hwpoison.sh: regular page unpoisoning support.Haicheng Li1-3/+42
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-03-13hwpoison.sh: to support Page Soft-Offlining testing.Haicheng Li1-20/+57
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-03-13hwpoison.sh: to support nfs, cifs, ocfs2, reiserfs, btrfs, and xfs.Haicheng Li1-39/+64
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-03-13hwpoison.sh: avoid unexpected page-state changing while stress testing.Haicheng Li1-0/+14
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-03-02Work around ubuntu incompatibilities to the linux standardChen Gong13-14/+14
On the Ubuntu platform, sh is linked to dash so that all of these shell scripts can't run correctly. It needs to be substituted with BASH. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-02-19There is problem with commit 138d18351a725e7ef43e6ae4fb7c9405718a797d thatCyril Hrubis1-8/+6
added clearing and backing up old logs for kdump driver. As testcases causes reboot and the script is re-run after each reboot the test ends up in infinite loop (as setupped stamp is moved). Second one is with loading mce-inject module. The kdump test driver is appereantly run with "set -ex" so all lines that can return non zero (and should not stop script exectuion) must be used only as a part of a conditionals. Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-02-19Handle case of multiple debugfs being mountedAndi Kleen1-1/+1
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-02-04Add KVM RAS test suiteJiajia Zheng6-2/+422
This patch is to add KVM RAS test suite into mce-test, which is a collection of test scripts for testing the Linux kernel MCE processing features in KVM guest system. Signed-off-by: Jiajia Zheng <jiajia.zheng@intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com>
2010-01-12Better precondition checking in the test suiteChen Gong3-7/+24
1. auto-load einj module before apei test begins and update APEI_IF definition to a proper place 2. fix typos in the check_debugfs 3. enhance the module check before stress test Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-24Fix debugfs mountingAndi Kleen1-2/+2
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-24some code cleanup againChen Gong4-1/+7
1. test path shouldn't be placed under "/"in the stress/hwpoison.sh 2. to clear the log history, backup old test log with different names. 3. add execution attribute for apei test case Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-24makefile cleanupChen Gong6-5/+12
cleanup some confusion execution paths Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-21path of g_ltppan should be updated after ltp root path setChen Gong1-2/+2
g_ltppan needs to be updated after g_ltproot is set. BTW, I consider g_ltppan should be under g_ltproot directly. It is more clear. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-21some codes cleanupChen Gong5-22/+41
1. more graceful output in the check_debugfs 2. eliminate trivial usage of parameter "debugfs" in hwpoison.sh 3. add some additional checks before driver kicks off, if not so, one maybe meets such info "Failed: MCE log is different from input", in fact it is only because module mce_inject isn't be inserted. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-18The position of APEI_IF is not a const anymoreChen Gong2-2/+2
update definiton of APEI_IF. Now it can be located anywhere. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-18update the definition of check_debugfsChen Gong1-2/+12
check_debugfs should not only be serviced for mce. And add a new function dedicated for mce. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-16Fix english/line wrapping in gcov warningAndi Kleen2-4/+6
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-16Automatically mount debugfs in mce testerAndi Kleen1-0/+7
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-14Add make test-kernel to tsrcAndi Kleen1-0/+3
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-14Add quickstart note to the READMEAndi Kleen1-0/+5
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-14Don't run kdump test in make test, but in make test-kdumpAndi Kleen2-6/+11
This essentially renames test-simple to test Also some minor fixes to the Makefiel Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-14Add standard "make test" in tsrc / run from top level MakefileAndi Kleen3-1/+24
This runs all the standard functional tests for a quick test in hwpoison Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-14random_offline improvementsAndi Kleen1-5/+72
- Add way to specify random seed - Add timeout - Various new checks to be more user friendly - Use standard option parsing Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-14Add comment in tsimpleinj.cAndi Kleen1-0/+3
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-14Rename tinjpage-working to tsimpleinjAndi Kleen2-1/+1
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-14Fix warning/add comment in tring.cAndi Kleen1-2/+4
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-14Add copyright headersAndi Kleen2-1/+24
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-14Clean x.html in tsrc tooAndi Kleen1-0/+1
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-14Random soft offline test casesAndi Kleen4-1/+230
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-08Enable hwpoison filter if neededAndi Kleen1-0/+2
This is to handle kernel where the filter defaults to off. Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-02Fix warnings in simple_process.cAndi Kleen1-2/+5
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-02Fix makefile rules for simple_processAndi Kleen2-5/+13
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-02APEI injection support for mce-testJiajia Zheng9-11/+324
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-10-28Fix tsrc Makefile clean target to clean everythingAndi Kleen1-4/+10
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-10-28tinjpage: add status printf to second test loopNaoya Horiguchi1-1/+3
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-10-28tinjpage: minor changes to shared memory test functionsAndi Kleen1-3/+4
- Fix indentation - Always report failure to parent Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-10-28tinjpage: add test case for mmap/ipv shared pagesNaoya Horiguchi1-0/+160
This patch adds the testcases for mmap/IPV shared pages. The purpose of these testcases is as follows: - We can check whether a process A is killed expectedly when it accesses the page shared with and hwpoisoned by another process B (in the late killing case). - We can check whether a process A is killed at once when another process B injected hwpoison into the page shared by both of them (in the early killing case). ChangeLog: - Add synchronization code between parent and child process with semaphore. - Share the common function do_shared() between mmap case and IPV case. - Add error chack code. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-10-27Add another missing ruleAndi Kleen1-1/+1
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-10-27Add proper dependencies to stress MakefilesAndi Kleen4-13/+21
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-10-27fs-metadata workload: file system metadata test program.Haicheng Li6-0/+588
It is the fs-metadata workload. fs-metadata is designed to test i-node operations with heavy workload and make sure every i-node operation gets the expected result. In details, it firstly generates a huge directory hierarchy on the target disk, then it performs unlink operations on this directory hierarchy and duplicate a copy of the directory, finally it checks if these two directories are same as expected. Acked-by: Andi Kleen <andi.kleen@intel.com> Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com> Signed-off-by: Haicheng Li <haicheng.li@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-10-27page-poisoning workload: multi-process based test program thru madvise syscall.Haicheng Li4-0/+925
page-poisoning test program is an extension of tinjpage test program with a multi-process model. It spawns thousands of processes that inject HWPosion error to various pages simultaneously thru madvise syscall. Then it checks if these errors get handled correctly, i.e. whether each test process receives or doesn't receive SIGBUS signal as expected. In details, page-poisoning is designed to cover all of possible userspace page types via following two test operations: - anonymous pages operations. - file data operations. Acked-by: Andi Kleen <andi.kleen@intel.com> Signed-off-by: Haicheng Li <haicheng.li@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-10-27HOWTO: documentation of MCE stress test suite.Haicheng Li2-0/+344
Documentation of MCE stress test suite. Reviewed-by: Jiajia Zheng <jiajia.zheng@intel.com> Signed-off-by: Haicheng Li <haicheng.li@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-10-27hwpoison.sh: test driver of MCE stress test suiteHaicheng Li4-0/+969
The MCE stress test suite is a collection of tools and test scripts, which intends to achieve stress testing for Linux kernel MCA high level handlers that include HWPosion page recovery, soft page offline, and so on. In general, this test suite is designed to do stress testing thru various test interfaces, i.e. madvise syscall, HWPoison page injector, and APEI injector (see ACPI4.0 spec). And it's able to support most of popular Linux File Systems (FS), that is, there is an option for user to specify which FS type they want the test to be running on. The MCE stress test suite consists of four parts: test driver, workload controller, customized workloads, and background workloads. The main test idea is described as below: - Test driver launchs various customized workloads to continuously generate lots of pages with expected page states, Note, all of these workloads know about their expected results that should not be affected by Linux MCE high level handlers. - Then test driver injects MCE errors to these pages thru either madvise syscall or HWPoison injector or APEI injector. While Linux Kernel handling these MCE errors, all the workloads continue running normally, - After long time running, test driver will collect test result of each workload to see if any unexpected failures happened. In such a way, it can decide if any bug is found. - If any system panics or FS corruption happens, that means there must be a bug. It's the bottom line to decide if test gets pass. Test driver (a.k.a hwpoison.sh) drives the whole test procedure. It's responsible for managing test environment, setting up error injection interface, controlling test progress, launching workloads, injecting page errors, as well as recordng test logs and reportng test result. Acked-by: Andi Kleen <andi.kleen@intel.com> Signed-off-by: Haicheng Li <haicheng.li@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-10-26Avoid tsyncpage failures on NFSHaicheng Li1-1/+1
My debugging shows mmap() with MLOCKED flag will set page dirty again, then kernel handler would never enter into clean page handling logic. So use fsync() after mmap() to make the page clean. Signed-off-by: Haicheng Li <haicheng.li@intel.com> Tested-by: Dean Nelson <dnelson@redhat.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-10-08Fix tprctl tester to actually workAndi Kleen1-5/+5
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-10-08Add prctl tester for hwpoisonAndi Kleen2-1/+99
(requires hwpoison-2.6.32) Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-10-07Add a symlink tsrc -> hwpoison to make it clearAndi Kleen1-0/+1
tsrc tests hwpoison Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-09-22Update the documentation howto.txtZheng Jiajia1-2/+17
Update the howto.txt. -recommend to stop cron before mce testing. -add an introduction to loop-mce-test as well. Signed-off-by: Zheng Jiajia <jiajia.zheng@intel.com>
2009-09-18Minor changes to tools/loop-mce-test.shHuang Ying1-2/+11
Some parameter changes and other minor changes. Signed-off-by: Huang Ying <ying.huang@intel.com>
2009-09-18Rename and chmod tools/loop-mce-testHuang Ying1-0/+0
Rename to tools/loop-mce-test.sh to follow naming convention. chmod +x. Signed-off-by: Huang Ying <ying.huang@intel.com>
2009-09-17loop-mce-test: Exit with error code on failureDean Nelson1-1/+1
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-09-17loop-mce-test: Add a test tool for running the mca test cases in a loop.Zheng Jiajia1-0/+36
Based on work from Dean Nelson Signed-off-by: Zheng Jiajia <jiajia.zheng@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-09-16Merge branch 'master' of git://git.kernel.org/pub/scm/utils/cpu/mce/mce-testAndi Kleen18-20/+28
2009-09-16tinjpage: fix another printf to new formatAndi Kleen1-1/+2
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-09-16tinjpage: mark currently broken in kernel errors optionalAndi Kleen1-4/+10
- all second errors are optional because the VFS reports only once - hole errors are optional because we can't propagate errors for holes Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-09-16tinjpage: clean up output to be easier readableAndi Kleen1-11/+14
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-09-16Add option to injpage to enable sniperAndi Kleen1-3/+34
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-09-16Add GPL copyright header to tinjpageAndi Kleen1-1/+16
Also add Fengguang as author Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-09-16Add mlock test cases to tinjpageAndi Kleen1-11/+43
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-09-16Fix duplicated mcelog records for some cases on machine with SER_PHuang Ying17-17/+17
On machine with SER_P, machine_check_poll in kernel filters out MCE with MCI_STATUS_S instead of MCI_STATUS_UC. So for some test cases run on machine with/without SER_P, both UC and S should be set. Signed-off-by: Huang Ying <ying.huang@intel.com>
2009-09-16fix kdump name definitionChen Gong1-3/+11
SLE11 change the kdump name from "kdump" to "boot.kdump". So fix it in a usual way. Signed-off-by: Chen Gong <gong.chen@intel.com>
2009-09-04Update howto for kdump test driverJiajia Zheng1-6/+30
update the document for test with kdump test driver. Signed-off-by: Jiajia Zheng <jiajia.zheng@intel.com>
2009-08-31New test case to test conditional control in machine_check_pollJiajia Zheng10-3/+84
Add a new test group -- poll_noser, add three cases -- fatal_poll, srar_poll and uc_poll to test the conditional control statement in machine_check_poll. Signed-off-by: Jiajia Zheng <jiajia.zheng@intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com>