aboutsummaryrefslogtreecommitdiffstats
path: root/fs/jbd2/recovery.c
AgeCommit message (Collapse)AuthorFilesLines
2024-01-04jbd2: add errseq to detect client fs's bdev writeback errorZhihao Cheng1-6/+1
Add errseq in journal, so that JBD2 can detect whether metadata is successfully written to fs bdev. This patch adds detection in recovery process to replace original solution(using local variable wb_err). Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231213013224.2100050-2-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-10-05jbd2: fix potential data lost in recovering journal raced with synchronizing ↵Zhihao Cheng1-0/+8
fs bdev JBD2 makes sure journal data is fallen on fs device by sync_blockdev(), however, other process could intercept the EIO information from bdev's mapping, which leads journal recovering successful even EIO occurs during data written back to fs device. We found this problem in our product, iscsi + multipath is chosen for block device of ext4. Unstable network may trigger kpartx to rescan partitions in device mapper layer. Detailed process is shown as following: mount kpartx irq jbd2_journal_recover do_one_pass memcpy(nbh->b_data, obh->b_data) // copy data to fs dev from journal mark_buffer_dirty // mark bh dirty vfs_read generic_file_read_iter // dio filemap_write_and_wait_range __filemap_fdatawrite_range do_writepages block_write_full_folio submit_bh_wbc >> EIO occurs in disk << end_buffer_async_write mark_buffer_write_io_error mapping_set_error set_bit(AS_EIO, &mapping->flags) // set! filemap_check_errors test_and_clear_bit(AS_EIO, &mapping->flags) // clear! err2 = sync_blockdev filemap_write_and_wait filemap_check_errors test_and_clear_bit(AS_EIO, &mapping->flags) // false err2 = 0 Filesystem is mounted successfully even data from journal is failed written into disk, and ext4/ocfs2 could become corrupted. Fix it by comparing the wb_err state in fs block device before recovering and after recovering. A reproducer can be found in the kernel bugzilla referenced below. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217888 Cc: stable@vger.kernel.org Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230919012525.1783108-1-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-10-05jbd2: fix printk format type for 'io_block' in do_one_pass()Ye Bin1-1/+1
'io_block' is unsinged long but print it by '%ld'. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230904105817.1728356-3-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-10-05jbd2: print io_block if check data block checksum failed when do recoveryYe Bin1-1/+2
Now, if check data block checksum failed only print data's block number then skip write data. However, one data block may in more than one transaction. In some scenarios, offline analysis is inconvenient. As a result, it is difficult to locate the areas where data is faulty. So print 'io_block' if check data block checksum failed. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230904105817.1728356-2-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-27jbd2: correct the end of the journal recovery scan rangeZhang Yi1-9/+3
We got a filesystem inconsistency issue below while running generic/475 I/O failure pressure test with fast_commit feature enabled. Symlink /p3/d3/d1c/d6c/dd6/dce/l101 (inode #132605) is invalid. If fast_commit feature is enabled, a special fast_commit journal area is appended to the end of the normal journal area. The journal->j_last point to the first unused block behind the normal journal area instead of the whole log area, and the journal->j_fc_last point to the first unused block behind the fast_commit journal area. While doing journal recovery, do_one_pass(PASS_SCAN) should first scan the normal journal area and turn around to the first block once it meet journal->j_last, but the wrap() macro misuse the journal->j_fc_last, so the recovering could not read the next magic block (commit block perhaps) and would end early mistakenly and missing tN and every transaction after it in the following example. Finally, it could lead to filesystem inconsistency. | normal journal area | fast commit area | +-------------------------------------------------+------------------+ | tN(rere) | tN+1 |~| tN-x |...| tN-1 | tN(front) | .... | +-------------------------------------------------+------------------+ / / / start journal->j_last journal->j_fc_last This patch fix it by use the correct ending journal->j_last. Fixes: 5b849b5f96b4 ("jbd2: fast commit recovery path") Cc: stable@kernel.org Reported-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/linux-ext4/20230613043120.GB1584772@mit.edu/ Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230626073322.3956567-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26jbd2: continue to record log between each mountZhang Yi1-5/+17
For a newly mounted file system, the journal committing thread always record new transactions from the start of the journal area, no matter whether the journal was clean or just has been recovered. So the logdump code in debugfs cannot dump continuous logs between each mount, it is disadvantageous to analysis corrupted file system image and locate the file system inconsistency bugs. If we get a corrupted file system in the running products and want to find out what has happened, besides lookup the system log, one effective way is to backtrack the journal log. But we may not always run e2fsck before each mount and the default fsck -a mode also cannot always checkout all inconsistencies, so it could left over some inconsistencies into the next mount until we detect it. Finally, transactions in the journal may probably discontinuous and some relatively new transactions has been covered, it becomes hard to analyse. If we could record transactions continuously between each mount, we could acquire more useful info from the journal. Like this: |Previous mount checkpointed/recovered logs|Current mount logs | |{------}{---}{--------} ... {------}| ... |{======}{========}...000000| And yes the journal area is limited and cannot record everything, the problematic transaction may also be covered even if we do this, but this is still useful for fuzzy tests and short-running products. This patch save the head blocknr in the superblock after flushing the journal or unmounting the file system, let the next mount could continue to record new transaction behind it. This change is backward compatible because the old kernel does not care about the head blocknr of the journal. It is also fine if we mount a clean old image without valid head blocknr, we fail back to set it to s_first just like before. Finally, for the case of mount an unclean file system, we could also get the journal head easily after scanning/replaying the journal, it will continue to record new transaction after the recovered transactions. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230322013353.1843306-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-10-10Merge tag 'mm-stable-2022-10-08' of ↵Linus Torvalds1-6/+10
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ...
2022-09-30jbd2: add miss release buffer head in fc_do_one_pass()Ye Bin1-0/+1
In fc_do_one_pass() miss release buffer head after use which will lead to reference count leak. Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220917093805.1782845-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-11jbd2: replace ll_rw_block()Zhang Yi1-6/+10
ll_rw_block() is not safe for the sync read path because it cannot guarantee that submitting read IO if the buffer has been locked. We could get false positive EIO after wait_on_buffer() if the buffer has been locked by others. So stop using ll_rw_block() in journal_get_superblock(). We also switch to new bh_readahead_batch() for the buffer array readahead path. Link: https://lkml.kernel.org/r/20220901133505.2510834-7-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-04Merge tag 'ext4_for_linus' of ↵Linus Torvalds1-15/+15
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Add new ioctls to set and get the file system UUID in the ext4 superblock and improved the performance of the online resizing of file systems with bigalloc enabled. Fixed a lot of bugs, in particular for the inline data feature, potential races when creating and deleting inodes with shared extended attribute blocks, and the handling of directory blocks which are corrupted" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (37 commits) ext4: add ioctls to get/set the ext4 superblock uuid ext4: avoid resizing to a partial cluster size ext4: reduce computation of overhead during resize jbd2: fix assertion 'jh->b_frozen_data == NULL' failure when journal aborted ext4: block range must be validated before use in ext4_mb_clear_bb() mbcache: automatically delete entries from cache on freeing mbcache: Remove mb_cache_entry_delete() ext2: avoid deleting xattr block that is being reused ext2: unindent codeblock in ext2_xattr_set() ext2: factor our freeing of xattr block reference ext4: fix race when reusing xattr blocks ext4: unindent codeblock in ext4_xattr_block_set() ext4: remove EA inode entry from mbcache on inode eviction mbcache: add functions to delete entry if unused mbcache: don't reclaim used entries ext4: make sure ext4_append() always allocates new block ext4: check if directory block is within i_size ext4: reflect mb_optimize_scan value in options file ext4: avoid remove directory when directory is corrupted ext4: aligned '*' in comments ...
2022-08-02jbd2: rename jbd_debug() to jbd2_debug()Jan Kara1-15/+15
The name of jbd_debug() is confusing as all functions inside jbd2 have jbd2_ prefix. Rename jbd_debug() to jbd2_debug(). No functional changes. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Link: https://lore.kernel.org/r/20220608112355.4397-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-07-14fs/buffer: Combine two submit_bh() and ll_rw_block() argumentsBart Van Assche1-2/+2
Both submit_bh() and ll_rw_block() accept a request operation type and request flags as their first two arguments. Micro-optimize these two functions by combining these first two arguments into a single argument. This patch does not change the behavior of any of the modified code. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jan Kara <jack@suse.cz> Acked-by: Song Liu <song@kernel.org> (for the md changes) Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-48-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-10jbd2: clean up two gcc -Wall warnings in recovery.cTheodore Ts'o1-3/+3
Fix a signed vs unsigned and a void * pointer arithmetic warning. This cleanup is also in e2fsprogs commit aec460db9a93 ("e2fsck: clean up two gcc -Wall warnings in recovery.c"). Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-10jbd2: fix clang warning in recovery.cTheodore Ts'o1-1/+0
Remove unused variable store which was never used. This fix is also in e2fsprogs commit 99a2294f85f0 ("e2fsck: value stored to err is never read"). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-10jbd2: fix portability problems caused by unaligned accessesTheodore Ts'o1-11/+11
This commit applies the e2fsck/recovery.c portions of commit 1e0c8ca7c08a ("e2fsck: fix portability problems caused by unaligned accesses) from the e2fsprogs git tree. The on-disk format for the ext4 journal can have unaigned 32-bit integers. This can happen when replaying a journal using a obsolete checksum format (which was never popularly used, since the v3 format replaced v2 while the metadata checksum feature was being stablized). Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-09ext4: fix debug format string warningArnd Bergmann1-3/+2
Using no_printk() for jbd_debug() revealed two warnings: fs/jbd2/recovery.c: In function 'fc_do_one_pass': fs/jbd2/recovery.c:256:30: error: format '%d' expects a matching 'int' argument [-Werror=format=] 256 | jbd_debug(3, "Processing fast commit blk with seq %d"); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fs/ext4/fast_commit.c: In function 'ext4_fc_replay_add_range': fs/ext4/fast_commit.c:1732:30: error: format '%d' expects argument of type 'int', but argument 2 has type 'long unsigned int' [-Werror=format=] 1732 | jbd_debug(1, "Converting from %d to %d %lld", The first one was added incorrectly, and was also missing a few newlines in debug output, and the second one happened when the type of an argument changed. Reported-by: kernel test robot <lkp@intel.com> Fixes: d556435156b7 ("jbd2: avoid -Wempty-body warnings") Fixes: 6db074618969 ("ext4: use BIT() macro for BH_** state bits") Fixes: 5b849b5f96b4 ("jbd2: fast commit recovery path") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210409201211.1866633-1-arnd@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-01-27block: use an on-stack bio in blkdev_issue_flushChristoph Hellwig1-1/+1
There is no point in allocating memory for a synchronous flush. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-06jbd2: rename j_maxlen to j_total_len and add jbd2_journal_max_txn_bufsHarshad Shirwadkar1-3/+3
The on-disk superblock field sb->s_maxlen represents the total size of the journal including the fast commit area and is no more the max number of blocks available for a transaction. The maximum number of blocks available to a transaction is reduced by the number of fast commit blocks. So, this patch renames j_maxlen to j_total_len to better represent its intent. Also, it adds a function to calculate max number of bufs available for a transaction. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-6-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-21jbd2: fast commit recovery pathHarshad Shirwadkar1-4/+53
This patch adds fast commit recovery support in JBD2. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201015203802.3597742-7-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18jbd2: avoid transaction reuse after reformattingchangfengnan1-12/+66
When ext4 is formatted with lazy_journal_init=1 and transactions from the previous filesystem are still on disk, it is possible that they are considered during a recovery after a crash. Because the checksum seed has changed, the CRC check will fail, and the journal recovery fails with checksum error although the journal is otherwise perfectly valid. Fix the problem by checking commit block time stamps to determine whether the data in the journal block is just stale or whether it is indeed corrupt. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Fengnan Chang <changfengnan@hikvision.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201012164900.20197-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19jbd2: clean up checksum verification in do_one_pass()Shijie Luo1-34/+12
Remove the unnecessary chksum_err and checksum_seen variables as well as some redundant code to make the function easier to understand. [ With changes suggested by jack@ and tytso@ ] Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200819122955.33526-1-luoshijie1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-22block: remove the error_sector argument to blkdev_issue_flushChristoph Hellwig1-1/+1
The argument isn't used by any caller, and drivers don't fill out bi_sector for flush requests either. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-18jbd2: clarify bad journal block checksum messageTheodore Ts'o1-2/+2
There were two error messages emitted by jbd2, one for a bad checksum for a jbd2 descriptor block, and one for a bad checksum for a jbd2 data block. Change the data block checksum error so that the two can be disambiguated. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-12-17ext4: fix up remaining files with SPDX cleanupsTheodore Ts'o1-4/+1
A number of ext4 source files were skipped due because their copyright permission statements didn't match the expected text used by the automated conversion utilities. I've added SPDX tags for the rest. While looking at some of these files, I've noticed that we have quite a bit of variation on the licenses that were used --- in particular some of the Red Hat licenses on the jbd2 files use a GPL2+ license, and we have some files that have a LGPL-2.1 license (which was quite surprising). I've not attempted to do any license changes. Even if it is perfectly legal to relicense to GPL 2.0-only for consistency's sake, that should be done with ext4 developer community discussion. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-06-07fs: have ll_rw_block users pass in op and flags separatelyMike Christie1-2/+2
This has ll_rw_block users pass in the operation and flags separately, so ll_rw_block can setup the bio op and bi_rw flags on the bio that is submitted. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-18Merge branch 'master' into for-nextJiri Kosina1-26/+5
Sync with Linus' tree so that patches against newer codebase can be applied. Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-04-18Doc: treewide : Fix typos in DocBook/filesystem.xmlMasanari Iida1-1/+1
This patch fix spelling typos found in DocBook/filesystem.xml. It is because the file was generated from comments in code, I have to fix the comments in codes, instead of xml file. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-02-22jbd2: unify revoke and tag block checksum handlingJan Kara1-26/+5
Revoke and tag descriptor blocks are just different kinds of descriptor blocks and thus have checksum in the same place. Unify computation and checking of checksums for these. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-10-17jbd2: clean up feature test macros with predicate functionsDarrick J. Wong1-11/+7
Create separate predicate functions to test/set/clear feature flags, thereby replacing the wordy old macros. Furthermore, clean out the places where we open-coded feature tests. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-10-17ext4: call out CRC and corruption errors with specific error codesDarrick J. Wong1-4/+4
Instead of overloading EIO for CRC errors and corrupt structures, return the same error codes that XFS returns for the same issues. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-05-14jbd2: fix r_count overflows leading to buffer overflow in journal recoveryDarrick J. Wong1-1/+9
The journal revoke block recovery code does not check r_count for sanity, which means that an evil value of r_count could result in the kernel reading off the end of the revoke table and into whatever garbage lies beyond. This could crash the kernel, so fix that. However, in testing this fix, I discovered that the code to write out the revoke tables also was not correctly checking to see if the block was full -- the current offset check is fine so long as the revoke table space size is a multiple of the record size, but this is not true when either journal_csum_v[23] are set. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
2015-01-19jbd2: complain about descriptor block checksum errorsDarrick J. Wong1-0/+3
We should complain in dmesg when journal recovery fails on account of the descriptor block being corrupt, so that the diagnostic data can be recovered. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-16jbd2: free bh when descriptor block checksum failsDarrick J. Wong1-0/+1
Free the buffer head if the journal descriptor block fails checksum verification. This is the jbd2 port of the e2fsprogs patch "e2fsck: free bh on csum verify error in do_one_pass". Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Cc: stable@vger.kernel.org
2014-08-28jbd2: fix descriptor block size handling errors with journal_csumDarrick J. Wong1-11/+15
It turns out that there are some serious problems with the on-disk format of journal checksum v2. The foremost is that the function to calculate descriptor tag size returns sizes that are too big. This causes alignment issues on some architectures and is compounded by the fact that some parts of jbd2 use the structure size (incorrectly) to determine the presence of a 64bit journal instead of checking the feature flags. Therefore, introduce journal checksum v3, which enlarges the descriptor block tag format to allow for full 32-bit checksums of journal blocks, fix the journal tag function to return the correct sizes, and fix the jbd2 recovery code to use feature flags to determine 64bitness. Add a few function helpers so we don't have to open-code quite so many pieces. Switching to a 16-byte block size was found to increase journal size overhead by a maximum of 0.1%, to convert a 32-bit journal with no checksumming to a 32-bit journal with checksum v3 enabled. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: TR Reardon <thomas_reardon@hotmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-08-28jbd2: fix infinite loop when recovering corrupt journal blocksDarrick J. Wong1-2/+5
When recovering the journal, don't fall into an infinite loop if we encounter a corrupt journal block. Instead, just skip the block and return an error, which fails the mount and thus forces the user to run a full filesystem fsck. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-12-08jbd2: rename obsoleted msg JBD->JBD2Dmitry Monakhov1-1/+1
Rename performed via: perl -pi -e 's/JBD:/JBD2:/g' fs/jbd2/*.c Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2013-08-28jbd2: Fix endian mixing problems in the checksumming codeDarrick J. Wong1-12/+12
In the jbd2 checksumming code, explicitly declare separate variables with endianness information so that we don't get confused and screw things up again. Also fixes sparse warnings. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-05-28jbd2: fix block tag checksum verification brokennessDarrick J. Wong1-6/+5
Al Viro complained of a ton of bogosity with regards to the jbd2 block tag header checksum. This one checksum is 16 bits, so cut off the upper 16 bits and treat it as a 16-bit value and don't mess around with be32* conversions. Fortunately metadata checksumming is still "experimental" and not in a shipping e2fsprogs, so there should be few users affected by this. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2012-08-17jbd2: check return value of blkdev_issue_flush()Theodore Ts'o1-2/+5
blkdev_issue_flush() can fail; make sure the error gets properly propagated. This is a port of the equivalent jbd patch from commit 349ecd6a3c0e. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-27jbd2: checksum data blocks that are stored in the journalDarrick J. Wong1-0/+30
Calculate and verify checksums of each data block being stored in the journal. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-27jbd2: checksum commit blocksDarrick J. Wong1-0/+31
Calculate and verify the checksum of commit blocks. In checksum v2, deprecate most of the checksum v1 commit block checksum fields, since each block has its own checksum. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-27jbd2: checksum descriptor blocksDarrick J. Wong1-1/+36
Calculate and verify a checksum of each descriptor block. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-27jbd2: checksum revocation blocksDarrick J. Wong1-0/+22
Compute and verify revoke blocks inside the journal. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-22jbd2: change disk layout for metadata checksummingDarrick J. Wong1-3/+3
Define flags and allocate space in on-disk journal structures to support checksumming of journal metadata. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-13jbd2: issue cache flush after checkpointing even with internal journalJan Kara1-1/+4
When we reach jbd2_cleanup_journal_tail(), there is no guarantee that checkpointed buffers are on a stable storage - especially if buffers were written out by jbd2_log_do_checkpoint(), they are likely to be only in disk's caches. Thus when we update journal superblock effectively removing old transaction from journal, this write of superblock can get to stable storage before those checkpointed buffers which can result in filesystem corruption after a crash. Thus we must unconditionally issue a cache flush before we update journal superblock in these cases. A similar problem can also occur if journal superblock is written only in disk's caches, other transaction starts reusing space of the transaction cleaned from the log and power failure happens. Subsequent journal replay would still try to replay the old transaction but some of it's blocks may be already overwritten by the new transaction. For this reason we must use WRITE_FUA when updating log tail and we must first write new log tail to disk and update in-memory information only after that. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-11-01jbd2: Unify log messages in jbd2 codeEryu Guan1-14/+14
Some jbd2 code prints out kernel messages with "JBD2: " prefix, at the same time other jbd2 code prints with "JBD: " prefix. Unify the prefix to "JBD2: ". Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-12-18jbd2: move debug message into debug #ifdefTheodore Ts'o1-1/+1
This is a port to jbd2 of a patch which Namhyung Kim <namhyung@gmail.com> originally made to fs/jbd. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-06-14ext4: remove initialized but not read variablesAndi Kleen1-8/+2
No real bugs found, just removed some dead code. Found by gcc 4.6's new warnings. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo1-1/+0
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2008-10-10jbd2: fix error handling for checkpoint ioHidehiro Kawai1-2/+5
When a checkpointing IO fails, current JBD2 code doesn't check the error and continue journaling. This means latest metadata can be lost from both the journal and filesystem. This patch leaves the failed metadata blocks in the journal space and aborts journaling in the case of jbd2_log_do_checkpoint(). To achieve this, we need to do: 1. don't remove the failed buffer from the checkpoint list where in the case of __try_to_free_cp_buf() because it may be released or overwritten by a later transaction 2. jbd2_log_do_checkpoint() is the last chance, remove the failed buffer from the checkpoint list and abort the journal 3. when checkpointing fails, don't update the journal super block to prevent the journaled contents from being cleaned. For safety, don't update j_tail and j_tail_sequence either 4. when checkpointing fails, notify this error to the ext4 layer so that ext4 don't clear the needs_recovery flag, otherwise the journaled contents are ignored and cleaned in the recovery phase 5. if the recovery fails, keep the needs_recovery flag 6. prevent jbd2_cleanup_journal_tail() from being called between __jbd2_journal_drop_transaction() and jbd2_journal_abort() (a possible race issue between jbd2_log_do_checkpoint()s called by jbd2_journal_flush() and __jbd2_log_wait_for_space()) Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-06-06jbd2: If a journal checksum error is detected, propagate the error to ext4Theodore Ts'o1-7/+4
If a journal checksum error is detected, the ext4 filesystem will call ext4_error(), and the mount will either continue, become a read-only mount, or cause a kernel panic based on the superblock flags indicating the user's preference of what to do in case of filesystem corruption being detected. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-05-26jbd2: Fix memory leak when verifying checksums in the journalTheodore Ts'o1-0/+1
Cc: Andreas Dilger <adilger@clusterfs.com> Cc: Girish Shilamkar <girish@clusterfs.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-03-19jbd2: correctly unescape journal data blocksDuane Griffin1-1/+1
Fix a long-standing typo (predating git) that will cause data corruption if a journal data block needs unescaping. At the moment the wrong buffer head's data is being unescaped. To test this case mount a filesystem with data=journal, start creating and deleting a bunch of files containing only JBD2_MAGIC_NUMBER (0xc03b3998), then pull the plug on the device. Without this patch the files will contain zeros instead of the correct data after recovery. Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05JBD2: Use the incompat macro for testing the incompat feature.Aneesh Kumar K.V1-1/+1
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT needs to be checked with JBD2_HAS_INCOMPAT_FEATURE Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-06BKL-removal: remove incorrect comment refering to lock_kernel() from jbd/jbd2Andi Kleen1-1/+1
None of the callers of this function does actually take the BKL as far as I can see. So remove the comment refering to the BKL. Signed-off-by: Andi Kleen <ak@suse.de> Cc: <linux-ext4@vger.kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-28ext4: Add the journal checksum featureGirish Shilamkar1-8/+143
The journal checksum feature adds two new flags i.e JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM. JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the checksum for the blocks described by the descriptor blocks. Due to checksums, writing of the commit record no longer needs to be synchronous. Now commit record can be sent to disk without waiting for descriptor blocks to be written to disk. This behavior is controlled using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be able to recover the journal with _ASYNC_COMMIT hence it is made incompat. The commit header has been extended to hold the checksum along with the type of the checksum. For recovery in pass scan checksums are verified to ensure the sanity and completeness(in case of _ASYNC_COMMIT) of every transaction. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Girish Shilamkar <girish@clusterfs.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2007-10-17jbd2: JBD_XXX to JBD2_XXX naming cleanupMingming Cao1-1/+1
change JBD_XXX macros to JBD2_XXX in JBD2/Ext4 Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-07-19fix ext4/JBD2 build warningsMingming Cao1-3/+3
Looking at the current linus-git tree jbd_debug() define in include/linux/jbd2.h extern u8 journal_enable_debug; #define jbd_debug(n, f, a...) \ do { \ if ((n) <= journal_enable_debug) { \ printk (KERN_DEBUG "(%s, %d): %s: ", \ __FILE__, __LINE__, __FUNCTION__); \ printk (f, ## a); \ } \ } while (0) > fs/ext4/inode.c: In function ‘ext4_write_inode’: > fs/ext4/inode.c:2906: warning: comparison is always true due to limited > range of data type > > fs/jbd2/recovery.c: In function ‘jbd2_journal_recover’: > fs/jbd2/recovery.c:254: warning: comparison is always true due to > limited range of data type > fs/jbd2/recovery.c:257: warning: comparison is always true due to > limited range of data type > > fs/jbd2/recovery.c: In function ‘jbd2_journal_skip_recovery’: > fs/jbd2/recovery.c:301: warning: comparison is always true due to > limited range of data type > Noticed all warnings are occurs when the debug level is 0. Then found the "jbd2: Move jbd2-debug file to debugfs" patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0f49d5d019afa4e94253bfc92f0daca3badb990b changed the jbd2_journal_enable_debug from int type to u8, makes the jbd_debug comparision is always true when the debugging level is 0. Thus the compile warning occurs. Thought about changing the jbd2_journal_enable_debug data type back to int, but can't, because the jbd2-debug is moved to debug fs, where calling debugfs_create_u8() to create the debugfs entry needs the value to be u8 type. Even if we changed the data type back to int, the code is still buggy, kernel should not print jbd2 debug message if the jbd2_journal_enable_debug is set to 0. But this is not the case. The fix is change the level of debugging to 1. The same should fixed in ext3/JBD, but currently ext3 jbd-debug via /proc fs is broken, so we probably should fix it all together. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: Theodore Tso <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-18jbd2: Fix CONFIG_JBD_DEBUG ifdef to be CONFIG_JBD2_DEBUGJose R. Santos1-1/+1
When the JBD code was forked to create the new JBD2 code base, the references to CONFIG_JBD_DEBUG where never changed to CONFIG_JBD2_DEBUG. This patch fixes that. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-05-09fix file specification in commentsUwe Kleine-König1-1/+1
Many files include the filename at the beginning, serveral used a wrong one. Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-11[PATCH] jbd2: switch blks_type from sector_t to ullMingming Cao1-6/+6
Similar to ext4, change blocks in JBD2 from sector_t to unsigned long long. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] jbd2: sector_t conversionMingming Cao1-4/+4
JBD layer in-kernel block varibles type fixes to support >32 bit block number and convert to sector_t type. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] 64-bit jbd2 coreZach Brown1-13/+30
Here is the patch to JBD to handle 64 bit block numbers, originally from Zach Brown. This patch is useful only after adding support for 64-bit block numbers in the filesystem. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] jbd2: rename jbd2 symbols to avoid duplication of jbd symbolsMingming Cao1-23/+23
Mingming Cao originally did this work, and Shaggy reproduced it using some scripts from her. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] jbd2: initial copy of files from jbdDave Kleikamp1-0/+592
This is a simple copy of the files in fs/jbd to fs/jbd2 and /usr/incude/linux/[ext4_]jbd.h to /usr/include/[ext4_]jbd2.h Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>