Age | Commit message (Collapse) | Author | Files | Lines |
|
Source kernel commit: 131fa58d391fc0939f6c66b23776ad5df5db20f9
Don't use u32, use uint32_t, because this won't work in xfsprogs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: no-op commit, fixed previously to keep build working]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 6d8a45ce29c7d67cc4fc3016dc2a07660c62482a
xfs_bmap_btalloc is given a range of file offset blocks that must be
allocated to some data/attr/cow fork. If the fork has an extent size
hint associated with it, the request will be enlarged on both ends to
try to satisfy the alignment hint. If free space is fragmentated,
sometimes we can allocate some blocks but not enough to fulfill any of
the requested range. Since bmapi_allocate always trims the new extent
mapping to match the originally requested range, this results in
bmapi_write returning zero and no mapping.
The consequences of this vary -- buffered writes will simply re-call
bmapi_write until it can satisfy at least one block from the original
request. Direct IO overwrites notice nmaps == 0 and return -ENOSPC
through the dio mechanism out to userspace with the weird result that
writes fail even when we have enough space because the ENOSPC return
overrides any partial write status. For direct CoW writes the situation
was disastrous because nobody notices us returning an invalid zero-length
wrong-offset mapping to iomap and the write goes off into space.
Therefore, if free space is so fragmented that we managed to allocate
some space but not enough to map into even a single block of the
original allocation request range, we should break the alignment hint in
order to guarantee at least some forward progress for the direct write.
If we return a short allocation to iomap_apply it'll call back about the
remaining blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 4b4c1326fd7c7210d23d9dd3bfc51f2b6477bb9e
Since the CoW fork only exists in memory, it is incorrect to update the
on-disk quota block counts when we modify the CoW fork. Unlike the data
fork, even real extents in the CoW fork are only delalloc-style
reservations (on-disk they're owned by the refcountbt) so they must not
be tracked in the on disk quota info. Ensure the i_delayed_blks
accounting reflects this too.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 751f3767c245f9adf4f0a4f8f04aae9ae1d675a0
Move all the inode and quota accounting updates out of xfs_bmap_btalloc
in preparation for fixing some quota accounting problems with copy on
write.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 22431bf3dfbf44d7356933776eb486a6a01dea6f
Refactor inode verifier error reporting into a non-libxfs function so
that we aren't encoding the message format in libxfs. This also
changes the kernel dmesg output to resemble buffer verifier errors
more closely.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 6ca30729c206d62d88730a904af7d543a56273d8
Remove the extent size hint and realtime inode relevant code from
the xfs_bmapi_reserve_delalloc since it is not called on the inode
with extent size hint set or on a realtime inode.
Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: fb1755a645972ed096047583600838f6cf414e2b
By splitting the b_fspriv field into two different fields (b_log_item
and b_li_list). It's possible to get rid of an old ABI workaround, by
using the new b_log_item field to store xfs_buf_log_item separated from
the log items attached to the buffer, which will be linked in the new
b_li_list field.
This way, there is no more need to reorder the log items list to place
the buf_log_item at the beginning of the list, simplifying a bit the
logic to handle buffer IO.
This also opens the possibility to change buffer's log items list into a
proper list_head.
b_log_item field is still defined as a void *, because it is still used
by the log buffers to store xlog_in_core structures, and there is no
need to add an extra field on xfs_buf just for xlog_in_core.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: minor style changes]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[sandeen: b_li_list unused in userspace]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: f0e28280629e0ec7921f3179409a179b1ea41f24
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 4bb73d014785cc55225686f9f46e7192fb59d26b
Currently, we don't check sb_agblocks or sb_agblklog when we validate
the superblock, which means that we can fuzz garbage values into those
values and the mount succeeds. This leads to all sorts of UBSAN
warnings in xfs/350 since we can then coerce other parts of xfs into
shifting by ridiculously large values.
Once we've validated agblocks, make sure the agcount makes sense.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
[sandeen: fix up u32 usage now so we keep building]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: be78ff0e72778eb4df4aac66edb9e97462bfe00d
Eryu Guan reported seeing occasional hangs when running generic/269 with
a new fsstress that supports clonerange/deduperange. The cause of this
hang is an infinite loop when we convert the CoW fork extents from
unwritten to real just prior to writing the pages out; the infinite
loop happens because there's nothing in the CoW fork to convert, and so
it spins forever.
The fundamental issue here is that when we go to perform these CoW fork
conversions, we're supposed to have an extent waiting for us, but the
low space CoW reaper has snuck in and blown them away! There are four
conditions that can dissuade the reaper from touching our file -- no
reflink iflag; dirty page cache; writeback in progress; or directio in
progress. We check the four conditions prior to taking the locks, but
we neglect to recheck them once we have the locks, which is how we end
up whacking the writeback that's in progress.
Therefore, refactor the four checks into a helper function and call it
once again once we have the locks to make sure we really want to reap
the inode. While we're at it, add an ASSERT for this weird condition so
that we'll fail noisily if we ever screw this up again.
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Tested-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 55e45429ce3e4ac9dd2bf4937b1a499a69ccc4ca
A btree format inode fork with zero records makes no sense, so reject it
if we see it, or else we can miscalculate memory allocations. Found by
zeroes fuzzing {a,u3}.bmbt.numrecs in xfs/{374,378,412} with KASAN.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 79a69bf8dc240ebeb105226a8a8540df136bf987
In the attribute leaf verifier, we can check for obviously bad values of
firstused and count so that later attempts at lasthash don't run off the
end of the memory buffer. Found by ones fuzzing hdr.count in xfs/400 with
KASAN.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: ce92d29ddf9908d397895c46b7c78e9db8df414d
In xfs_scrub_dir_rec, we must walk through the directory block entries
to arrive at the offset given by the hash structure. If we blindly
trust the hash address, we can end up midway into a directory entry and
stray outside the block. Found by lastbit fuzzing lents[3].address in
xfs/390 with KASAN enabled.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 46d9bfb5e706493777b9dfed666cd8967f69e6fd
While we're scrubbing various btrees, cross-reference the records
with the other metadata.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 49db55eca5665e32c9d3e67a7d5694bcc6c274de
Add a couple of functions to the refcount btrees that will be used
to cross-reference metadata against the refcountbt.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: ed7c52d4bf92ac1f05b8c251a44a8bf4688f8786
Add a couple of functions to the rmap btrees that will be used
to cross-reference metadata against the rmapbt.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 2e001266b67c865ad904e1889658282d0773b207
Add a couple of functions to the inode btrees that will be used
to cross-reference metadata against the inobt.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: ce1d802e6a889b8ee53b3444c6d7e8cfecadac50
Add a couple of functions to the free space btrees that will be used
to cross-reference metadata against the bnobt/cntbt, and a generic
btree function that provides the real implementation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: c468562879a766de2c2fbedd41b653a7bf4c157d
Chris Dunlop reports a problem where an xattr operation fails,
reports the following error to syslog and hangs during unmount:
================================================
[ BUG: lock held when returning to user space! ]
...
------------------------------------------------
<PID> is leaving the kernel with locks still held!
1 lock held by <PID>:
#0: (sb_internal){......}, at: [<ffffffffa07692a3>] xfs_trans_alloc+0xe3/0x130 [xfs]
The failure/shutdown occurs during deferred ops processing which
leads to an error return from xfs_defer_finish() via
xfs_attr_leaf_addname(). While the root cause of the failure is
unknown corruption, the cause of the subsequent BUG above and
unmount hang is failure to cancel the transaction before returning
to userspace.
The transaction is not cancelled because the out_defer_cancel error
handling paths in the xfs_attr_[leaf|node]_[add|remove]name()
functions clear args.trans without releasing the transaction. The
callers therefore lose the reference to the transaction and fail to
cancel it.
Since xfs_attr_[set|remove]() always cancel args.trans when != NULL
and xfs_defer_finish()->...->xfs_trans_roll() should always return
with a valid transaction, update the leaf/node xattr functions to
not reset args.trans in the error path responsible for cancelling
deferred ops.
Reported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: ad90bb585c45917b6c1bb01c812fba337e689362
XFS started using the perag metadata reservation pool for free inode
btree blocks in commit 76d771b4cbe33 ("xfs: use per-AG reservations
for the finobt"). To handle backwards compatibility, finobt blocks
are accounted against the pool so long as the full reservation is
available at mount time. Otherwise the ->m_inotbt_nores flag is set
and the filesystem falls back to the traditional per-transaction
finobt reservation.
This commit has two problems:
- finobt blocks are always accounted against the metadata
reservation on allocation, regardless of ->m_inotbt_nores state
- finobt blocks are never returned to the reservation pool on free
The first problem affects reflink+finobt filesystems where the full
finobt reservation is not available at mount time. finobt blocks are
essentially stolen from the reflink reservation, putting refcountbt
management at risk of allocation failure. The second problem is an
unconditional leak of metadata reservation whenever finobt is
enabled.
Update the finobt block allocation callouts to consider
->m_inotbt_nores and account blocks appropriately. Blocks should be
consistently accounted against the metadata pool when
->m_inotbt_nores is false and otherwise tagged as RESV_NONE.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: a8789a5ae28f69d7f3791a0e74f8c44222f3108b
It appears that the check for versions 4 or more is incorrect and is
off-by-one. Fix this.
Detected by CoverityScan, CID#1463775 ("Logically dead code")
Fixes: ac503a4cc9e8 ("xfs: refactor the geometry structure filling function")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: c96900435fa9fdfd9702a60cd765bd85e380303e
Starting with commit 57e734423ad ("vsprintf: refactor %pK code out of
pointer"), the behavior of the raw '%p' printk format specifier was
changed to print a 32-bit hash of the pointer value to avoid leaking
kernel pointers into dmesg. For most situations that's good.
This is /undesirable/ behavior when we're trying to debug XFS, however,
so define a PTR_FMT that prints the actual pointer when we're in debug
mode.
Note that %p for tracepoints still prints the raw pointer, so in the
long run we could consider rewriting some of these messages as
tracepoints.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 3d170aa24283568b1ed92a09daa0e05a8788c6a4
Since %p prepends "0x" to the outputted string, we can drop the prefix.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 46c59736d8090e602f960aeaf1c6b8292151bf38
If a malicious filesystem image contains a block+ format directory
wherein the directory inode's core.mode is set such that
S_ISDIR(core.mode) == 0, and if there are subdirectories of the
corrupted directory, an attempt to traverse up the directory tree will
crash the kernel in __xfs_dir3_data_check. Running the online scrub's
parent checks will tend to do this.
The crash occurs because the directory inode's d_ops get set to
xfs_dir[23]_nondir_ops (it's not a directory) but the parent pointer
scrubber's indiscriminate call to xfs_readdir proceeds past the ASSERT
if we have non fatal asserts configured.
Fix the null pointer dereference crash in __xfs_dir3_data_check by
looking for S_ISDIR or wrong d_ops; and teach the parent scrubber
to bail out if it is fed a non-directory "parent".
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: ac503a4cc9e8ab574032e3e217ffb555f5bf2341
Refactor the geometry structure filling function to use the superblock
to fill the fields. While we're at it, make the function less indenty
and use some whitespace to make the function easier to read.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: c368ebcd4cc3bbc08602adce083ad3cc76a15258
Move xfs_fs_geometry to libxfs so that we can clean up the fs geometry
reporting in xfsprogs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: b872af2c8700e9d64af8e13811b7679ede26ca00
At each mount, emit the transaction reservation type information via
tracepoints. This makes it easier to compare the log reservation info
calculated by the kernel and xfsprogs so that we can more easily diagnose
minimum log size failures on freshly formatted filesystems.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: eebf3cab9c5eac7fdb54fb9e9fb38c06f46f17f3
Rename xfs_dqcheck to xfs_dquot_verify and make it return an
xfs_failaddr_t like every other structure verifier function.
This enables us to check on-disk quotas in the same way that we check
everything else. Callers are now responsible for logging errors, as
XFS_QMOPT_DOWARN goes away.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: eeea79802871fef82a8ca6ab1220515855e5cdcc
Move the dquot repair code into a separate function and remove
XFS_QMOPT_DQREPAIR in favor of calling the helper directly. Remove
other dead code because quotacheck is the only caller of DQREPAIR.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: b55725974c9d3a5afcdf83daff6fba7d3f91ffca
Expose all metadata structure buffer verifier functions via buf_ops.
These will be used by the online scrub mechanism to look for problems
with buffers that are already sitting around in memory.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 8ba92d43d499f4920af983a7c16e02304dd36932
If the xattr leaf block looks corrupt, return -EFSCORRUPTED to userspace
instead of ASSERTing on debug kernels or running off the end of the
buffer on regular kernels.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 9cfb9b47479e237d217dbcfafe034cbf98f45909
Replace the current haphazard dir2 shortform verifier callsites with a
centralized verifier function that can be called either with the default
verifier functions or with a custom set. This helps us strengthen
integrity checking while providing us with flexibility for repair tools.
xfs_repair wants this to be able to supply its own verifier functions
when trying to fix possibly corrupt metadata.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: dc042c2d8ff629dd411e9a60bce9c379e2f8aaf8
Change the short form directory structure verifier function to return
the instruction pointer of a failing check or NULL if everything's ok.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 0795e004fd4f2723f3dbf09a195cd7ccf3c74c58
Create a function to check the structure of short form symlink targets.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 1e1bbd8e7ee0624034e9bf1e91ac11a7aaa2f8a6
Create a function to perform structure verification for short form
extended attributes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 71493b839e294065ba63bd6f8d07263f3afee8c6
Consolidate the fork size and format verifiers to xfs_dinode_verify so
that we can reject bad inodes earlier and in a single place.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 50aa90ef03007beca2c9108993f5b4f2bb4f0a66
Move the v3 inode integrity information (crc, owner, metauuid) before we
look at anything else in the inode so that we don't waste time on a torn
write or a totally garbled block. This makes xfs_dinode_verify more
consistent with the other verifiers.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: bc1a09b8e334bf5fca1d6727aec538dcff957961
Refactor the callers of verifiers to print the instruction address of a
failing check.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: a6a781a58befcbd467ce843af4eaca3906aa1f08
Modify each function that checks the contents of a metadata buffer to
return the instruction address of the failing test so that we can report
more precise failure errors to the log.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 31ca03c92c329525ee3a97d99c47f1ebbaed5d63
Since all verification errors also mark the buffer as having an error,
we can combine these two calls. Later we'll add a xfs_failaddr_t
parameter to promote the idea of reporting corruption errors and the
address of the failing check to enable better debugging reports.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 9101d3707b9acae8bbb0d82d47e99cf5c60b3ee5
Since __xfs_dir3_data_check verifies on-disk metadata, we can't have it
noisily blowing asserts and hanging the system on corrupt data coming in
off the disk. Instead, have it return a boolean like all the other
checker functions, and only have it noisily fail if we fail in debug
mode.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: e1e55aaf1cc646b736439cbd5af229759029ae34
Now that we have xfs_verify_agbno, use it to verify short form btree
pointers instead of open-coding them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 8368a6019d5bbb8b56c140029dcf5ea570b638f1
Create two helper functions to verify the headers of a long format
btree block. We'll use this later for the realtime rmapbt.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 59f6fec3bdb2aafc84d39f34000819d232182d71
We already have a function to verify fsb pointers, so get rid of the
last users of the (less robust) macro.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: c017cb5ddfd6326032570d5eba83308c8a9c13a9
The create transaction reservation calculation has two different
branches of code depending on whether the filesystem is a v5 format
fs or older. Each branch considers the max reservation between the
allocation case (new chunk allocation + record insert) and the
modify case (chunk exists, record modification) of inode allocation.
The modify case is the same for both superblock versions with the
exception of the finobt. The finobt helper checks the feature bit,
however, and so the modify case already shares the same code.
Now that inode chunk allocation has been refactored into a helper
that checks the superblock version to calculate the appropriate
reservation for the create transaction, the only remaining
difference between the create and icreate branches is the call to
the finobt helper. As noted above, the finobt helper is a no-op when
the feature is not enabled. Therefore, these branches are
effectively duplicate and can be condensed.
Remove the xfs_calc_create_*() branch of functions and update the
various callers to use the xfs_calc_icreate_*() variant. The latter
creates the same reservation size for v4 create transactions as the
removed branch. As such, this patch does not result in transaction
reservation changes.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 57af33e451b73f56feb428f5856cdf6e4e0c60cd
The reservation for the various forms of inode allocation is
scattered across several different functions. This includes two
variants of chunk allocation (v5 icreate transactions vs. older
create transactions) and the inode free transaction.
To clean up some of this code and clarify the purpose of specific
allocfree reservations, continue the pattern of defining helper
functions for smaller operational units of broader transactions.
Refactor the reservation into an inode chunk alloc/free helper that
considers the various conditions based on filesystem format.
An inode chunk free involves an extent free and buffer
invalidations. The latter requires reservation for log headers only.
An inode chunk allocation modifies the free space btrees and logs
the chunk on v4 supers. v5 supers initialize the inode chunk using
ordered buffers and so do not log the chunk.
As a side effect of this refactoring, add one more allocfree res to
the ifree transaction. Technically this does not serve a specific
purpose because inode chunks are freed via deferred operations and
thus occur after a transaction roll. tr_ifree has a bit of a history
of tx overruns caused by too many agfl fixups during sustained file
deletion workloads, so add this extra reservation as a form of
padding nonetheless.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: f03c78f39710995d2766236f229295d91b8de9dd
Analysis of recent reports of log reservation overruns and code
inspection has uncovered that the reservations associated with inode
operations may not cover the worst case scenarios. In particular,
many cases only include one allocfree res. for a particular
operation even though said operations may also entail AGFL fixups
and inode btree block allocations in addition to the actual inode
chunk allocation. This can easily turn into two or three block
allocations (or frees) per operation.
In theory, the only way to define the worst case reservation is to
include an allocfree res for each individual allocation in a
transaction. Since that is impractical (we can perform multiple agfl
fixups per tx and not every allocation results in a full tree
operation), we need to find a reasonable compromise that addresses
the deficiency in practice without blowing out the size of the
transactions.
Since the inode btrees are not filled by the AGFL, record insertion
and removal can directly result in block allocations and frees
depending on the shape of the tree. These allocations and frees
occur in the same transaction context as the inobt update itself,
but are separate from the allocation/free that might be required for
an inode chunk. Therefore, it makes sense to assume that an [f]inobt
insert/remove can directly result in one or more block allocations
on behalf of the tree.
Refactor the inode transaction reservations to include one allocfree
res. per inode btree modification to cover allocations required by
the tree itself. This separates the reservation required to allocate
the inode chunk from the reservation required for inobt record
insertion/removal. Apply the same logic to the finobt. This results
in killing off the finobt modify condition because we no longer
assume that the broader transaction reservation will cover finobt
block allocations and finobt shape changes can occur in either of
the inobt allocation or modify situations.
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: a606ebdb859e78beb757dfefa08001df366e2ef5
The truncate transaction does not ever modify the inode btree, but
includes an associated log reservation. Update
xfs_calc_itruncate_reservation() to remove the reservation
associated with inobt updates.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: e8341d9f6348640dff01d8c4a33695dc82bab5a3
The current AGI unlinked list addition and removal reservations do
not reflect the worst case log usage. An unlinked list removal can
log up to two on-disk inode clusters but only includes reservation
for one. An unlinked list addition logs the on-disk cluster but
includes reservation for an in-core inode.
Update the AGI unlinked list reservation helpers to calculate the
correct worst case reservation for the associated operations.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: a6f485908d5210a5662f7a031bd1deeb3867e466
The tr_ifree transaction handles inode unlinks and inode chunk
frees. The current transaction calculation does not accurately
reflect worst case changes to the inode btree, however. The inobt
portion of the current transaction reservation only covers
modification of a single inobt buffer (for the particular inode
record). This is a historical artifact from the days before XFS
supported full inode chunk removal.
When support for inode chunk removal was added in commit
254f6311ed1b ("Implement deletion of inode clusters in XFS."), the
additional log reservation required for chunk removal was not added
correctly. The new reservation only considered the header overhead
of associated buffers rather than the full contents of the btrees
and AGF and AGFL buffers affected by the transaction. The
reservation for the free space btrees was subsequently fixed up in
ITRUNCATE log reservation"), but the res. for full inobt joins has
never been added.
Further review of the ifree reservation uncovered a couple more
problems:
- The undocumented +2 blocks are intended for the AGF and AGFL, but
are also not sized correctly and should be logged as full sectors
(not FSBs).
- The additional single block header is undocumented and serves no
apparent purpose.
Update xfs_calc_ifree_reservation() to include a full inobt join in
the reservation calculation. Refactor the undocumented blocks
appropriately and fix up the comments to reflect the current
calculation.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Update all the necessary files for a 4.15.1 release.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Polish translation update for xfsprogs 4.15.0
Signed-off-by: Jakub Bogusz <qboosh@pld-linux.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
If we're upgrading a systemd-enabled chroot we'll fail because systemctl
can't connect to a running systemd (nor should it). We don't need to
issue daemon-reload inside a chroot that doesn't have a running systemd,
so we can ignore the return value.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Nathan Scott <nathans@debian.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Since the configure scripts now depend on pkg-config to autodetect where
systemd service files go, we need to list pkg-config as a build
dependency.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Update all the necessary files for a 4.15.0 release.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-off-by: <nathans@debian.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-off-by: Nathan Scott <nathans@debian.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-off-by: Nathan Scott <nathans@debian.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-off-by: Nathan Scott <nathans@debian.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Switch to Debian packaging features available in more
recent years to resolve some long-standing issues.
In particular, using the quilt format gives non-native
package builds finally, while keeping the ability for
developers to do upstream deb builds. Also split the
binary-arch and binary-indep debian/rules targets as
is now mandated, and update to latest standard version.
Mark a bunch of long-resolved bugs as fixed in the deb
changelog so they are automatically closed by the next
update.
Signed-off-by: Nathan Scott <nathans@debian.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Update all the necessary files for a 4.15.0-rc1 release.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Move all the printing of the scrub outcome into a separate helper to
declutter the main function.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[sandeen: put "Unmount ..." on its own line]
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Make sure we initialize the overall phase state before we start
executing any code that can end up in the report-status-and-exit paths.
Otherwise if debugging is turned on we get garbage io/cpu stat reports.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Some of the warning messages are actually runtime errors in optional
components, so turn them into informational messages.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
If the program encounters runtime errors, these should be noted as
information. Because these errors abort the execution flow (which is
counted as a runtime error), we need only call str_info to log the
event.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
If the kernel doesn't have the SCRUB_METADATA ioctl that's a runtime
error, not a fs error. Account it as such.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
While it's true that the kernel can tell us whether something needs
repairs or it needs optimizing, from the admin's perspective there's
no point in having an optimize-only mode -- either fix everything, or
don't. This is what xfs_repair does w.r.t. -n, so let's do the same
thing too.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Today, xfs_mdrestore from stdin will fail if the -i flag is
specified, because it attempts to rewind the stream after
the initial read of the metablock. This fails, and
results in an abort with "specified file is not a metadata
dump."
Read the metablock exactly once in main(), validate the magic,
print informational flags if requested, and then pass it to
perform_restore() which will then continue the restore process.
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Marco A Benatto <marco.antonio.780@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Currently we are missing -i option from usage().
This patch adds it to this biult-in help.
Signed-off-by: Marco A Benatto <marco.antonio.780@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
glibc 2.27 has a copy_file_range wrapper, so we need to change our
internal function out of the way to avoid compiler warnings.
Reported-by: fredrik@crux.nu
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
We can't reach the return mess at the bottom of __xfs_scrub_test so get
rid of it.
Fixes-coverity-id: 1428798
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
If we don't get a directory pointer, close dir_fd before jumping out.
Fixes-coverity-id: 1428799
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
We don't support reflink on the realtime device, so don't let people
create such things.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
If xfs_scrub is run today against a 4.15 kernel, it fails with
EXPERIMENTAL xfs_scrub program in use! Use at your own risk!
Error: /home: Kernel metadata optimization facility is required.
Info: /home: Scrub aborted after phase 1.
/home: 2 errors found.
Be a bit kinder to the user and suggest a path forward. By the
time we fail for missing preen or repair functionality, we do
know that scrub is available, so suggest it.
Further, rather than stating what is required, state what was not
found ... we're failing, so state what was missing, vs. what is
required - seems a bit more definitive.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Create the mechanism we need to actually call the kernel's online repair
functionality. The interface will consume a repair description; the
descriptor management will follow in the next patch.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
libreadline5-dev hasn't existed as a package for quite some time now;
even Debian "oldoldstable" doesn't know what that is. Drop it in favor
of libreadline-gplv2-dev.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Create a systemd service unit so that we can run the online scrubber
under systemd with (somewhat) appropriate containment.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Create an xfs_scrub_all command to find all XFS filesystems
and run an online scrub against them all.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Implement a progress indicator.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
If the filesystem scan comes out clean or fixes all the problems, call
fstrim to clean out the free areas (if it's an ssd/thinp/whatever).
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Make sure the filesystem summary counters are somewhat close to what
we can find by scanning the filesystem.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
If we sense that we're talking to a raw SCSI disk, use the SCSI READ
VERIFY command to ask the disk to verify a disk internally. This can
sharply reduce the runtime of the data block verification phase on
devices whose internal bandwidth exceeds their link bandwidth.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Read all data blocks from the disk, hoping to catch IO errors.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Manage the scheduling, issuance, and reporting of data block
verification reads. This enables us to combine adjacent (or nearly
adjacent) read requests, and to take advantage of high-IOPS devices by
issuing IO from multiple threads.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Create an efficient tree-based bitmap data structure. We will use this
during the data block scan to record the LBAs of IO errors so that we
can report broken files to userspace.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Iterate all directory and xattr names to look for name collisions
amongst Unicode normalized names. This is generally a sign of buggy
programs or malicious duplicate files.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Look for control characters and punctuation that interfere with shell
globbing in directory entry names and extended attribute key names.
Technically these aren't filesystem corruptions because names are
arbitrary sequences of bytes, but they've been known to cause problems
in the Unix environment so warn if we see them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Opening directories by file handle will cause the kernel to perform
parent lookups all the way to the root directory. Take advantage of
this to ensure that directories actually connect to the root. Some
day we'll have parent pointers and can make this more comprehensive.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Scan all the inodes in the system for problems.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Create a threaded stats counter that we'll use to track scan progress.
This includes things like how much of the disk blocks we've scanned,
or later how much progress we've made in each phase.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Scrub the filesystem and per-AG metadata.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Create some wrappers to call the scrub ioctls.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Add a couple of helper functions to estimate the inode and block
counters on the filesystem.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
These helpers enable userspace to iterate all the space map information
for a file. The iteration function uses GETBMAPX.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
These helpers enable userspace to iterate all the space map information
in a filesystem. The iteration function uses GETFSMAP.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
These helpers enable userspace to count or iterate all inodes in a
filesystem. The counting function uses INUMBERS, while the inode
iterator uses INUMBERS and BULKSTAT to iterate over every inode that
should be in the filesystem.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Discover the geometry of the XFS filesystem that we've been told to
scan, and set up some common functions that will be used by the
scrub phases.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Create an abstraction to handle all of our low level disk operations.
We'll eventually use it to bind to a fs mount point and block device.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Create the plumbing to figure out how many threads we're going to want
to do all of our scrubbing.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Create the dispatching routines that we'll use to call out to each
separate phase of the program.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Parse command line options in order to set up the context in which we
will scrub the filesystem.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Standardize how we record and report errors.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Create the foundations of a filesystem scrubbing tool that asks the
kernel to inspect all metadata in the filesystem and (ultimately) to
repair anything that's broken. Also create the man page for the
utility.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
This fixes 2 issues with stripe geometry validation.
# mkfs.xfs -d sunit=64,swidth=0 ...
both data sunit and data swidth options must be specified
But I did specify it, I specified 0!
So use cli_opt_set() to detect that it was specified.
But we can't allow the above configuration (in fact it causes
a % 0 later in mkfs), so catch it in the "swidth must be a
multiple of sunit" test a bit further down.
(sunit=0,swidth=0 /is/ valid, it's used to override disk
geometry if desired.)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Always explain why invalid numeric inputs are not valid, and in a
complete sentence since that's what illegal_optio() sets us up for.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Specifying invalid inputs to mkfs does not break any (reasonable) laws,
so the error message should complain about invalid inputs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
I ran mkfs.xfs -d su=1048576,sw=$((18 * 1048576)), forgetting that sw
takes a multiple of su (unlike swidth which takes any space unit). I
was surprised when we hit a floating point exception, which I traced
back to an integer overflow when we calculate swidth from dsw.
So, do the 64-bit multiplication so we can detect the overflow and
complain about it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Fix a few things the undefined behavior sanitizer complained about.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
The Linux kernel treats core.*time.sec as a signed integer value, so
xfs_db should do likewise, or else files will have inconsistent times
if the seconds count is negative.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Since oss.sgi.com is dead and xfs.org is slowly migrating to
xfs.wiki.kernel.org, update all the documentation links to point to the
current landing pads.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Not sure how this was missed for so long, but to handle CRC
filesystems, this ASSERT on block magic must accept CRC magic
as well.
Reported-by: Radek Burkat <radek@pinkbike.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Line up flags2/cowextsize line with all the others, using
tabs.
Fixes: 1fe708d60 ("xfs_logprint: support cowextsize reporting in log contents")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
In addition to more closely matching the kernel, this will
help us catch any leaks from these allocations.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
In addition to more closely matching the kernel, this will
help us catch any leaks from these allocations.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
The buf_fsprivate3 field has no actual use, other than a pointless
"if it's not set, set it" in xfs_buf_item_init; nobody cares after
that. Remove it.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit bf9d9013a2a559858efb590bf922377be9d6d969
Replace the typeless b_fspriv2 and the ugly macros around it with a properly
typed transaction pointer. As a fallout the log buffer state debug checks
are also removed. We could have kept them using casts, but as they do
not have a real purpose we can as well just remove them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit adadbeefb34f755a3477da51035eeeec2c1fde38
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
One of the grep patterns in find-api-violations is mistaken for a
(broken) range specifier when LC_ALL=C, so fix it to work properly.
This was found by wiring up the script to xfstests.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Call libxfs_ functions, not xfs_ functions.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
If I run the following command:
xfs_db /dev/sdf -x -c 'agf 0' -c 'addr refcntroot' -c 'addr ptrs[1]\'
it errors out with "bad character in field \" and
then ftok_free crashes on an invalid free() because picking up the
previous token (the closing bracket) xrealloc'd the token array to be 5
elements long but never set the last element's tok pointer.
Consequently the ftok_free tries to free whatever garbage pointer is in
that last element and kaboom.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
[sandeen: slightly clarify commit log]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Teach xfs_check to record cow staging extents correctly. This means that
we strip off the high bit before using startblock.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Set fdhash_head to zero once we've destroyed the handle list to avoid
dangling pointer problems.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Add a new 'log_writes' command to xfs_io so that we can add dm-log-writes
log marks. It's helpful to allow users of xfs_io to adds these marks from
within xfs_io instead of waiting until after xfs_io exits because then they
are able to replay the dm-log-writes log up to immediately after another
xfs_io operation such as mwrite. This isolates the log replay from other
operations that happen as part of xfs_io exiting (file handles being
closed, mmaps being torn down, etc.). This also allows users to insert
multiple marks between different xfs_io commands.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Add support for a new -S flag to xfs_io's mmap command. This opens the
mapping with the (MAP_SYNC | MAP_SHARED_VALIDATE) flags instead of the
standard MAP_SHARED flag.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Todauy this works, with last-parsed-wins semantics:
mkfs.xfs -f -l logdev=/dev/sda1,name=/dev/sda2 /dev/sda3
Disallow it to avoid ambiguity.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Parsing did this sort of thing:
case D_AGCOUNT:
cli->agcount = getnum(value, opts, D_AGCOUNT);
which was just begging for a cut and paste error between the
case value and the enum passed into getnum/getstr. Pass
"subopt" instead so that it is always consistent with the case.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Remove logarithm-based options from usage() and manpage.
Fixes: 70f72d5 "mkfs: remove logarithm based CLI options"
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Very few people use the log2 based size options for various mkfs
parameters and they just clutter up the code. Get rid of them.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Now we have a two dimensional conflict array, convert the sector
size CLI option conflict determination to use it. To get the error
specification just right, we also need to tweak how we store
and validate the sector size CLI parameter state in the options
table.
Old:
$ mkfs.xfs -N -s size=4k -d sectsize=512 /dev/pmem0
Cannot specify both -d sectsize and -d sectlog
.....
New:
$ mkfs.xfs -N -s size=4k -d sectsize=512 /dev/pmem0
Cannot specify both -s size and -d sectsize
.....
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Replace the nasty #define + implicit array index definitions with
pre-declared enums and index specific name array declarations.
This cleans up the code quite a bit and the pre-declaration of the
enums allows tables to use indexes from other tables in things like
conflict specifications.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Currently the conflict table is a single dimension, allowing
conflicts to be specified in the same option table. however, we
have conflicts that span option tables (e.g. sector size) and
so we need to encode both the table and the option that conflicts.
Add support for a two dimensional conflict definition and convert
all the code over to use it.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
mkfs jumps through hoops to call libxfs_log_calc_minimum_size() to
set the minimum log size. We already have a xfs_mount at this point,
we just need to set the superblock up slightly earlier and then mkfs
can call libxfs_log_calc_minimum_size() directly. This means we can
remove mkfs/maxtrres.c completely.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Rather than hard coding the global table variable into the
parsing functions.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
sb_feat was a weird mishmash of hardcoded defaults and macros
(which were used in only this place).
Make it consistent by removing the use-once macros, and remove
the unused XFS_DFL_LOG_SIZE while we're in here.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Some of these are more self-explanatory than others ("nci?").
Just mention the bit each one controls, and put them in order
while we're at it. There are more comments about each bit
near its definition.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
It's a bit nuts that we have a projid32bit mkfs option, but
we carry around the inverse of its value in "projid16bit" -
just flip it around for sanity.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Until we have more than one possible source of configuration,
there is no need to emit the only possibility and clutter
the output. We can decide how it should all look when we
get more than one source.
Apply some i18n to the config description, though.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
|
|
Source kernel commit: 68c58e9b9a88c1a9d0c2eaf6c7acefb00f5fbbfb
For rmap removal, refactor the rmap owner checks into a separate
function, then skip the checks if we are performing an unknown-owner
removal.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 33df3a9cf925183a6a169bc3eff2bd0febd1298a
Calling xfs_rmap_free with an unknown owner is supposed to remove any
rmaps covering that range regardless of owner. This is used by the EFI
recovery code to say "we're freeing this, it mustn't be owned by
anything anymore", but for whatever reason xfs_free_ag_extent filters
them out.
Therefore, remove the filter and make xfs_rmap_unmap actually treat it
as a wildcard owner -- free anything that's already there, and if
there's no owner at all then that's fine too.
There are two existing callers of bmap_add_free that take care the rmap
deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap
cleanup; convert these to use OWN_NULL (via helpers), and now we really
require that an RUI (if any) gets added to the defer ops before any EFI.
Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests,
growfs will have to consult directly with the rmap to ensure that there
aren't any rmaps in the grown region.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
order
Source kernel commit: 0525e952dcceb9fc947c6d395de7f72220c7d081
Under the deferred rmap operation scheme, there's a certain order in
which the rmap deferred ops have to be queued to maintain integrity
during log replay. For alloc/map operations that order is cui -> rui;
for free/unmap operations that order is cui -> rui -> efi. However, the
initial refcount code got the ordering wrong in the free side of things
because it queued refcount free op and an EFI and the refcount free op
queued a rmap free op, resulting in the order cui -> efi -> rui.
If we fail before the efd finishes, the efi recovery will try to do a
wildcard rmap removal and the subsequent rui will fail to find the rmap
and blow up. This didn't ever happen due to other screws up in handling
unknown owner rmap removals, but those other screw ups broke recovery in
other ways, so fix the ordering to follow the intended rules.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: c54854a437a447a6bb1dcb11f60dd01cef3fa597
Move the tracepoint in xfs_iext_insert to after the point where we've
inserted the extent because otherwise we report stale extent data in
the ftrace output.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 8c57b88637d78a723e0854fc3d06c6d4c31a1e0c
In e1a4e37cc7b665 ("xfs: try to avoid blowing out the transaction
reservation when bunmaping a shared extent"), we try to constrain the
amount of real extents we unmap from the data fork in a given call so
that we don't blow out transaction reservations.
However, not all bunmapi operations require a transaction -- if we're
only removing a delalloc extent, no transaction is needed, so we have to
code against that.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
of an attribute
Source kernel commit: 6e643cd094de3bd0f97edcc1db0089afa24d909f
The new attribute leaf buffer is not held locked across the transaction
roll between the shortform->leaf modification and the addition of the
new entry. As a result, the attribute buffer modification being made is
not atomic from an operational perspective. Hence the AIL push can grab
it in the transient state of "just created" after the initial
transaction is rolled, because the buffer has been released. This leads
to xfs_attr3_leaf_verify() asserting that hdr.count is zero, treating
this as in-memory corruption, and shutting down the filesystem.
Darrick ported the original patch to 4.15 and reworked it use the
xfs_defer_bjoin helper and hold/join the buffer correctly across the
second transaction roll.
Signed-off-by: Alex Lyakas <alex@zadarastorage.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: b7b2846fe26f2c0d7f317c874a13d3ecf22670ff
In certain cases, defer_ops callers will lock a buffer and want to hold
the lock across transaction rolls. Similar to ijoined inodes, we want
to dirty & join the buffer with each transaction roll in defer_finish so
that afterwards the caller still owns the buffer lock and we haven't
inadvertently pinned the log.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: 9684010d38eccda733b61106765e9357cf436f65
xfs_trans_log_buf() is responsible for logging the dirty segments of
a buffer along with setting all of the necessary state on the
transaction, buffer, bli, etc., to ensure that the associated items
are marked as dirty and prepared for I/O. We have a couple use cases
that need to to dirty a buffer in a transaction without actually
logging dirty ranges of the buffer. One existing use case is
ordered buffers, which are currently logged with arbitrary ranges to
accomplish this even though the content of ordered buffers is never
written to the log. Another pending use case is to relog an already
dirty buffer across rolled transactions within the deferred
operations infrastructure. This is required to prevent a held
(XFS_BLI_HOLD) buffer from pinning the tail of the log.
Refactor xfs_trans_log_buf() into a new function that contains all
of the logic responsible to dirty the transaction, lidp, buffer and
bli. This new function can be used in the future for the use cases
outlined above. This patch does not introduce functional changes.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: f59cf5c29919d17b61913c3360a7bd29b72975c1
If we create a new file we will need an inode, and usually some metadata
in the parent direction. Aiming for everything to go well despite the
lack of a reservation leads to dirty transactions cancelled under a heavy
create/delete load. This patch removes those nospace transactions, which
will lead to slightly earlier ENOSPC on some workloads, but instead
prevent file system shutdowns due to cancelling dirty transactions for
others.
A customer could observe assertations failures and shutdowns due to
cancelation of dirty transactions during heavy NFS workloads as shown
below:
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728125] XFS: Assertion failed: error != -ENOSPC, file: fs/xfs/xfs_inode.c, line: 1262
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728222] Call Trace:
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728246] [<ffffffff81795daf>] dump_stack+0x63/0x81
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728262] [<ffffffff810a1a5a>] warn_slowpath_common+0x8a/0xc0
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728264] [<ffffffff810a1b8a>] warn_slowpath_null+0x1a/0x20
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728285] [<ffffffffa01bf403>] asswarn+0x33/0x40 [xfs]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728308] [<ffffffffa01bb07e>] xfs_create+0x7be/0x7d0 [xfs]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728329] [<ffffffffa01b6ffb>] xfs_generic_create+0x1fb/0x2e0 [xfs]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728348] [<ffffffffa01b7114>] xfs_vn_mknod+0x14/0x20 [xfs]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728366] [<ffffffffa01b7153>] xfs_vn_create+0x13/0x20 [xfs]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728380] [<ffffffff81231de5>] vfs_create+0xd5/0x140
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728390] [<ffffffffa045ddb9>] do_nfsd_create+0x499/0x610 [nfsd]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728396] [<ffffffffa0465fa5>] nfsd3_proc_create+0x135/0x210 [nfsd]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728401] [<ffffffffa04561e3>] nfsd_dispatch+0xc3/0x210 [nfsd]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728416] [<ffffffffa03bfa43>] svc_process_common+0x453/0x6f0 [sunrpc]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728423] [<ffffffffa03bfdf3>] svc_process+0x113/0x1f0 [sunrpc]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728427] [<ffffffffa0455bcf>] nfsd+0x10f/0x180 [nfsd]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728432] [<ffffffffa0455ac0>] ? nfsd_destroy+0x80/0x80 [nfsd]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728438] [<ffffffff810c0d58>] kthread+0xd8/0xf0
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728441] [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728451] [<ffffffff8179d962>] ret_from_fork+0x42/0x70
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728453] [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728454] ---[ end trace f9822c842fec81d4 ]---
2017-05-30 21:17:06 kernel: ALERT: [ 2670.728477] XFS (sdb): Internal error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x4ee/0x7d0 [xfs]
2017-05-30 21:17:06 kernel: ALERT: [ 2670.728684] XFS (sdb): Corruption of in-memory data detected. Shutting down filesystem
2017-05-30 21:17:06 kernel: ALERT: [ 2670.728685] XFS (sdb): Please umount the filesystem and rectify the problem(s)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Source kernel commit: d41c6172bd4031979eab722c265a2e5764383c3c
Use _GOTO instead of _RETURN so we can free the allocated
cursor on error.
Fixes: bf80628 ("xfs: remove xfs_bmse_shift_one")
Fixes-coverity-id: 1423813, 1423676
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Prevent libxfs_nproc from returning a negative/zero CPU count if
platform_nproc happens to error out.
Fixes-coverity-id: 1425909
Fixes-coverity-id: 1425910
Fixes-coverity-id: 1425913
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Put all the internal macro definitions at the top of the file so
they can be easily found and remove teh ones that aren't used
anymore.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Move all the error and helper functions like conflict() to the top
of the file so we can get rid of the forward declarations.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Now the factoring is complete, we can remove the remaining temporary
code that was used to isolate the factoring from the rest of the
code.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
The data and log stripe calculations a spaghettied all over the mkfs
code. This patch pulls all of the different chunks of code together
into calc_stripe_factors() and removes all the redundant/repeated
checks and calculations that are made.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Hiding references to variables inside macros instead of passing them
as parameters is just plain nasty. Fix it before going any further.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
This also adds a check that disallows changing the sector size on
internal logs.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Start factoring all the sector size validation code into a
function that takes cli, dft and cfg structures. This starts
removing option flags and some of the temporary code in the input
parsing structures.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Names like "qflag" and "Nflag" don't tell us what the flag is
supposed to do, so rename them to be obvious in function.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
mkfs has lots of options that require default values. Some of these
are centralised, but others aren't. Introduce a new structure
designed to hold default values for all the parameters that need
defaults in one place.
This structure also provides a mechanism for providing mkfs defaults
from a config file. This is not implemented in this series, but a
comment is left where it is expected this functionality will hook
in.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
There are some slight changes to the way log alignment is calculated
in the change. Instead of using a flag, it checks the log start
block to see if it's different to the first free block in the log
AG, and if it is different then does the aligned setup. This means
we no longer have to care if the log is aligned or not, the code
will do the right thing in all cases.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Prior to formating the device(s), we have to take several steps to
prepare them and check that they are appropriate for the formatting
that is about to take place. Pull all this into a single function
that is run before mounting the libxfs infrastructure.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Formatting the on disk XFS structures requires a certain set of
validated and calculated parameters. By the time we start writing
information to disk this has all been done. Abstract this information
out into a separate structures and initialise it with all the
calculated parameters so we can factor the mkfs formatting code
to use it.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Abstract out the common subopt parsing code into a common function
and type table so we can factor the parsing code. Add the function
stubs in preparation for factoring.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
We need to hold the values set from command line options so they can
later be validated and discriminated from the default values that
might be set. This structure will form a connector between the input
parsing and the rest of the mkfs code.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Use const for all the tables to remove most of the (char **) casts.
This adds a couple of temporary (const char **) casts that go away
as the input parsing is factored.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
If the log is on the data device (i.e. internal) then it should
match the sector size the data device is using. If they don't match,
then one or the other doesn't have atomic sector writes and we could
have crash consistency problems. Not to mention that it's simply
wrong to have two different sector sizes for the same device.
Hence enforce the requirement that an internal log device always has
the same sector size as the data device.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Several data structures are missing padding fields from their field
definitions. Add them so that they can be printed out if explicitly
requested.
Fix the AGI field order to be consistent with the structure
definition while we're at it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: make them available but not printed by default]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
We are very inconsistent about how we print padding fields in on-disk
structures -- sometimes we hide it from printall, sometimes we deviate
from unsigned hex values, etc. Make this all consistent -- always hide
padding values when printing the whole structure, always print them as
unsigned hex integers when explicitly requested.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: switch to never-print instead of always-print]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Now that we've made a generic workqueue in libfrog, we can remove the
implementation in xfs_repair and turn the old functions into wrappers
that call do_error if they fail. There are no functional changes in
this patch, though some of the names and types have changed.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Make it so that we can tear down the file descriptor hash table.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Add a function to tear down the fs_table when we're done
messing with paths.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Move the fs_table code into libfrog since it's not really a command.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
|
Move all the conversion functions out of libxcmd since they'll be used
by scrub, which doesn't have a commandline.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|