linux

mirror of https://github.com/torvalds/linux synced 2024-09-29 07:50:45 +00:00

Author	SHA1	Message	Date
Darrick J. Wong	9cfb9b4747	xfs: provide a centralized method for verifying inline fork data Replace the current haphazard dir2 shortform verifier callsites with a centralized verifier function that can be called either with the default verifier functions or with a custom set. This helps us strengthen integrity checking while providing us with flexibility for repair tools. xfs_repair wants this to be able to supply its own verifier functions when trying to fix possibly corrupt metadata. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:54:47 -08:00
Darrick J. Wong	dc042c2d8f	xfs: refactor short form directory structure verifier function Change the short form directory structure verifier function to return the instruction pointer of a failing check or NULL if everything's ok. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:54:46 -08:00
Darrick J. Wong	0795e004fd	xfs: create structure verifier function for short form symlinks Create a function to check the structure of short form symlink targets. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:54:46 -08:00
Darrick J. Wong	1e1bbd8e7e	xfs: create structure verifier function for shortform xattrs Create a function to perform structure verification for short form extended attributes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:54:46 -08:00
Darrick J. Wong	71493b839e	xfs: move inode fork verifiers to xfs_dinode_verify Consolidate the fork size and format verifiers to xfs_dinode_verify so that we can reject bad inodes earlier and in a single place. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:54:46 -08:00
Darrick J. Wong	50aa90ef03	xfs: verify dinode header first Move the v3 inode integrity information (crc, owner, metauuid) before we look at anything else in the inode so that we don't waste time on a torn write or a totally garbled block. This makes xfs_dinode_verify more consistent with the other verifiers. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:54:46 -08:00
Darrick J. Wong	bc1a09b8e3	xfs: refactor verifier callers to print address of failing check Refactor the callers of verifiers to print the instruction address of a failing check. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:54:46 -08:00
Darrick J. Wong	a6a781a58b	xfs: have buffer verifier functions report failing address Modify each function that checks the contents of a metadata buffer to return the instruction address of the failing test so that we can report more precise failure errors to the log. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:54:46 -08:00
Darrick J. Wong	31ca03c92c	xfs: refactor xfs_verifier_error and xfs_buf_ioerror Since all verification errors also mark the buffer as having an error, we can combine these two calls. Later we'll add a xfs_failaddr_t parameter to promote the idea of reporting corruption errors and the address of the failing check to enable better debugging reports. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:54:45 -08:00
Darrick J. Wong	9101d3707b	xfs: remove XFS_WANT_CORRUPTED_RETURN from dir3 data verifiers Since __xfs_dir3_data_check verifies on-disk metadata, we can't have it noisily blowing asserts and hanging the system on corrupt data coming in off the disk. Instead, have it return a boolean like all the other checker functions, and only have it noisily fail if we fail in debug mode. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:54:45 -08:00
Darrick J. Wong	e1e55aaf1c	xfs: refactor short form btree pointer verification Now that we have xfs_verify_agbno, use it to verify short form btree pointers instead of open-coding them. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:54:45 -08:00
Darrick J. Wong	8368a6019d	xfs: refactor long-format btree header verification routines Create two helper functions to verify the headers of a long format btree block. We'll use this later for the realtime rmapbt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:54:45 -08:00
Darrick J. Wong	59f6fec3bd	xfs: remove XFS_FSB_SANITY_CHECK We already have a function to verify fsb pointers, so get rid of the last users of the (less robust) macro. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:54:45 -08:00
Darrick J. Wong	d658e72b4a	xfs: distinguish between corrupt inode and invalid inum in xfs_scrub_get_inode In xfs_scrub_get_inode, we don't do a good enough job distinguishing EINVAL returns from xfs_iget w/ IGET_UNTRUSTED -- this can happen if the passed in inode number is invalid (past eofs, inobt says it isn't an inode) or if the inum is actually valid but the inode buffer fails verifier. In the first case we still want to return ENOENT, but in the second case we want to capture the corruption error. Therefore, if xfs_iget returns EINVAL, try the raw imap lookup. If that succeeds, we conclude it's a corruption error, otherwise we just bounce out to userspace. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:49:04 -08:00
Darrick J. Wong	1ad1205e71	xfs: always grab transaction when scrubbing inode Always allocate a transaction for inode scrubbing, even if the _iget fails. This is something that is nice to have now for consistency with the other scrubbers but will become critical when we get to online repair where we'll actually use the transaction + raw buffer read to fix the verifier errors. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:49:03 -08:00
Darrick J. Wong	2b9e9b5771	xfs: xfs_scrub_bmap should use for_each_xfs_iext Refactor xfs_scrub_bmap to use for_each_xfs_iext now that it exists. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:49:03 -08:00
Darrick J. Wong	e5b37faa93	xfs: catch a few more error codes when scrubbing secondary sb The superblock validation routines return a variety of error codes to reject a mount request. For scrub we can assume that the mount succeeded, so if we see these things appear when scrubbing secondary sb X, we can treat them all like corruption. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:49:02 -08:00
Darrick J. Wong	5a0f433745	xfs: ignore agfl read errors when not scrubbing agfl In xfs_scrub_ag_read_headers, if we're not scrubbing the AGFL but hit a read error reading the AGFL, we should reset the error code so that it doesn't propagate up into the caller. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:49:02 -08:00
Brian Foster	c017cb5ddf	xfs: eliminate duplicate icreate tx reservation functions The create transaction reservation calculation has two different branches of code depending on whether the filesystem is a v5 format fs or older. Each branch considers the max reservation between the allocation case (new chunk allocation + record insert) and the modify case (chunk exists, record modification) of inode allocation. The modify case is the same for both superblock versions with the exception of the finobt. The finobt helper checks the feature bit, however, and so the modify case already shares the same code. Now that inode chunk allocation has been refactored into a helper that checks the superblock version to calculate the appropriate reservation for the create transaction, the only remaining difference between the create and icreate branches is the call to the finobt helper. As noted above, the finobt helper is a no-op when the feature is not enabled. Therefore, these branches are effectively duplicate and can be condensed. Remove the xfs_calc_create_() branch of functions and update the various callers to use the xfs_calc_icreate_() variant. The latter creates the same reservation size for v4 create transactions as the removed branch. As such, this patch does not result in transaction reservation changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-01-08 10:41:38 -08:00
Brian Foster	57af33e451	xfs: refactor inode chunk alloc/free tx reservation The reservation for the various forms of inode allocation is scattered across several different functions. This includes two variants of chunk allocation (v5 icreate transactions vs. older create transactions) and the inode free transaction. To clean up some of this code and clarify the purpose of specific allocfree reservations, continue the pattern of defining helper functions for smaller operational units of broader transactions. Refactor the reservation into an inode chunk alloc/free helper that considers the various conditions based on filesystem format. An inode chunk free involves an extent free and buffer invalidations. The latter requires reservation for log headers only. An inode chunk allocation modifies the free space btrees and logs the chunk on v4 supers. v5 supers initialize the inode chunk using ordered buffers and so do not log the chunk. As a side effect of this refactoring, add one more allocfree res to the ifree transaction. Technically this does not serve a specific purpose because inode chunks are freed via deferred operations and thus occur after a transaction roll. tr_ifree has a bit of a history of tx overruns caused by too many agfl fixups during sustained file deletion workloads, so add this extra reservation as a form of padding nonetheless. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-01-08 10:41:38 -08:00
Brian Foster	f03c78f397	xfs: include an allocfree res for inobt modifications Analysis of recent reports of log reservation overruns and code inspection has uncovered that the reservations associated with inode operations may not cover the worst case scenarios. In particular, many cases only include one allocfree res. for a particular operation even though said operations may also entail AGFL fixups and inode btree block allocations in addition to the actual inode chunk allocation. This can easily turn into two or three block allocations (or frees) per operation. In theory, the only way to define the worst case reservation is to include an allocfree res for each individual allocation in a transaction. Since that is impractical (we can perform multiple agfl fixups per tx and not every allocation results in a full tree operation), we need to find a reasonable compromise that addresses the deficiency in practice without blowing out the size of the transactions. Since the inode btrees are not filled by the AGFL, record insertion and removal can directly result in block allocations and frees depending on the shape of the tree. These allocations and frees occur in the same transaction context as the inobt update itself, but are separate from the allocation/free that might be required for an inode chunk. Therefore, it makes sense to assume that an [f]inobt insert/remove can directly result in one or more block allocations on behalf of the tree. Refactor the inode transaction reservations to include one allocfree res. per inode btree modification to cover allocations required by the tree itself. This separates the reservation required to allocate the inode chunk from the reservation required for inobt record insertion/removal. Apply the same logic to the finobt. This results in killing off the finobt modify condition because we no longer assume that the broader transaction reservation will cover finobt block allocations and finobt shape changes can occur in either of the inobt allocation or modify situations. Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-01-08 10:41:37 -08:00
Brian Foster	a606ebdb85	xfs: truncate transaction does not modify the inobt The truncate transaction does not ever modify the inode btree, but includes an associated log reservation. Update xfs_calc_itruncate_reservation() to remove the reservation associated with inobt updates. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-01-08 10:41:37 -08:00
Brian Foster	e8341d9f63	xfs: fix up agi unlinked list reservations The current AGI unlinked list addition and removal reservations do not reflect the worst case log usage. An unlinked list removal can log up to two on-disk inode clusters but only includes reservation for one. An unlinked list addition logs the on-disk cluster but includes reservation for an in-core inode. Update the AGI unlinked list reservation helpers to calculate the correct worst case reservation for the associated operations. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-01-08 10:41:36 -08:00
Brian Foster	a6f485908d	xfs: include inobt buffers in ifree tx log reservation The tr_ifree transaction handles inode unlinks and inode chunk frees. The current transaction calculation does not accurately reflect worst case changes to the inode btree, however. The inobt portion of the current transaction reservation only covers modification of a single inobt buffer (for the particular inode record). This is a historical artifact from the days before XFS supported full inode chunk removal. When support for inode chunk removal was added in commit 254f6311ed1b ("Implement deletion of inode clusters in XFS."), the additional log reservation required for chunk removal was not added correctly. The new reservation only considered the header overhead of associated buffers rather than the full contents of the btrees and AGF and AGFL buffers affected by the transaction. The reservation for the free space btrees was subsequently fixed up in commit 5fe6abb82f76 ("Add space for inode and allocation btrees to ITRUNCATE log reservation"), but the res. for full inobt joins has never been added. Further review of the ifree reservation uncovered a couple more problems: - The undocumented +2 blocks are intended for the AGF and AGFL, but are also not sized correctly and should be logged as full sectors (not FSBs). - The additional single block header is undocumented and serves no apparent purpose. Update xfs_calc_ifree_reservation() to include a full inobt join in the reservation calculation. Refactor the undocumented blocks appropriately and fix up the comments to reflect the current calculation. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-01-08 10:41:36 -08:00
Brian Foster	2c8f626539	xfs: print transaction log reservation on overrun The transaction dump code displays the content and reservation consumption of a particular transaction in the event of an overrun. It currently displays the reservation associated with the transaction ticket, but not the original reservation attached to the transaction. The latter value reflects the original transaction reservation calculation before additional reservation overhead is assigned, such as for the CIL context header and potential split region headers. Update xlog_print_trans() to also print the original transaction reservation in the event of overrun. This provides a reference point to identify how much reservation overhead was added to a particular ticket by xfs_log_calc_unit_res(). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-01-08 10:41:35 -08:00
Darrick J. Wong	29c1c123a3	xfs: scrub inode nsec fields Check that the nanosecond fields in each timestamp aren't larger than a billion. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2018-01-08 10:41:35 -08:00
Eric Sandeen	8e63083762	xfs: move all scrub input checking to xfs_scrub_validate There were ad-hoc checks for some scrub types but not others; mark each scrub type with ... it's type, and use that to validate the allowed and/or required input fields. Moving these checks out of xfs_scrub_setup_ag_header makes it a thin wrapper, so unwrap it in the process. Signed-off-by: Eric Sandeen <sandeen@redhat.com> [darrick: add xfs_ prefix to enum, check scrub args after checking type] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-01-08 10:41:34 -08:00
Eric Sandeen	0a085ddf0e	xfs: factor out scrub input checking Do this before adding more core checks. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-01-08 10:41:34 -08:00
Eric Sandeen	bfb3e9b926	xfs: explicitly initialize meta_scrub_ops array by type An implicit mapping to type by order of initialization seems error-prone, and doesn't lend itself to cscope-ing. Also add sanity checks about size of array vs. max types, and a defensive check that ->scrub exists before using it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-01-08 10:41:33 -08:00
Richard Wareing	a015831596	xfs: Show realtime device stats on statfs calls if realtime flags set - Reports realtime device free blocks in statfs calls if (realtime) inheritance bit is set on the inode of directory, or realtime flag in the case of files. This is a bit more intuitive, especially for use-cases which are using a much larger device for the realtime device. - Add XFS_IS_REALTIME_MOUNT option to gate based on the existence of a realtime device on the mount, similar to the XFS_IS_REALTIME_INODE option. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Richard Wareing <rwareing@fb.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-01-08 10:41:33 -08:00
Linus Torvalds	12e971b652	Changes since last update: - Fix resource cleanup of failed quota initialization - Fix integer overflow problems wrt s_maxbytes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCgAGBQJaS8yUAAoJEPh/dxk0SrTrrNMP+gLCitWenObhf6uA0Aysb3Vr EnhNFaqZA7RRLbQRwLESblvhExp9WTrtFmWOAFh1Q0ETBEIazIGkXfKDeOChxaCY LMPb83vQarZoV++HoiBeFbShf39dFw2ufGHyveZwvxk4kgYgQRFzIVZbRTg7CA/C nMLPZ9IBDBhEwnCVpH+gKJMcU6j5I9IIePwaEIKnB0o99fsEgZfnM0B4Wl0DRrzn nE6DOvkGZiNF4on1J2KgL2rB0r+VEyyMtBTCRs519rEaa8ACFUQDqEqoUIC92SnS pD/n9S2JwVH1dLX7cRoiMQcX/r4do83LlK0IvMswApMuNqYRQU6332lwosdgo7KQ 8+antAlVKuqMAGNvhVWMy1DuaRO5gCqRwL1wpzebNHsw4eRsDD2MNkeLXbM2P2oL 5OflIrPLMlLORlPtwbJclm8CcnQzQGMAa5yEDJcU1PIWH/urdRd+KqWQ+N0Zfj6m J3L4tXDY61hqwZ8BISe+/9iFDooGV/6Ri4mbez4UWiN6UfaKKokaFZzbo2n3VTb9 Htx5KsrzslfGWAnoeIT9GnyFhT4te9IHT69jl2AorvxpmdXdfOI8TgrzS8TzuKGD N6TadC4IZGLLpww+rND6Bywdc8/garmFbck+/nVdMRwNAsZUE+m08OrNFMCqmYms p9jIA2tRh94Hu4Awi8hG =2rs/ -----END PGP SIGNATURE----- Merge tag 'xfs-4.15-fixes-10' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull XFS fixes from Darrick Wong: "I have just a few fixes for bugs and resource cleanup problems this week: - Fix resource cleanup of failed quota initialization - Fix integer overflow problems wrt s_maxbytes" * tag 'xfs-4.15-fixes-10' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix s_maxbytes overflow problems xfs: quota: check result of register_shrinker() xfs: quota: fix missed destroy of qi_tree_lock	2018-01-05 12:59:32 -08:00
Darrick J. Wong	b4d8ad7fd3	xfs: fix s_maxbytes overflow problems Fix some integer overflow problems if offset + count happen to be large enough to cause an integer overflow. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-01-02 10:16:32 -08:00
Aliaksei Karaliou	3a3882ff26	xfs: quota: check result of register_shrinker() xfs_qm_init_quotainfo() does not check result of register_shrinker() which was tagged as __must_check recently, reported by sparse. Signed-off-by: Aliaksei Karaliou <akaraliou.dev@gmail.com> [darrick: move xfs_qm_destroy_quotainos nearer xfs_qm_init_quotainos] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-01-02 10:16:32 -08:00
Aliaksei Karaliou	2196881566	xfs: quota: fix missed destroy of qi_tree_lock xfs_qm_destroy_quotainfo() does not destroy quotainfo->qi_tree_lock while destroys quotainfo->qi_quotaofflock. Signed-off-by: Aliaksei Karaliou <akaraliou.dev@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-01-02 10:16:32 -08:00
Linus Torvalds	fca0e39b2b	Changes since last update: - Fix a locking problem during xattr block conversion that could lead to the log checkpointing thread to try to write an incomplete buffer to disk, which leads to a corruption shutdown - Fix a null pointer dereference when removing delayed allocation extents - Remove post-eof speculative allocations when reflinking a block past current inode size so that we don't just leave them there and assert on inode reclaim - Relax an assert which didn't accurately reflect the way locking works and would trigger under heavy io load - Avoid infinite loop when cancelling copy on write extents after a writeback failure - Try to avoid copy on write transaction reservation overflows when remapping after a successful write - Fix various problems with the copy-on-write reservation automatic garbage collection not being cleaned up properly during a ro remount - Fix problems with rmap log items being processed in the wrong order, leading to corruption shutdowns - Fix problems with EFI recovery wherein the "remove any rmapping if present" mechanism wasn't actually doing anything, which would lead to corruption problems later when the extent is reallocated, leading to multiple rmaps for the same extent -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCgAGBQJaO+dwAAoJEPh/dxk0SrTrY8YP/R9AXH3Wt6S2QGGjZfXURa22 /cioJKFl8hWay00ZT8Zcj4Pdx6R+stvausj5ECDvpdWZG+d28e61c1bxg+bqRYO5 JWXikWnAa80RQ5uEjOXHoUjAgk6u6YYuQHEuHH/xA0nL4Cw98WLSzLjqk7ZU53rx P17dgUWWHta/w8OpxG9UG5pxvNW3VRitiyCMWxa2gzBPncHnCk3fu9lInpDzH9S+ xakwCRtfiAykoOG/O5pnMg6vw5r6ENwK7DymxXgqF+Vv/HzgMbeJs+9UON2eACtp ECHGffN4pXpqWVcGDMs5cWCOfLUEjxCrotMLYpIrdZs5DptmOcOWpQpHWl4JiaXB rqAxx3D0Yo+00ENponM01un8UgCXF5gqsDGyTzn99aPpDVqxCJw1XmSdOXRhcnnF At2raUkXF+nbqaVwL3Y7ZJuOKs1hi3HpsYwwfvClR8cTFk/BaY6sQ4QnVR0Ggkg6 8lZxeDb8VdoUjWO11sX1edwGtR8g+p3PSHiUFSnh1JsbP2I0R+TV+j5Y9rMotxFT Eq6+Ehp889GeSpEBCrDpMgNIABMjBxoi5JvOwXSUNhF5Rh/1Vf//7v31nXcyVlah a95IhCYfQLFMtaYaGr2ElvdO+Qs1+ppsD207I4H86XotjRkvD7U+mJoYm9EaujQX jgUDdZEsP5h5DX524VHU =i51V -----END PGP SIGNATURE----- Merge tag 'xfs-4.15-fixes-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Darrick Wong: "Here are some XFS fixes for 4.15-rc5. Apologies for the unusually large number of patches this late, but I wanted to make sure the corruption fixes were really ready to go. Changes since last update: - Fix a locking problem during xattr block conversion that could lead to the log checkpointing thread to try to write an incomplete buffer to disk, which leads to a corruption shutdown - Fix a null pointer dereference when removing delayed allocation extents - Remove post-eof speculative allocations when reflinking a block past current inode size so that we don't just leave them there and assert on inode reclaim - Relax an assert which didn't accurately reflect the way locking works and would trigger under heavy io load - Avoid infinite loop when cancelling copy on write extents after a writeback failure - Try to avoid copy on write transaction reservation overflows when remapping after a successful write - Fix various problems with the copy-on-write reservation automatic garbage collection not being cleaned up properly during a ro remount - Fix problems with rmap log items being processed in the wrong order, leading to corruption shutdowns - Fix problems with EFI recovery wherein the "remove any rmapping if present" mechanism wasn't actually doing anything, which would lead to corruption problems later when the extent is reallocated, leading to multiple rmaps for the same extent" * tag 'xfs-4.15-fixes-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: only skip rmap owner checks for unknown-owner rmap removal xfs: always honor OWN_UNKNOWN rmap removal requests xfs: queue deferred rmap ops for cow staging extent alloc/free in the right order xfs: set cowblocks tag for direct cow writes too xfs: remove leftover CoW reservations when remounting ro xfs: don't be so eager to clear the cowblocks tag on truncate xfs: track cowblocks separately in i_flags xfs: allow CoW remap transactions to use reserve blocks xfs: avoid infinite loop when cancelling CoW blocks after writeback failure xfs: relax is_reflink_inode assert in xfs_reflink_find_cow_mapping xfs: remove dest file's post-eof preallocations before reflinking xfs: move xfs_iext_insert tracepoint to report useful information xfs: account for null transactions in bunmapi xfs: hold xfs_buf locked between shortform->leaf conversion and the addition of an attribute xfs: add the ability to join a held buffer to a defer_ops	2017-12-22 12:27:27 -08:00
Darrick J. Wong	68c58e9b9a	xfs: only skip rmap owner checks for unknown-owner rmap removal For rmap removal, refactor the rmap owner checks into a separate function, then skip the checks if we are performing an unknown-owner removal. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-12-21 08:48:38 -08:00
Darrick J. Wong	33df3a9cf9	xfs: always honor OWN_UNKNOWN rmap removal requests Calling xfs_rmap_free with an unknown owner is supposed to remove any rmaps covering that range regardless of owner. This is used by the EFI recovery code to say "we're freeing this, it mustn't be owned by anything anymore", but for whatever reason xfs_free_ag_extent filters them out. Therefore, remove the filter and make xfs_rmap_unmap actually treat it as a wildcard owner -- free anything that's already there, and if there's no owner at all then that's fine too. There are two existing callers of bmap_add_free that take care the rmap deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap cleanup; convert these to use OWN_NULL (via helpers), and now we really require that an RUI (if any) gets added to the defer ops before any EFI. Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests, growfs will have to consult directly with the rmap to ensure that there aren't any rmaps in the grown region. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-12-21 08:48:38 -08:00
Darrick J. Wong	0525e952dc	xfs: queue deferred rmap ops for cow staging extent alloc/free in the right order Under the deferred rmap operation scheme, there's a certain order in which the rmap deferred ops have to be queued to maintain integrity during log replay. For alloc/map operations that order is cui -> rui; for free/unmap operations that order is cui -> rui -> efi. However, the initial refcount code got the ordering wrong in the free side of things because it queued refcount free op and an EFI and the refcount free op queued a rmap free op, resulting in the order cui -> efi -> rui. If we fail before the efd finishes, the efi recovery will try to do a wildcard rmap removal and the subsequent rui will fail to find the rmap and blow up. This didn't ever happen due to other screws up in handling unknown owner rmap removals, but those other screw ups broke recovery in other ways, so fix the ordering to follow the intended rules. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-12-21 08:48:38 -08:00
Darrick J. Wong	86d692bfad	xfs: set cowblocks tag for direct cow writes too If a user performs a direct CoW write, we end up loading the CoW fork with preallocated extents. Therefore, we must set the cowblocks tag so that they can be cleared out if we run low on space. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-12-21 08:47:37 -08:00
Darrick J. Wong	10ddf64e42	xfs: remove leftover CoW reservations when remounting ro When we're remounting the filesystem readonly, remove all CoW preallocations prior to going ro. If the fs goes down after the ro remount, we never clean up the staging extents, which means xfs_check will trip over them on a subsequent run. Practically speaking, the next mount will clean them up too, so this is unlikely to be seen. Since we shut down the cowblocks cleaner on remount-ro, we also have to make sure we start it back up if/when we remount-rw. Found by adding clonerange to fsstress and running xfs/017. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-12-21 08:47:32 -08:00
Darrick J. Wong	363e59baa4	xfs: don't be so eager to clear the cowblocks tag on truncate Currently, xfs_itruncate_extents clears the cowblocks tag if i_cnextents is zero. This is wrong, since i_cnextents only tracks real extents in the CoW fork, which means that we could have some delayed CoW reservations still in there that will now never get cleaned. Fix a further bug where we /don't/ clear the reflink iflag if there are any attribute blocks -- really, it's only safe to clear the reflink flag if there are no data fork extents and no cow fork extents. Found by adding clonerange to fsstress in xfs/017. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-12-21 08:47:28 -08:00
Darrick J. Wong	91aae6be41	xfs: track cowblocks separately in i_flags The EOFBLOCKS/COWBLOCKS tags are totally separate things, so track them with separate i_flags. Right now we're abusing IEOFBLOCKS for both, which is totally bogus because we won't tag the inode with COWBLOCKS if IEOFBLOCKS was set by a previous tagging of the inode with EOFBLOCKS. Found by wiring up clonerange to fsstress in xfs/017. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-12-20 17:11:48 -08:00
Darrick J. Wong	a192de265b	xfs: allow CoW remap transactions to use reserve blocks Since we as yet have no way of holding on to the indlen blocks that are reserved as part of CoW fork delalloc reservations, let the CoW remap transaction dip into the reserves so that we avoid failing writes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-12-14 09:20:11 -08:00
Darrick J. Wong	9d40fba8b2	xfs: avoid infinite loop when cancelling CoW blocks after writeback failure When we're cancelling a cow range, we don't always delete each extent that we iterate, so we have to move icur backwards in the list to avoid an infinite loop. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-12-14 09:20:11 -08:00
Darrick J. Wong	73353f486c	xfs: relax is_reflink_inode assert in xfs_reflink_find_cow_mapping We don't hold the ilock through the entire sequence of xfs_writepage_map -> xfs_map_cow -> xfs_reflink_find_cow_mapping. This means that we can race with another thread that is trying to clear the inode reflink flag, with the result that the flag is set for the xfs_map_cow check but cleared before we get to the assert in find_cow_mapping. When this happens, we blow the assert even though everything is fine. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-12-14 09:20:11 -08:00
Darrick J. Wong	5c989a0ee0	xfs: remove dest file's post-eof preallocations before reflinking If we try to reflink into a file with post-eof preallocations at an offset well past the preallocations, we increase i_size as one would expect. However, those allocations do not have page cache backing them, so they won't get cleaned out on their own. This leads to asserts in the collapse/insert range code and xfs_destroy_inode when they encounter delalloc extents they weren't expecting to find. Since there are plenty of other places where we dump those post-eof blocks, do the same to the reflink destination file before we start remapping extents. This was found by adding clonerange support to fsstress and running it in write-only mode. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-12-14 09:20:11 -08:00
Darrick J. Wong	c54854a437	xfs: move xfs_iext_insert tracepoint to report useful information Move the tracepoint in xfs_iext_insert to after the point where we've inserted the extent because otherwise we report stale extent data in the ftrace output. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-12-14 09:20:11 -08:00
Darrick J. Wong	8c57b88637	xfs: account for null transactions in bunmapi In `e1a4e37cc7` ("xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent"), we try to constrain the amount of real extents we unmap from the data fork in a given call so that we don't blow out transaction reservations. However, not all bunmapi operations require a transaction -- if we're only removing a delalloc extent, no transaction is needed, so we have to code against that. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-12-14 09:20:10 -08:00
Darrick J. Wong	6e643cd094	xfs: hold xfs_buf locked between shortform->leaf conversion and the addition of an attribute The new attribute leaf buffer is not held locked across the transaction roll between the shortform->leaf modification and the addition of the new entry. As a result, the attribute buffer modification being made is not atomic from an operational perspective. Hence the AIL push can grab it in the transient state of "just created" after the initial transaction is rolled, because the buffer has been released. This leads to xfs_attr3_leaf_verify() asserting that hdr.count is zero, treating this as in-memory corruption, and shutting down the filesystem. Darrick ported the original patch to 4.15 and reworked it use the xfs_defer_bjoin helper and hold/join the buffer correctly across the second transaction roll. Signed-off-by: Alex Lyakas <alex@zadarastorage.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-12-14 09:18:12 -08:00
Darrick J. Wong	b7b2846fe2	xfs: add the ability to join a held buffer to a defer_ops In certain cases, defer_ops callers will lock a buffer and want to hold the lock across transaction rolls. Similar to ijoined inodes, we want to dirty & join the buffer with each transaction roll in defer_finish so that afterwards the caller still owns the buffer lock and we haven't inadvertently pinned the log. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-12-14 09:17:35 -08:00

1 2 3 4 5 ...

4869 commits