Commit graph

5136 commits

Author SHA1 Message Date
Val Packett daa2c99c89 p9fs: implement working putpages (fix mmap write)
Mostly copied from smbfs. This driver in its current state has the exact
same issue that prevents the generic putpages implementation from
working.

Sponsored by:		https://www.patreon.com/valpackett
Reviewed by:		dfr
Differential Revision:	https://reviews.freebsd.org/D45639
MFC after:		3 months
2024-06-24 17:11:47 +01:00
Alan Somers 6efba04df3 fusefs: fix two bugs regarding _PC_MIN_HOLE_SIZE
Background:

If a user does pathconf(_, _PC_MIN_HOLE_SIZE) on a fusefs file system,
the kernel must actually issue a FUSE_LSEEK operation in order to
determine whether the server supports it.  We cache that result, so we
only have to send FUSE_LSEEK the first time that _PC_MIN_HOLE_SIZE is
requested on any given mountpoint.

Problem 1:

Unlike fpathconf, pathconf operates on files that may not be open.  But
FUSE_LSEEK requires the file to be open.  As described in PR 278135,
FUSE_LSEEK cannot be sent for unopened files, causing _PC_MIN_HOLE_size
to wrongly report EINVAL.  We never noticed that before because the
fusefs test suite only uses fpathconf, not pathconf.  Fix this bug by
opening the file if necessary.

Problem 2:

On a completely sparse file, with no data blocks at all, FUSE_LSEEK with
SEEK_DATA would fail to ENXIO.  That's correct behavior, but
fuse_vnop_pathconf wrongly interpreted that as "FUSE_LSEEK not
supported".  Fix the interpretation.

PR:		278135
MFC after:	1 week
Sponsored by:	Axcient
Differential Revision: https://reviews.freebsd.org/D44618
2024-06-24 10:02:02 -06:00
Doug Rabson 56e4622588 p9fs: fix lookup of "." for lib9p-based 9P servers
The lib9p implementation takes a strict interpretation of the Twalk RPC
call and returns an error for attempts to lookup ".".  The workaround is
to fake the lookup locally.

Reviewed by: Val Packett <val@packett.cool>
MFC after: 3 months
2024-06-24 14:40:06 +01:00
Rick Macklem 67284d32e5 nfsd: Make modifying vfs.nfsd.enable_locallocks safe
Commit dfaeeacc2c modified clientID handling so that it could be done
with only a mutex lock held when vfs.nfsd.enable_locallocks is 0.
This makes it unsafe to change the setting of vfs.nfsd.enable_locallocks
when nfsd threads are active.

This patch forces all nfsd threads to be blocked when the value
of vfs.nfsd.enable_locallocks is changed, so that it is done safely.

MFC after:	1 month
2024-06-23 15:47:22 -07:00
Rick Macklem dfaeeacc2c nfsd: Allow a mutex lock for clientID handling
On Feb. 28, a problem was reported on freebsd-stable@ where a
nfsd thread processing an ExchangeID operation was blocked for
a long time by another nfsd thread performing a copy_file_range.
This occurred because the copy_file_range was taking a long time,
but also because handling a clientID requires that all other nfsd
threads be blocked via an exclusive lock, as required by ExchangeID.

This patch allows clientID handling to be done with only a mutex
held (instead of an exclusive lock that blocks all other nfsd threads)
when vfs.nfsd.enable_locallocks is 0.  For the case of
vfs.nfsd.enable_locallocks set to 1, the exclusive lock that
blocks all nfsd threads is still required.

This patch does make changing the value of vfs.nfsd.enable_locallocks
somewhat racy.  A future commit will ensure any change is done when
all nfsd threads are blocked to avoid this racyness.

MFC after:	1 month
2024-06-22 15:56:40 -07:00
Rick Macklem a7de510685 nfsd: Fix nfsrv_cleanclient so that it can be called with a mutex
On Feb. 28, a problem was reported on freebsd-stable@ where a
nfsd thread processing an ExchangeID operation was blocked for
a long time by another nfsd thread performing a copy_file_range.
This occurred because the copy_file_range was taking a long time,
but also because handling a clientID requires that all other nfsd
threads be blocked via an exclusive lock, as required by ExchangeID.

This patch adds two arguments to nfsv4_cleanclient() so that it
can optionally be called with a mutex held.  For this patch, the
first of these arguments is "false" and, as such, there is no
change in semantics.  However, this change will allow a future
commit to modify handling of the clientID so that it can be done
with a mutex held while other nfsd threads continue to process
NFS RPCs.

MFC after:	1 month
2024-06-21 15:08:48 -07:00
Doug Rabson b2ebcd19f4 p9fs: Fix the build for 32-bit kernels
MFC after: 3 months
2024-06-19 15:16:38 +01:00
Doug Rabson e97ad33a89 Add an implementation of the 9P filesystem
This is derived from swills@ fork of the Juniper virtfs with many
changes by me including bug fixes, style improvements, clearer layering
and more consistent logging. The filesystem is renamed to p9fs to better
reflect its function and to prevent possible future confusion with
virtio-fs.

Several updates and fixes from Juniper have been integrated into this
version by Val Packett and these contributions along with the original
Juniper authors are credited below.

To use this with bhyve, add 'virtio_p9fs_load=YES' to loader.conf. The
bhyve virtio-9p device allows access from the guest to files on the host
by mapping a 'sharename' to a host path. It is possible to use p9fs as a
root filesystem by adding this to /boot/loader.conf:

	vfs.root.mountfrom="p9fs:sharename"

for non-root filesystems add something like this to /etc/fstab:

	sharename /mnt p9fs rw 0 0

In both examples, substitute the share name used on the bhyve command
line.

The 9P filesystem protocol relies on stateful file opens which map
protocol-level FIDs to host file descriptors. The FreeBSD vnode
interface doesn't really support this and we use heuristics to guess the
right FID to use for file operations.  This can be confused by privilege
lowering and does not guarantee that the FID created for a given file
open is always used for file operations, even if the calling process is
using the file descriptor from the original open call. Improving this
would involve changes to the vnode interface which is out-of-scope for
this import.

Differential Revision: https://reviews.freebsd.org/D41844
Reviewed by: kib, emaste, dch
MFC after: 3 months
Co-authored-by: Val Packett <val@packett.cool>
Co-authored-by: Ka Ho Ng <kahon@juniper.net>
Co-authored-by: joyu <joyul@juniper.net>
Co-authored-by: Kumara Babu Narayanaswamy <bkumara@juniper.net>
2024-06-19 13:12:04 +01:00
Rick Macklem bb53f071e8 nfscl: Add support for read delegations and atomic upgrade
For NFSv4.1/4.2, an atomic upgrade of a delegation from a
read delegation to a write delegation is allowed and can
result in significantly improved performance.
This patch adds this upgrade to the NFSv4.1/4.2 client and
enables use of read delegations.

For a test case of building a FreeBSD kernel (sources and
output objects) over a NFSv4.2 mount, these changes reduced
the elapsed time by 30% and included a reduction of 80% for
RPC counts when delegations were enabled.  As such, with this
patch there are at least certain cases where enabling
delegations seems to be worth the increased complexity they
bring.

This patch should only affect the NFSv4.1/4.2 behaviour
when delegations are enabled, which is not the default.

MFC after:	1 month
2024-06-12 16:41:12 -07:00
Rick Macklem 4308d6e0fc nfscl: Add a check for VREG for delegations
Since delegations are only issued for regular files, check
v_type to see if the query is for a regular file.  This is
a simple optimization for the non-VREG case.
While here, fix a couple of global variable declarations.

This patch should only affect the NFSv4.1/4.2 behaviour
when delegations are enabled, which is not the default.

MFC after:	1 month
2024-06-12 16:17:23 -07:00
Rick Macklem ec1f285f2e nfscl: Add support for the NFSv4.1/4.2 WANT_xxx flags
NFSv4.1/4.2 defined new OPEN_WANT_xxx flags that a client
can use to hint to the server that delegations are or are
not wanted.  This patch adds use of those delegations to
the client.

This patch should only affect the NFSv4.1/4.2 behaviour
when delegations are enabled, which is not the default.

MFC after:	1 month
2024-06-12 16:11:10 -07:00
Rick Macklem 13a51233e4 nfsd: Delete an unused VNET global variable
During code inspection, I noticed that
NFSD_VNET_DEFINE(nfsrv_dontlisthead)
is unused, so delete it.

MFC after:	2 weeks
2024-06-08 16:40:52 -07:00
Rick Macklem dbe7ff254e nfsd: Update a file missed by commit e2c9fad2e0
MFC after:	1 month
2024-06-04 18:54:15 -07:00
Rick Macklem e2c9fad2e0 nfsd: Fix delegation handled for atomic upgrade
For NFSv4.1/4.2, an atomic upgrade of a delegation from a
read delegation to a write delegation is allowed and can
result in signoficantly improved performance.

This patch adds support for this atomic upgrade, plus fixes
a couple of other delegation related bugs.  Since there were
three cases where delegations were being issued, the patch
factors this out into a separate function called
nfsrv_issuedelegations().

This patch should only affect the NFSv4.1/4.2 behaviour
when delegations are enabled, which is not the default.

MFC after:	1 month
2024-06-04 18:46:41 -07:00
Ryan Libby 6bd3f23a2a tmpfs_node_init: use MTX_NEW on lock from uninitialized memory
Reported by:	netchild
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D45364
2024-05-26 10:20:52 -07:00
Rick Macklem c68db4608e Revert "nfscl: Do not do readahead for directories"
The PR reported hangs that were avoided when this commit was
reverted.  Since it was only a cleanup, revert it.
The LORs in the PR need further investigation, since I think
readahead only hides the problem.

PR:	279138
This reverts commit fbe965591f.
2024-05-26 08:02:30 -07:00
Pawel Jakub Dawidek 56a8aca83a Stop treating size 0 as unknown size in vnode_create_vobject().
Whenever file is created, the vnode_create_vobject() function will
try to determine its size by calling vn_getsize_locked() as size 0
is ambigious: it means either the file size is 0 or the file size
is unknown.

Introduce special value for the size argument: VNODE_NO_SIZE.
Only when it is given, the vnode_create_vobject() will try to obtain
file's size on its own.

Introduce dedicated vnode_disk_create_vobject() for use by
g_vfs_open(), so we don't have to call vn_isdisk() in the common case
(for regular files).

Handle the case of mediasize==0 in g_vfs_open().

Reviewed by: alc, kib, markj, olce
Approved by: oshogbo (mentor), allanjude (mentor)
Differential Revision: https://reviews.freebsd.org/D45244
2024-05-23 06:08:14 +00:00
Pawel Jakub Dawidek ff4fc43afd Fix build. 2024-05-22 03:56:59 +00:00
Pawel Jakub Dawidek 31223e68e2 Simplify the code.
Obtained from: Fudo Security
Reviewed by: asomers, imp
Approved by: oshogbo (mentor)
Differential Revision: https://reviews.freebsd.org/D45247
2024-05-22 03:01:24 +00:00
Konstantin Belousov ff4480baf6 nfs client comment typo fix
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2024-05-19 01:49:59 +03:00
Konstantin Belousov 4681194979 tmpfs_destroy_vobject(): clear v_object under the object lock
Which allows tmpfs_pager_writecount_recalc() to reliably detect
reclaimed vnode and make its accesses to object->un_pager.swp.private
(== vp) safe against reclaim.  Note that vnode instantiation already
assigns v_object under the object lock.

Reviewed by:	markj
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D45119
2024-05-13 21:33:59 +03:00
Konstantin Belousov 6ada4e8a0a swap-like pagers: assert that writemapping decrease does not pass zero
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D45119
2024-05-13 21:33:29 +03:00
Konstantin Belousov 58d7ac11e7 tmpfs: recalculate OBJ_TMPFS_VREF on reinstantiating node' vnode
Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D45119
2024-05-13 21:33:29 +03:00
Konstantin Belousov 6d79564fe3 devfs_allocv(): style
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-05-12 04:13:00 +03:00
John Baldwin 473c90ac04 uio: Use switch statements when handling UIO_READ vs UIO_WRITE
This is mostly to reduce the diff with CheriBSD which adds additional
constants to enum uio_rw, but also matches the normal style used for
uio_segflg.

Reviewed by:	kib, emaste
Obtained from:	CheriBSD
Differential Revision:	https://reviews.freebsd.org/D45142
2024-05-10 13:43:36 -07:00
Rick Macklem fbe965591f nfscl: Do not do readahead for directories
For a very long time, the NFS client has done readahead for
directory blocks.  Unlike data blocks, the readahead cannot
begin until the Readdir RPC reply has been received, since
the directory offset cookie in that Readdir RPC reply is needed.
As such, the readahead is serialized and does not seem to
provide any real benefit.

Recent testing/benchmarking shows that removing this
readahead code for Readdir does not have a negative impact
on performance.

Therefore, this patch deletes the readahead code for Readdir,
which simplifies the code and may make future changes simpler.

MFC after:	1 month
2024-05-09 18:35:10 -07:00
Rick Macklem 3f65000b6b nfsd: Fix Link conformance with RFC8881 for delegations
RFC8881 specifies that, when a Link operation occurs on an
NFSv4, that file delegations issued to other clients must
be recalled.  Discovered during a recent discussion on nfsv4@ietf.org.

Although I have not observed a problem caused by not doing
the required delegation recall, it is definitely required
by the RFC, so this patch makes the server do the recall.

Tested during a recent NFSv4 IETF Bakeathon event.

MFC after:	1 week
2024-05-04 14:30:07 -07:00
Jason A. Harmening 05e8ab627b unionfs_rename: fix numerous locking issues
There are a few places in which unionfs_rename() accesses fvp's private
data without holding the necessary lock/interlock.  Moreover, the
implementation completely fails to handle the case in which fdvp is not
the same as tdvp; in this case it simply fails to lock fdvp at all.
Finally, it locks fvp while potentially already holding tvp's lock, but
makes no attempt to deal with possible LOR there.

Fix this by optimistically using the vnode interlock to protect
the short accesses to fdvp and fvp private data, sequentially.
If a file copy or shadow directory creation is required to prepare
the upper FS for the rename operation, the interlock must be dropped
and fdvp/fvp locked as necessary.

Additionally, use ERELOOKUP (as suggested by kib@) to simplify the
locking logic and eliminate unionfs_relookup() calls for file-copy/
shadow-directory cases that require tdvp's lock to be dropped.

Reviewed by:		kib (earlier version), olce
Tested by:		pho
Differential Revision:	https://reviews.freebsd.org/D44788
2024-04-28 20:19:48 -05:00
Rick Macklem 03a39a1708 nfscl: Clear out a lot of cruft related to B_DIRECT
There is only one place in the unpatched sources where B_DIRECT is
set in the NFS client and this code is never executed. As such, this patch
removes this code that is never executed, since B_DIRECT should never
be set.

During a IETF testing event this week, I saw a crash in ncl_doio_directwrite(),
but this function is only called if B_DIRECT is set.
I cannot explain how ncl_doio_directwrite() got called, but once this patch
was applied to the sources, the crash did not recur. This is not surprising,
since this patch deleted the function.

Reviewed by:	kib, markj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D44980
2024-04-27 17:10:48 -07:00
Rick Macklem 6251027c42 nfscl: Do not use nfso_own for delayed nfsrpc_doclose()
When an initial attempt to close an NFSv4 lock returns NFSERR_DELAY,
the open structure is put on a list for delayed closing.  When this
is done, the nfso_own field is set to NULL, so it cannot be used by
nfsrpc_doclose().

Without this patch, the NFSv4 client can crash when a NFSv4 server
replies NFSERR_DELAY to a Close operation.  Fortunately, most extant
NFSv4 servers do not do this.  This patch avoids the crash for any
that do return NFSERR_DELAY for Close.

Found during a IETF bakeathon testing event this week.

MFC after:	5 days
2024-04-25 20:58:21 -07:00
Rick Macklem 8efba70d79 nfscl: Revert part of commit 196787f79e
Commit 196787f79e erroneously assumed that the client code for
Open/Claim_deleg_cur_FH was broken, but it was not.
It was actually wireshark that was broken and indicated
that the correct XDR was bogus.

This reverts the part of 196787f79e that changed the arguments for
Open/Claim_deleg_cur_FH.

Found during the IETF bakeathon testing event this week.

MFC after:	3 days
2024-04-25 12:32:02 -07:00
Rick Macklem 54c3aa02e9 Revert "nfsd: Fix NFSv4.1/4.2 Claim_Deleg_Cur_FH"
This reverts commit f300335d9a.

It turns out that the old code was correct and it was wireshark
that was broken and indicated that the RPC's XDR was bogus.
Found during IETF bakeathon testing this week.
2024-04-25 09:41:23 -07:00
Mark Johnston 78c51db3c4 udf: uma_zcreate() does not fail
While here remove an old comment regarding preallocation; it appears to
refer to an optimization that is almost certainly irrelevant at this
point.

No functional change intended.

MFC after:	1 week
2024-04-24 08:45:59 -04:00
Mark Johnston 6d5ce2bb63 nfsserver: Default to nfs_reserved_port_only="YES"
This setting causes the NFS server to check that all RPCs are sent from
a privileged (<= 1023) port, rejecting those that are not.  This
slightly raises the bar for a user with network access to an
unauthenticated NFS server to access exported NFS filesystems.

Users that use traditional NFS clients (e.g., those provided by FreeBSD
or Linux) should not see any difference, assuming that unprivileged
filesystem mounting is disallowed.

Note that the setting is per-VNET, so may be overridden in VNET jails
without affecting the rest of the system.

Discussed with:	freebsd-arch@
Reviewed by:	rmacklem, bz, emaste
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D44906
2024-04-23 12:54:46 -04:00
Gordon Bergling 742f4b7758 tarfs(5): Grammar fix for a source code comment
- s/the the/of the/

MFC after:i	3 days
2024-04-20 11:21:54 +02:00
Mark Johnston b7e4666d7b nfsserver: Rate-limit messages about requests from unprivileged ports
If access from unreserved ports is disabled, then a remote host can
cause an NFS server to log a message by sending a packet.  This is
useful for diagnosing problems but bad for resiliency in the case where
the server is being spammed with a large number of rejected requests.

Limit prints to once per second (racily).

Reviewed by:	rmacklem, emaste
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44819
2024-04-17 10:36:58 -04:00
Brooks Davis 6bb132ba1e Reduce reliance on sys/sysproto.h pollution
Add sys/errno.h, sys/malloc.h, sys/queue.h, and vm/uma.h as needed.

sys/sysproto.h currently includes sys/acl.h which currently includes
sys/param.h, sys/queue.h, and vm/uma.h which in turn bring in
sys/errno.h sys/malloc.h.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D44465
2024-04-15 21:35:40 +01:00
Dag-Erling Smørgrav 2b258dd17c nullfs: Show correct exported flag.
MFC after:	3 days
Reviewed by:	allanjude, kib
Differential Revision:	https://reviews.freebsd.org/D44773
2024-04-13 17:21:01 +02:00
Zaphrod Beeblebrox d00c64bb23 nfscl: Purge name cache when readdir_plus is done
The author reported that this patch was needed to avoid
crashes on a fairly busy RISC-V system.  The author did not
provide details w.r.t. the crashes.  Although I
have not seen any such crash, the patch looks reasonable
and I have not found any regressions when testing it.

Since "rdirplus" is not a default option, the patch is
only needed if you are doing NFS mounts with the "rdirplus"
mount option and seeing crashes related to the name cache.

MFC after:	1 week
2024-04-11 13:27:27 -07:00
Jason A. Harmening b18029bc59 unionfs_lookup(): fix wild accesses to vnode private data
There are a few spots in which unionfs_lookup() accesses unionfs vnode
private data without holding the corresponding vnode lock or interlock.

Reviewed by:		kib, olce
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D44601
2024-04-09 17:36:59 -05:00
Mark Johnston a0895e394d tarfs: Implement VOP_BMAP
This lets tarfs provide readahead/behind hints to the VFS, which helps
memory-mapped I/O performance, important when running faulting in
executables out of a tarfs mount as one might if tarfs is used to back
the root filesystem, for example.  The improvement is particularly
noticeable when the backing tarball is zstd-compressed.

The implementation simply returns the extent of the virtual block
containing the target offset, clamped by the maximum I/O size.  This is
perhaps simplistic; it effectively just chooses values that would
correspond to a single VOP_READ call in tarfs_read_file().

Reviewed by:	des, kib
MFC after:	1 month
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D44626
2024-04-05 11:14:36 -04:00
Alan Somers c1326c01df fusefs: correct a comment
[skip ci]

MFC after:	1 week
Sponsored by:	Axcient
2024-04-04 14:18:56 -06:00
Mark Johnston 91eca18551 tarfs: Inherit mnt_iosize_max from the lower vnode
There is no obvious reason to use a value smaller than that.

Reviewed by:	des, kib
MFC after:	1 week
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D44627
2024-04-04 10:54:06 -04:00
Dag-Erling Smørgrav 0238d3711d tarfs: Fix 32-bit build.
MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	bapt
Differential Revision:	https://reviews.freebsd.org/D44613
2024-04-03 16:24:05 +02:00
Dag-Erling Smørgrav 584e1c355a tarfs: Ignore global extended headers.
Previously, we would error out if we encountered a global extended
header, because we don't know what it means.  This doesn't really
matter though, and traditionally, tar implementations have either
ignored them or treated them as plain files, so just ignore them.
This allows tarfs to mount tar files created by `git archive`.

MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	kevans
Differential Revision:	https://reviews.freebsd.org/D44600
2024-04-03 11:55:06 +02:00
Dag-Erling Smørgrav b1fd95c9e2 tarfs: Support paths that spill into exthdrs.
MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	kevans
Differential Revision:	https://reviews.freebsd.org/D44599
2024-04-03 11:55:01 +02:00
Jason A. Harmening eee6217b40 unionfs: implement VOP_UNP_* and remove special VSOCK vnode handling
unionfs has a bunch of clunky special-case code to avoid creating
unionfs wrapper vnodes for AF_UNIX sockets.  This was added in 2008
to address PR 118346, but in the intervening years the VOP_UNP_*
operations have been added to provide a clean interface to allow
sockets to work in the presence of stacked filesystems.

PR:			275871
Reviewed by:		kib (prior version), olce
Tested by:		Karlo Miličević <karlo98.m@gmail.com>
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D44288
2024-03-23 21:10:53 -05:00
Konstantin Belousov d3efbe0132 cdevpriv(9): add iterator
Reviewed by:	christos
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44469
2024-03-23 08:59:00 +02:00
Rick Macklem 748f56c53f nfsd: Add a sysctl to limit NFSv4.2 Copy RPC size
NFSv4.2 supports a Copy operation, which avoids file data being
read to the client and then written back to the server, if both
input and output files are on the same NFSv4.2 mount for
copy_file_range(2).

Unfortunately, this Copy operation can take a long time under
certain circumstances.  If this occurs concurrently with a RPC
that requires an exclusive lock on the nfsd such as ExchangeID
done for a new mount, the result can be an nfsd "stall" until
the Copy completes.

This patch adds a sysctl that can be set to limit the size of
a Copy operation or, if set to 0, disable Copy operations.

The use of this sysctl and other ways to avoid Copy operations
taking too long will be documented in the nfsd.4 man page by
a separate commit.

MFC after:	2 weeks
2024-03-15 18:04:37 -07:00
Jason A. Harmening 6c8ded0015 unionfs: accommodate underlying FS calls that may re-lock
Since non-doomed unionfs vnodes always share their primary lock with
either the lower or upper vnode, any forwarded call to the base FS
which transiently drops that upper or lower vnode lock may result in
the unionfs vnode becoming completely unlocked during that transient
window.  The unionfs vnode may then become doomed by a concurrent
forced unmount, which can lead to either or both of the following:

--Complete loss of the unionfs lock: in the process of being
  doomed, the unionfs vnode switches back to the default vnode lock,
  so even if the base FS VOP reacquires the upper/lower vnode lock,
  that no longer translates into the unionfs vnode being relocked.
  This will then violate that caller's locking assumptions as well
  as various assertions that are enabled with DEBUG_VFS_LOCKS.

--Complete less of reference on the upper/lower vnode: the caller
  normally holds a reference on the unionfs vnode, while the unionfs
  vnode in turn holds references on the upper/lower vnodes.  But in
  the course of being doomed, the unionfs vnode will drop the latter
  set of references, which can effectively lead to the base FS VOP
  executing with no references at all on its vnode, violating the
  assumption that vnodes can't be recycled during these calls and
  (if lucky) violating various assertions in the base FS.

Fix this by adding two new functions, unionfs_forward_vop_start_pair()
and unionfs_forward_vop_finish_pair(), which are intended to bookend
any forwarded VOP which may transiently unlock the relevant vnode(s).
These functions are currently only applied to VOPs that modify file
state (and require vnode reference and lock state to be identical at
call entry and exit), as the common reason for transiently dropping
locks is to update filesystem metadata.

Reviewed by:	olce
Tested by:	pho
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D44076
2024-03-09 19:54:04 -06:00