Commit graph

5124 commits

Author SHA1 Message Date
Rick Macklem dbe7ff254e nfsd: Update a file missed by commit e2c9fad2e0
MFC after:	1 month
2024-06-04 18:54:15 -07:00
Rick Macklem e2c9fad2e0 nfsd: Fix delegation handled for atomic upgrade
For NFSv4.1/4.2, an atomic upgrade of a delegation from a
read delegation to a write delegation is allowed and can
result in signoficantly improved performance.

This patch adds support for this atomic upgrade, plus fixes
a couple of other delegation related bugs.  Since there were
three cases where delegations were being issued, the patch
factors this out into a separate function called
nfsrv_issuedelegations().

This patch should only affect the NFSv4.1/4.2 behaviour
when delegations are enabled, which is not the default.

MFC after:	1 month
2024-06-04 18:46:41 -07:00
Ryan Libby 6bd3f23a2a tmpfs_node_init: use MTX_NEW on lock from uninitialized memory
Reported by:	netchild
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D45364
2024-05-26 10:20:52 -07:00
Rick Macklem c68db4608e Revert "nfscl: Do not do readahead for directories"
The PR reported hangs that were avoided when this commit was
reverted.  Since it was only a cleanup, revert it.
The LORs in the PR need further investigation, since I think
readahead only hides the problem.

PR:	279138
This reverts commit fbe965591f.
2024-05-26 08:02:30 -07:00
Pawel Jakub Dawidek 56a8aca83a Stop treating size 0 as unknown size in vnode_create_vobject().
Whenever file is created, the vnode_create_vobject() function will
try to determine its size by calling vn_getsize_locked() as size 0
is ambigious: it means either the file size is 0 or the file size
is unknown.

Introduce special value for the size argument: VNODE_NO_SIZE.
Only when it is given, the vnode_create_vobject() will try to obtain
file's size on its own.

Introduce dedicated vnode_disk_create_vobject() for use by
g_vfs_open(), so we don't have to call vn_isdisk() in the common case
(for regular files).

Handle the case of mediasize==0 in g_vfs_open().

Reviewed by: alc, kib, markj, olce
Approved by: oshogbo (mentor), allanjude (mentor)
Differential Revision: https://reviews.freebsd.org/D45244
2024-05-23 06:08:14 +00:00
Pawel Jakub Dawidek ff4fc43afd Fix build. 2024-05-22 03:56:59 +00:00
Pawel Jakub Dawidek 31223e68e2 Simplify the code.
Obtained from: Fudo Security
Reviewed by: asomers, imp
Approved by: oshogbo (mentor)
Differential Revision: https://reviews.freebsd.org/D45247
2024-05-22 03:01:24 +00:00
Konstantin Belousov ff4480baf6 nfs client comment typo fix
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2024-05-19 01:49:59 +03:00
Konstantin Belousov 4681194979 tmpfs_destroy_vobject(): clear v_object under the object lock
Which allows tmpfs_pager_writecount_recalc() to reliably detect
reclaimed vnode and make its accesses to object->un_pager.swp.private
(== vp) safe against reclaim.  Note that vnode instantiation already
assigns v_object under the object lock.

Reviewed by:	markj
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D45119
2024-05-13 21:33:59 +03:00
Konstantin Belousov 6ada4e8a0a swap-like pagers: assert that writemapping decrease does not pass zero
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D45119
2024-05-13 21:33:29 +03:00
Konstantin Belousov 58d7ac11e7 tmpfs: recalculate OBJ_TMPFS_VREF on reinstantiating node' vnode
Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D45119
2024-05-13 21:33:29 +03:00
Konstantin Belousov 6d79564fe3 devfs_allocv(): style
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-05-12 04:13:00 +03:00
John Baldwin 473c90ac04 uio: Use switch statements when handling UIO_READ vs UIO_WRITE
This is mostly to reduce the diff with CheriBSD which adds additional
constants to enum uio_rw, but also matches the normal style used for
uio_segflg.

Reviewed by:	kib, emaste
Obtained from:	CheriBSD
Differential Revision:	https://reviews.freebsd.org/D45142
2024-05-10 13:43:36 -07:00
Rick Macklem fbe965591f nfscl: Do not do readahead for directories
For a very long time, the NFS client has done readahead for
directory blocks.  Unlike data blocks, the readahead cannot
begin until the Readdir RPC reply has been received, since
the directory offset cookie in that Readdir RPC reply is needed.
As such, the readahead is serialized and does not seem to
provide any real benefit.

Recent testing/benchmarking shows that removing this
readahead code for Readdir does not have a negative impact
on performance.

Therefore, this patch deletes the readahead code for Readdir,
which simplifies the code and may make future changes simpler.

MFC after:	1 month
2024-05-09 18:35:10 -07:00
Rick Macklem 3f65000b6b nfsd: Fix Link conformance with RFC8881 for delegations
RFC8881 specifies that, when a Link operation occurs on an
NFSv4, that file delegations issued to other clients must
be recalled.  Discovered during a recent discussion on nfsv4@ietf.org.

Although I have not observed a problem caused by not doing
the required delegation recall, it is definitely required
by the RFC, so this patch makes the server do the recall.

Tested during a recent NFSv4 IETF Bakeathon event.

MFC after:	1 week
2024-05-04 14:30:07 -07:00
Jason A. Harmening 05e8ab627b unionfs_rename: fix numerous locking issues
There are a few places in which unionfs_rename() accesses fvp's private
data without holding the necessary lock/interlock.  Moreover, the
implementation completely fails to handle the case in which fdvp is not
the same as tdvp; in this case it simply fails to lock fdvp at all.
Finally, it locks fvp while potentially already holding tvp's lock, but
makes no attempt to deal with possible LOR there.

Fix this by optimistically using the vnode interlock to protect
the short accesses to fdvp and fvp private data, sequentially.
If a file copy or shadow directory creation is required to prepare
the upper FS for the rename operation, the interlock must be dropped
and fdvp/fvp locked as necessary.

Additionally, use ERELOOKUP (as suggested by kib@) to simplify the
locking logic and eliminate unionfs_relookup() calls for file-copy/
shadow-directory cases that require tdvp's lock to be dropped.

Reviewed by:		kib (earlier version), olce
Tested by:		pho
Differential Revision:	https://reviews.freebsd.org/D44788
2024-04-28 20:19:48 -05:00
Rick Macklem 03a39a1708 nfscl: Clear out a lot of cruft related to B_DIRECT
There is only one place in the unpatched sources where B_DIRECT is
set in the NFS client and this code is never executed. As such, this patch
removes this code that is never executed, since B_DIRECT should never
be set.

During a IETF testing event this week, I saw a crash in ncl_doio_directwrite(),
but this function is only called if B_DIRECT is set.
I cannot explain how ncl_doio_directwrite() got called, but once this patch
was applied to the sources, the crash did not recur. This is not surprising,
since this patch deleted the function.

Reviewed by:	kib, markj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D44980
2024-04-27 17:10:48 -07:00
Rick Macklem 6251027c42 nfscl: Do not use nfso_own for delayed nfsrpc_doclose()
When an initial attempt to close an NFSv4 lock returns NFSERR_DELAY,
the open structure is put on a list for delayed closing.  When this
is done, the nfso_own field is set to NULL, so it cannot be used by
nfsrpc_doclose().

Without this patch, the NFSv4 client can crash when a NFSv4 server
replies NFSERR_DELAY to a Close operation.  Fortunately, most extant
NFSv4 servers do not do this.  This patch avoids the crash for any
that do return NFSERR_DELAY for Close.

Found during a IETF bakeathon testing event this week.

MFC after:	5 days
2024-04-25 20:58:21 -07:00
Rick Macklem 8efba70d79 nfscl: Revert part of commit 196787f79e
Commit 196787f79e erroneously assumed that the client code for
Open/Claim_deleg_cur_FH was broken, but it was not.
It was actually wireshark that was broken and indicated
that the correct XDR was bogus.

This reverts the part of 196787f79e that changed the arguments for
Open/Claim_deleg_cur_FH.

Found during the IETF bakeathon testing event this week.

MFC after:	3 days
2024-04-25 12:32:02 -07:00
Rick Macklem 54c3aa02e9 Revert "nfsd: Fix NFSv4.1/4.2 Claim_Deleg_Cur_FH"
This reverts commit f300335d9a.

It turns out that the old code was correct and it was wireshark
that was broken and indicated that the RPC's XDR was bogus.
Found during IETF bakeathon testing this week.
2024-04-25 09:41:23 -07:00
Mark Johnston 78c51db3c4 udf: uma_zcreate() does not fail
While here remove an old comment regarding preallocation; it appears to
refer to an optimization that is almost certainly irrelevant at this
point.

No functional change intended.

MFC after:	1 week
2024-04-24 08:45:59 -04:00
Mark Johnston 6d5ce2bb63 nfsserver: Default to nfs_reserved_port_only="YES"
This setting causes the NFS server to check that all RPCs are sent from
a privileged (<= 1023) port, rejecting those that are not.  This
slightly raises the bar for a user with network access to an
unauthenticated NFS server to access exported NFS filesystems.

Users that use traditional NFS clients (e.g., those provided by FreeBSD
or Linux) should not see any difference, assuming that unprivileged
filesystem mounting is disallowed.

Note that the setting is per-VNET, so may be overridden in VNET jails
without affecting the rest of the system.

Discussed with:	freebsd-arch@
Reviewed by:	rmacklem, bz, emaste
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D44906
2024-04-23 12:54:46 -04:00
Gordon Bergling 742f4b7758 tarfs(5): Grammar fix for a source code comment
- s/the the/of the/

MFC after:i	3 days
2024-04-20 11:21:54 +02:00
Mark Johnston b7e4666d7b nfsserver: Rate-limit messages about requests from unprivileged ports
If access from unreserved ports is disabled, then a remote host can
cause an NFS server to log a message by sending a packet.  This is
useful for diagnosing problems but bad for resiliency in the case where
the server is being spammed with a large number of rejected requests.

Limit prints to once per second (racily).

Reviewed by:	rmacklem, emaste
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44819
2024-04-17 10:36:58 -04:00
Brooks Davis 6bb132ba1e Reduce reliance on sys/sysproto.h pollution
Add sys/errno.h, sys/malloc.h, sys/queue.h, and vm/uma.h as needed.

sys/sysproto.h currently includes sys/acl.h which currently includes
sys/param.h, sys/queue.h, and vm/uma.h which in turn bring in
sys/errno.h sys/malloc.h.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D44465
2024-04-15 21:35:40 +01:00
Dag-Erling Smørgrav 2b258dd17c nullfs: Show correct exported flag.
MFC after:	3 days
Reviewed by:	allanjude, kib
Differential Revision:	https://reviews.freebsd.org/D44773
2024-04-13 17:21:01 +02:00
Zaphrod Beeblebrox d00c64bb23 nfscl: Purge name cache when readdir_plus is done
The author reported that this patch was needed to avoid
crashes on a fairly busy RISC-V system.  The author did not
provide details w.r.t. the crashes.  Although I
have not seen any such crash, the patch looks reasonable
and I have not found any regressions when testing it.

Since "rdirplus" is not a default option, the patch is
only needed if you are doing NFS mounts with the "rdirplus"
mount option and seeing crashes related to the name cache.

MFC after:	1 week
2024-04-11 13:27:27 -07:00
Jason A. Harmening b18029bc59 unionfs_lookup(): fix wild accesses to vnode private data
There are a few spots in which unionfs_lookup() accesses unionfs vnode
private data without holding the corresponding vnode lock or interlock.

Reviewed by:		kib, olce
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D44601
2024-04-09 17:36:59 -05:00
Mark Johnston a0895e394d tarfs: Implement VOP_BMAP
This lets tarfs provide readahead/behind hints to the VFS, which helps
memory-mapped I/O performance, important when running faulting in
executables out of a tarfs mount as one might if tarfs is used to back
the root filesystem, for example.  The improvement is particularly
noticeable when the backing tarball is zstd-compressed.

The implementation simply returns the extent of the virtual block
containing the target offset, clamped by the maximum I/O size.  This is
perhaps simplistic; it effectively just chooses values that would
correspond to a single VOP_READ call in tarfs_read_file().

Reviewed by:	des, kib
MFC after:	1 month
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D44626
2024-04-05 11:14:36 -04:00
Alan Somers c1326c01df fusefs: correct a comment
[skip ci]

MFC after:	1 week
Sponsored by:	Axcient
2024-04-04 14:18:56 -06:00
Mark Johnston 91eca18551 tarfs: Inherit mnt_iosize_max from the lower vnode
There is no obvious reason to use a value smaller than that.

Reviewed by:	des, kib
MFC after:	1 week
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D44627
2024-04-04 10:54:06 -04:00
Dag-Erling Smørgrav 0238d3711d tarfs: Fix 32-bit build.
MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	bapt
Differential Revision:	https://reviews.freebsd.org/D44613
2024-04-03 16:24:05 +02:00
Dag-Erling Smørgrav 584e1c355a tarfs: Ignore global extended headers.
Previously, we would error out if we encountered a global extended
header, because we don't know what it means.  This doesn't really
matter though, and traditionally, tar implementations have either
ignored them or treated them as plain files, so just ignore them.
This allows tarfs to mount tar files created by `git archive`.

MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	kevans
Differential Revision:	https://reviews.freebsd.org/D44600
2024-04-03 11:55:06 +02:00
Dag-Erling Smørgrav b1fd95c9e2 tarfs: Support paths that spill into exthdrs.
MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	kevans
Differential Revision:	https://reviews.freebsd.org/D44599
2024-04-03 11:55:01 +02:00
Jason A. Harmening eee6217b40 unionfs: implement VOP_UNP_* and remove special VSOCK vnode handling
unionfs has a bunch of clunky special-case code to avoid creating
unionfs wrapper vnodes for AF_UNIX sockets.  This was added in 2008
to address PR 118346, but in the intervening years the VOP_UNP_*
operations have been added to provide a clean interface to allow
sockets to work in the presence of stacked filesystems.

PR:			275871
Reviewed by:		kib (prior version), olce
Tested by:		Karlo Miličević <karlo98.m@gmail.com>
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D44288
2024-03-23 21:10:53 -05:00
Konstantin Belousov d3efbe0132 cdevpriv(9): add iterator
Reviewed by:	christos
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D44469
2024-03-23 08:59:00 +02:00
Rick Macklem 748f56c53f nfsd: Add a sysctl to limit NFSv4.2 Copy RPC size
NFSv4.2 supports a Copy operation, which avoids file data being
read to the client and then written back to the server, if both
input and output files are on the same NFSv4.2 mount for
copy_file_range(2).

Unfortunately, this Copy operation can take a long time under
certain circumstances.  If this occurs concurrently with a RPC
that requires an exclusive lock on the nfsd such as ExchangeID
done for a new mount, the result can be an nfsd "stall" until
the Copy completes.

This patch adds a sysctl that can be set to limit the size of
a Copy operation or, if set to 0, disable Copy operations.

The use of this sysctl and other ways to avoid Copy operations
taking too long will be documented in the nfsd.4 man page by
a separate commit.

MFC after:	2 weeks
2024-03-15 18:04:37 -07:00
Jason A. Harmening 6c8ded0015 unionfs: accommodate underlying FS calls that may re-lock
Since non-doomed unionfs vnodes always share their primary lock with
either the lower or upper vnode, any forwarded call to the base FS
which transiently drops that upper or lower vnode lock may result in
the unionfs vnode becoming completely unlocked during that transient
window.  The unionfs vnode may then become doomed by a concurrent
forced unmount, which can lead to either or both of the following:

--Complete loss of the unionfs lock: in the process of being
  doomed, the unionfs vnode switches back to the default vnode lock,
  so even if the base FS VOP reacquires the upper/lower vnode lock,
  that no longer translates into the unionfs vnode being relocked.
  This will then violate that caller's locking assumptions as well
  as various assertions that are enabled with DEBUG_VFS_LOCKS.

--Complete less of reference on the upper/lower vnode: the caller
  normally holds a reference on the unionfs vnode, while the unionfs
  vnode in turn holds references on the upper/lower vnodes.  But in
  the course of being doomed, the unionfs vnode will drop the latter
  set of references, which can effectively lead to the base FS VOP
  executing with no references at all on its vnode, violating the
  assumption that vnodes can't be recycled during these calls and
  (if lucky) violating various assertions in the base FS.

Fix this by adding two new functions, unionfs_forward_vop_start_pair()
and unionfs_forward_vop_finish_pair(), which are intended to bookend
any forwarded VOP which may transiently unlock the relevant vnode(s).
These functions are currently only applied to VOPs that modify file
state (and require vnode reference and lock state to be identical at
call entry and exit), as the common reason for transiently dropping
locks is to update filesystem metadata.

Reviewed by:	olce
Tested by:	pho
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D44076
2024-03-09 19:54:04 -06:00
Konstantin Belousov 4e8d264b00 nullfs_mount(): fix whitespace 2024-03-08 20:51:39 +02:00
Konstantin Belousov 8921216dbe nullfs: add -o cache
to allow overwrite global default if needed.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-03-08 20:44:21 +02:00
Konstantin Belousov 0724293331 nullfs_mount(): remove unneeded cast
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-03-08 20:44:21 +02:00
Seigo Tanimura c849eb8f19 nullfs: Add the vfs.nullfs.cache_nodes sysctl to control nocache default
Differential revision:	https://reviews.freebsd.org/D44217
MFC after:	1 week
2024-03-07 17:19:18 +02:00
Dag-Erling Smørgrav cbddb2f02c tarfs: Fix checksum on 32-bit platforms.
MFC after:	3 days
Fixes:		b56872332e47786afc09515a4daaf1388da4d73c
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	bapt
Differential Revision:	https://reviews.freebsd.org/D44261
2024-03-07 09:15:54 +01:00
Dag-Erling Smørgrav 0118b0c8e5 tarfs: Fix checksum calculation.
The checksum code assumed that struct ustar_header filled an entire
block and calculcated the checksum based on the size of the structure.
The header is in fact only 500 bytes long while the checksum covers
the entire block (“logical record” in POSIX terms).  Add padding and
an assertion, and clean up the checksum code.

MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D44226
2024-03-06 17:14:01 +01:00
Dag-Erling Smørgrav e212f0c066 tarfs: Remove unnecessary hack and obsolete comment.
MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	allanjude
Differential Revision:	https://reviews.freebsd.org/D44203
2024-03-06 17:13:57 +01:00
Dag-Erling Smørgrav c291b7914e tarfs: Avoid overflow in exthdr calculation.
MFC after:	3 days
PR:		277420
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D44202
2024-03-06 17:13:54 +01:00
Dag-Erling Smørgrav 8427d94ce0 tarfs: Improve validation of numeric fields.
MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	sjg, allanjude
Differential Revision:	https://reviews.freebsd.org/D44166
2024-03-06 17:13:51 +01:00
Dag-Erling Smørgrav 38b3683592 tarfs: Fix two input validation issues.
* Reject hard or soft links with an empty target path.  Currently, a
  debugging kernel will hit an assertion in tarfs_lookup_path() while
  a non-debugging kernel will happily create a link to the mount root.

* Use a temporary variable to store the result of the link target path,
  and copy it to tnp->other only once we have found it to be valid.
  Otherwise we error out after creating a reference to the target but
  before incrementing the target's reference count, which results in a
  use-after-free situation in the cleanup code.

* Correctly return ENOENT from tarfs_lookup_path() if the requested
  path was not found and create_dirs is false.  Luckily, existing
  callers did not rely solely on the return value.

MFC after:	3 days
PR:		277360
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	sjg
Differential Revision:	https://reviews.freebsd.org/D44161
2024-03-06 17:13:42 +01:00
Konstantin Belousov 0085afdceb fs/msdosfs fatblock: use ulmin() rather than min()
to avoid truncation of pmp->pm_FATsecs.

Submitted by:	Robert Morris <rtm@lcs.mit.edu>
PR:	277237
MFC after:	1 week
2024-02-23 19:37:52 +02:00
Stefan Eßer 445d3d227e msdosfs: fix potential inode collision on FAT12 and FAT16
FAT file systems do not use inodes, instead all file meta-information
is stored in directory entries.

FAT12 and FAT16 use a fixed size area for root directories, with
typically 512 entries of 32 bytes each (for a total of 16 KB) on hard
disk formats. The file system data is stored in clusters of typically
512 to 4096 bytes, depending on the size of the file system.

The current code uses the offset of a DOS 8.3 style directory entry as
a pseudo-inode, which leads to inode values of 0 to 16368 for typical
root directories with 512 entries.

Sub-directories use 2 cluster length plus the byte offset of the
directory entry in the data area for the pseudo-inode, which may be
as low as 1024 in case of 512 byte clusters. A sub-directory in
cluster 2 and with 512 byte clusters will therefore lead to a
re-use of inode 1024 when there are at least 32 DOS 8.3 style
filenames in the root directory (or 11 14-character Windows
long file names, each of which takes up 3 directory entries).

FAT32 file systems are not affected by this issue and FAT12/FAT16
file systems with larger cluster sizes are unlikely to have as
many directory entries in the root directory as are required to
cause the collision.

This commit leads to inode numbers that are guaranteed to not collide
for all valid FAT12 and FAT16 file system parameters. It does also
provide a small speed-up due to more efficient use of the vnode cache.

Approved by:	mckusick
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D43978
2024-02-20 13:02:24 +01:00