Commit graph

485 commits

Author SHA1 Message Date
Konstantin Belousov 21ccdb4119 vfs_domount_update(): postpone setting MNT_UNION until VFS_MOUNT() is done
The file system that handles updating the mount point might do lookups
during the update, in which case it could find the flag MNT_UNION set on
the mp while mount point is still not updated.  In particular, the
rootvp->v_mount->mnt_vnodecovered is not yet set.

Delay setting MNT_UNION until the mount is performed.

PR:	265311
Reported by:	Robert Morris <rtm@lcs.mit.edu>
Reviewed by:	mckusick, olce
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D45208
2024-05-16 04:00:26 +03:00
Konstantin Belousov 5a061a38cd vfs_domount_update(): style, use space instead of tab
Noted by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2024-05-16 04:00:26 +03:00
Alfredo Mazzinghi 61cc4830a7 Abstract UIO allocation and deallocation.
Introduce the allocuio() and freeuio() functions to allocate and
deallocate struct uio. This hides the actual allocator interface, so it
is easier to modify the sub-allocation layout of struct uio and the
corresponding iovec array.

Obtained from:	CheriBSD
Reviewed by:	kib, markj
MFC after:	2 weeks
Sponsored by:	CHaOS, EPSRC grant EP/V000292/1
Differential Revision:	https://reviews.freebsd.org/D43711
2024-02-10 11:38:04 -05:00
Mark Johnston 099d25c354 nmount: Ignore errors when copying out an error string
In general we copy error strings as part of reporting an error from
lower layers, so if the copyout() fails there's nothing to do since we'd
prefer to preserve the original error.

This is in preparation for annotating copyin() and related functions
with __result_use_check.

Reviewed by:	olce, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D43147
2023-12-25 21:04:01 -05:00
Andrew Gierth 2a1d50fc12 vfs_domount_update(): correct fsidcmp() usage
MFC after:	3 days
2023-12-26 03:35:46 +02:00
Warner Losh fdafd315ad sys: Automated cleanup of cdefs and other formatting
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by:		Netflix
2023-11-26 22:24:00 -07:00
Rick Macklem f5f277728a nfsd: Fix NFS access to .zfs/snapshot snapshots
When a process attempts to access a snapshot under
/<dataset>/.zfs/snapshot, the snapshot is automounted.
However, without this patch, the automount does not
set mnt_exjail, which results in the snapshot not being
accessible over NFS.

This patch defines a new function called vfs_exjail_clone()
which sets mnt_exjail from another mount point and
then uses that function to set mnt_exjail in the snapshot
automount.  A separate patch that is currently a pull request
for OpenZFS, calls this function to fix the problem.

PR:	275200
Reviewed by:	markj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D42672
2023-11-23 07:23:33 -08:00
John Baldwin 3eed4803f9 vfs mount: Consistently use ENODEV internally for an invalid fstype
Change vfs_byname_kld to always return an error value of ENODEV to
indicate an unsupported fstype leaving ENOENT to indicate errors such
as a missing mount point or invalid path.  This allows nmount(2) to
better distinguish these cases and avoid treating a missing device
node as an invalid fstype after commit 6e8272f317.

While here, change mount(2) to return EINVAL instead of ENODEV for an
invalid fstype to match nmount(2).

PR:		274600
Reviewed by:	pstef, markj
Differential Revision:	https://reviews.freebsd.org/D42327
2023-11-18 11:08:34 -08:00
Konstantin Belousov ede4c412b3 vfs_domount_update(): ensure that 'goto end' works
We need to vfs_op_enter()/vn_seqc_write_start() before jumping to
cleanup.

PR:	274992
Reported by:	trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Fixes:	9ef7a491a4
2023-11-09 22:18:47 +02:00
Konstantin Belousov 9ef7a491a4 nmount(MNT_UPDATE): add optional generid fsid parameter
to check looked up path against specific mounted filesystem.

Reviewed by:	mjg
Tested by:	Andrew Gierth <andrew@tao146.riddles.org.uk>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D42023
2023-10-17 19:40:12 +03:00
Konstantin Belousov c584bb9cac vfs_remount_ro(): mnt_lockref should be only accessed after vfs_op_enter()
PR:	273953
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-09-21 03:59:11 +03:00
Warner Losh 685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Olivier Certner 2544b8e00c vfs: Rename vfs_emptydir() to vn_dir_check_empty()
No functional change.  While here, adapt comments to style(9).

Reviewed by:    kib
MFC after:      1 week
2023-04-28 22:37:35 +03:00
Konstantin Belousov bb24eaea49 vn_lock_pair(): allow to request shared locking
If either of vnodes is shared locked, lock must not be recursed.

Requested by:	rmacklem
Reviewed by:	markj, rmacklem
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39444
2023-04-08 01:58:26 +03:00
Rick Macklem 4bbbd5875d vfs_mount.c: Allow mountd(8) to do exports in a vnet prison
To run mountd in a vnet prison, three checks in vfs_domount()
and vfs_domount_update() related to doing exports needed
to be changed, so that a file system visible within the
prison but mounted outside the prison can be exported.

I did all three in a minimal way, only changing the checks for
the specific case of a process (typically mountd) doing exports
within a vnet prison and not updating the mount point in other
ways.  The changes are:
- Ignore the error return from vfs_suser(), since the file
  system being mounted outside the prison will cause it to fail.
- Use the priv_check(PRIV_NFS_DAEMON) for this specific case
  within a prison.
- Skip the call to VFS_MOUNT(), since it will return an error,
  due to the "from" argument not being set correctly.  VFS_MOUNT()
  does not appear to do anything for the case of doing exports only.

Reviewed by:	markj
MFC after:	3 months
Differential Revision:	https://reviews.freebsd.org/D37741
2023-03-02 13:09:01 -08:00
Rick Macklem 88175af8b7 vfs_export: Add mnt_exjail to control exports done in prisons
If there are multiple instances of mountd(8) (in different
prisons), there will be confusion if they manipulate the
exports of the same file system.  This patch adds mnt_exjail
to "struct mount" so that the credentials (and, therefore,
the prison) that did the exports for that file system can
be recorded.  If another prison has already exported the
file system, vfs_export() will fail with an error.
If mnt_exjail == NULL, the file system has not been exported.
mnt_exjail is checked by the NFS server, so that exports done
from within a different prison will not be used.

The patch also implements vfs_exjail_destroy(), which is
called from prison_cleanup() to release all the mnt_exjail
credential references, so that the prison can be removed.
Mainly to avoid doing a scan of the mountlist for the case
where there were no exports done from within the prison,
a count of how many file systems have been exported from
within the prison is kept in pr_exportcnt.

Reviewed by:	markj
Discussed with:	jamie
Differential Revision:	https://reviews.freebsd.org/D38371
MFC after:	3 months
2023-02-21 13:00:42 -08:00
Rick Macklem db5655124c vfs_mount.c: Free exports structures in vfs_destroy_mount()
During testing of exporting file systems in jails, I
noticed that the export structures on a mount
were not being free'd when the mount is dismounted.

This bug appears to have been in the system for a
very long time.  It would have resulted in a slow memory
leak when exported file systems were dismounted.

Prior to r362158, freeing the structures during dismount
would not have been safe, since VFS_CHECKEXP() returned
a pointer into an export structure, which might still have been
used by the NFS server for an in-progress RPC when the file system
is dismounted.  r362158 fixed this, so it should now be safe
to free the structures in vfs_mount_destroy(), which is what
this patch does.

Reviewed by:	kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D38385
2023-02-04 14:45:23 -08:00
Doug Rabson 71e9be1bd5 Don't allow stacking of file mounts
Reviewed by:    mjg, kib
Tested by:      pho
2022-12-19 16:46:27 +00:00
Doug Rabson a1d74b2dab Allow realpath to work for file mounts
For file mounts, the directory vnode is not available from namei and this
prevents the use of vn_fullpath_hardlink. In this case, we can use the
vnode which was covered by the file mount with vn_fullpath.

This also disallows file mounts over files with link counts greater than
one to ensure a deterministic path to the mount point.

Reviewed by:    mjg, kib
Tested by:      pho
2022-12-19 16:46:27 +00:00
Doug Rabson 521fbb722c Add support for mounting single files in nullfs
The main use-case for this is to support mounting config files and
secrets into OCI containers. My current workaround copies the files into
the container which is messy and risks secrets leaking into container
images if the cleanup fails.

This adds a VFCF flag to indicate whether the filesystem supports file
mounts and allows fspath to be either a directory or a file if the flag
is set.

Test Plan:
$ sudo mkdir -p /mnt
$ sudo touch /mnt/foo
$ sudo mount -t nullfs /COPYRIGHT /mnt/foo

Reviewed by:    mjg, kib
Tested by:      pho
2022-12-19 16:46:13 +00:00
Rick Macklem 195f1b124d vfs_mount.c: fix vfs_domount() for PRIV_VFS_MOUNT_EXPORTED
It appears that, prior to r158857 vfs_domount() checked
suser() when MNT_EXPORTED was specified.

r158857 appears to have broken this, since MNT_EXPORTED
was no longer set when mountd.c was converted to use nmount(2).
r164033 replaced the suser() check with
priv_check(td, PRIV_VFS_MOUNT_EXPORTED), which does the
same thing (ie. checks for effective uid == 0 assuming suses_enabled
is set).

This patch restores this check by setting MNT_EXPORTED when the
"export" mount option is specified to nmount().

I think this is reasonable since only mountd(8) should be setting
exports and I doubt any non-root mounted file system would
be setting its own exports.

Reviewed by:	kib, markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D37718
2022-12-16 13:01:23 -08:00
Konstantin Belousov 6b69465efb vfs_domount(): ensure that v_mountedhere and VIRF_MOUNTPOINT are set under the vnode lock
Fixes:	f7833196bd
Reported and tested by:	pho
Reviewed by:	jah, markj (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37198
2022-10-29 14:29:55 +03:00
Mateusz Guzik 61a1d5dde2 vfs: stop using the V_MNTREF flag
Reviewed by:	kib, mckusick
Differential Revision:	https://reviews.freebsd.org/D36521
2022-09-14 18:16:23 +00:00
Konstantin Belousov ad175a107b vfs_mount.c: convert explicit panics and KASSERTs to MPASSERT/MPPASS
Reviewed by:	imp, mjg
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D35652
2022-06-29 21:31:47 +03:00
Konstantin Belousov 1e54362824 vfs_op_exit(): assert that mnt_vfs_ops stays non-zero for unmount or suspend
Reviewed by:	mjg
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D35639
2022-06-29 21:31:47 +03:00
Doug Ambrisko ce00b11940 mount: revert the active vnode reporting feature
Revert the computing of active vnode reporting since statfs is used
by a lot of tools.  Only report the vnodes used.

Reported by:	mjg
2022-06-15 07:24:55 -07:00
Mark Johnston 7565431f30 mount: Fix an incorrect assertion in kernel_mount()
The pointer to the mount values may be null if an error occurred while
copying them in, so fix the assertion condition to reflect that
possibility.

While here, move some initialization code into the error == 0 block.  No
functional change intended.

Reported by:	syzkaller
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-06-14 12:00:59 -04:00
Doug Ambrisko 6468cd8e0e mount: add vnode usage per file system with mount -v
This avoids the need to drop into the ddb to figure out vnode
usage per file system.  It helps to see if they are or are not
being freed.  Suggestion to report active vnode count was from
kib@

Reviewed by:   	kib
Differential Revision: https://reviews.freebsd.org/D35436
2022-06-13 07:56:38 -07:00
Dmitry Chagin 31d1b816fe sysent: Get rid of bogus sys/sysent.h include.
Where appropriate hide sysent.h under proper condition.

MFC after:	2 weeks
2022-05-28 20:52:17 +03:00
Mateusz Guzik bb92cd7bcd vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd) 2022-03-24 10:20:51 +00:00
Konstantin Belousov 4a4b059a97 Add vfs_remount_ro()
a helper to remount filesystem from rw to ro.

Tested by:	pho
Reviewed by:	markj, mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33721
2022-01-08 05:41:44 +02:00
Mateusz Guzik 7e1d3eefd4 vfs: remove the unused thread argument from NDINIT*
See b4a58fbf64 ("vfs: remove cn_thread")

Bump __FreeBSD_version to 1400043.
2021-11-25 22:50:42 +00:00
Robert Wing 8981a100e6 mount: retire kernel_vmount()
The last usage of this function was removed in e3b1c847a4.

There are no in-tree consumers of kernel_vmount().

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D32607
2021-11-20 10:22:28 -09:00
Kirk McKusick f10a8d0971 Allow the MNT_FORCE flag to be passed through to an initial mount.
When doing an initial mount(8) with its -f (force) flag, the MNT_FORCE
flag is not passed through to the underlying filesystem mount routine.
MNT_FORCE is only passed through on later updates to an existing
mount. With this commit the MNT_FORCE flag is now passed through on the
initial mount.

Sanity check: kib
Sponsored by: Netflix
2021-11-15 15:45:56 -08:00
Mark Johnston 03d5820f73 mount: Check for !VDIR mount points before handling -o emptydir
To implement -o emptydir, vfs_emptydir() checks that the passed
directory is empty.  This should be done after checking whether the
vnode is of type VDIR, though, or vfs_emptydir() may end up calling
VOP_READDIR on a non-directory.

Reported by:	syzbot+4006732c69fb0f792b2c@syzkaller.appspotmail.com
Reviewed by:	kib, imp
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32475
2021-10-13 09:33:35 -04:00
Piotr Pawel Stefaniak 6e8272f317 mount: improve error message for invalid filesystem names
For an invalid filesystem name used like this:
mount -t asdfs /dev/ada1p5 /usr/obj

emit an error message like this:
mount: /dev/ada1p5: Invalid fstype: Invalid argument

instead of:
mount: /dev/ada1p5: Operation not supported by device

Differential Revision:	https://reviews.freebsd.org/D31540
2021-09-15 16:25:31 +02:00
Mateusz Guzik f1e2cc1c66 vfs: drop dedicated sysinit for mountlist_mtx
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-08-26 20:52:03 +02:00
Mateusz Guzik 0d28d014c8 vfs: refactor kern_unmount
Split unmounting by path and id in preparation for other changes.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-08-26 13:58:28 +02:00
Mateusz Guzik 7b2561b46b vfs: stop open-coding vfs_getvfs in kern_unmount
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-08-26 11:38:31 +00:00
Mateusz Guzik 614faa3269 vfs: fix cache-relatecd LOR introduced in the previous change
Reported by:	kib
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-08-22 16:20:07 +00:00
Jason A. Harmening e81e71b0e9 Use interruptible wait for blocking recursive unmounts
Now that we allow recursive unmount attempts to be abandoned upon
exceeding the retry limit, we should avoid leaving an unkillable
thread when a synchronous unmount request was issued against the
base filesystem.

Reviewed by:	kib (earlier revision), mkusick
Differential Revision:  https://reviews.freebsd.org/D31450
2021-08-20 13:21:56 -07:00
Jason A. Harmening a8c732f4e5 VFS: add retry limit and delay for failed recursive unmounts
A forcible unmount attempt may fail due to a transient condition, but
it may also fail due to some issue in the filesystem implementation
that will indefinitely prevent successful unmount.  In such a case,
the retry logic in the recursive unmount facility will cause the
deferred unmount taskqueue to execute constantly.

Avoid this scenario by imposing a retry limit, with a default value
of 10, beyond which the recursive unmount facility will emit a log
message and give up.  Additionally, introduce a grace period, with
a default value of 1s, between successive unmount retries on the
same mount.

Create a new sysctl node, vfs.deferred_unmount, to export the total
number of failed recursive unmount attempts since boot, and to allow
the retry limit and retry grace period to be tuned.

Reviewed by:	kib (earlier revision), mkusick
Differential Revision:  https://reviews.freebsd.org/D31450
2021-08-20 13:20:50 -07:00
Mateusz Guzik dbc689cdef vfs: use vn_lock_pair to avoid establishing an ordering on mount
This fixes some of the LORs seen on mount/unmount.

Complete fix will require taking care of unmount as well.

Reviewed by:	kib
Tested by:	pho (previous version)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31611
2021-08-20 17:52:24 +00:00
Jason A. Harmening c746ed724d Allow stacked filesystems to be recursively unmounted
In certain emergency cases such as media failure or removal, UFS will
initiate a forced unmount in order to prevent dirty buffers from
accumulating against the no-longer-usable filesystem.  The presence
of a stacked filesystem such as nullfs or unionfs above the UFS mount
will prevent this forced unmount from succeeding.

This change addreses the situation by allowing stacked filesystems to
be recursively unmounted on a taskqueue thread when the MNT_RECURSE
flag is specified to dounmount().  This call will block until all upper
mounts have been removed unless the caller specifies the MNT_DEFERRED
flag to indicate the base filesystem should also be unmounted from the
taskqueue.

To achieve this, the recently-added vfs_pin_from_vp()/vfs_unpin() KPIs
have been combined with the existing 'mnt_uppers' list used by nullfs
and renamed to vfs_register_upper_from_vp()/vfs_unregister_upper().
The format of the mnt_uppers list has also been changed to accommodate
filesystems such as unionfs in which a given mount may be stacked atop
more than one lower mount.  Additionally, management of lower FS
reclaim/unlink notifications has been split into a separate list
managed by a separate set of KPIs, as registration of an upper FS no
longer implies interest in these notifications.

Reviewed by:	kib, mckusick
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D31016
2021-07-24 12:52:00 -07:00
Warner Losh 6475667f7b devctl: don't publish the mount options
Mount options aren't solely ASCII strings. In addition, experience to
date suggests that the mount options are much less useful than was
originally supposed and the mount flags suffice to make decisions. Drop
the reporting of options for the mount/remount/unmount events.

Reviewed by:		markj
Reported by:		KASAN
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D31287
2021-07-24 09:03:53 -06:00
Jason A. Harmening 59409cb90f Add a generic mechanism for preventing forced unmount
This is aimed at preventing stacked filesystems like nullfs and unionfs
from "losing" their lower mounts due to forced unmount.  Otherwise,
VFS operations that are passed through to the lower filesystem(s) may
crash or otherwise cause unpredictable behavior.

Introduce two new functions: vfs_pin_from_vp() and vfs_unpin().
which are intended to be called on the lower mount(s) when the stacked
filesystem is mounted and unmounted, respectively.
Much as registration in the mnt_uppers list previously did, pinning
will prevent even forced unmount of the lower FS and will allow the
stacked FS to freely operate on the lower mount either by direct
use of the struct mount* or indirect use through a properly-referenced
vnode's v_mount field.

vfs_pin_from_vp() is modeled after vfs_ref_from_vp() in that it uses
the mount interlock coupled with re-checking vp->v_mount to ensure
that it will fail in the face of a pending unmount request, even if
the concurrent unmount fully completes.

Adopt these new functions in both nullfs and unionfs.

Reviewed By:	kib, markj
Differential Revision: https://reviews.freebsd.org/D30401
2021-06-05 18:20:36 -07:00
Mark Johnston 2425f5e912 mount: Disallow mounting over a jail root
Discussed with:	jamie
Approved by:	so
Security:	CVE-2020-25584
Security:	FreeBSD-SA-21:10.jail_mount
2021-04-06 14:49:36 -04:00
Mateusz Guzik a15f787adb vfs: add vfs_ref_from_vp
This generalizes what vop_stdgetwritemount used to be doing.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28695
2021-02-21 00:43:05 +00:00
Mateusz Guzik 82397d7919 vfs: denote vnode being a mount point with VIRF_MOUNTPOINT
Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D27794
2021-01-03 06:50:06 +00:00
Konstantin Belousov 164438a7b9 More careful handling of the mount failure.
- VFS_UNMOUNT() requires vn_start_write() around it [*].
- call VFS_PURGE() before unmount.
- do not destroy mp if cleanup unmount did not succeed.
- set MNTK_UNMOUNT, and indicate forced unmount with MNTK_UNMOUNTF
  for VFS_UNMOUNT() in cleanup.

PR:	251320 [*]
Reported by:	Tong Zhang <ztong0001@gmail.com>
Reviewed by:	markj, mjg
Discussed with:	rmacklem
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27327
2020-11-26 18:08:42 +00:00