The file system that handles updating the mount point might do lookups
during the update, in which case it could find the flag MNT_UNION set on
the mp while mount point is still not updated. In particular, the
rootvp->v_mount->mnt_vnodecovered is not yet set.
Delay setting MNT_UNION until the mount is performed.
PR: 265311
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: mckusick, olce
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D45208
Introduce the allocuio() and freeuio() functions to allocate and
deallocate struct uio. This hides the actual allocator interface, so it
is easier to modify the sub-allocation layout of struct uio and the
corresponding iovec array.
Obtained from: CheriBSD
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: CHaOS, EPSRC grant EP/V000292/1
Differential Revision: https://reviews.freebsd.org/D43711
In general we copy error strings as part of reporting an error from
lower layers, so if the copyout() fails there's nothing to do since we'd
prefer to preserve the original error.
This is in preparation for annotating copyin() and related functions
with __result_use_check.
Reviewed by: olce, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D43147
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix
When a process attempts to access a snapshot under
/<dataset>/.zfs/snapshot, the snapshot is automounted.
However, without this patch, the automount does not
set mnt_exjail, which results in the snapshot not being
accessible over NFS.
This patch defines a new function called vfs_exjail_clone()
which sets mnt_exjail from another mount point and
then uses that function to set mnt_exjail in the snapshot
automount. A separate patch that is currently a pull request
for OpenZFS, calls this function to fix the problem.
PR: 275200
Reviewed by: markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42672
Change vfs_byname_kld to always return an error value of ENODEV to
indicate an unsupported fstype leaving ENOENT to indicate errors such
as a missing mount point or invalid path. This allows nmount(2) to
better distinguish these cases and avoid treating a missing device
node as an invalid fstype after commit 6e8272f317.
While here, change mount(2) to return EINVAL instead of ENODEV for an
invalid fstype to match nmount(2).
PR: 274600
Reviewed by: pstef, markj
Differential Revision: https://reviews.freebsd.org/D42327
We need to vfs_op_enter()/vn_seqc_write_start() before jumping to
cleanup.
PR: 274992
Reported by: trasz
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Fixes: 9ef7a491a4
to check looked up path against specific mounted filesystem.
Reviewed by: mjg
Tested by: Andrew Gierth <andrew@tao146.riddles.org.uk>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42023
If either of vnodes is shared locked, lock must not be recursed.
Requested by: rmacklem
Reviewed by: markj, rmacklem
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39444
To run mountd in a vnet prison, three checks in vfs_domount()
and vfs_domount_update() related to doing exports needed
to be changed, so that a file system visible within the
prison but mounted outside the prison can be exported.
I did all three in a minimal way, only changing the checks for
the specific case of a process (typically mountd) doing exports
within a vnet prison and not updating the mount point in other
ways. The changes are:
- Ignore the error return from vfs_suser(), since the file
system being mounted outside the prison will cause it to fail.
- Use the priv_check(PRIV_NFS_DAEMON) for this specific case
within a prison.
- Skip the call to VFS_MOUNT(), since it will return an error,
due to the "from" argument not being set correctly. VFS_MOUNT()
does not appear to do anything for the case of doing exports only.
Reviewed by: markj
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D37741
If there are multiple instances of mountd(8) (in different
prisons), there will be confusion if they manipulate the
exports of the same file system. This patch adds mnt_exjail
to "struct mount" so that the credentials (and, therefore,
the prison) that did the exports for that file system can
be recorded. If another prison has already exported the
file system, vfs_export() will fail with an error.
If mnt_exjail == NULL, the file system has not been exported.
mnt_exjail is checked by the NFS server, so that exports done
from within a different prison will not be used.
The patch also implements vfs_exjail_destroy(), which is
called from prison_cleanup() to release all the mnt_exjail
credential references, so that the prison can be removed.
Mainly to avoid doing a scan of the mountlist for the case
where there were no exports done from within the prison,
a count of how many file systems have been exported from
within the prison is kept in pr_exportcnt.
Reviewed by: markj
Discussed with: jamie
Differential Revision: https://reviews.freebsd.org/D38371
MFC after: 3 months
During testing of exporting file systems in jails, I
noticed that the export structures on a mount
were not being free'd when the mount is dismounted.
This bug appears to have been in the system for a
very long time. It would have resulted in a slow memory
leak when exported file systems were dismounted.
Prior to r362158, freeing the structures during dismount
would not have been safe, since VFS_CHECKEXP() returned
a pointer into an export structure, which might still have been
used by the NFS server for an in-progress RPC when the file system
is dismounted. r362158 fixed this, so it should now be safe
to free the structures in vfs_mount_destroy(), which is what
this patch does.
Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D38385
For file mounts, the directory vnode is not available from namei and this
prevents the use of vn_fullpath_hardlink. In this case, we can use the
vnode which was covered by the file mount with vn_fullpath.
This also disallows file mounts over files with link counts greater than
one to ensure a deterministic path to the mount point.
Reviewed by: mjg, kib
Tested by: pho
The main use-case for this is to support mounting config files and
secrets into OCI containers. My current workaround copies the files into
the container which is messy and risks secrets leaking into container
images if the cleanup fails.
This adds a VFCF flag to indicate whether the filesystem supports file
mounts and allows fspath to be either a directory or a file if the flag
is set.
Test Plan:
$ sudo mkdir -p /mnt
$ sudo touch /mnt/foo
$ sudo mount -t nullfs /COPYRIGHT /mnt/foo
Reviewed by: mjg, kib
Tested by: pho
It appears that, prior to r158857 vfs_domount() checked
suser() when MNT_EXPORTED was specified.
r158857 appears to have broken this, since MNT_EXPORTED
was no longer set when mountd.c was converted to use nmount(2).
r164033 replaced the suser() check with
priv_check(td, PRIV_VFS_MOUNT_EXPORTED), which does the
same thing (ie. checks for effective uid == 0 assuming suses_enabled
is set).
This patch restores this check by setting MNT_EXPORTED when the
"export" mount option is specified to nmount().
I think this is reasonable since only mountd(8) should be setting
exports and I doubt any non-root mounted file system would
be setting its own exports.
Reviewed by: kib, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D37718
The pointer to the mount values may be null if an error occurred while
copying them in, so fix the assertion condition to reflect that
possibility.
While here, move some initialization code into the error == 0 block. No
functional change intended.
Reported by: syzkaller
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
This avoids the need to drop into the ddb to figure out vnode
usage per file system. It helps to see if they are or are not
being freed. Suggestion to report active vnode count was from
kib@
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D35436
a helper to remount filesystem from rw to ro.
Tested by: pho
Reviewed by: markj, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33721
The last usage of this function was removed in e3b1c847a4.
There are no in-tree consumers of kernel_vmount().
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D32607
When doing an initial mount(8) with its -f (force) flag, the MNT_FORCE
flag is not passed through to the underlying filesystem mount routine.
MNT_FORCE is only passed through on later updates to an existing
mount. With this commit the MNT_FORCE flag is now passed through on the
initial mount.
Sanity check: kib
Sponsored by: Netflix
To implement -o emptydir, vfs_emptydir() checks that the passed
directory is empty. This should be done after checking whether the
vnode is of type VDIR, though, or vfs_emptydir() may end up calling
VOP_READDIR on a non-directory.
Reported by: syzbot+4006732c69fb0f792b2c@syzkaller.appspotmail.com
Reviewed by: kib, imp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32475
For an invalid filesystem name used like this:
mount -t asdfs /dev/ada1p5 /usr/obj
emit an error message like this:
mount: /dev/ada1p5: Invalid fstype: Invalid argument
instead of:
mount: /dev/ada1p5: Operation not supported by device
Differential Revision: https://reviews.freebsd.org/D31540
Now that we allow recursive unmount attempts to be abandoned upon
exceeding the retry limit, we should avoid leaving an unkillable
thread when a synchronous unmount request was issued against the
base filesystem.
Reviewed by: kib (earlier revision), mkusick
Differential Revision: https://reviews.freebsd.org/D31450
A forcible unmount attempt may fail due to a transient condition, but
it may also fail due to some issue in the filesystem implementation
that will indefinitely prevent successful unmount. In such a case,
the retry logic in the recursive unmount facility will cause the
deferred unmount taskqueue to execute constantly.
Avoid this scenario by imposing a retry limit, with a default value
of 10, beyond which the recursive unmount facility will emit a log
message and give up. Additionally, introduce a grace period, with
a default value of 1s, between successive unmount retries on the
same mount.
Create a new sysctl node, vfs.deferred_unmount, to export the total
number of failed recursive unmount attempts since boot, and to allow
the retry limit and retry grace period to be tuned.
Reviewed by: kib (earlier revision), mkusick
Differential Revision: https://reviews.freebsd.org/D31450
This fixes some of the LORs seen on mount/unmount.
Complete fix will require taking care of unmount as well.
Reviewed by: kib
Tested by: pho (previous version)
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31611
In certain emergency cases such as media failure or removal, UFS will
initiate a forced unmount in order to prevent dirty buffers from
accumulating against the no-longer-usable filesystem. The presence
of a stacked filesystem such as nullfs or unionfs above the UFS mount
will prevent this forced unmount from succeeding.
This change addreses the situation by allowing stacked filesystems to
be recursively unmounted on a taskqueue thread when the MNT_RECURSE
flag is specified to dounmount(). This call will block until all upper
mounts have been removed unless the caller specifies the MNT_DEFERRED
flag to indicate the base filesystem should also be unmounted from the
taskqueue.
To achieve this, the recently-added vfs_pin_from_vp()/vfs_unpin() KPIs
have been combined with the existing 'mnt_uppers' list used by nullfs
and renamed to vfs_register_upper_from_vp()/vfs_unregister_upper().
The format of the mnt_uppers list has also been changed to accommodate
filesystems such as unionfs in which a given mount may be stacked atop
more than one lower mount. Additionally, management of lower FS
reclaim/unlink notifications has been split into a separate list
managed by a separate set of KPIs, as registration of an upper FS no
longer implies interest in these notifications.
Reviewed by: kib, mckusick
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D31016
Mount options aren't solely ASCII strings. In addition, experience to
date suggests that the mount options are much less useful than was
originally supposed and the mount flags suffice to make decisions. Drop
the reporting of options for the mount/remount/unmount events.
Reviewed by: markj
Reported by: KASAN
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31287
This is aimed at preventing stacked filesystems like nullfs and unionfs
from "losing" their lower mounts due to forced unmount. Otherwise,
VFS operations that are passed through to the lower filesystem(s) may
crash or otherwise cause unpredictable behavior.
Introduce two new functions: vfs_pin_from_vp() and vfs_unpin().
which are intended to be called on the lower mount(s) when the stacked
filesystem is mounted and unmounted, respectively.
Much as registration in the mnt_uppers list previously did, pinning
will prevent even forced unmount of the lower FS and will allow the
stacked FS to freely operate on the lower mount either by direct
use of the struct mount* or indirect use through a properly-referenced
vnode's v_mount field.
vfs_pin_from_vp() is modeled after vfs_ref_from_vp() in that it uses
the mount interlock coupled with re-checking vp->v_mount to ensure
that it will fail in the face of a pending unmount request, even if
the concurrent unmount fully completes.
Adopt these new functions in both nullfs and unionfs.
Reviewed By: kib, markj
Differential Revision: https://reviews.freebsd.org/D30401
- VFS_UNMOUNT() requires vn_start_write() around it [*].
- call VFS_PURGE() before unmount.
- do not destroy mp if cleanup unmount did not succeed.
- set MNTK_UNMOUNT, and indicate forced unmount with MNTK_UNMOUNTF
for VFS_UNMOUNT() in cleanup.
PR: 251320 [*]
Reported by: Tong Zhang <ztong0001@gmail.com>
Reviewed by: markj, mjg
Discussed with: rmacklem
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27327