There must be no callers of VOP_COPY_FILE_RANGE() except
vn_copy_file_range(), which does enough to find the write-vnodes where
to call the VOP.
Reviewed by: markj, Olivier Certner <olce.freebsd@certner.fr>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42603
It is similar to VOP_GETWRITEMOUNT(), and for given vnode vp should
return the lower vnode which would actually handle write to vp.
Flags allow to specify FREAD or FWRITE for benefit of possible unionfs
implementation.
Reviewed by: markj, Olivier Certner <olce.freebsd@certner.fr>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42603
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.
Sponsored by: Netflix
The MNT_IGNORE flag can be used to mark certain filesystem mounts so
that utilities such as df(1) and mount(8) can filter out those mounts by
default. This can be used, for instance, to reduce the noise from
running container workloads inside jails which often have at least three
and sometimes as many as ten mounts per container.
The flag is supplied by the nmount(2) system call and is recorded so
that it can be reported by statfs(2). Unfortunately several filesystems
override the default behaviour and mask out the flag, defeating its
purpose. This change preserves the MNT_IGNORE flag for those filesystems
so that it can be reported correctly.
MFC after: 1 week
To quote from a comment above vput_final:
<quote>
* XXX Some filesystems pass in an exclusively locked vnode and strongly depend
* on the lock being held all the way until VOP_INACTIVE. This in particular
* happens with UFS which adds half-constructed vnodes to the hash, where they
* can be found by other code.
</quote>
As is there is no mechanism which allows filesystems to denote that a
vnode is fully initialized, consequently problems like the above are
only found the hard way(tm).
Add rudimentary support for state transitions, which in particular allow
to assert the vnode is not legally unlocked until its fate is decided
(either construction finishes or vgone is called to abort it).
The new field lands in a 1-byte hole, thus it does not grow the struct.
Bump __FreeBSD_version to 1400077
Reviewed by: kib (previous version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D37759
The main use-case for this is to support mounting config files and
secrets into OCI containers. My current workaround copies the files into
the container which is messy and risks secrets leaking into container
images if the cleanup fails.
This adds a VFCF flag to indicate whether the filesystem supports file
mounts and allows fspath to be either a directory or a file if the flag
is set.
Test Plan:
$ sudo mkdir -p /mnt
$ sudo touch /mnt/foo
$ sudo mount -t nullfs /COPYRIGHT /mnt/foo
Reviewed by: mjg, kib
Tested by: pho
When the lower filesystem directory hierarchy is the same as the nullfs
mount point (admittedly not likely to be a useful situation in
practice), nullfs is subject to the exact deadlock between the busy
count drain and the covered vnode lock that VV_CROSSLOCK is intended
to address.
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D37458
null_nodeget() needs a valid mount point data, otherwise we might
race and dereference NULL.
Using MBF_NOWAIT makes non-forced unmount non-transparent for
vn_fullpath() over nullfs, but we make no guarantee that fullpath
calculation succeeds anyway.
Reported and tested by: pho
Reviewed by: jah
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35477
Also remove once-used functions to clean up after failed insmntque1(),
which were destructor callbacks in previous life.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D34071
I was somehow convinced that insmntque calls insmntque1 with a NULL
destructor. Unfortunately this worked well enough to not immediately
blow up in simple testing.
Keep not using the destructor in previously patched filesystems though
as it avoids unnecessary casts.
Noted by: kib
Reported by: pho
fdvp and fvp vnodes are not locked, and race with reclaim cannot be handled
by the generic bypass routine.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31310
Caller of VOP_LOOKUP() passes dvp locked and expect it locked on return.
Relock of lower vnode in any case could leave upper vnode reclaimed and
unlocked.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31310
The upper vnode reference to the lower vnode is the only reference that
keeps our pointer to the lower vnode alive. If lower vnode is relocked
during the VOP call, upper vnode might become unlocked and reclaimed,
which invalidates our reference.
Add a transient vhold around VOP call.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31310
The advlock VOP takes the vnode unlocked, which makes the normal bypass
function racy. Same as null_pgcache_read(), nullfs implementation needs
to take interlock and reference lower vnode under it.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31310
In certain emergency cases such as media failure or removal, UFS will
initiate a forced unmount in order to prevent dirty buffers from
accumulating against the no-longer-usable filesystem. The presence
of a stacked filesystem such as nullfs or unionfs above the UFS mount
will prevent this forced unmount from succeeding.
This change addreses the situation by allowing stacked filesystems to
be recursively unmounted on a taskqueue thread when the MNT_RECURSE
flag is specified to dounmount(). This call will block until all upper
mounts have been removed unless the caller specifies the MNT_DEFERRED
flag to indicate the base filesystem should also be unmounted from the
taskqueue.
To achieve this, the recently-added vfs_pin_from_vp()/vfs_unpin() KPIs
have been combined with the existing 'mnt_uppers' list used by nullfs
and renamed to vfs_register_upper_from_vp()/vfs_unregister_upper().
The format of the mnt_uppers list has also been changed to accommodate
filesystems such as unionfs in which a given mount may be stacked atop
more than one lower mount. Additionally, management of lower FS
reclaim/unlink notifications has been split into a separate list
managed by a separate set of KPIs, as registration of an upper FS no
longer implies interest in these notifications.
Reviewed by: kib, mckusick
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D31016
This is aimed at preventing stacked filesystems like nullfs and unionfs
from "losing" their lower mounts due to forced unmount. Otherwise,
VFS operations that are passed through to the lower filesystem(s) may
crash or otherwise cause unpredictable behavior.
Introduce two new functions: vfs_pin_from_vp() and vfs_unpin().
which are intended to be called on the lower mount(s) when the stacked
filesystem is mounted and unmounted, respectively.
Much as registration in the mnt_uppers list previously did, pinning
will prevent even forced unmount of the lower FS and will allow the
stacked FS to freely operate on the lower mount either by direct
use of the struct mount* or indirect use through a properly-referenced
vnode's v_mount field.
vfs_pin_from_vp() is modeled after vfs_ref_from_vp() in that it uses
the mount interlock coupled with re-checking vp->v_mount to ensure
that it will fail in the face of a pending unmount request, even if
the concurrent unmount fully completes.
Adopt these new functions in both nullfs and unionfs.
Reviewed By: kib, markj
Differential Revision: https://reviews.freebsd.org/D30401
Instead of requiring all implementations of vfs_quotactl to unbusy
the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param
to VFS_QUOTACTL(9). The implementation may then indicate to the caller
whether it needed to unbusy the mount.
Also, add stbool.h to libprocstat modules which #define _KERNEL
before including sys/mount.h. Otherwise they'll pull in sys/types.h
before defining _KERNEL and therefore won't have the bool definition
they need for mp_busy.
Reviewed By: kib, markj
Differential Revision: https://reviews.freebsd.org/D30556
Parts of libprocstat like to pretend they're kernel components for the
sake of including mount.h, and including sys/types.h in the _KERNEL
case doesn't fix the build for some reason. Revert both the
VFS_QUOTACTL() change and the follow-up "fix" for now.
Instead of requiring all implementations of vfs_quotactl to unbusy
the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param
to VFS_QUOTACTL(9). The implementation may then indicate to the caller
whether it needed to unbusy the mount.
Reviewed By: kib, markj
Differential Revision: https://reviews.freebsd.org/D30218
Otherwise pages are cleaned some time later when the lower fs decides
that it is time to do it. This mostly manifests itself as delayed
mtime update, e.g. breaking make-like programs.
Reported by: mav
Tested by: mav, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The VFS conventions is that VOP_LOOKUP() methods do not need to handle
ISDOTDOT lookups for VV_ROOT vnodes (since they cannot, after all). Nullfs
bypasses VOP_LOOKUP() to lower filesystem, and there, due to user actions,
it is possible to get into situation where
- upper vnode does not have VV_ROOT set
- lower vnode is root
- ISDOTDOT is requested
User just needs to nullfs-mount non-root of some filesystem, and then move
some directory under mount, out of mount, using lower filesystem.
In this case, nullfs cannot do much, but we still should and can ensure
internal kernel structures are consistent. Avoid ISDOTDOT lookup forwarding
when VV_ROOT is set on lower dvp, return somewhat arbitrary ENOENT.
PR: 253593
Reported by: Gregor Koscak <elogin41@gmail.com>
Test by: Patrick Sullivan <sulli00777@gmail.com>
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
We might own the last use reference, and then vrele() at the end would
need to take the dvp vnode lock to inactivate, which causes deadlock
with vp. We cannot vrele() dvp from start since this might unlock ldvp.
Handle it by holding the vnode and dropping use ref after lowerfs
VOP_VPUT_PAIR() ended. This effectivaly requires unlock of the vp vnode
after VOP_VPUT_PAIR(), so the call is changed to set unlock_vp to true
unconditionally. This opens more opportunities for vp to be reclaimed,
if lvp is still alive we reinstantiate vp with null_nodeget().
Reported and tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29178
Generic bypass cannot understand the rules of liveness for the VOP.
Reviewed by: chs, mckusick
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Normal bypass expects locked vnode, which is not true for
VOP_READ_PGCACHE(). Ensure liveness of the lower vnode by taking the
upper vnode interlock, which is also taked by null_reclaim() when
setting v_data to NULL.
Reported and tested by: pho
Reviewed by: markj, mjg
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27327
instead of mount_nullfs(8).
Obviously you'd need to force mount(8) to not call
mount_nullfs(8) to make use of it.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26934
If lower VOP relocked the lower vnode, it is possible that nullfs
vnode was reclaimed meantime. In this case nullfs vnode no longer
shares lock with lower vnode, which breaks locking protocol.
Check for the condition and acquire nullfs vnode lock if detected.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week