Commit graph

227 commits

Author SHA1 Message Date
Warner Losh 29363fb446 sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by:		Netflix
2023-11-26 22:23:30 -07:00
Doug Rabson b5c4616582 Fix MNT_IGNORE for devfs, fdescfs and nullfs
The MNT_IGNORE flag can be used to mark certain filesystem mounts so
that utilities such as df(1) and mount(8) can filter out those mounts by
default. This can be used, for instance, to reduce the noise from
running container workloads inside jails which often have at least three
and sometimes as many as ten mounts per container.

The flag is supplied by the nmount(2) system call and is recorded so
that it can be reported by statfs(2). Unfortunately several filesystems
override the default behaviour and mask out the flag, defeating its
purpose. This change preserves the MNT_IGNORE flag for those filesystems
so that it can be reported correctly.

MFC after:	1 week
2023-08-26 12:08:37 +01:00
Warner Losh 95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Konstantin Belousov 3905309dfe fdescfs: add a mount option rdlnk
which changes /dev/fd/N files types to symbolic link with the behavior
of symbolic links.

PR:	272127
Reported by:	Peter Eriksson <pen@lysator.liu.se>
Reviewed by:	dchagin
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D40969
2023-07-13 04:14:20 +03:00
Konstantin Belousov 9c3bfe2ad0 Revert "VFS: Remove VV_READLINK flag" and "fdescfs: improve linrdlnk mount option"
This reverts commits 4a402dfe0b and
3bffa22623.

The fix will be implemented in somewhat different manner.  The semantic
adjustment is incompatible with linuxolator expectations.

Reported and reviewed by:	dchagin
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D40969
2023-07-13 04:14:12 +03:00
Konstantin Belousov 3bffa22623 fdescfs: improve linrdlnk mount option
Instead of using VV_READLINK vnode flag and checking it in one place,
just assign VLNK type to the Fdesc vnodes for linrdlnk mounts.  Then all
places where symlinks needs to be followed, e.g. lookup(), are handled.

PR:	272127
Reported by:	Peter Eriksson <pen@lysator.liu.se>
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D40700
2023-06-27 13:43:17 +03:00
Stefan Eßer 88a795e80c sys/fs: do not report blocks allocated for synthetic file systems
The pseudo file systems (devfs, fdescfs, procfs, etc.) report total
and available blocks and inodes despite being synthetic with no
underlying storage device to which those values could be applied.

The current code of these file systems tends to report a fixed number
of total blocks but no free blocks, and in the case of procfs,
libprocfs, linsysfs also no free inodes.

This can be irritating in e.g. the "df" output, since 100% of the
resources seem to be in use, but it can also create warnings in
monitoring tools used for capacity management.

This patch makes these file systems return the same value for the
total and free parameters, leading to 0% in use being displayed by
"df". Since there is no resource that can be exhausted, this appears
to be a sensible result.

Reviewed by:	mckusick
Differential Revision:	https://reviews.freebsd.org/D39442
2023-04-25 09:59:15 +02:00
Konstantin Belousov 13262b07a0 fdesc_lookup(): drop fdropped
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:47:22 +02:00
Konstantin Belousov 7dca8fd1cb fdesc_lookup(): the condition to use vn_vget_ino() is always true
The ix number for the fdescfs root is 1, while any fd vnode has the ix
value at least 3.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:47:16 +02:00
Konstantin Belousov 8d97282a39 fdesc_lookup(): no need to vhold the dvp vnode
It is already referenced by the VOP_LOOKUP() caller, otherwise vdrop()
after vn_lock() is invalid anyway.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:47:07 +02:00
Konstantin Belousov 51b8ffb95c fdesc_allocvp(): fix potential use after free
Just owning the interlock is not enough for vget() to operate on the
vnode race-free with vgone(), the vnode should be held.  Use
vget_prep()/vget_finish() to avoid vholding the vnode explicitly, and
drop LK_INTERLOCK.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:46:53 +02:00
Konstantin Belousov fa3ea81b77 fdescfs: remove useless XXX comment, unwrap line
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39207
2023-03-24 19:46:47 +02:00
Mark Johnston 0f5b6f9a04 fdescfs: Fix a file ref leak
In fdesc_lookup(), vn_vget_ino_gen() may fail without invoking the
callback, in which case the ref on fp is leaked.  This happens if the
fdescfs mount is being concurrently unmounted.  Moreover, we cannot
safely drop the ref while the dvp is locked.

So:
- Use a flag variable to indicate whether the ref is dropped.
- Reorganize things to handle the leak.

Reported by:	C Turt <ecturt@gmail.com>
Reviewed by:	mjg, kib
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39189
2023-03-22 09:19:27 -04:00
Mateusz Guzik 829f0bcb5f vfs: add the concept of vnode state transitions
To quote from a comment above vput_final:
<quote>
* XXX Some filesystems pass in an exclusively locked vnode and strongly depend
* on the lock being held all the way until VOP_INACTIVE. This in particular
* happens with UFS which adds half-constructed vnodes to the hash, where they
* can be found by other code.
</quote>

As is there is no mechanism which allows filesystems to denote that a
vnode is fully initialized, consequently problems like the above are
only found the hard way(tm).

Add rudimentary support for state transitions, which in particular allow
to assert the vnode is not legally unlocked until its fate is decided
(either construction finishes or vgone is called to abort it).

The new field lands in a 1-byte hole, thus it does not grow the struct.

Bump __FreeBSD_version to 1400077

Reviewed by:	kib (previous version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D37759
2022-12-26 17:35:12 +00:00
Mateusz Guzik a75d1ddd74 vfs: introduce V_PCATCH to stop abusing PCATCH 2022-09-17 15:41:37 +00:00
Konstantin Belousov 156745b42d fdescfs: allow chown/utime etc on fdescfs fd for underlying files opened with O_PATH
Reported and tested by:	dchagin
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D35410
2022-06-06 22:27:36 +03:00
Konstantin Belousov 66c5fbca77 insmntque1(): remove useless arguments
Also remove once-used functions to clean up after failed insmntque1(),
which were destructor callbacks in previous life.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D34071
2022-01-31 16:49:08 +02:00
Mateusz Guzik 2a7e4cf843 Revert b58ca5df0b ("vfs: remove the now unused insmntque1")
I was somehow convinced that insmntque calls insmntque1 with a NULL
destructor. Unfortunately this worked well enough to not immediately
blow up in simple testing.

Keep not using the destructor in previously patched filesystems though
as it avoids unnecessary casts.

Noted by:	kib
Reported by:	pho
2022-01-27 16:32:22 +00:00
Mateusz Guzik ade1367ba8 fdescfs: stop using insmntque1
It adds nothing of value over insmntque.
2022-01-27 00:54:38 +01:00
Mateusz Guzik 35b12b8711 fdescfs: plug a set-but-not-unused var
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-11-24 21:25:24 +00:00
Mateusz Guzik b4a58fbf64 vfs: remove cn_thread
It is always curthread.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D32453
2021-10-11 13:21:47 +00:00
Konstantin Belousov f9b1e711f0 fdescfs: add an option to return underlying file vnode on lookup
The 'nodup' option forces fdescfs to return real vnode behind file
descriptor instead of the fdescfs fd vnode, on lookup. The end result
is that e.g. stat("/dev/fd/3") returns the stat data for the underlying
vnode, if any.  Similarly, fchdir(2) works in the expected way.

For open(2), if applied over file descriptor opened with O_PATH, it
effectively re-open that vnode into normal file descriptor which has the
specified access mode, assuming the current vnode permissions allow it.

If the file descriptor does not reference vnode, the behavior is unchanged.

This is done by a mount option, because permission check on open(2) breaks
established fdescfs open semantic of dup(2)-ing the descriptor.  So it
is not suitable for /dev/fd mount.

Tested by:	Andrew Walker <awalker@ixsystems.com>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30140
2021-06-04 03:30:12 +03:00
Mateusz Guzik 4fe925b81e fdescfs: allow shared locking of root vnode
Eliminates fdescfs from lock profile when running poudriere.
2021-05-19 17:58:54 +00:00
Mateusz Guzik 6b3a9a0f3d Convert remaining cap_rights_init users to cap_rights_init_one
semantic patch:

@@

expression rights, r;

@@

- cap_rights_init(&rights, r)
+ cap_rights_init_one(&rights, r)
2021-01-12 13:16:10 +00:00
Mateusz Guzik 586ee69f09 fs: clean up empty lines in .c and .h files 2020-09-01 21:18:40 +00:00
Mateusz Guzik feabaaf995 cache: drop the always curthread argument from reverse lookup routines
Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs.

Tested by:	pho
2020-08-24 08:57:02 +00:00
Mateusz Guzik a92a971bbb vfs: remove the thread argument from vget
It was already asserted to be curthread.

Semantic patch:

@@

expression arg1, arg2, arg3;

@@

- vget(arg1, arg2, arg3)
+ vget(arg1, arg2)
2020-08-16 17:18:54 +00:00
Mateusz Guzik b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Mateusz Guzik 6fa079fc3f vfs: flatten vop vectors
This eliminates the following loop from all VOP calls:

while(vop != NULL && \
    vop->vop_spare2 == NULL && vop->vop_bypass == NULL)
        vop = vop->vop_default;

Reviewed by:	jeff
Tesetd by:	pho
Differential Revision:	https://reviews.freebsd.org/D22738
2019-12-16 00:06:22 +00:00
Mateusz Guzik abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Mark Johnston 6d2e2df764 Ensure that directory entry padding bytes are zeroed.
Directory entries must be padded to maintain alignment; in many
filesystems the padding was not initialized, resulting in stack
memory being copied out to userspace.  With the ino64 work there
are also some explicit pad fields in struct dirent.  Add a subroutine
to clear these bytes and use it in the in-tree filesystems.  The
NFS client is omitted for now as it was fixed separately in r340787.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2018-11-23 22:24:59 +00:00
Konstantin Belousov 1c4ca77890 Add d_off support for multiple filesystems.
The d_off field has been added to the dirent structure recently.
Currently filesystems don't support this feature.  Support has been
added and tested for zfs, ufs, ext2fs, fdescfs, msdosfs and unionfs.
A stub implementation is available for cd9660, nandfs, udf and
pseudofs but hasn't been tested.

Motivation for this feature: our usecase is for a userspace nfs server
(nfs-ganesha) with zfs.  At the moment we cache direntry offsets by
calling lseek once per entry, with this patch we can get the offset
directly from getdirentries(2) calls which provides a significant
speedup.

Submitted by:	Jack Halford <jack@gandi.net>
Reviewed by:	mckusick, pfg, rmacklem (previous versions)
Sponsored by:	Gandi.net
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17917
2018-11-14 14:18:35 +00:00
Matt Macy cbd92ce62e Eliminate the overhead of gratuitous repeated reinitialization of cap_rights
- Add macros to allow preinitialization of cap_rights_t.

- Convert most commonly used code paths to use preinitialized cap_rights_t.
  A 3.6% speedup in fstat was measured with this change.

Reported by:	mjg
Reviewed by:	oshogbo
Approved by:	sbruno
MFC after:	1 month
2018-05-09 18:47:24 +00:00
Jamie Gritton 0e5c6bd436 Make it easier for filesystems to count themselves as jail-enabled,
by doing most of the work in a new function prison_add_vfs in kern_jail.c
Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and
the rest is taken care of.  This includes adding a jail parameter like
allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed.
Both of these used to be a static list of known filesystems, with
predefined permission bits.

Reviewed by:	kib
Differential Revision:	D14681
2018-05-04 20:54:27 +00:00
Hajimu UMEMOTO 9f5fab694c Fix Bad file descriptor error.
MFC after:	1 week
2018-03-09 04:45:24 +00:00
John Baldwin b1288166e0 Use long for the last argument to VOP_PATHCONF rather than a register_t.
pathconf(2) and fpathconf(2) both return a long.  The kern_[f]pathconf()
functions now accept a pointer to a long value rather than modifying
td_retval directly.  Instead, the system calls explicitly store the
returned long value in td_retval[0].

Requested by:	bde
Reviewed by:	kib
Sponsored by:	Chelsio Communications
2018-01-17 22:36:58 +00:00
John Baldwin dd688800e1 Add a custom VOP_PATHCONF method for fdescfs.
The method handles NAME_MAX and LINK_MAX explicitly.  For all other
pathconf variables, the method passes the request down to the underlying
file descriptor.  This requires splitting a kern_fpathconf() syscallsubr
routine out of sys_fpathconf().  Also, to avoid lock order reversals with
vnode locks, the fdescfs vnode is unlocked around the call to
kern_fpathconf(), but with the usecount of the vnode bumped.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2017-12-19 18:20:38 +00:00
Pedro F. Giffuni 51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Dmitry Chagin 77d3337c9f Implement proper Linux /dev/fd and /proc/self/fd behavior by adding
Linux specific things to the native fdescfs file system.

Unlike FreeBSD, the Linux fdescfs is a directory containing a symbolic
links to the actual files, which the process has open.
A readlink(2) call on this file returns a full path in case of regular file
or a string in a special format (type:[inode], anon_inode:<file-type>, etc..).
As well as in a FreeBSD, opening the file in the Linux fdescfs directory is
equivalent to duplicating the corresponding file descriptor.

Here we have mutually exclusive requirements:
- in case of readlink(2) call fdescfs lookup() method should return VLNK
vnode otherwise our kern_readlink() fail with EINVAL error;
- in the other calls fdescfs lookup() method should return non VLNK vnode.

For what new vnode v_flag VV_READLINK was added, which is set if fdescfs has beed
mounted with linrdlnk option an modified kern_readlinkat() to properly handle it.

For now For Linux ABI compatibility mount fdescfs volume with linrdlnk option:

    mount -t fdescfs -o linrdlnk null /compat/linux/dev/fd

Reviewed by:	kib@
MFC after:	1 week
Relnotes:	yes
2017-08-01 03:40:19 +00:00
Dmitry Chagin f80dc46c3d Replace unnecessary _KERNEL by double-include protection.
MFC after:	2 week
2017-07-25 06:59:35 +00:00
Dmitry Chagin d499e3f58e Style(9). Whitespace.
MFC after:	3 weeks
2017-07-09 14:18:22 +00:00
Dmitry Chagin 16e3859b47 Eliminate the bogus casts.
MFC after:	3 weeks
2017-07-09 14:15:51 +00:00
Dmitry Chagin 46d186a9b4 Don't initialize error in declaration.
MFC after:	3 weeks
2017-07-08 21:15:46 +00:00
Dmitry Chagin 18a9ea872a Eliminate the bogus cast.
MFC after:	3 weeks
2017-07-08 21:13:25 +00:00
Dmitry Chagin 11fc6c6dac Eliminate the bogus cast.
MFC after:	3 weeks
2017-07-08 21:12:00 +00:00
Dmitry Chagin b9d3485fb4 Don't take a lock around atomic operation.
MFC after:	3 weeks
2017-07-08 21:08:22 +00:00
Dmitry Chagin 073b14b469 Remove init from declaration, collapse two int vars declarations into single.
MFC after:	3 weeks
2017-07-08 21:05:28 +00:00
Dmitry Chagin 1901d0d8d3 Remove init from declaration.
MFC after:	3 weeks
2017-07-08 21:04:09 +00:00
Dmitry Chagin a15cf51f0a Style(9). Add blank line aftr {.
MFC after:	3 weeks
2017-07-08 21:02:40 +00:00
Konstantin Belousov 6992112349 Commit the 64-bit inode project.
Extend the ino_t, dev_t, nlink_t types to 64-bit ints.  Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment.  Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks.  Unfortunately, not everything can be
fixed, especially outside the base system.  For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING.  Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb).  Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver.  Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem).  Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by:	The FreeBSD Foundation (emaste, kib)
Differential revision:	https://reviews.freebsd.org/D10439
2017-05-23 09:29:05 +00:00