Commit graph

393 commits

Author SHA1 Message Date
Pawel Jakub Dawidek 56a8aca83a Stop treating size 0 as unknown size in vnode_create_vobject().
Whenever file is created, the vnode_create_vobject() function will
try to determine its size by calling vn_getsize_locked() as size 0
is ambigious: it means either the file size is 0 or the file size
is unknown.

Introduce special value for the size argument: VNODE_NO_SIZE.
Only when it is given, the vnode_create_vobject() will try to obtain
file's size on its own.

Introduce dedicated vnode_disk_create_vobject() for use by
g_vfs_open(), so we don't have to call vn_isdisk() in the common case
(for regular files).

Handle the case of mediasize==0 in g_vfs_open().

Reviewed by: alc, kib, markj, olce
Approved by: oshogbo (mentor), allanjude (mentor)
Differential Revision: https://reviews.freebsd.org/D45244
2024-05-23 06:08:14 +00:00
Konstantin Belousov b068bb09a1 Add vnode_pager_clean_{a,}sync(9)
Bump __FreeBSD_version for ZFS use.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D43356
2024-01-11 18:44:53 +02:00
Konstantin Belousov ed1a88a311 vnode_pager_generic_putpages(): rename maxblksz local to max_offset
Requested by:	markj
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D43358
2024-01-11 11:49:37 +02:00
Konstantin Belousov bdb46c21a3 vnode_pager_generic_putpages(): correctly handle clean block at EOF
The loop 'skip clean blocks' checking for the clean blocks in the dirty
pages might end up setting the in_hole to true when exactly at EOF at
the middle of the block, without advancing the prev_offset value. Then
the next block is not dirty, and next_offset is clipped back to poffset
+ maxsize, equal to prev_offset, failing the assertion.

Instead of asserting prev_offset < next_offset, we must skip the write.

Reported by:	asomers
PR:	276191
Reviewed by:	alc, markj
Tested by:	asomers
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D43358
2024-01-11 11:49:37 +02:00
Warner Losh 29363fb446 sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by:		Netflix
2023-11-26 22:23:30 -07:00
Warner Losh 685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Konstantin Belousov 28f957b8b3 vnode_pager_input: return runningbufspace back
Both vnode_pager_input_smlfs() and vnode_pager_generic_getpages()
increment runningbufspace, but also both delegate io completion handling
on the pbuf to either plain bdone() or filesystem-specific strategy
routine. Accidentally, for e.g. UFS it is g_vfs_strategy()/g_vfs_done().
The later calls bufdone() which handles runningbufspace reclamation.

For plain bdone() io done handler, nothing would return
accounted b_runningbufspace back. Do it in the new
helper vnode_pager_input_bdone(), as well as in
vnode_pager_generic_getpages_done() explicitly.

Note that potential multiple calls to runningbufwakeup() for the same
pbuf or buf completion are safe. runningbufwakeup() clears accounting
for the buffer, so second and later calls are nop.

The problem was found due to tarfs using small vnode pager input but not
g_vfs_strategy().

Reported by:	des
Reviewed by:	markj, sjg
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39263
2023-03-26 00:55:29 +02:00
Mateusz Guzik f45feecfb2 vfs: add vn_getsize
getattr is very expensive and in important cases only gets called to get
the size. This can be optimized with a dedicated routine which obtains
that statistic.

As a step towards that goal make size-only consumers use a dedicated
routine.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D37885
2022-12-28 22:43:49 +00:00
John Baldwin b8ebd99aa5 vm: Use __diagused for variables only used in KASSERT(). 2022-04-13 16:08:20 -07:00
Ka Ho Ng de2e152959 Add vnode_pager_purge_range(9) KPI
This KPI is created in addition to the existing vnode_pager_setsize(9)
KPI. The KPI is intended for file systems that are able to turn a range
of file into sparse range, also known as hole-punching.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D27194
2021-08-05 22:52:26 +08:00
Konstantin Belousov 00a3fe968b vm_object_kvme_type(): reimplement by embedding kvme_type into pagerops
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30168
2021-05-13 20:10:35 +03:00
Konstantin Belousov d474440ab3 Constify vm_pager-related virtual tables.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Konstantin Belousov 192112b74f Add pgo_getvp method
This eliminates the staircase of conditions in vm_map_entry_set_vnode_text().

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Konstantin Belousov c23c555bc1 Add pgo_mightbedirty method
Used to implement vm_object_mightbedirty()

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Konstantin Belousov 180bcaa46c vm_pager: add pgo_set_writeable_dirty method
specialized for swap and vnode pagers, and used to implement
vm_object_set_writeable_dirty().

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Bryan Drewery a771bf748f Remove unused obj variable missed in r354870.
Sponsored by:	Dell EMC
2021-03-17 15:29:15 -07:00
Konstantin Belousov cd85379104 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
Eric van Gyzen f9cc8410e1 vm_ooffset_t is now unsigned
vm_ooffset_t is now unsigned. Remove some tests for negative values,
or make other adjustments accordingly.

Reported by:	Coverity
Reviewed by:	kib markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D26214
2020-09-18 16:48:08 +00:00
Mateusz Guzik c3aa3bf97c vm: clean up empty lines in .c and .h files 2020-09-01 21:20:45 +00:00
Mateusz Guzik 7ad2a82da2 vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error
Most consumers pass NULL.
2020-08-19 02:51:17 +00:00
Konstantin Belousov 419e5698a0 Atomically update vm_object vnp_size, where atomic is available.
This will be used later, where it matters on 32bit arches.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25968
2020-08-16 20:52:24 +00:00
Mark Johnston efec381dd1 Remove most lingering references to the page lock in comments.
Finish updating comments to reflect new locking protocols introduced
over the past year.  In particular, vm_page_lock is now effectively
unused.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25868
2020-08-04 14:59:43 +00:00
Chuck Silvers 1bd12a3bb2 Fix vnode_pager handling of read ahead/behind pages when a disk read fails.
Rather than marking the read ahead/behind pages valid even though they were
not initialized, free them using the new function vm_page_free_invalid().

Reviewed by:	markj, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D25430
2020-07-17 23:10:35 +00:00
Chuck Silvers c3dbadc1fd Revert my change from r361855 in favor of a better fix.
Reviewed by:	markj, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D25430
2020-07-17 23:08:01 +00:00
Chuck Silvers bd7d64f548 Don't mark pages as valid if reading the contents from disk fails.
Instead, just skip marking pages valid if the read fails.  Future
attempts to access such pages will notice that they are not marked valid
and try to read them from disk again.

Reviewed by:	kib, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D25138
2020-06-06 00:47:59 +00:00
Konstantin Belousov abfdf76791 VOP_GETPAGES_ASYNC(): consistently call iodone() callback in case of error.
Reviewed by:	glebius, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D24038
2020-03-30 21:44:30 +00:00
Warner Losh cafbf0c664 Don't convert all lower-layer errors to EIO.
Don't convert all lower layer errors to EIO. Instead, pass the actual error up
the stack. This will allow the upper layers that look for ENXIO to react
properly to that signal from the lower layers and, for UFS, unmount the
filesystem.

Reviewed by: kib@
Differential Revision:  https://reviews.freebsd.org/D23755
2020-02-20 01:33:01 +00:00
Warner Losh 65252dc903 Don't spam the console with an additional, and useless, error message.
There's no need to spam the console with this error message. If there's an I/O
error, the disk/cam driver will report it at the lower levels. If that's an
actual problem, the upper layers will report that.

Reviewed by: kib@
Differential Revision:  https://reviews.freebsd.org/D23756
2020-02-20 00:34:46 +00:00
Mateusz Guzik f1fa1ba3d0 Fix up various vnode-related asserts which did not dump the used vnode 2020-02-03 14:25:32 +00:00
Jeff Roberson d6e13f3b4d Don't hold the object lock while calling getpages.
The vnode pager does not want the object lock held.  Moving this out allows
further object lock scope reduction in callers.  While here add some missing
paging in progress calls and an assert.  The object handle is now protected
explicitly with pip.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23033
2020-01-19 23:47:32 +00:00
Jeff Roberson 9c83ff2d86 It has not been possible to recursively terminate a vnode object for some time
now.  Eliminate the dead code that supports it.

Approved by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22908
2020-01-19 18:36:03 +00:00
Mateusz Guzik a314aba874 vm: add missing CLTFLAG_MPSAFE annotations
This covers all vm/* files.
2020-01-12 05:08:57 +00:00
Mateusz Guzik b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Mateusz Guzik abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Jeff Roberson a67d540832 Use atomics in more cases for object references. We now can completely
omit the object lock if we are above a certain threshold.  Hold only a
single vnode reference when the vnode object has any ref > 0.  This
allows us to only lock the object and vnode on 0-1 and 1-0 transitions.

Differential Revision:	https://reviews.freebsd.org/D22452
2019-11-27 00:39:23 +00:00
Jeff Roberson 7f935055d3 Remove unnecessary object locking from the vnode pager. Recent changes to
busy/valid/dirty locking make these acquires redundant.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22186
2019-11-19 23:30:09 +00:00
Jeff Roberson 51df53213c Use atomics and a shared object lock to protect the object reference count.
Certain consumers still need to guarantee a stable reference so we can not
switch entirely to atomics yet.  Exclusive lock holders can still modify
and examine the refcount without using the ref api.

Reviewed by:	kib
Tested by:	pho
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D21598
2019-10-29 20:58:46 +00:00
Mark Johnston 2f81c92e55 Check for bogus_page in vnode_pager_generic_getpages_done().
We now assert that a page is busy when updating its validity-tracking
state, but bogus_page is not busied during a getpages operation.

Reported by:	syzkaller
Reviewed by:	alc, kib
Discussed with:	jeff
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22124
2019-10-23 18:00:22 +00:00
Konstantin Belousov 5b87ecc643 Assert that vnode_pager_setsize() is called with the vnode exclusively locked
except for filesystems that set the MNTK_VMSETSIZE_BUG,  Set the flag for ZFS.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D21883
2019-10-22 16:21:24 +00:00
Konstantin Belousov 208b81bb05 Add VV_VMSIZEVNLOCK flag.
The flag specifies that vm_fault() handler should check the vnode'
vm_object size under the vnode lock.  It is converted into the object'
OBJ_SIZEVNLOCK flag in vnode_pager_alloc().

Tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D21883
2019-10-22 16:09:25 +00:00
Jeff Roberson 0012f373e4 (4/6) Protect page valid with the busy lock.
Atomics are used for page busy and valid state when the shared busy is
held.  The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:        https://reviews.freebsd.org/D21594
2019-10-15 03:45:41 +00:00
Kyle Evans fe7bcbaf50 vm pager: writemapping accounting for OBJT_SWAP
Currently writemapping accounting is only done for vnode_pager which does
some accounting on the underlying vnode.

Extend this to allow accounting to be possible for any of the pager types.
New pageops are added to update/release writecount that need to be
implemented for any pager wishing to do said accounting, and we implement
these methods now for both vnode_pager (unchanged) and swap_pager.

The primary motivation for this is to allow other systems with OBJT_SWAP
objects to check if their objects have any write mappings and reject
operations with EBUSY if so. posixshm will be the first to do so in order to
reject adding write seals to the shmfd if any writable mappings exist.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D21456
2019-09-03 20:31:48 +00:00
Konstantin Belousov 6470c8d3db Rework v_object lifecycle for vnodes.
Current implementation of vnode_create_vobject() and
vnode_destroy_vobject() is written so that it prepared to handle the
vm object destruction for live vnode.  Practically, no filesystems use
this, except for some remnants that were present in UFS till today.
One of the consequences of that model is that each filesystem must
call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result
all of them get rid of the v_object in reclaim.

Move the call to vnode_destroy_vobject() to vgonel() before
VOP_RECLAIM().  This makes v_object stable: either the object is NULL,
or it is valid vm object till the vnode reclamation.  Remove code from
vnode_create_vobject() to handle races with the parallel destruction.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21412
2019-08-29 07:50:25 +00:00
Konstantin Belousov 783a68aa33 Move OBJT_VNODE specific code from vm_object_terminate() to
vnode_destroy_vobject().

Reviewed by:	alc, jeff (previous version), markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21357
2019-08-25 13:26:06 +00:00
Jeff Roberson 4153054a7c Permit vm_pager_has_page() to run with a shared lock. Introduce
VM_OBJECT_DROP/VM_OBJECT_PICKUP to handle functions that are called with
uncertain lock state.

Reviewed by:	kib, markj
Tested by:	pho
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21310
2019-08-19 22:25:28 +00:00
Doug Moore 0cab71bcee Fix style(9) violations involving division by PAGE_SIZE.
Reviewed by: alc
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D20847
2019-07-06 15:55:16 +00:00
Conrad Meyer daec92844e Include ktr.h in more compilation units
Similar to r348026, exhaustive search for uses of CTRn() and cross reference
ktr.h includes.  Where it was obvious that an OS compat header of some kind
included ktr.h indirectly, .c files were left alone.  Some of these files
clearly got ktr.h via header pollution in some scenarios, or tinderbox would
not be passing prior to this revision, but go ahead and explicitly include it
in files using it anyway.

Like r348026, these CUs did not show up in tinderbox as missing the include.

Reported by:	peterj (arm64/mp_machdep.c)
X-MFC-With:	r347984
Sponsored by:	Dell EMC Isilon
2019-05-21 20:38:48 +00:00
Konstantin Belousov 78022527bb Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.

The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount.  To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal

The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change.  vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP.  vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.

nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.

On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.

Reviewed by:	markj, trasz
Tested by:	mjg, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D19923
2019-05-05 11:20:43 +00:00
Jason A. Harmening 40a5168449 Fix incorrect assertion in vnode_pager_generic_getpages()
Reviewed by:	kib, glebius
MFC after:	1 week
2019-02-26 04:50:46 +00:00
Gleb Smirnoff 66fb0b1ad7 For 32-bit machines rollback the default number of vnode pager pbufs
back to the lever before r343030.  For 64-bit machines reduce it slightly,
too.  Together with r343030 I bumped the limit up to the value we use at
Netflix to serve 100 Gbit/s of sendfile traffic, and it probably isn't a
good default.

Provide a loader tunable to change vnode pager pbufs count. Document it.
2019-02-15 23:36:22 +00:00