mirror of
https://github.com/freebsd/freebsd-src
synced 2024-11-05 18:22:52 +00:00
Mdoc Janitor:
* Fix hard sentence breaks.
This commit is contained in:
parent
4c7b4d14a2
commit
3a858f3798
Notes:
svn2git
2020-12-20 02:59:44 +00:00
svn path=/head/; revision=121384
5 changed files with 75 additions and 40 deletions
|
@ -40,76 +40,101 @@
|
|||
.Sh DESCRIPTION
|
||||
The kernel implements a KVM abstraction of the buffer cache which allows it
|
||||
to map potentially disparate vm_page's into contiguous KVM for use by
|
||||
(mainly file system) devices and device I/O. This abstraction supports
|
||||
(mainly file system) devices and device I/O.
|
||||
This abstraction supports
|
||||
block sizes from DEV_BSIZE (usually 512) to upwards of several pages or more.
|
||||
It also supports a relatively primitive byte-granular valid range and dirty
|
||||
range currently hardcoded for use by NFS. The code implementing the
|
||||
range currently hardcoded for use by NFS.
|
||||
The code implementing the
|
||||
VM Buffer abstraction is mostly concentrated in
|
||||
.Pa /usr/src/sys/kern/vfs_bio.c .
|
||||
.Pp
|
||||
One of the most important things to remember when dealing with buffer pointers
|
||||
(struct buf) is that the underlying pages are mapped directly from the buffer
|
||||
cache. No data copying occurs in the scheme proper, though some file systems
|
||||
such as UFS do have to copy a little when dealing with file fragments. The
|
||||
second most important thing to remember is that due to the underlying page
|
||||
cache.
|
||||
No data copying occurs in the scheme proper, though some file systems
|
||||
such as UFS do have to copy a little when dealing with file fragments.
|
||||
The second most important thing to remember is that due to the underlying page
|
||||
mapping, the b_data base pointer in a buf is always *page* aligned, not
|
||||
*block* aligned. When you have a VM buffer representing some b_offset and
|
||||
*block* aligned.
|
||||
When you have a VM buffer representing some b_offset and
|
||||
b_size, the actual start of the buffer is (b_data + (b_offset & PAGE_MASK))
|
||||
and not just b_data. Finally, the VM system's core buffer cache supports
|
||||
valid and dirty bits (m->valid, m->dirty) for pages in DEV_BSIZE chunks. Thus
|
||||
and not just b_data.
|
||||
Finally, the VM system's core buffer cache supports
|
||||
valid and dirty bits (m->valid, m->dirty) for pages in DEV_BSIZE chunks.
|
||||
Thus
|
||||
a platform with a hardware page size of 4096 bytes has 8 valid and 8 dirty
|
||||
bits. These bits are generally set and cleared in groups based on the device
|
||||
block size of the device backing the page. Complete page's worth are often
|
||||
bits.
|
||||
These bits are generally set and cleared in groups based on the device
|
||||
block size of the device backing the page.
|
||||
Complete page's worth are often
|
||||
referred to using the VM_PAGE_BITS_ALL bitmask (i.e. 0xFF if the hardware page
|
||||
size is 4096).
|
||||
.Pp
|
||||
VM buffers also keep track of a byte-granular dirty range and valid range.
|
||||
This feature is normally only used by the NFS subsystem. I'm not sure why it
|
||||
This feature is normally only used by the NFS subsystem.
|
||||
I'm not sure why it
|
||||
is used at all, actually, since we have DEV_BSIZE valid/dirty granularity
|
||||
within the VM buffer. If a buffer dirty operation creates a 'hole',
|
||||
the dirty range will extend to cover the hole. If a buffer validation
|
||||
within the VM buffer.
|
||||
If a buffer dirty operation creates a 'hole',
|
||||
the dirty range will extend to cover the hole.
|
||||
If a buffer validation
|
||||
operation creates a 'hole' the byte-granular valid range is left alone and
|
||||
will not take into account the new extension. Thus the whole byte-granular
|
||||
will not take into account the new extension.
|
||||
Thus the whole byte-granular
|
||||
abstraction is considered a bad hack and it would be nice if we could get rid
|
||||
of it completely.
|
||||
.Pp
|
||||
A VM buffer is capable of mapping the underlying VM cache pages into KVM in
|
||||
order to allow the kernel to directly manipulate the data associated with
|
||||
the (vnode,b_offset,b_size). The kernel typically unmaps VM buffers the moment
|
||||
the (vnode,b_offset,b_size).
|
||||
The kernel typically unmaps VM buffers the moment
|
||||
they are no longer needed but often keeps the 'struct buf' structure
|
||||
instantiated and even bp->b_pages array instantiated despite having unmapped
|
||||
them from KVM. If a page making up a VM buffer is about to undergo I/O, the
|
||||
them from KVM.
|
||||
If a page making up a VM buffer is about to undergo I/O, the
|
||||
system typically unmaps it from KVM and replaces the page in the b_pages[]
|
||||
array with a place-marker called bogus_page. The place-marker forces any kernel
|
||||
array with a place-marker called bogus_page.
|
||||
The place-marker forces any kernel
|
||||
subsystems referencing the associated struct buf to re-lookup the associated
|
||||
page. I believe the place-marker hack is used to allow sophisticated devices
|
||||
page.
|
||||
I believe the place-marker hack is used to allow sophisticated devices
|
||||
such as file system devices to remap underlying pages in order to deal with,
|
||||
for example, re-mapping a file fragment into a file block.
|
||||
.Pp
|
||||
VM buffers are used to track I/O operations within the kernel. Unfortunately,
|
||||
VM buffers are used to track I/O operations within the kernel.
|
||||
Unfortunately,
|
||||
the I/O implementation is also somewhat of a hack because the kernel wants
|
||||
to clear the dirty bit on the underlying pages the moment it queues the I/O
|
||||
to the VFS device, not when the physical I/O is actually initiated. This
|
||||
to the VFS device, not when the physical I/O is actually initiated.
|
||||
This
|
||||
can create confusion within file system devices that use delayed-writes because
|
||||
you wind up with pages marked clean that are actually still dirty. If not
|
||||
you wind up with pages marked clean that are actually still dirty.
|
||||
If not
|
||||
treated carefully, these pages could be thrown away! Indeed, a number of
|
||||
serious bugs related to this hack were not fixed until the 2.2.8/3.0 release.
|
||||
The kernel uses an instantiated VM buffer (i.e. struct buf) to place-mark pages
|
||||
in this special state. The buffer is typically flagged B_DELWRI. When a
|
||||
device no longer needs a buffer it typically flags it as B_RELBUF. Due to
|
||||
in this special state.
|
||||
The buffer is typically flagged B_DELWRI.
|
||||
When a
|
||||
device no longer needs a buffer it typically flags it as B_RELBUF.
|
||||
Due to
|
||||
the underlying pages being marked clean, the B_DELWRI|B_RELBUF combination must
|
||||
be interpreted to mean that the buffer is still actually dirty and must be
|
||||
written to its backing store before it can actually be released. In the case
|
||||
written to its backing store before it can actually be released.
|
||||
In the case
|
||||
where B_DELWRI is not set, the underlying dirty pages are still properly
|
||||
marked as dirty and the buffer can be completely freed without losing that
|
||||
clean/dirty state information. ( XXX do we have to check other flags in
|
||||
regards to this situation ??? ).
|
||||
.Pp
|
||||
The kernel reserves a portion of its KVM space to hold VM Buffer's data
|
||||
maps. Even though this is virtual space (since the buffers are mapped
|
||||
maps.
|
||||
Even though this is virtual space (since the buffers are mapped
|
||||
from the buffer cache), we cannot make it arbitrarily large because
|
||||
instantiated VM Buffers (struct buf's) prevent their underlying pages in the
|
||||
buffer cache from being freed. This can complicate the life of the paging
|
||||
buffer cache from being freed.
|
||||
This can complicate the life of the paging
|
||||
system.
|
||||
.Pp
|
||||
.\" .Sh SEE ALSO
|
||||
|
|
|
@ -61,7 +61,8 @@
|
|||
The
|
||||
.Nm
|
||||
functions are designed to copy contiguous data from one address
|
||||
to another. All but
|
||||
to another.
|
||||
All but
|
||||
.Fn copystr
|
||||
copy data from user-space to kernel-space or vice-versa.
|
||||
.Pp
|
||||
|
|
|
@ -48,9 +48,11 @@ the vnode to remove from the free list
|
|||
if non-zero, the vnode will also be locked
|
||||
.El
|
||||
.Pp
|
||||
When not in use, vnodes are kept on a free list. The vnodes still
|
||||
When not in use, vnodes are kept on a free list.
|
||||
The vnodes still
|
||||
reference valid files but may be reused to refer to a new file at any
|
||||
time. Often, these vnodes are also held in caches in the system, such
|
||||
time.
|
||||
Often, these vnodes are also held in caches in the system, such
|
||||
as the name cache.
|
||||
.Pp
|
||||
When a vnode which is on the free list is used again, for instance if
|
||||
|
|
|
@ -50,13 +50,15 @@ Each vnode has three reference counts,
|
|||
and
|
||||
.Va v_writecount .
|
||||
The first is the number of clients within the kernel which are
|
||||
using this vnode. This count is maintained by
|
||||
using this vnode.
|
||||
This count is maintained by
|
||||
.Xr vref 9 ,
|
||||
.Xr vrele 9
|
||||
and
|
||||
.Xr vput 9 .
|
||||
The second is the number of clients within the kernel who veto
|
||||
the recycling of this vnode. This count is
|
||||
the recycling of this vnode.
|
||||
This count is
|
||||
maintained by
|
||||
.Xr vhold 9
|
||||
and
|
||||
|
@ -73,7 +75,8 @@ The transition to and from the freelist is handled by
|
|||
and
|
||||
.Xr vbusy 9 .
|
||||
The third is a count of the number of clients which are writing into
|
||||
the file. It is maintained by the
|
||||
the file.
|
||||
It is maintained by the
|
||||
.Xr open 2
|
||||
and
|
||||
.Xr close 2
|
||||
|
@ -85,7 +88,8 @@ Any call which returns a vnode (e.g.\&
|
|||
etc.)
|
||||
will increase the
|
||||
.Va v_usecount
|
||||
of the vnode by one. When the caller is finished with the vnode, it
|
||||
of the vnode by one.
|
||||
When the caller is finished with the vnode, it
|
||||
should release this reference by calling
|
||||
.Xr vrele 9
|
||||
(or
|
||||
|
@ -113,23 +117,25 @@ functionality.
|
|||
.It Dv VNON
|
||||
No type.
|
||||
.It Dv VREG
|
||||
A regular file; may be with or without VM object backing. If you want
|
||||
to make sure this get a backing object, call
|
||||
A regular file; may be with or without VM object backing.
|
||||
If you want to make sure this get a backing object, call
|
||||
.Xr vfs_object_create 9 .
|
||||
.It Dv VDIR
|
||||
A directory.
|
||||
.It Dv VBLK
|
||||
A block device; may be with or without VM object backing. If you want
|
||||
to make sure this get a backing object, call
|
||||
A block device; may be with or without VM object backing.
|
||||
If you want to make sure this get a backing object, call
|
||||
.Xr vfs_object_create 9 .
|
||||
.It Dv VCHR
|
||||
A character device.
|
||||
.It Dv VLNK
|
||||
A symbolic link.
|
||||
.It Dv VSOCK
|
||||
A socket. Advisory locking won't work on this.
|
||||
A socket.
|
||||
Advisory locking won't work on this.
|
||||
.It Dv VFIFO
|
||||
A FIFO (named pipe). Advisory locking won't work on this.
|
||||
A FIFO (named pipe).
|
||||
Advisory locking won't work on this.
|
||||
.It Dv VBAD
|
||||
An old style bad sector map
|
||||
.El
|
||||
|
|
|
@ -49,7 +49,8 @@ the vnode to increment
|
|||
.El
|
||||
.Pp
|
||||
Each vnode maintains a reference count of how many parts of the system
|
||||
are using the vnode. This allows the system to detect when a vnode is
|
||||
are using the vnode.
|
||||
This allows the system to detect when a vnode is
|
||||
no longer being used and can be safely recycled for a different file.
|
||||
.Pp
|
||||
Any code in the system which is using a vnode (e.g. during the
|
||||
|
|
Loading…
Reference in a new issue