freebsd-src/lib/libsys/mmap.2

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

565 lines
15 KiB
Groff
Raw Normal View History

1994-05-27 05:00:24 +00:00
.\" Copyright (c) 1991, 1993
.\" The Regents of the University of California. All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\" 3. Neither the name of the University nor the names of its contributors
1994-05-27 05:00:24 +00:00
.\" may be used to endorse or promote products derived from this software
.\" without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.Dd August 14, 2023
1994-05-27 05:00:24 +00:00
.Dt MMAP 2
.Os
1994-05-27 05:00:24 +00:00
.Sh NAME
.Nm mmap
.Nd allocate memory, or map files or devices into memory
.Sh LIBRARY
.Lb libc
1994-05-27 05:00:24 +00:00
.Sh SYNOPSIS
.In sys/mman.h
.Ft void *
.Fn mmap "void *addr" "size_t len" "int prot" "int flags" "int fd" "off_t offset"
1994-05-27 05:00:24 +00:00
.Sh DESCRIPTION
The
.Fn mmap
system call causes the pages starting at
1994-05-27 05:00:24 +00:00
.Fa addr
and continuing for at most
.Fa len
bytes to be mapped from the object described by
.Fa fd ,
starting at byte offset
.Fa offset .
If
.Fa len
is not a multiple of the page size, the mapped region may extend past the
1994-05-27 05:00:24 +00:00
specified range.
Any such extension beyond the end of the mapped object will be zero-filled.
1994-05-27 05:00:24 +00:00
.Pp
If
.Fa fd
references a regular file or a shared memory object, the range of
bytes starting at
.Fa offset
and continuing for
.Fa len
bytes must be legitimate for the possible (not necessarily
current) offsets in the object.
In particular, the
.Fa offset
value cannot be negative.
If the object is truncated and the process later accesses a page that
is wholly within the truncated region, the access is aborted and a
.Dv SIGBUS
signal is delivered to the process.
.Pp
If
.Fa fd
references a device file, the interpretation of the
.Fa offset
value is device specific and defined by the device driver.
The virtual memory subsystem does not impose any restrictions on the
.Fa offset
value in this case, passing it unchanged to the driver.
.Pp
If
1994-05-27 05:00:24 +00:00
.Fa addr
is non-zero, it is used as a hint to the system.
(As a convenience to the system, the actual address of the region may differ
from the address supplied.)
If
.Fa addr
is zero, an address will be selected by the system.
The actual starting address of the region is returned.
A successful
.Fa mmap
deletes any previous mapping in the allocated address range.
.Pp
The protections (region accessibility) are specified in the
.Fa prot
argument by
.Em or Ns 'ing
the following values:
.Pp
.Bl -tag -width PROT_WRITE -compact
.It Dv PROT_NONE
Pages may not be accessed.
1994-05-27 05:00:24 +00:00
.It Dv PROT_READ
Pages may be read.
.It Dv PROT_WRITE
Pages may be written.
.It Dv PROT_EXEC
Pages may be executed.
1994-05-27 05:00:24 +00:00
.El
.Pp
Extend mmap/mprotect API to specify the max page protections. A new macro PROT_MAX() alters a protection value so it can be OR'd with a regular protection value to specify the maximum permissions. If present, these flags specify the maximum permissions. While these flags are non-portable, they can be used in portable code with simple ifdefs to expand PROT_MAX() to 0. This change allows (e.g.) a region that must be writable during run-time linking or JIT code generation to be made permanently read+execute after writes are complete. This complements W^X protections allowing more precise control by the programmer. This change alters mprotect argument checking and returns an error when unhandled protection flags are set. This differs from POSIX (in that POSIX only specifies an error), but is the documented behavior on Linux and more closely matches historical mmap behavior. In addition to explicit setting of the maximum permissions, an experimental sysctl vm.imply_prot_max causes mmap to assume that the initial permissions requested should be the maximum when the sysctl is set to 1. PROT_NONE mappings are excluded from this for compatibility with rtld and other consumers that use such mappings to reserve address space before mapping contents into part of the reservation. A final version this is expected to provide per-binary and per-process opt-in/out options and this sysctl will go away in its current form. As such it is undocumented. Reviewed by: emaste, kib (prior version), markj Additional suggestions from: alc Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D18880
2019-06-20 18:24:16 +00:00
In addition to these protection flags,
.Fx
provides the ability to set the maximum protection of a region allocated by
.Nm
and later altered by
.Xr mprotect 2 .
This is accomplished by
.Em or Ns 'ing
one or more
.Dv PROT_
values wrapped in the
.Dv PROT_MAX()
macro into the
.Fa prot
argument.
.Pp
1994-05-27 05:00:24 +00:00
The
.Fa flags
2002-12-19 09:40:28 +00:00
argument specifies the type of the mapped object, mapping options and
1994-05-27 05:00:24 +00:00
whether modifications made to the mapped copy of the page are private
to the process or are to be shared with other references.
Sharing, mapping type and options are specified in the
.Fa flags
argument by
.Em or Ns 'ing
the following values:
.Bl -tag -width MAP_PREFAULT_READ
.It Dv MAP_32BIT
Request a region in the first 2GB of the current process's address space.
If a suitable region cannot be found,
.Fn mmap
will fail.
.It Dv MAP_ALIGNED Ns Pq Fa n
Align the region on a requested boundary.
If a suitable region cannot be found,
.Fn mmap
will fail.
The
.Fa n
argument specifies the binary logarithm of the desired alignment.
.It Dv MAP_ALIGNED_SUPER
Align the region to maximize the potential use of large
.Pq Dq super
pages.
If a suitable region cannot be found,
.Fn mmap
will fail.
The system will choose a suitable page size based on the size of
mapping.
The page size used as well as the alignment of the region may both be
affected by properties of the file being mapped.
In particular,
the physical address of existing pages of a file may require a specific
alignment.
The region is not guaranteed to be aligned on any specific boundary.
1994-05-27 05:00:24 +00:00
.It Dv MAP_ANON
Map anonymous memory not associated with any specific file.
The file descriptor used for creating
.Dv MAP_ANON
must be \-1.
The
.Fa offset
argument must be 0.
1994-05-27 05:00:24 +00:00
.\".It Dv MAP_FILE
.\"Mapped from a regular file or character-special device memory.
.It Dv MAP_ANONYMOUS
This flag is identical to
.Dv MAP_ANON
and is provided for compatibility.
.It Dv MAP_EXCL
This flag can only be used in combination with
.Dv MAP_FIXED .
Please see the definition of
.Dv MAP_FIXED
for the description of its effect.
1994-05-27 05:00:24 +00:00
.It Dv MAP_FIXED
Do not permit the system to select a different address than the one
specified.
If the specified address cannot be used,
.Fn mmap
1994-05-27 05:00:24 +00:00
will fail.
If
.Dv MAP_FIXED
is specified,
1994-05-27 05:00:24 +00:00
.Fa addr
must be a multiple of the page size.
If
.Dv MAP_EXCL
is not specified, a successful
2005-11-17 13:00:00 +00:00
.Dv MAP_FIXED
request replaces any previous mappings for the process'
pages in the range from
.Fa addr
to
2005-11-17 13:00:00 +00:00
.Fa addr
+
.Fa len .
In contrast, if
.Dv MAP_EXCL
is specified, the request will fail if a mapping
already exists within the range.
.It Dv MAP_GUARD
Instead of a mapping, create a guard of the specified size.
Guards allow a process to create reservations in its address space,
which can later be replaced by actual mappings.
.Pp
.Fa mmap
will not create mappings in the address range of a guard unless
the request specifies
.Dv MAP_FIXED .
Guards can be destroyed with
.Xr munmap 2 .
Any memory access by a thread to the guarded range results
in the delivery of a
.Dv SIGSEGV
signal to that thread.
.It Dv MAP_NOCORE
Region is not included in a core file.
.It Dv MAP_NOSYNC
Causes data dirtied via this VM map to be flushed to physical media
only when necessary (usually by the pager) rather than gratuitously.
Typically this prevents the update daemons from flushing pages dirtied
through such maps and thus allows efficient sharing of memory across
unassociated processes using a file-backed shared memory map.
Without
this option any VM pages you dirty may be flushed to disk every so often
(every 30-60 seconds usually) which can create performance problems if you
do not need that to occur (such as when you are using shared file-backed
mmap regions for IPC purposes).
Dirty data will be flushed automatically when all mappings of an object are
removed and all descriptors referencing the object are closed.
Note that VM/file system coherency is
maintained whether you use
.Dv MAP_NOSYNC
or not.
This option is not portable
across
.Ux
platforms (yet), though some may implement the same behavior
by default.
.Pp
.Em WARNING !
Extending a file with
.Xr ftruncate 2 ,
thus creating a big hole, and then filling the hole by modifying a shared
.Fn mmap
can lead to severe file fragmentation.
In order to avoid such fragmentation you should always pre-allocate the
file's backing store by
.Fn write Ns ing
zero's into the newly extended area prior to modifying the area via your
.Fn mmap .
The fragmentation problem is especially sensitive to
.Dv MAP_NOSYNC
pages, because pages may be flushed to disk in a totally random order.
.Pp
The same applies when using
.Dv MAP_NOSYNC
to implement a file-based shared memory store.
It is recommended that you create the backing store by
.Fn write Ns ing
zero's to the backing file rather than
.Fn ftruncate Ns ing
it.
You can test file fragmentation by observing the KB/t (kilobytes per
transfer) results from an
.Dq Li iostat 1
while reading a large file sequentially, e.g.,\& using
.Dq Li dd if=filename of=/dev/null bs=32k .
.Pp
The
.Xr fsync 2
system call will flush all dirty data and metadata associated with a file,
including dirty NOSYNC VM data, to physical media.
The
.Xr sync 8
command and
.Xr sync 2
system call generally do not flush dirty NOSYNC VM data.
The
.Xr msync 2
system call is usually not needed since
.Bx
implements a coherent file system buffer cache.
However, it may be
used to associate dirty VM pages with file system buffers and thus cause
them to be flushed to physical media sooner rather than later.
.It Dv MAP_PREFAULT_READ
Immediately update the calling process's lowest-level virtual address
translation structures, such as its page table, so that every memory
resident page within the region is mapped for read access.
Ordinarily these structures are updated lazily.
The effect of this option is to eliminate any soft faults that would
otherwise occur on the initial read accesses to the region.
Although this option does not preclude
.Fa prot
from including
.Dv PROT_WRITE ,
it does not eliminate soft faults on the initial write accesses to the
region.
.It Dv MAP_PRIVATE
Modifications are private.
.It Dv MAP_SHARED
Modifications are shared.
.It Dv MAP_STACK
Creates both a mapped region that grows downward on demand and an
adjoining guard that both reserves address space for the mapped region
to grow into and limits the mapped region's growth.
Together, the mapped region and the guard occupy
.Fa len
bytes of the address space.
The guard starts at the returned address, and the mapped region ends at
the returned address plus
.Fa len
bytes.
Upon access to the guard, the mapped region automatically grows in size,
and the guard shrinks by an equal amount.
Essentially, the boundary between the guard and the mapped region moves
downward so that the access falls within the enlarged mapped region.
However, the guard will never shrink to less than the number of pages
specified by the sysctl
.Dv security.bsd.stack_guard_page ,
thereby ensuring that a gap for detecting stack overflow always exists
between the downward growing mapped region and the closest mapped region
beneath it.
.Pp
.Dv MAP_STACK
implies
.Dv MAP_ANON
and
.Fa offset
of 0.
2002-12-19 09:40:28 +00:00
The
.Fa fd
2002-12-19 09:40:28 +00:00
argument
must be -1 and
.Fa prot
must include at least
.Dv PROT_READ
and
.Dv PROT_WRITE .
The size of the guard, in pages, is specified by sysctl
.Dv security.bsd.stack_guard_page .
1994-05-27 05:00:24 +00:00
.El
.Pp
The
.Xr close 2
system call does not unmap pages, see
1994-05-27 05:00:24 +00:00
.Xr munmap 2
for further information.
.Sh NOTES
Although this implementation does not impose any alignment restrictions on
the
.Fa offset
argument, a portable program must only use page-aligned values.
.Pp
Large page mappings require that the pages backing an object be
aligned in matching blocks in both the virtual address space and RAM.
The system will automatically attempt to use large page mappings when
mapping an object that is already backed by large pages in RAM by
aligning the mapping request in the virtual address space to match the
alignment of the large physical pages.
The system may also use large page mappings when mapping portions of an
object that are not yet backed by pages in RAM.
The
.Dv MAP_ALIGNED_SUPER
flag is an optimization that will align the mapping request to the
size of a large page similar to
.Dv MAP_ALIGNED ,
except that the system will override this alignment if an object already
uses large pages so that the mapping will be consistent with the existing
large pages.
This flag is mostly useful for maximizing the use of large pages on the
first mapping of objects that do not yet have pages present in RAM.
1994-05-27 05:00:24 +00:00
.Sh RETURN VALUES
Upon successful completion,
.Fn mmap
1994-05-27 05:00:24 +00:00
returns a pointer to the mapped region.
Otherwise, a value of
.Dv MAP_FAILED
is returned and
1994-05-27 05:00:24 +00:00
.Va errno
is set to indicate the error.
.Sh ERRORS
The
.Fn mmap
system call
1994-05-27 05:00:24 +00:00
will fail if:
.Bl -tag -width Er
.It Bq Er EACCES
The flag
.Dv PROT_READ
was specified as part of the
.Fa prot
2002-12-19 09:40:28 +00:00
argument and
1994-05-27 05:00:24 +00:00
.Fa fd
was not open for reading.
The flags
.Dv MAP_SHARED
and
.Dv PROT_WRITE
were specified as part of the
1994-05-27 05:00:24 +00:00
.Fa flags
and
.Fa prot
2002-12-19 09:40:28 +00:00
argument and
1994-05-27 05:00:24 +00:00
.Fa fd
was not open for writing.
.It Bq Er EBADF
2002-12-19 09:40:28 +00:00
The
2000-12-29 14:08:20 +00:00
.Fa fd
2002-12-19 09:40:28 +00:00
argument
1994-05-27 05:00:24 +00:00
is not a valid open file descriptor.
.It Bq Er EINVAL
An invalid (negative) value was passed in the
.Fa offset
argument, when
.Fa fd
referenced a regular file or shared memory.
.It Bq Er EINVAL
An invalid value was passed in the
.Fa prot
argument.
.It Bq Er EINVAL
An undefined option was set in the
.Fa flags
argument.
.It Bq Er EINVAL
Both
.Dv MAP_PRIVATE
and
.Dv MAP_SHARED
were specified.
.It Bq Er EINVAL
None of
.Dv MAP_ANON ,
.Dv MAP_GUARD ,
.Dv MAP_PRIVATE ,
.Dv MAP_SHARED ,
or
.Dv MAP_STACK
was specified.
At least one of these flags must be included.
.It Bq Er EINVAL
.Dv MAP_STACK
was specified and
.Va len
is less than or equal to the guard size.
.It Bq Er EINVAL
1994-05-27 05:00:24 +00:00
.Dv MAP_FIXED
was specified and the
.Fa addr
2002-12-19 09:40:28 +00:00
argument was not page aligned, or part of the desired address space
resides out of the valid address space for a user process.
.It Bq Er EINVAL
Both
.Dv MAP_FIXED
and
.Dv MAP_32BIT
were specified and part of the desired address space resides outside
of the first 2GB of user address space.
.It Bq Er EINVAL
2002-12-19 09:40:28 +00:00
The
.Fa len
argument
was equal to zero.
.It Bq Er EINVAL
.Dv MAP_ALIGNED
was specified and the desired alignment was either larger than the
virtual address size of the machine or smaller than a page.
.It Bq Er EINVAL
.Dv MAP_ANON
was specified and the
.Fa fd
2002-12-19 09:40:28 +00:00
argument was not -1.
.It Bq Er EINVAL
.Dv MAP_ANON
was specified and the
.Fa offset
argument was not 0.
.It Bq Er EINVAL
Both
.Dv MAP_FIXED
and
.Dv MAP_EXCL
were specified, but the requested region is already used by a mapping.
.It Bq Er EINVAL
.Dv MAP_EXCL
was specified, but
.Dv MAP_FIXED
was not.
.It Bq Er EINVAL
.Dv MAP_GUARD
was specified, but the
.Fa offset
argument was not zero, the
.Fa fd
argument was not -1, or the
.Fa prot
argument was not
.Dv PROT_NONE .
.It Bq Er EINVAL
.Dv MAP_GUARD
was specified together with one of the flags
.Dv MAP_ANON ,
.Dv MAP_PREFAULT ,
.Dv MAP_PREFAULT_READ ,
.Dv MAP_PRIVATE ,
.Dv MAP_SHARED ,
.Dv MAP_STACK .
.It Bq Er ENODEV
.Dv MAP_ANON
has not been specified and
.Fa fd
did not reference a regular or character special file.
1994-05-27 05:00:24 +00:00
.It Bq Er ENOMEM
.Dv MAP_FIXED
was specified and the
.Fa addr
2002-12-19 09:40:28 +00:00
argument was not available.
1994-05-27 05:00:24 +00:00
.Dv MAP_ANON
was specified and insufficient memory was available.
.It Bq Er ENOTSUP
The
.Fa prot
argument contains protections which are not a subset of the specified
maximum protections.
2000-12-29 14:08:20 +00:00
.El
.Sh SEE ALSO
1994-05-27 05:00:24 +00:00
.Xr madvise 2 ,
.Xr mincore 2 ,
.Xr minherit 2 ,
.Xr mlock 2 ,
1997-01-20 23:23:22 +00:00
.Xr mprotect 2 ,
.Xr msync 2 ,
.Xr munlock 2 ,
1997-01-20 23:23:22 +00:00
.Xr munmap 2 ,
.Xr getpagesize 3 ,
.Xr getpagesizes 3
.Sh HISTORY
The
.Nm
system call was first documented in
.Bx 4.2
and implemented in
.Bx 4.4 .
.\" XXX: lots of missing history of FreeBSD additions.
.Pp
The
.Dv PROT_MAX
functionality was introduced in
.Fx 13 .