mirror of
https://github.com/SerenityOS/serenity
synced 2024-10-07 00:19:27 +00:00
Base: Document futex(2) :^)
This commit is contained in:
parent
0a6690b745
commit
32de3ddd33
193
Base/usr/share/man/man2/futex.md
Normal file
193
Base/usr/share/man/man2/futex.md
Normal file
|
@ -0,0 +1,193 @@
|
|||
## Name
|
||||
|
||||
futex - low-level synchronization primitive
|
||||
|
||||
## Synopsis
|
||||
|
||||
```c++
|
||||
#include <serenity.h>
|
||||
|
||||
// Raw syscall.
|
||||
int futex(uint32_t* userspace_address, int futex_op, uint32_t value, const struct timespec* timeout, uint32_t* userspace_address2, uint32_t value3);
|
||||
|
||||
// More convenient wrappers.
|
||||
int futex_wait(uint32_t* userspace_address, uint32_t value, const struct timespec* abstime, int clockid, int process_shared);
|
||||
int futex_wake(uint32_t* userspace_address, uint32_t count, int process_shared);
|
||||
```
|
||||
|
||||
## Description
|
||||
|
||||
The `futex()` system call provides a low-level synchronization primitive,
|
||||
essentially exposing the kernel's internal thread synchronization primitives
|
||||
to userspace.
|
||||
|
||||
While the `futex()` API is powerful and generic, it is complex and cumbersome
|
||||
to use, and notoriously tricky to use *correctly*. For this reason, it is not
|
||||
intended to be used by application code directly, but rather to serve as
|
||||
a building block for more specialized and easier to use synchronization
|
||||
primitives implemented in user space, such as mutexes and semaphores.
|
||||
Specifically, the `futex()` API is designed to enable userspace synchronization
|
||||
primitives to have a *fast path* that does not involve calling into the kernel
|
||||
at all in the common uncontended case, avoiding the cost of making a syscall
|
||||
completely.
|
||||
|
||||
*A futex* is a single 32-bit integer cell located anywhere in the address space
|
||||
of a process (identified by its address), as well as an associated kernel-side
|
||||
queue of waiting threads. The kernel-side resources associated with a futex are
|
||||
created and destroyed implicitly when a futex is used; in other words, any
|
||||
32-bit integer can be used as a futex without any specific setup, and a futex
|
||||
on which no threads are waiting is no different to any other integer. The
|
||||
kernel does not assign any meaning to the value of the futex integer; it is up
|
||||
to userspace to make use of the value for its own logic.
|
||||
|
||||
The `futex()` API provides a number of *operations*, the most basic ones being
|
||||
_waiting_ and _waking_:
|
||||
|
||||
* `FUTEX_WAKE` / `futex_wake()`: wake up to `count` threads waiting on the
|
||||
futex (in the raw `futex()` syscall, `count` is passed as the `value`
|
||||
argument). The two most common values for `count` are 1 (wake a single
|
||||
thread) and `UINT32_MAX` (wake all threads).
|
||||
* `FUTEX_WAIT` / `futex_wait()`: wait on the futex, but only if the current
|
||||
value of the futex integer matches the specified `value`. The value
|
||||
comparison and blocking is done atomically: if another thread changes the
|
||||
value before the calling thread starts waiting, the calling thread will not
|
||||
begin waiting at all, and the `futex_wait()` call will return `EAGAIN`
|
||||
immediately. A waiting thread may wake up spuriously, without a matching call
|
||||
to `futex_wake()`.
|
||||
* `FUTEX_WAKE_BITSET`: like `FUTEX_WAKE`, but only consider waiting threads
|
||||
that have specified a matching bitset (passed in `value3`). Two bitsets match
|
||||
if their *bitwise and* is non-zero. A thread that has not specified a bitset
|
||||
is treated as having a bitset with all bits set (`FUTEX_BITSET_MATCH_ANY`,
|
||||
equal to `0xffffffff`).
|
||||
* `FUTEX_WAIT_BITSET`: like `FUTEX_WAIT`, but the thread will only get woken by
|
||||
wake operations specifying a matching bitset.
|
||||
* `FUTEX_REQUEUE`: wake up to `value` threads waiting on the futex, and requeue
|
||||
up to `value2` (passed instead of the `timeout` argument) of the remaining
|
||||
waiting threads to wait on another futex specified by `userspace_address2`,
|
||||
without waking them up. Waking and requeueing threads is done atomically.
|
||||
|
||||
Requeueing threads without waking them up is useful to avoid "thundering
|
||||
herd" issues with synchronization primitives like condition variables, where
|
||||
multiple threads may wait for an event, but an event can only be handled by a
|
||||
single thread at a time.
|
||||
* `FUTEX_CMP_REQUEUE`: like `FUTEX_REQUEUE`, but only if the current value of
|
||||
the futex integer matches the specified `value3`. The value comparison,
|
||||
waking and requeueing threads are all done atomically.
|
||||
* `FUTEX_WAKE_OP`: modify the value of the futex specified by
|
||||
`userspace_address2`, wake up to `value` threads waiting on the futex, and
|
||||
optionally up to `value2` (passed instead of the `timeout` argument) threads
|
||||
waiting on the futex specified by `userspace_address2`.
|
||||
|
||||
The details of this operation are not currently documented here, see the
|
||||
implementation for details.
|
||||
|
||||
Additionally, the `FUTEX_PRIVATE_FLAG` flag can be *or*'ed in with one of the
|
||||
*operation* values listed above. This flag restricts the call to only work on
|
||||
other threads of the same process (as opposed to any threads in the system that
|
||||
may have the same memory page mapped into their address space, possibly at a
|
||||
different address), which enables additional optimizations in the syscall
|
||||
implementation. The inverse of this flag is exposed as the `process_shared`
|
||||
argument in `futex_wait()` and `futex_wake()` wrapper functions.
|
||||
|
||||
## Return value
|
||||
|
||||
* `FUTEX_WAKE`, `FUTEX_WAKE_BITSET`, `FUTEX_WAKE_OP`: the number of the waiting
|
||||
threads that have been woken up, which may be 0 or a positive number.
|
||||
* `FUTEX_WAIT`, `FUTEX_WAIT_BITSET`: 0 if blocked and got woken up by an
|
||||
explicit wake call or woke up spuriously, an error otherwise.
|
||||
* `FUTEX_REQUEUE`, `FUTEX_CMP_REQUEUE`: the total number of threads woken up
|
||||
and requeued.
|
||||
|
||||
## Errors
|
||||
|
||||
* `EAGAIN`: for wait operations, did not begin waiting, because the futex value
|
||||
has already been changed.
|
||||
* `ETIMEDOUT`: for wait operations with a timeout, timed out.
|
||||
* `EFAULT`: the specified futex address is invalid.
|
||||
* `ENOSYS`: `FUTEX_CLOCK_REALTIME` was specified, but the operation is not
|
||||
`FUTEX_WAIT` or `FUTEX_WAIT_BITSET`.
|
||||
* `EINVAL`: The arithmetic-logical operation for `FUTEX_WAKE_OP` is invalid.
|
||||
|
||||
## Examples
|
||||
|
||||
The following program demonstrates how futexes can be used to implement a
|
||||
simple "event" synchronization primitive. An event has a boolean state: it can
|
||||
be *set* or *unset*; the initial state being unset. The two operations on an
|
||||
event are *waiting* until it is set, and *setting* it (which wakes up any
|
||||
threads that were waiting for the event to get set).
|
||||
|
||||
Such a synchronization primitive could be used, for example, to notify threads
|
||||
that are waiting for another thread to perform some sort of complex
|
||||
initialization.
|
||||
|
||||
The implementation features two fast paths: both setting an event that no
|
||||
thread is waiting on, and trying to wait on an event that has already been set,
|
||||
are performed entirely in userspace without calling into the kernel. For this
|
||||
to work, the value of the futex integer is used to track both the state of the
|
||||
event (whether it has been set) and whether any threads are waiting on it.
|
||||
|
||||
```c++
|
||||
#include <AK/Atomic.h>
|
||||
#include <serenity.h>
|
||||
|
||||
class Event {
|
||||
private:
|
||||
enum State : u32 {
|
||||
UnsetNoWaiters,
|
||||
UnsetWithWaiters,
|
||||
Set,
|
||||
};
|
||||
|
||||
AK::Atomic<State> m_state { UnsetNoWaiters };
|
||||
|
||||
u32* state_futex_ptr() { return reinterpret_cast<u32*>(const_cast<State*>(m_state.ptr())); }
|
||||
|
||||
public:
|
||||
void set()
|
||||
{
|
||||
State previous_state = m_state.exchange(Set, AK::memory_order_release);
|
||||
// If there was anyone waiting, wake them all up.
|
||||
// Fast path: no one was waiting, so we're done.
|
||||
if (previous_state == UnsetWithWaiters)
|
||||
futex_wake(state_futex_ptr(), UINT32_MAX, false);
|
||||
}
|
||||
|
||||
void wait()
|
||||
{
|
||||
// If the state is UnsetNoWaiters, set it to UnsetWithWaiters.
|
||||
State expected_state = UnsetNoWaiters;
|
||||
bool have_exchanged = m_state.compare_exchange_strong(
|
||||
expected_state, UnsetWithWaiters,
|
||||
AK::memory_order_acquire);
|
||||
if (have_exchanged)
|
||||
expected_state = UnsetWithWaiters;
|
||||
|
||||
// We need to check the state in a loop and not just once
|
||||
// because of the possibility of spurious wakeups.
|
||||
// Fast path: if the state was already Set, we're done.
|
||||
while (expected_state != Set) {
|
||||
futex_wait(state_futex_ptr(), expected_state, nullptr, 0, false);
|
||||
expected_state = m_state.load(AK::memory_order_acquire);
|
||||
}
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## History
|
||||
|
||||
The name "futex" stands for "fast userspace mutex".
|
||||
|
||||
The `futex()` system call originally apeared in Linux. Since then, many other
|
||||
kernels implemented support for futex-like operations, under various names, in
|
||||
particular:
|
||||
* Darwin (XNU) has private `ulock_wait()` and `ulock_wake()` API;
|
||||
* Windows (NT) apparently has `WaitOnAddress()`, `WakeByAddressSingle()` and
|
||||
`WakeByAddressAll()`;
|
||||
* FreeBSD and DargonFly BSD have `umtx`;
|
||||
* OpenBSD has Linux-like `futex()`;
|
||||
* GNU Hurd has `gsync_wait()`, `gsync_wake()`, and `gsync_requeue()`.
|
||||
|
||||
## Further reading
|
||||
|
||||
* [Futexes Are Tricky](https://akkadia.org/drepper/futex.pdf) by Ulrich Drepper
|
||||
* [Locking in WebKit](https://webkit.org/blog/6161/locking-in-webkit/) by Filip Pizlo
|
Loading…
Reference in a new issue