2023-09-18 11:33:06 +00:00
|
|
|
|
---
|
2023-09-19 20:22:49 +00:00
|
|
|
|
title: File Descriptor Store
|
2023-09-18 11:33:06 +00:00
|
|
|
|
category: Interfaces
|
|
|
|
|
layout: default
|
|
|
|
|
SPDX-License-Identifier: LGPL-2.1-or-later
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
# The File Descriptor Store
|
|
|
|
|
|
|
|
|
|
*TL;DR: The systemd service manager may optionally maintain a set of file
|
2023-09-19 20:22:49 +00:00
|
|
|
|
descriptors for each service. Those file descriptors are under control of the
|
|
|
|
|
service. Storing file descriptors in the manager makes is easier to restart
|
|
|
|
|
services without dropping connections or losing state.*
|
2023-09-18 11:33:06 +00:00
|
|
|
|
|
|
|
|
|
Since its inception `systemd` has supported the *socket* *activation*
|
|
|
|
|
mechanism: the service manager creates and listens on some sockets (and similar
|
|
|
|
|
UNIX file descriptors) on behalf of a service, and then passes them to the
|
|
|
|
|
service during activation of the service via UNIX file descriptor (short: *fd*)
|
|
|
|
|
passing over `execve()`. This is primarily exposed in the
|
|
|
|
|
[.socket](https://www.freedesktop.org/software/systemd/man/systemd.socket.html)
|
|
|
|
|
unit type.
|
|
|
|
|
|
|
|
|
|
The *file* *descriptor* *store* (short: *fdstore*) extends this concept, and
|
|
|
|
|
allows services to *upload* during runtime additional fds to the service
|
|
|
|
|
manager that it shall keep on its behalf. File descriptors are passed back to
|
|
|
|
|
the service on subsequent activations, the same way as any socket activation
|
|
|
|
|
fds are passed.
|
|
|
|
|
|
|
|
|
|
If a service fd is passed to the fdstore logic of the service manager it only
|
|
|
|
|
maintains a duplicate of it (in the sense of UNIX
|
|
|
|
|
[`dup(2)`](https://man7.org/linux/man-pages/man2/dup.2.html)), the fd remains
|
|
|
|
|
also in possession of the service itself, and it may (and is expected to)
|
|
|
|
|
invoke any operations on it that it likes.
|
|
|
|
|
|
2023-09-19 20:22:49 +00:00
|
|
|
|
The primary use-case of this logic is to permit services to restart seamlessly
|
2023-09-18 11:33:06 +00:00
|
|
|
|
(for example to update them to a newer version), without losing execution
|
|
|
|
|
context, dropping pinned resources, terminating established connections or even
|
|
|
|
|
just momentarily losing connectivity. In fact, as the file descriptors can be
|
2023-09-19 20:22:49 +00:00
|
|
|
|
uploaded freely at any time during the service runtime, this can even be used
|
|
|
|
|
to implement services that robustly handle abnormal termination and can recover
|
2023-09-18 11:33:06 +00:00
|
|
|
|
from that without losing pinned resources.
|
|
|
|
|
|
|
|
|
|
Note that Linux supports the
|
|
|
|
|
[`memfd`](https://man7.org/linux/man-pages/man2/memfd_create.2.html) concept
|
|
|
|
|
that allows associating a memory-backed fd with arbitrary data. This may
|
|
|
|
|
conveniently be used to serialize service state into and then place in the
|
|
|
|
|
fdstore, in order to implement service restarts with full service state being
|
|
|
|
|
passed over.
|
|
|
|
|
|
2023-11-06 13:50:03 +00:00
|
|
|
|
## Basic Mechanism
|
2023-09-18 11:33:06 +00:00
|
|
|
|
|
|
|
|
|
The fdstore is enabled per-service via the
|
|
|
|
|
[`FileDescriptorStoreMax=`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#FileDescriptorStoreMax=)
|
|
|
|
|
service setting. It defaults to zero (which means the fdstore logic is turned
|
|
|
|
|
off), but can take an unsigned integer value that controls how many fds to
|
|
|
|
|
permit the service to upload to the service manager to keep simultaneously.
|
|
|
|
|
|
|
|
|
|
If set to values > 0, the fdstore is enabled. When invoked the service may now
|
|
|
|
|
(asynchronously) upload file descriptors to the fdstore via the
|
|
|
|
|
[`sd_pid_notify_with_fds()`](https://www.freedesktop.org/software/systemd/man/sd_pid_notify_with_fds.html)
|
2023-09-19 20:22:49 +00:00
|
|
|
|
API call (or an equivalent re-implementation). When uploading the fds it is
|
2023-09-18 11:33:06 +00:00
|
|
|
|
necessary to set the `FDSTORE=1` field in the message, to indicate what the fd
|
|
|
|
|
is intended for. It's recommended to also set the `FDNAME=…` field to any
|
|
|
|
|
string of choice, which may be used to identify the fd later.
|
|
|
|
|
|
|
|
|
|
Whenever the service is restarted the fds in its fdstore will be passed to the
|
|
|
|
|
new instance following the same protocol as for socket activation fds. i.e. the
|
|
|
|
|
`$LISTEN_FDS`, `$LISTEN_PIDS`, `$LISTEN_FDNAMES` environment variables will be
|
|
|
|
|
set (the latter will be populated from the `FDNAME=…` field mentioned
|
|
|
|
|
above). See
|
|
|
|
|
[`sd_listen_fds()`](https://www.freedesktop.org/software/systemd/man/sd_listen_fds.html)
|
|
|
|
|
for details on receiving such fds in a service. (Note that the name set in
|
|
|
|
|
`FDNAME=…` does not need to be unique, which is useful when operating with
|
|
|
|
|
multiple fully equivalent sockets or similar, for example for a service that
|
|
|
|
|
both operates on IPv4 and IPv6 and treats both more or less the same.).
|
|
|
|
|
|
|
|
|
|
And that's already the gist of it.
|
|
|
|
|
|
2023-11-06 13:50:03 +00:00
|
|
|
|
## Seamless Service Restarts
|
2023-09-18 11:33:06 +00:00
|
|
|
|
|
|
|
|
|
A system service that provides a client-facing interface that shall be able to
|
|
|
|
|
seamlessly restart can make use of this in a scheme like the following:
|
|
|
|
|
whenever a new connection comes in it uploads its fd immediately into its
|
2023-09-19 08:02:05 +00:00
|
|
|
|
fdstore. At appropriate times it also serializes its state into a memfd it
|
2023-09-18 11:33:06 +00:00
|
|
|
|
uploads to the service manager — either whenever the state changed
|
|
|
|
|
sufficiently, or simply right before it terminates. (The latter of course means
|
|
|
|
|
that state only survives on *clean* restarts and abnormal termination implies the
|
|
|
|
|
state is lost completely — while the former would mean there's a good chance the
|
|
|
|
|
next restart after an abnormal termination could continue where it left off
|
|
|
|
|
with only some context lost.)
|
|
|
|
|
|
|
|
|
|
Using the fdstore for such seamless service restarts is generally recommended
|
|
|
|
|
over implementations that attempt to leave a process from the old service
|
|
|
|
|
instance around until after the new instance already started, so that the old
|
|
|
|
|
then communicates with the new service instance, and passes the fds over
|
|
|
|
|
directly. Typically service restarts are a mechanism for implementing *code*
|
|
|
|
|
updates, hence leaving two version of the service running at the same time is
|
|
|
|
|
generally problematic. It also collides with the systemd service manager's
|
|
|
|
|
general principle of guaranteeing a pristine execution environment, a pristine
|
|
|
|
|
security context, and a pristine resource management context for freshly
|
2023-09-19 20:22:49 +00:00
|
|
|
|
started services, without uncontrolled "leftovers" from previous runs. For
|
2023-09-18 11:33:06 +00:00
|
|
|
|
example: leaving processes from previous runs generally negatively affects
|
2023-11-06 13:52:01 +00:00
|
|
|
|
lifecycle management (i.e. `KillMode=none` must be set), which disables large
|
2023-09-18 11:33:06 +00:00
|
|
|
|
parts of the service managers state tracking, resource management (as resource
|
|
|
|
|
counters cannot start at zero during service activation anymore, since the old
|
|
|
|
|
processes remaining skew them), security policies (as processes with possibly
|
2023-09-19 08:02:05 +00:00
|
|
|
|
out-of-date security policies – SElinux, AppArmor, any LSM, seccomp, BPF — in
|
2023-09-18 11:33:06 +00:00
|
|
|
|
effect remain), and similar.
|
|
|
|
|
|
2023-11-06 13:50:03 +00:00
|
|
|
|
## File Descriptor Store Lifecycle
|
2023-09-18 11:33:06 +00:00
|
|
|
|
|
|
|
|
|
By default any file descriptor stored in the fdstore for which a `POLLHUP` or
|
|
|
|
|
`POLLERR` is seen is automatically closed and removed from the fdstore. This
|
2023-09-19 20:22:49 +00:00
|
|
|
|
behavior can be turned off, by setting the `FDPOLL=0` field when uploading the
|
2023-09-18 11:33:06 +00:00
|
|
|
|
fd via `sd_notify_with_fds()`.
|
|
|
|
|
|
|
|
|
|
The fdstore is automatically closed whenever the service is fully deactivated
|
|
|
|
|
and no jobs are queued for it anymore. This means that a restart job for a
|
|
|
|
|
service will leave the fdstore intact, but a separate stop and start job for
|
|
|
|
|
it — executed synchronously one after the other — will likely not.
|
|
|
|
|
|
2023-09-19 20:22:49 +00:00
|
|
|
|
This behavior can be modified via the
|
2023-09-18 11:33:06 +00:00
|
|
|
|
[`FileDescriptorStorePreserve=`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#FileDescriptorStorePreserve=)
|
|
|
|
|
setting in service unit files. If set to `yes` the fdstore will be kept as long
|
|
|
|
|
as the service definition is loaded into memory by the service manager, i.e. as
|
|
|
|
|
long as at least one other loaded unit has a reference to it.
|
|
|
|
|
|
|
|
|
|
The `systemctl clean --what=fdstore …` command may be used to explicitly clear
|
|
|
|
|
the fdstore of a service. This is only allowed when the service is fully
|
|
|
|
|
deactivated, and is hence primarily useful in case
|
|
|
|
|
`FileDescriptorStorePreserve=yes` is set (because the fdstore is otherwise
|
|
|
|
|
fully closed anyway in this state).
|
|
|
|
|
|
|
|
|
|
Individual file descriptors may be removed from the fdstore via the
|
|
|
|
|
`sd_notify()` mechanism, by sending an `FDSTOREREMOVE=1` message, accompanied
|
|
|
|
|
by an `FDNAME=…` string identifying the fds to remove. (The name does not have
|
|
|
|
|
to be unique, as mentioned, in which case *all* matching fds are
|
|
|
|
|
closed). Generally it's a good idea to send such messages to the service
|
|
|
|
|
manager during initialization of the service whenever an unrecognized fd is
|
|
|
|
|
received, to make the service robust for code updates: if an old version
|
|
|
|
|
uploaded an fd that the new version doesn't recognize anymore it's good idea to
|
|
|
|
|
close it both in the service and in the fdstore.
|
|
|
|
|
|
2023-09-19 20:22:49 +00:00
|
|
|
|
Note that storing a duplicate of an fd in the fdstore means the resource pinned
|
|
|
|
|
by the fd remains pinned even if the service closes its duplicate of the
|
|
|
|
|
fd. This in particular means that peers on a connection socket uploaded this
|
|
|
|
|
way will not receive an automatic `POLLHUP` event anymore if the service code
|
|
|
|
|
issues `close()` on the socket. It must accompany it with an `FDSTOREREMOVE=1`
|
|
|
|
|
notification to the service manager, so that the fd is comprehensively closed.
|
2023-09-18 11:33:06 +00:00
|
|
|
|
|
2023-11-06 13:50:03 +00:00
|
|
|
|
## Access Control
|
2023-09-18 11:33:06 +00:00
|
|
|
|
|
|
|
|
|
Access to the fds in the file descriptor store is generally restricted to the
|
|
|
|
|
service code itself. Pushing fds into or removing fds from the fdstore is
|
|
|
|
|
subject to the access control restrictions of any other `sd_notify()` message,
|
|
|
|
|
which is controlled via
|
|
|
|
|
[`NotifyAccess=`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#NotifyAccess=).
|
|
|
|
|
|
|
|
|
|
By default only the main service process hence can push/remove fds, but by
|
2023-10-17 10:54:07 +00:00
|
|
|
|
setting `NotifyAccess=all` this may be relaxed to allow arbitrary service
|
2023-09-18 11:33:06 +00:00
|
|
|
|
child processes to do the same.
|
|
|
|
|
|
2023-11-06 13:50:03 +00:00
|
|
|
|
## Soft Reboot
|
2023-09-18 11:33:06 +00:00
|
|
|
|
|
|
|
|
|
The fdstore is particularly interesting in [soft
|
|
|
|
|
reboot](https://www.freedesktop.org/software/systemd/man/systemd-soft-reboot.service.html)
|
|
|
|
|
scenarios, as per `systemctl soft-reboot` (which restarts userspace like in a
|
|
|
|
|
real reboot, but leaves the kernel running). File descriptor stores that remain
|
|
|
|
|
loaded at the very end of the system cycle — just before the soft-reboot – are
|
|
|
|
|
passed over to the next system cycle, and propagated to services they originate
|
|
|
|
|
from there. This enables updating the full userspace of a system during
|
|
|
|
|
runtime, fully replacing all processes without losing pinning resources,
|
|
|
|
|
interrupting connectivity or established connections and similar.
|
|
|
|
|
|
|
|
|
|
This mechanism can be enabled either by making sure the service survives until
|
|
|
|
|
the very end (i.e. by setting `DefaultDependencies=no` so that it keeps running
|
|
|
|
|
for the whole system lifetime without being regularly deactivated at shutdown)
|
2023-09-19 20:22:49 +00:00
|
|
|
|
or by setting `FileDescriptorStorePreserve=yes` (and referencing the unit
|
2023-09-19 08:02:05 +00:00
|
|
|
|
continuously).
|
2023-09-18 11:33:06 +00:00
|
|
|
|
|
2023-09-19 20:22:49 +00:00
|
|
|
|
For further details see [Resource
|
|
|
|
|
Pass-Through](https://www.freedesktop.org/software/systemd/man/systemd-soft-reboot.service.html#Resource%20Pass-Through).
|
|
|
|
|
|
2023-11-06 13:50:03 +00:00
|
|
|
|
## Initrd Transitions
|
2023-09-19 20:22:49 +00:00
|
|
|
|
|
2023-09-22 13:14:40 +00:00
|
|
|
|
The fdstore may also be used to pass file descriptors for resources from the
|
|
|
|
|
initrd context to the main system. Restarting all processes after the
|
|
|
|
|
transition is important as code running in the initrd should generally not
|
|
|
|
|
continue to run after the switch to the host file system, since that pins
|
|
|
|
|
backing files from the initrd, and the initrd might contain different versions
|
|
|
|
|
of programs than the host.
|
2023-09-19 20:22:49 +00:00
|
|
|
|
|
|
|
|
|
Any service that still runs during the initrd→host transition will have its
|
|
|
|
|
fdstore passed over the transition, where it will be passed back to any queued
|
|
|
|
|
services of the same name.
|
|
|
|
|
|
|
|
|
|
The soft reboot cycle transition and the initrd→host transition are
|
|
|
|
|
semantically very similar, hence similar rules apply, and in both cases it is
|
|
|
|
|
recommended to use the fdstore if pinned resources shall be passed over.
|
|
|
|
|
|
2023-11-06 13:50:03 +00:00
|
|
|
|
## Debugging
|
2023-09-18 11:33:06 +00:00
|
|
|
|
|
|
|
|
|
The
|
|
|
|
|
[`systemd-analyze`](https://www.freedesktop.org/software/systemd/man/systemd-analyze.html#systemd-analyze%20fdstore%20%5BUNIT...%5D)
|
|
|
|
|
tool may be used to list the current contents of the fdstore of any running
|
|
|
|
|
service.
|
|
|
|
|
|
|
|
|
|
The
|
|
|
|
|
[`systemd-run`](https://www.freedesktop.org/software/systemd/man/systemd-run.html)
|
|
|
|
|
tool may be used to quickly start a testing binary or similar as a service. Use
|
|
|
|
|
`-p FileDescriptorStore=4711` to enable the fdstore from `systemd-run`'s
|
|
|
|
|
command line. By using the `-t` switch you can even interactively communicate
|
|
|
|
|
via processes spawned that way, via the TTY.
|