systemd/man/systemd-nsresourced.service.xml
Lennart Poettering 8aee931e7a nsresourced: add new daemon for granting clients user namespaces and assigning resources to them
This adds a small, socket-activated Varlink daemon that can delegate UID
ranges for user namespaces to clients asking for it.

The primary call is AllocateUserRange() where the user passes in an
uninitialized userns fd, which is then set up.

There are other calls that allow assigning a mount fd to a userns
allocated that way, to set up permissions for a cgroup subtree, and to
allocate a veth for such a user namespace.

Since the UID assignments are supposed to be transitive, i.e. not
permanent, care is taken to ensure that users cannot create inodes owned
by these UIDs, so that persistancy cannot be acquired. This is
implemented via a BPF-LSM module that ensures that any member of a
userns allocated that way cannot create files unless the mount it
operates on is owned by the userns itself, or is explicitly
allowelisted.

BPF LSM program with contributions from Alexei Starovoitov.
2024-04-06 16:08:24 +02:00

81 lines
4.3 KiB
XML

<?xml version='1.0'?> <!--*-nxml-*-->
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
<!-- SPDX-License-Identifier: LGPL-2.1-or-later -->
<refentry id="systemd-nsresourced.service" conditional='ENABLE_NSRESOURCED'>
<refentryinfo>
<title>systemd-nsresourced.service</title>
<productname>systemd</productname>
</refentryinfo>
<refmeta>
<refentrytitle>systemd-nsresourced.service</refentrytitle>
<manvolnum>8</manvolnum>
</refmeta>
<refnamediv>
<refname>systemd-nsresourced.service</refname>
<refname>systemd-nsresourced</refname>
<refpurpose>User Namespace Resource Delegation Service</refpurpose>
</refnamediv>
<refsynopsisdiv>
<para><filename>systemd-nsresourced.service</filename></para>
<para><filename>/usr/lib/systemd/systemd-nsresourced</filename></para>
</refsynopsisdiv>
<refsect1>
<title>Description</title>
<para><command>systemd-nsresourced</command> is a system service that permits transient delegation of a a
UID/GID range to a user namespace (see <citerefentry
project='man-pages'><refentrytitle>user_namespaces</refentrytitle><manvolnum>7</manvolnum></citerefentry>)
allocated by a client, via a Varlink IPC API.</para>
<para>Unprivileged clients may allocate a user namespace, and then request a UID/GID range to be assigned
to it via this service. The user namespace may then be used to run containers and other sandboxes, and/or
apply it to an id-mapped mount.</para>
<para>Allocations of UIDs/GIDs this way are transient: when a user namespace goes away, its UID/GID range
is returned to the pool of available ranges. In order to ensure that clients cannot gain persistency in
their transient UID/GID range a BPF-LSM based policy is enforced that ensures that user namespaces set up
this way can only write to file systems they allocate themselves or that are explicitly allowlisted via
<command>systemd-nsresourced</command>.</para>
<para><command>systemd-nsresourced</command> automatically ensures that any registered UID ranges show up
in the system's NSS database via the <ulink url="https://systemd.io/USER_GROUP_API">User/Group Record
Lookup API via Varlink</ulink>.</para>
<para>Currently, only UID/GID ranges consisting of either exactly 1 or exactly 65536 UIDs/GIDs can be
registered with this service. Moreover, UIDs and GIDs are always allocated together, and
symmetrically.</para>
<para>The service provides API calls to allowlist mounts (referenced via their mount file descriptors as
per Linux <function>fsmount()</function> API), to pass ownership of a cgroup subtree to the user
namespace and to delegate a virtual Ethernet device pair to the user namespace. When used in combination
this is sufficient to implement fully unprivileged container environments, as implemented by
<citerefentry><refentrytitle>systemd-nspawn</refentrytitle><manvolnum>1</manvolnum></citerefentry>, fully
unprivileged <varname>RootImage=</varname> (see
<citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry>) or
fully unprivileged disk image tools such as
<citerefentry><refentrytitle>systemd-dissect</refentrytitle><manvolnum>1</manvolnum></citerefentry>.</para>
<para>This service provides one <ulink url="https://varlink.org/">Varlink</ulink> service:
<constant>io.systemd.NamespaceResource</constant> allows registering user namespaces, and assign mounts,
cgroups and network interfaces to it.</para>
</refsect1>
<refsect1>
<title>See Also</title>
<para>
<citerefentry><refentrytitle>systemd</refentrytitle><manvolnum>1</manvolnum></citerefentry>,
<citerefentry><refentrytitle>systemd-mountfsd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>,
<citerefentry><refentrytitle>systemd-nspawn</refentrytitle><manvolnum>1</manvolnum></citerefentry>,
<citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
<citerefentry><refentrytitle>systemd-dissect</refentrytitle><manvolnum>1</manvolnum></citerefentry>,
<citerefentry project='man-pages'><refentrytitle>user_namespaces</refentrytitle><manvolnum>7</manvolnum></citerefentry>
</para>
</refsect1>
</refentry>