Commit graph

72391 commits

Author SHA1 Message Date
Yu Watanabe 6952ebae3b tree-wide: drop several remaining license headers
And downgrade the license of utf8.c to LGPL-2.0-or-later, to follow the
original license.
2024-04-08 10:14:50 +02:00
Yu Watanabe caaf95985f mountfsd: fix typo
Follow-up for 702a52f4b5.
2024-04-08 09:22:06 +09:00
Yu Watanabe 1ea9151e6c nsresourced: fix typo
Follow-up for 8aee931e7a.
2024-04-08 09:20:20 +09:00
Yu Watanabe a1952a5c79 dissect: fix typo
Follow-up for 9444e54e56.
2024-04-08 09:17:53 +09:00
Yu Watanabe 693a28d748 nspawn: fix typo
Follow-up for 0af7e29434.
2024-04-08 09:12:08 +09:00
Luca Boccassi 69484aa6c2
Merge pull request #32136 from YHNdnzj/nextroot-auto-mountpoint
systemctl-logind: auto soft-reboot only if /run/nextroot/ is mountpoint
2024-04-07 23:32:18 +01:00
Luca Boccassi 1b0cc135d0 test-execute: check for s390x first and duplicate test
s390x will define both s390x and s390, so exec-personality-s390.service is ran
in both cases but fails on s390x, as the personality returned is s390x.
Split the test and check specifically for s390x.
2024-04-08 07:29:06 +09:00
Mike Yuan b8b0704ce9 hibernate-util: check 'noresume' before reading resume setting
Also hibernation_is_safe() should really take this
into consideration too.
2024-04-07 23:28:56 +01:00
Luca Boccassi 7a5edb0795
Merge pull request #26826 from poettering/mntfsd
unprivileged DDI mounts + dynamic userns range allocation via IPC
2024-04-07 19:43:34 +01:00
Daan De Meyer 84affd46d5 mkosi: Install dnf5 in Fedora image 2024-04-07 19:09:11 +02:00
Luca Boccassi 7190be5dd4
Merge pull request #32135 from keszybz/compiler-warning-cleanup
Compiler warning cleanup
2024-04-07 16:33:38 +01:00
Mike Yuan 9eb7f4cebf
systemctl-logind: auto soft-reboot only if /run/nextroot/ is mountpoint
Consider the following case: a user sets up a minimum rootfs for
file system maintenance work in /run/nextroot/ dir directly. When
they're done, they expect 'systemctl reboot' to perform a full reboot.
But they keep soft-rebooting back to the tmpfs root, until they
find out about $SYSTEMCTL_SKIP_AUTO_SOFT_REBOOT.

So currently, when /run/nextroot/ is a normal dir, pid1 automatically
turns it into a bind mount to soft-reboot into. This is good, but when
combined with automatic soft-reboot it has an arguably unexpected
behavior, since /run/nextroot/ can never go away in such a case.
OTOH, if /run/nextroot/ is a mountpoint in the first place, the mount
is *moved* so a second reboot would not trigger auto soft-reboot.
Let's just make things more friendly to users, and do auto soft-reboot
only if /run/nextroot/ is also a mountpoint.
2024-04-07 20:02:40 +08:00
Mike Yuan 706e9a4bc7
logind-dbus: use FLAGS_SET more 2024-04-07 19:56:58 +08:00
Mike Yuan b7e4e152cf core: use log_unit_debug in *_set_state 2024-04-07 10:20:39 +01:00
Luca Boccassi 9dd174dc23 run: query for SoftRebootsCount only for system scope runs
Only the system manager records soft reboots, and the user session is
restarted anyway so it doesn't suffer from the ID clash issue

Follow-up for ed35851693
2024-04-07 10:20:04 +01:00
Zbigniew Jędrzejewski-Szmek 41733186c4 sd-bus: rework assert to make the gcc happy
With gcc-14.0.1-0.13.fc40, when compiling with -O2, the compiler doesn't understand
that sd_bus_error_setf() always returns negative on error when <name> is provided:

[28/576] Compiling C object systemd-resolved.p/src_resolve_resolved-bus.c.o
../src/resolve/resolved-bus.c: In function ‘call_link_method’:
../src/resolve/resolved-bus.c:1763:16: warning: ‘l’ may be used uninitialized [-Wmaybe-uninitialized]
 1763 |         return handler(message, l, error);
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~
../src/resolve/resolved-bus.c:1749:15: note: ‘l’ was declared here
 1749 |         Link *l;
      |               ^
../src/resolve/resolved-bus.c: In function ‘bus_method_get_link’:
../src/resolve/resolved-bus.c:1822:13: warning: ‘l’ may be used uninitialized [-Wmaybe-uninitialized]
 1822 |         p = link_bus_path(l);
      |             ^~~~~~~~~~~~~~~~
../src/resolve/resolved-bus.c:1810:15: note: ‘l’ was declared here
 1810 |         Link *l;
      |               ^
...

Let's make the assertion a bit more explicit. With this, the warning goes away,
but I think it's more obvious to a human reader too.
2024-04-07 11:15:19 +02:00
Zbigniew Jędrzejewski-Szmek 6a4607a3c8 ask-password: minor shortening 2024-04-07 11:15:19 +02:00
Zbigniew Jędrzejewski-Szmek 741f6ae39b core: silence gcc warning about unitialized variable
When compiled with -O2, the compiler is not happy about dynamic_user_pop() and
would warn about the output variables not being set. It does have a point:
we were doing a cast from ssize_t to int, and theoretically there could be
wraparound. So let's add an explicit check that the cast to int is fine.

[540/2509] Compiling C object src/core/libsystemd-core-256.so.p/dynamic-user.c.o
../src/core/dynamic-user.c: In function ‘dynamic_user_close.isra’:
../src/core/dynamic-user.c:580:9: warning: ‘uid’ may be used uninitialized [-Wmaybe-uninitialized]
  580 |         unlink_uid_lock(lock_fd, uid, d->name);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/core/dynamic-user.c:560:15: note: ‘uid’ was declared here
  560 |         uid_t uid;
      |               ^~~
../src/core/dynamic-user.c: In function ‘dynamic_user_realize’:
../src/core/dynamic-user.c:476:29: warning: ‘new_uid’ may be used uninitialized [-Wmaybe-uninitialized]
  476 |                         num = new_uid;
      |                         ~~~~^~~~~~~~~
../src/core/dynamic-user.c:398:23: note: ‘new_uid’ was declared here
  398 |                 uid_t new_uid;
      |                       ^~~~~~~
2024-04-07 11:15:19 +02:00
Lennart Poettering 18098d7dec update TODO 2024-04-06 16:09:10 +02:00
Lennart Poettering 625646815b test: add integration test for unpriv mountfsd/nsresourced 2024-04-06 16:09:10 +02:00
Lennart Poettering 0af7e29434 nspawn: make nspawn work without privileges 2024-04-06 16:08:24 +02:00
Lennart Poettering 046a1487db core: implement RootImage= via mountfsd in unprivileged environments 2024-04-06 16:08:24 +02:00
Lennart Poettering fdec6d1560 dissect-tool: allow systemd-dissect to talk to mountfsd 2024-04-06 16:08:24 +02:00
Lennart Poettering fe7d8235e1 dissect-image: add a generic varlink client side for mountfsd 2024-04-06 16:08:24 +02:00
Lennart Poettering 702a52f4b5 mountfsd: add new systemd-mountfsd component 2024-04-06 16:08:24 +02:00
Lennart Poettering 54452c7b2a nsresourced: add client-side helpers around nsresourced APIs
This adds simple functions that wrap the Varlink IPC calls.
2024-04-06 16:08:24 +02:00
Lennart Poettering 8aee931e7a nsresourced: add new daemon for granting clients user namespaces and assigning resources to them
This adds a small, socket-activated Varlink daemon that can delegate UID
ranges for user namespaces to clients asking for it.

The primary call is AllocateUserRange() where the user passes in an
uninitialized userns fd, which is then set up.

There are other calls that allow assigning a mount fd to a userns
allocated that way, to set up permissions for a cgroup subtree, and to
allocate a veth for such a user namespace.

Since the UID assignments are supposed to be transitive, i.e. not
permanent, care is taken to ensure that users cannot create inodes owned
by these UIDs, so that persistancy cannot be acquired. This is
implemented via a BPF-LSM module that ensures that any member of a
userns allocated that way cannot create files unless the mount it
operates on is owned by the userns itself, or is explicitly
allowelisted.

BPF LSM program with contributions from Alexei Starovoitov.
2024-04-06 16:08:24 +02:00
Lennart Poettering 593428680c build-sys: pick up vmlinux.h from running kernel BTF or user 2024-04-06 16:08:24 +02:00
Lennart Poettering 5300fe74a1 dissect-image: document one more dissected_image_decrypt() error code 2024-04-06 16:08:23 +02:00
Lennart Poettering 44e3097dff dissect-image: make dissected_image_acquire_metadata() operate within a userns if possible
This opens the door for making the call work without privileges: if we
pass in a userns fd and DissectedImage that has mount fds then we can
acquire all information without privs.
2024-04-06 16:08:23 +02:00
Lennart Poettering 77740bddbe dissect-image: add a new helper that checks if VeritySettings has anything set at all 2024-04-06 16:08:23 +02:00
Lennart Poettering 9444e54e56 dissect-image: add dissected_image_close() that closes all references to resources 2024-04-06 16:08:23 +02:00
Lennart Poettering f7178a04db discover-image: export search paths array
This way we can use it to validate image paths later.
2024-04-06 16:08:23 +02:00
Lennart Poettering b2dcfd8e11 cgroup-setup: add fd-based version of cg_attach() 2024-04-06 16:08:23 +02:00
Lennart Poettering 3b2874952f cgroup-util: add helpers for opening cgroup by id 2024-04-06 16:08:23 +02:00
Lennart Poettering cb1b813f0d lock-util: make global lock return parameter to image_path_lock() optional
When adding unprivileged nspawn support we don't really want a global
lock file, since we cannot even access the dir they are stored in, hence
make the concept optional.

Some minor other modernizations.
2024-04-06 16:08:23 +02:00
Lennart Poettering 0f716ace41 bpf-dlopen: pick up more symbols from libbpf 2024-04-06 16:08:23 +02:00
Lennart Poettering e4f62e7a12 namespace-util: add new helper is_our_namespace() 2024-04-06 16:08:23 +02:00
Lennart Poettering 574a07c79d namespace-util: add namespace_open_by_type() helper 2024-04-06 16:08:23 +02:00
Lennart Poettering 2ad2f0c89e namespace-util: add detach_mount_namespace_userns() 2024-04-06 16:08:23 +02:00
Lennart Poettering e02fb2099c namespace-util: add helper for allocating an empty userns fd 2024-04-06 16:08:23 +02:00
Lennart Poettering 5783b4a954 namespace-util: add detach_mount_namespace_harder()
This is just like detach_mount_namespace() but if need be uses unpriv
user namespaces to be able to execute CLONE_NEWNS.
2024-04-06 16:08:23 +02:00
Lennart Poettering afdd0efa63 uid-range: add some basic operations on UidRange objects
Helpers to compare and get size, and whether the object is empty.
2024-04-06 16:08:23 +02:00
Lennart Poettering 20ba086e77 uid-range: add new uid_range_load_userns_by_fd() helper
This is similar to uid_range_load_userns() but instead of reading the
uid_map off a process it reads it off a userns fd.

(Of course the kernel has no API for this right now, hence we fork off a
throw-away process which joins the user namespace, and then read off the
data from there.)
2024-04-06 16:08:23 +02:00
Lennart Poettering 6ebb53d945 uid-range: optionally load outside view of UID range from uid_map procfs file 2024-04-06 16:08:23 +02:00
Lennart Poettering 5bff40e719 uid-range: add uid_range_overlaps() helper 2024-04-06 16:08:23 +02:00
Lennart Poettering 2251e4ef90 image-policy: add a new image_policy_intersect() call
This new call takes two image policy objects and generates an
"intersection" policy, i.e. only allows what is allowed by both. Or in
other words it conceptually implements a binary AND of the policy flags.
(Except that it's a bit harder, due to normalization, and underspecified
flags).

We can use this later for mountfsd: a client can specify a policy, and
mountfsd can specify another policy, and we'll then apply only what both
allow.

Note that a policy generated like this might be invalid. For example, if
one policy says root must exist and be verity or luks protected, and the
other policy says root must be absent, then the intersection is invalid,
since one policy only allows what the other prohibits and vice versa.
We'll return a clear error code in that case (ENAVAIL). (This is because
we simply don't allow encoding such impossible policies in an
ImagePolicy structure, for good reasons.)
2024-04-06 16:08:23 +02:00
Lennart Poettering b219dcd45a varlink: add varlink_peek_dup_fd() helper
This new call is like varlink_peek_fd() (i.e. gets an fd out of the
connection but leaving it also in there), and combines ith with
F_DUPFD_CLOEXEC to make a copy of it.

We previously already had varlink_dup_fd() which was a duplicating
version for pushing an fd *into* the connection. To reduce confusion,
let's rename that one varlink_push_dup_fd() to make the symmetry to
valrink_push_fd() clear so that we have no:

varlink_peer_push_fd()        → put fd in without dup'ing
varlink_peer_push_dup_fd()    → same with F_DUPFD_CLOEXEC
varlink_peer_peek_fd()        → get fd out without dup'ing
varlink_peer_peek_dup_fd()    → same with F_DUPFD_CLOEXEC
2024-04-06 16:08:23 +02:00
Lennart Poettering 52bd61373b varlink: add varlink_get_peer_gid() helper 2024-04-06 16:08:23 +02:00
Frantisek Sumsal b3a8264831 test: improve debug-ability of test-execute
Since e56a8790a0 debugging test-execute fails has been a royal PITA, since
we ditch all potentially useful output from the test units (that, for
the most part, run `sh -x ...`). Let's improve the situation a bit by
setting EXEC_OUTPUT_NULL only when running the single test case that
needs it, and inheriting stdout otherwise.

For example, with a purposefully introduced error we get this output
with this patch:
exec-personality-x86-64.service: About to execute: sh -x -c "c=\$\$(uname -m); test \"\$\$c\" = \"foo_bar\""
Serializing sd-executor-state to memfd.
...
        Personality: x86-64
        LockPersonality: no
        SystemCallErrorNumber: kill
++ uname -m
+ c=x86_64
+ test x86_64 = foo_bar
Received SIGCHLD from PID 1520588 (sh).
Child 1520588 (sh) died (code=exited, status=1/FAILURE)
exec-personality-x86-64.service: Child 1520588 belongs to exec-personality-x86-64.service.
exec-personality-x86-64.service: Main process exited, code=exited, status=1/FAILURE
exec-personality-x86-64.service: Failed with result 'exit-code'.
...
        Exit Status: 1
src/test/test-execute.c:456:test_exec_personality: exec-personality-x86-64.service: can_unshare=yes: exit status 1, expected 0
(test-execute-root) terminated by signal ABRT.
Assertion 'r >= 0' failed at src/test/test-execute.c:1433, function prepare_ns(). Aborting.
Aborted

But without it, we'd miss the most important part:
exec-personality-x86-64.service: About to execute: sh -x -c "c=\$\$(uname -m); test \"\$\$c\" = \"foo_bar\""
Serializing sd-executor-state to memfd.
...
        Personality: x86-64
        LockPersonality: no
        SystemCallErrorNumber: kill
Received SIGCHLD from PID 1521365 (sh).
Child 1521365 (sh) died (code=exited, status=1/FAILURE)
exec-personality-x86-64.service: Child 1521365 belongs to exec-personality-x86-64.service.
exec-personality-x86-64.service: Main process exited, code=exited, status=1/FAILURE
exec-personality-x86-64.service: Failed with result 'exit-code'.
...
        Exit Status: 1
src/test/test-execute.c:456:test_exec_personality: exec-personality-x86-64.service: can_unshare=yes: exit status 1, expected 0
(test-execute-root) terminated by signal ABRT.
Assertion 'r >= 0' failed at src/test/test-execute.c:1433, function prepare_ns(). Aborting.
Aborted
2024-04-06 13:24:36 +01:00