Commit graph

1642 commits

Author SHA1 Message Date
Daan De Meyer df4d09f78e units: Order ldconfig after systemd-tmpfiles-setup.service
tmpfiles might be linking the configuration for ldconfig into /etc
so make sure it runs after it so that the configuration is guaranteed
to be in place.
2024-09-24 15:32:58 +02:00
Daan De Meyer 939137abb4 firstboot: Prompt for keymap
It's rather crucial to have a good firstboot experience that you
can immediately set the right keymap so let's make sure we prompt
for it.
2024-09-20 04:47:27 +09:00
Daan De Meyer e196136bc5 units: Order ldconfig.service after systemd-confext.service
The configuration files required by ldconfig could be put into
place by systemd-confext.service (ldconfig only looks in /etc) so
let's order the service after systemd-confext.service to make sure
any config files are in place before the service runs.
2024-09-12 20:20:53 +02:00
Matteo Croce 6d9ef22acd emit a warning in networkd if managed sysctls are changed
Monitor the sysctl set by networkd for writes, if a sysctl is
overwritten with a different value than the one we set, emit a warning.
Writes are detected with an eBPF program attached as BPF_CGROUP_SYSCTL
which reports the sysctl writes only in net/.

The eBPF program only reports sysctl writes from a different cgroup than networkd.
To do this, it uses the `bpf_current_task_under_cgroup_proto()` helper,
which will be available allowed in BPF_CGROUP_SYSCTL from kernel 6.12[1].

Loading a BPF_CGROUP_SYSCTL program requires the CAP_SYS_ADMIN capability,
so drop it just after the program load, whether it loads successfully or not.

Writes are logged but permitted, in future the functionality can be
extended to also deny writes to managed sysctls.

[1] https://lore.kernel.org/bpf/20240819162805.78235-3-technoboy85@gmail.com/
2024-09-11 23:07:00 +02:00
Yu Watanabe 214c2508f3
Merge pull request #34336 from yuwata/nspawn-fuse-follow-ups
nspawn: follow-ups for FUSE support
2024-09-10 14:32:09 +09:00
Yu Watanabe b86b90cec5 nspawn: sync DeviceAllow= setting with systemd-nspawn@.service
Follow-up for dc3223919f.
Addresses https://github.com/systemd/systemd/pull/34067#discussion_r1748592958.

Otherwise, containers started with and without --keep-unit option run in
different device policies.
2024-09-10 04:38:11 +09:00
Lennart Poettering 229d4a9806 shell: define three system credentials we can propagate into shell prompts and welcome messages 2024-09-09 19:03:48 +02:00
Luke T. Shumaker dc3223919f nspawn: enable FUSE in containers
Linux kernel v4.18 (2018-08-12) added user-namespace support to FUSE, and
bumped the FUSE version to 7.27 (see: da315f6e0398 (Merge tag
'fuse-update-4.18' of
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse, Linus Torvalds,
2018-06-07).  This means that on such kernels it is safe to enable FUSE in
nspawn containers.

In outer_child(), before calling copy_devnodes(), check the FUSE version to
decide whether enable (>=7.27) or disable (<7.27) FUSE in the container.  We
look at the FUSE version instead of the kernel version in order to enable FUSE
support on older-versioned kernels that may have the mentioned patchset
backported ([as requested by @poettering][1]).  However, I am not sure that
this is safe; user-namespace support is not a documented part of the FUSE
protocol, which is what FUSE_KERNEL_VERSION/FUSE_KERNEL_MINOR_VERSION are meant
to capture.  While the same patchset
 - added FUSE_ABORT_ERROR (which is all that the 7.27 version bump
   is documented as including),
 - bumped FUSE_KERNEL_MINOR_VERSION from 26 to 27, and
 - added user-namespace support
these 3 things are not inseparable; it is conceivable to me that a backport
could include the first 2 of those things and exclude the 3rd; perhaps it would
be safer to check the kernel version.

Do note that our get_fuse_version() function uses the fsopen() family of
syscalls, which were not added until Linux kernel v5.2 (2019-07-07); so if
nothing has been backported, then the minimum kernel version for FUSE-in-nspawn
is actually v5.2, not v4.18.

Pass whether or not to enable FUSE to copy_devnodes(); have copy_devnodes()
copy in /dev/fuse if enabled.

Pass whether or not to enable FUSE back over fd_outer_socket to run_container()
so that it can pass that to append_machine_properties() (via either
register_machine() or allocate_scope()); have append_machine_properties()
append "DeviceAllow=/dev/fuse rw" if enabled.

For testing, simply check that /dev/fuse can be opened for reading and writing,
but that actually reading from it fails with EPERM.  The test assumes that if
FUSE is supported (/dev/fuse exists), then the testsuite is running on a kernel
with FUSE >= 7.27; I am unsure how to go about writing a test that validates
that the version check disables FUSE on old kernels.

[1]: https://github.com/systemd/systemd/issues/17607#issuecomment-745418835

Closes #17607
2024-09-07 10:18:35 -06:00
Etienne Cordonnier 4ac1755be2 coredump: set ProtectHome to read-only
In 924453c225
ProtectHome was set to true for systemd-coredump in order to reduce risk, since an attacker could craft a malicious binary in order to compromise systemd-coredump.
At that point the object analysis was done in the main systemd-coredump process.
Because of this systemd-coredump is unable to product symbolicated call-stacks for binaries running under /home ("n/a" is shown instead of function names).

However, later in 61aea456c1 systemd-coredump was changed to do the object analysis in a forked process,
covering those security concerns.

Let's set ProtectHome to read-only so that systemd-coredump produces symbolicated call-stacks for processes running under /home.
2024-09-06 13:30:36 +02:00
Lennart Poettering e7c8d78e7f
units: don't set LISTEN_FDNAMES for varlink services explicitly
Now that FileDescriptorName= is properly honored by Accept=yes sockets,
this explicit override is pointless.
2024-08-26 15:40:15 +02:00
Adrian Vovk bf2c741fd7
sysupdate: Implement systemd-sysupdated dbus service
Co-authored-by: Tom Coldrick <thomas.coldrick@codethink.co.uk>
Co-authored-by: Abderrahim Kitouni <abderrahim.kitouni@codethink.co.uk>
2024-08-21 09:31:41 +01:00
Ronan Pigott 3d2157e707 units: drop "-p" flag from agetty's login options
This flag was added in db6aedab92 with the justification that locale
environment variables should be preserved by the user session. However,
the companion patch to drop the UnsetEnvironment= directive blocking
these variables was never merged, so the intended change was never
effected.

While the patch was ineffective toward its stated goal, the "-p" option
does have material negative consequences for the user session in
systemd — environment variables to support the use of
credentials and memory pressure directives, such as
$CREDENTIALS_DIRECTORY and $MEMORY_PRESSURE_WATCH, which are now
directly used by agetty and login, get leaked into the user session
potentially breaking applications that rely on these values.

E.g. systemd-ask-password fails from the tty when $CREDENTIALS_DIRECTORY
has been leaked from agetty, because it expects to be able to access
credentials in $CREDENTIALS_DIRECTORY.

This effectively reverts db6aedab92.

References: db6aedab92 (units: Tell login to preserve environment (#6023), 2017-05-24)
2024-08-15 16:49:02 +09:00
Daan De Meyer e97429902d units: Import tty specific credentials for each getty unit
As explained in the previous commit, this allows us to configure
agetty and login for individual ttys instead of globally.
2024-07-31 15:52:29 +02:00
Lennart Poettering 56ea3c262c units: bring agetty command lines back into sync
Let's always rely on our own TTY reset logic and tty disallocation/clear
screen logic, thus always pass --noclear and --noreset.

Also, bring the list of baud rates to try into sync for console-getty
and serial-getty (the former might or might not be connected to rs232,
we can't know, hence assume the worst, and copy what
serial-getty@.service does)
2024-07-19 11:44:04 +02:00
Luca Boccassi bbb0b72849 fsck: do not pull down mount units on soft-reboot
Otherwise they will pull down the disk too, which we don't want on soft-reboot
2024-07-09 20:59:35 +02:00
fwfy 496b4fa0e9
Remove extra period at the end of systemd-bsod's unit description. (#33632)
* Remove extra period at end of unit description.

Having an extra period at the end of this unit description makes log entries pertaining to it appear weirdly, as it seems the default expectation is that there is not to be a period at the end of a unit description.

e.g.: `systemd[1]: Started Displays emergency message in full screen..`
2024-07-06 10:17:20 +01:00
Lennart Poettering 29294d21cf units: add dep on systemd-logind.service by user@.service
Let's make sure logind is accessible by the time user@.service runs, and
that logind stays around as long as it does so.

Addresses an issue reported here:

https://lists.freedesktop.org/archives/systemd-devel/2024-June/050468.html

This addresses an issued introduced by
278e815bfa, which dropped the a dependency
from user@.service systemd-user-sessions.service without replacement.
While dropping that dependency does make sense, it should have been
replaced with the weaker dependency on systemd-logind.service, hence fix
that now.

user@.service is after all a logind concept, hence logind really should
be around for its lifetime.

systemd-user-sessions.service is a later milestone that only really
should apply to regular users (not root), hence it's too strong a
requirement.
2024-07-01 18:52:35 +02:00
Lennart Poettering f596658811 importd: allow activation in early boot, and make it socket activatable
Previously, importd was only accessible via D-Bus, which required it to
be a late boot service. Now that we have Varlink we can rearrange things
to become early-boot activated, just after the image directories are
mounted.

This will later allow us to have generator that auto-downloads images on
boot.
2024-06-25 09:57:42 +02:00
Lennart Poettering ec67cc9785 units: register vmspawn VMs started via systemd-vmspawn@.service by default with machined 2024-06-21 17:49:26 +02:00
Mike Yuan b5c8cc0a3b man,units: drop "temporary" from description of systemd-tmpfiles
Historically, systemd-tmpfiles was designed to manager temporary
files, but nowadays it has become a generic tool for managing
all kinds of files. To avoid user confusion, let's remove "temporary"
from the tool's description.

As discussed in #33349
2024-06-15 19:08:35 +02:00
Daan De Meyer d6518003f8 tpm2-setup: Don't fail if we can't access the TPM due to authorization failure
The TPM might be password/pin protected for various reasons even if
there is no SRK yet. Let's handle those cases gracefully instead of
failing the unit as it is enabled by default.
2024-06-12 18:31:21 +09:00
Daan De Meyer 4861eace12 presets: Don't enable systemd-homed-firstboot.service by default
Enabling this service by default means every CI image without a
regular user now gets stuck on first boot due to the password prompt
from systemd-homed-firstboot.service. Let's not enable the service
by default but instead require users to enable it explicitly if they
want its behavior.

Fixes #33249
2024-06-08 11:29:55 +01:00
Luca Boccassi d6243ebedd journald: enable persistent FD Store to fix logging during soft-reboot
A unit with StandardOutput=journal (the default) will get its stdout/stderr sockets
disconnected when journald stops, as the file descriptors on journald's side are
not preserved (it works on restart, as the FD Store keeps them open during restarts).
Set FileDescriptorStorePreserve=yes so that the journal FD's stay open during a soft
reboot, and applications don't get broken stdout/stderr.
2024-06-03 16:30:54 +01:00
Zbigniew Jędrzejewski-Szmek a37454bd90 man: update links to "API File Systems" 2024-05-28 14:48:56 +02:00
Zbigniew Jędrzejewski-Szmek d5c17aceb3 various: update links to more wiki pages 2024-05-28 14:48:53 +02:00
Yu Watanabe 7ae27cefd7 unit: also stop systemd-journal-flush.service on soft-reboot
After soft-reboot, /var/log/journal may be initially read-only,
and becomes writable a bit later. In such case, runtime journal is
initially opened by journald. Hence, we need to flush to /var when it is
ready.
2024-05-26 03:11:24 +09:00
Yu Watanabe 37143fdf5a units: stop systemd-journald before systemd-soft-reboot.service
Typically, soft-reboot.target is never reached. So, without this change,
systemd-journald may be killed by PID1 on soft-reboot, and may cause
journal corruption.
2024-05-23 00:08:14 +09:00
Yu Watanabe 03a41c41ee Revert "units: do not soft-reboot before soft-reboot.target reached"
This reverts commit 4263d7617f.

Still I think this is the way to go. But the change was merged after -rc2,
and still discussion is continued. So, at least now let's revert it,
and do that after v256-final is released if approved.
2024-05-23 00:06:30 +09:00
Yu Watanabe 1cd904bbe0 units: add JobTimeoutAction= to exit.target and friends
For consistency with other targets, e.g. poweroff.target or
reboot.target.
2024-05-18 01:28:14 +09:00
Yu Watanabe 4263d7617f units: do not soft-reboot before soft-reboot.target reached
Otherwise, at the time systemd-soft-reboot.service succeeds,
services which has Conflicts= and Before=soft-reboot.target may
not be stopped yet, and may be SIGKILLed.

Especially, systemd-journald.service has the dependencies, thus
journal may be corrupted. See #32223.

Follow-up for 13ffc60749.

Fixes #32834.
2024-05-17 12:31:00 +09:00
Yu Watanabe 11a55b15bf units: drop dependencies of soft-reboot.target from systemd-journald@.service
The service deos not have DefaultDependencies=no. Hence it has dependencies
of shutdown.target, and dependencies of soft-reboot.target are not
necessary.

Follow-up for f89985ca49.
2024-05-17 11:57:53 +09:00
Yu Watanabe 61628287bd journal: explicitly sync namespaced journals before stopping socket units
Otherwise, if a service unit that requests LogNamespace= stopped before
systemd-journald@.service is started, logs generated by the service will be
lost, as systemd-journald@.socket is stopped and
systemd-journald@.service will never started.

To prevent the issue, let's introduce another implicit dependency to
a oneshot service that explicitly synchronizes a namespaced journal file
when the log namespace is not needed anymore.

Fixes #32604.
2024-05-02 19:41:01 +02:00
Dmitry V. Levin c309b9e9c3 treewide: fix a few typos in NEWS, docs, comments, and log messages 2024-04-27 12:11:13 +02:00
Luca Boccassi 1ac79a1937 units: add Before=shutdown.target to systemd-networkd-persistent-storage.service
It's ordered with networkd, but just in case. Lintian complains:

W: systemd: systemd-service-file-shutdown-problems [usr/lib/systemd/system/systemd-networkd-persistent-storage.service]

Follow-up for 91676b6458
2024-04-26 22:16:33 +02:00
Lennart Poettering ad7ac02035 units: merge two After= lines 2024-04-22 15:15:05 +02:00
Lennart Poettering a6e9c37f5e tpm2-setup-early: order against pcrphase-initrd
Right now systemd-tpm2-setup-early and systemd-pcrphase-initrd.service
are not ordered against each other. However, they require the same slow
resource to operate: the TPM2. If we allow them to access the device
simultaneously, the kernel resource manager like has to save/restore TPM
state while they operate, slowing things down further.

hence, let's avoid all this mess, and just order them against each other
so that the shared resource is first used in full by one and then by the
other.

I opted to order systemd-pcrphase-initrd before
systemd-tpm2-setup-early, since there's value in having the former as
early as possible in userspace, to be a good marker for the transition
from kernel to first userspace. I can see no benefit in the opposite
order however.
2024-04-22 14:47:58 +02:00
Yu Watanabe 5700e755a9 units: introduce systemd-udev-load-credentials.service 2024-04-16 09:45:43 +09:00
Lennart Poettering 27dd678d2d units: order repart after systemd-tpm2-setup-early.service
This mimics what we do for systemd-cryptsetup@.service (see
src/shared/generator.c), and makes sense since repart might lock up the
root volume against a TPM, which ideally has its SRK already set up by
then.

More importantly though, this ensures that we ordered correctly after
tpm2.target (which systemd-tpm2-setup-early.service has a dependency
on), for systems where the TPM drivers are not compiled into the kernel.

See: https://lists.freedesktop.org/archives/systemd-devel/2024-April/050201.html
2024-04-15 22:33:45 +02:00
Mike Yuan 40611863e4
units/systemd-boot-check-no-failures.service: drop unneeded dep on shutdown.target 2024-04-10 23:40:53 +08:00
Lennart Poettering 702a52f4b5 mountfsd: add new systemd-mountfsd component 2024-04-06 16:08:24 +02:00
Lennart Poettering 8aee931e7a nsresourced: add new daemon for granting clients user namespaces and assigning resources to them
This adds a small, socket-activated Varlink daemon that can delegate UID
ranges for user namespaces to clients asking for it.

The primary call is AllocateUserRange() where the user passes in an
uninitialized userns fd, which is then set up.

There are other calls that allow assigning a mount fd to a userns
allocated that way, to set up permissions for a cgroup subtree, and to
allocate a veth for such a user namespace.

Since the UID assignments are supposed to be transitive, i.e. not
permanent, care is taken to ensure that users cannot create inodes owned
by these UIDs, so that persistancy cannot be acquired. This is
implemented via a BPF-LSM module that ensures that any member of a
userns allocated that way cannot create files unless the mount it
operates on is owned by the userns itself, or is explicitly
allowelisted.

BPF LSM program with contributions from Alexei Starovoitov.
2024-04-06 16:08:24 +02:00
Mike Yuan dfad86b838
units: introduce systemd-hibernate-clear.service that clears
stale HibernateLocation EFI variable

Currently, if the HibernateLocation EFI variable exists,
but we failed to resume from it, the boot carries on
without clearing the stale variable. Therefore, the subsequent
boots would still be waiting for the device timeout,
unless the variable is purged manually.

There's no point to keep trying to resume after a successful
switch-root, because the hibernation image state
would have been invalidated by then. OTOH, we don't
want to clear the variable prematurely either,
i.e. in initrd, since if the resume device is the same
as root one, the boot won't succeed and the user might
be able to try resuming again. So, let's introduce a
unit that only runs after switch-root and clears the var.

Fixes #32021
2024-04-03 22:07:43 +08:00
Mike Yuan 4f156b1078
units: remove implicit RequiresMountsFor= 2024-04-01 19:44:51 +08:00
Yu Watanabe d30d0b04ae
Merge pull request #31951 from bluca/resolve_reload
resolved: support reloading configuration at runtime
2024-03-27 02:37:52 +09:00
Luca Boccassi 14a5217679 resolved: support reloading configuration at runtime
Drop connections and caches and reload config from files, to allow
for low-interruptions updates, and hook up to the usual SIGHUP and
ExecReload=. Mark servers and services configured directly via D-Bus
so that they can be kept around, and only the configuration file
settings are dropped and reloaded.

Fixes https://github.com/systemd/systemd/issues/17503
Fixes https://github.com/systemd/systemd/issues/20604
2024-03-26 13:36:42 +00:00
Mike Yuan 20ce9fecaa
units: sort lists in meson.build 2024-03-26 21:08:49 +08:00
Zbigniew Jędrzejewski-Szmek c38e4e2fda
Merge pull request #29721 from poettering/systemd-project
New capsule@.service feature
2024-03-26 13:19:33 +01:00
Zbigniew Jędrzejewski-Szmek d1f3cd7aaa units: add one more equivalency of '-' in '_' on kernel cmdline
c0aeff4b99 added this in one unit file, but the
same problem occurs here. (There are no other files where this would apply.)
I think we should solve this systematically somehow, but it's not clear how to
do that, so until we have that better solution, let's apply the manual solution
so that our units work as expected.
2024-03-19 13:06:44 +00:00
Yu Watanabe a9e7894d38 unit/network: use ProtectSystem=strict again
Now, networkd accesses the state directory through the file descriptor
passed from systemd-networkd-persistent-storage.service.
Hence, the networkd itself does not need to access the state directory
through its path, and we can use more stronger mode for ProtectSystem=.
2024-03-19 15:15:32 +09:00
Daan De Meyer 966e05af04 tpm2-setup: Add --graceful
Currently the associated units fail if full tpm support is not available
on the system. Similar to systemd-pcrextend, let's add a --graceful option
that exits gracefully if no full TPM support is detected and use it in both
units.
2024-03-17 13:34:51 +01:00