Otherwise,
1. X.path triggered X.service, and the service has waiting start job,
2. systemctl stop X.service
3. the waiting start job is cancelled to install new stop job,
4. path_trigger_notify() is called, and may reinstall new start job,
5. the stop job cannot be installed, and triggeres assertion.
So, instead, let's add a defer event source, then enqueue the new start
job after the stop (or any other type) job finished.
Fixes https://github.com/systemd/systemd/issues/24577#issuecomment-1522628906.
reload/reexec currently used a separate implementation of the /run/ disk
space check, different from the one used for switch-root, even though
the code is mostly the same. The one difference is that the former
checks are authoritative, the latter are just informational (that's
because refusing a reload/reexec is relatively benign, but refusing a
switch-root quite troublesome, since this code is entered when it's
already "too late" to turn turn back, i.e. when the preparatory
transaction to initiate the switch root are already fully executed.
Let's share some code, and unify codepaths.
(This is preparation for later addition of a "userspace reboot" concept)
No change in behaviour, just refactoring.
We hardcode the path the initrd uses to prepare the final mount point at
so many places, let's also imply it in "systemctl switch-root" if not
specified.
This adds the fallback both to systemctl and to PID 1 (this is because
both to — different – checks on the path).
Even with Storage=journal we would still attempt to open the final
dmesg.txt file which causes a lot of noise in the journal:
```
[ 5.764111] H testsuite-82.sh[658]: + systemctl start systemd-pstore
[ 5.806385] H systemd[1]: Starting modprobe@efi_pstore.service...
[ 5.808656] H systemd[1]: modprobe@efi_pstore.service: Deactivated successfully.
[ 5.808971] H systemd[1]: Finished modprobe@efi_pstore.service.
[ 5.818845] H kernel: audit: type=1130 audit(1682630623.637:114): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=modprobe@efi_pstore comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? termin>
[ 5.818865] H kernel: audit: type=1131 audit(1682630623.637:115): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=modprobe@efi_pstore comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? termin>
[ 5.816052] H systemd[1]: Starting systemd-pstore.service...
[ 5.840703] H systemd-pstore[806]: PStore dmesg-efi-168263062313014.
[ 5.841239] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.841428] H systemd-pstore[806]: PStore dmesg-efi-168263062312014.
[ 5.841575] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.841712] H systemd-pstore[806]: PStore dmesg-efi-168263062311014.
[ 5.841839] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.841989] H systemd-pstore[806]: PStore dmesg-efi-168263062310014.
[ 5.842141] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.842274] H systemd-pstore[806]: PStore dmesg-efi-168263062309014.
[ 5.842423] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.842589] H systemd-pstore[806]: PStore dmesg-efi-168263062308014.
[ 5.842722] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.842865] H systemd-pstore[806]: PStore dmesg-efi-168263062307014.
[ 5.843003] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.843153] H systemd-pstore[806]: PStore dmesg-efi-168263062306014.
[ 5.843280] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.843434] H systemd-pstore[806]: PStore dmesg-efi-168263062305014.
[ 5.843570] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.843702] H systemd-pstore[806]: PStore dmesg-efi-168263062304014.
[ 5.843831] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.843958] H systemd-pstore[806]: PStore dmesg-efi-168263062303014.
[ 5.844093] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.844250] H systemd-pstore[806]: PStore dmesg-efi-168263062302014.
[ 5.844412] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.844619] H systemd-pstore[806]: PStore dmesg-efi-168263062301014.
[ 5.844781] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.844956] H systemd-pstore[806]: PStore dmesg-efi-168263062300014.
[ 5.845168] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.851101] H systemd[1]: Finished systemd-pstore.service.
```
The commit b640e274a7 introduced reflink()
and reflink_full(). We usually name function xyz_full() for fully
parameterized version of xyz(), and xyz() is typically a inline alias of
xyz_full(). But in this case, reflink() and reflink_full() call
different ioctl().
Moreover, reflink_full() does partial reflink, while reflink() does full
file reflink. That's super confusing.
Let's rename reflink_full() to reflink_range(), the new name is
consistent with ioctl name, and should be fine.
Autostart files which contain the line gnome-autostart-phase are currently
completely skipped by systemd. This is because these are handled internally by
gnome startup through other means.
The problem is a number of desktop files that need to run on KDE too have this
flag set. Ideally they should just create systemd user units, but we're not at
this point universally yet.
This patch changes the logic so if the flag is set, we set NotShowIn-gnome,
which in turn would just not load decided at runtime.
As an optimisation if we would get conflicting OnlyShowIn lines we still
skip the file completely.
Example:
$ rg 'Exec|Autostart-Phase' /etc/xdg/autostart/gnome-keyring-pkcs11.desktop
Exec=/usr/bin/gnome-keyring-daemon --start --components=pkcs11
X-GNOME-Autostart-Phase=PreDisplayServer
$ cat '/tmp/xxx/app-gnome\x2dkeyring\x2dpkcs11@autostart.service'
# Automatically generated by systemd-xdg-autostart-generator
[Unit]
SourcePath=/etc/xdg/autostart/gnome-keyring-pkcs11.desktop
...
[Service]
...
ExecCondition=/usr/lib/systemd/systemd-xdg-autostart-condition "Unity:MATE" "GNOME"
Co-authored-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
The kernel has had filesystem independent reflink ioctls for a
while now, let's try to use them and fall back to the btrfs specific
ones if they're not supported.
This adds support for systematically destroying connections in
pam_sm_session_open() even on failure, so that under no circumstances
unserved dbus connection are around while the invoking process waits for
the session to end. Previously we'd only do this on success, now do it
in all cases.
This matters since so far we suggested people hook pam_systemd into
their pam stacks prefixed with "-", so that login proceeds even if
pam_systemd fails. This however means that in an error case our
cached connection doesn't get disconnected even if the session then is
invoked. This fixes that.
Let's systematically avoid sharing cached busses between processes (i.e.
from parent and child after fork()), by including the PID in the field
name.
With that we're never tempted to use a bus object the parent created in
the child.
(Note this is about *use*, not about *destruction*. Destruction needs to
be checked by other means.)
Let's make use of the new DelegateSubgroup= feature and delegate the
/supervisor/ subcgroup already to nspawn, so that moving the supervisor
process there is unnecessary.
If we create a subcroup (regardless if the '.control' subgroup we
always created or one configured via DelegateSubgroup=) it's inside of
the delegated territory of the cgroup tree, hence it should be owned
fully by the unit's users. Hence do so.
We don't need to apply the journal/oomd xattrs to the subcgroups we add,
since those daemons already look for the xattrs up the tree anyway.
Hence remove this.
This is in particular relevant as it means later changes to the xattr
don#t need to be replicated on the subcgroup either.
This implements a minimal subset of #24961, but in a lot more
restrictive way: we only allow one level of subcgroup (as that's enough
to address the no-processes in inner cgroups rule), and does not change
anything about threaded cgroup logic or similar, or make any of this new
behaviour mandatory.
All this does is this: all non-control processes we invoke for a unit
we'll invoke in a subgroup by the specified name.
We'll later port all our current services that use cgroup delegation
over to this, i.e. user@.service, systemd-nspawn@.service and
systemd-udevd.service.
Let's clean up validation/escaping of cgroup names. i.e. split out code
that tests if name needs escaping. Return proper error codes, and extend
test a bit.
According to setfacl(1), "the character X stands for
the execute permission if the file is a directory
or already has execute permission for some user."
After this commit, parse_acl() would return 3 acl
objects. The newly-added acl_exec object contains
entries that are subject to conditionalized execute
bit mangling. In tmpfiles, we would iterate the acl_exec
object, check the permission of the target files,
and remove the execute bit if necessary.
Here's an example entry:
A /tmp/test - - - - u:test:rwX
Closes#25114
When encoding partition policy flags we allow parts of the flags to be
"unspecified" (i.e. entirely zeros), which when actually checking the
policy we'll automatically consider equivalent to "any" (i.e. entirely
ones). This "extension" of the flags was so far done as part of
partition_policy_normalized_flags(). Let's split this logic out into a
new function partition_policy_flags_extend() that simply sets all bits
in a specific part of the flags field if they were entirely zeroes so
far.
When comparing policy objects for equivalence we so far used
partition_policy_normalized_flags() to compare the per-designator flags,
which thus meant that "underspecified" flags, and fully specified ones
that are set to "any" were considered equivalent. Which is great.
However, we forgot to do that for the fallback policy flags, the flags
that apply to all partitions for which no explicit policy flags are
specified.
Let's use the new partition_policy_flags_extend() call to compare them
in extended form, so that there two we can hide the difference between
"underspecified" and "any" flags.
This is for the case when we fail to deserialize job ID.
In job_install_deserialized(), we also check the job type, and that is
for the case when we failed to deserialize the job.
Let's gracefully handle the failure in deserializing the job ID.
This is paranoia, and just for safety. Should not change any behavior.
When we fail to deserialize job ID, or the current_job_id is overflowed,
we may have jobs with the same ID.
This is paranoia, and just for safety.
Note, we already use hashmap_remove_value() in job_uninstall().
We translate 'all' to UNIT64_MAX, which has a lot more 'f's. Use the
helper macro, since a decimal uint64_t will always be >> than a hex
representation.
root@image:~# systemd-run -t --property CoredumpFilter=all ls /tmp
Running as unit: run-u13.service
Press ^] three times within 1s to disconnect TTY.
*** stack smashing detected ***: terminated
[137256.320511] systemd[1]: run-u13.service: Main process exited, code=dumped, status=6/ABRT
[137256.320850] systemd[1]: run-u13.service: Failed with result 'core-dump'.
If journal_file_data_payload() returns -EBADMSG or -EADDRNOTAVAIL,
we skip the entry and go to the next entry, but we never modify
the number of items that we pass to journal_file_append_entry_internal()
if that happens, which means we could try to append garbage to the
journal file.
Let's keep track of the number of fields we've appended to avoid this
problem.
When doing a conversion and the specified 'xc->xvariant' has no match, select
the x11 layout entry with a matching layout and an empty xvariant if such entry
exists. It's still better than no conversion at all.
"credential provided widget" would be better spelled as "credential-provided widget".
But let's adjust the message to name the bad credential explicitly: this
makes it easier to fix for the user.
Realistically, the only thing that the caller can do is ignore failures related
to missing credentials. If the caller requires some credentials to be present,
they should just check which output variables are not NULL. One of the callers
was already doing that, and the other wanted to, but missed -ENOENT. By
suppressing -ENOENT and -ENXIO, both callers are simplified.
Fixes a warning at boot:
systemd-vconsole-setup[221]: Failed to import credentials, ignoring: No such file or directory
This will make things a bit longer for now, but more powerful as we can
reuse the userns fd between calls to remount_idmap() if we need to
adjust multiple mounts.
No change in behaviour, just some minor refactoring.
I guess it was only a question of time until we need to add the final
frontier of notification functions: one that combines the features of
all the others:
1. specifiying a source PID
2. taking a list of fds to send along
3. accepting a format string for the status string
Hence, let's add it.
When pam_end() is called after a fork, and it cleans up caches, it sets
PAM_DATA_SILENT in error_status. FDs will be shared with the parent, so
we do not want to attempt to close them from a child process, or we'll
hit assertions. Complain loudly and skip.
it's a bit confusing that on 32bit systems we'd risk session IDs
overruns like this. Let's expose the same behaviour everywhere and stick
to 64bit ids.
Since we format the ids as strings anyway this doesn't really change
anything performance-wise, it just pushes out collisions by overrun to
basically never happen.
sd-event objects use hashmaps, which use module-global state, so it is not safe
to pass a sd-event object created by a module instance to another module instance
(e.g.: when two libraries static linking sd-event are pulled in a single process).
Initialize a random per-module origin id and store it in the object, and compare
it when entering a public API, and error out if they don't match, together with
the PID.
sd-journal objects use hashmaps, which use module-global state, so it is not safe
to pass a sd-journal object created by a module instance to another module instance
(e.g.: when two libraries static linking sd-journal are pulled in a single process).
Initialize a random per-module origin id and store it in the object, and compare
it when entering a public API, and error out if they don't match, together with
the PID.
sd-bus objects use hashmaps, which use module-global state, so it is not safe
to pass a sd-bus object created by a module instance to another module instance
(e.g.: when two libraries static linking sd-bus are pulled in a single process).
Initialize a random per-module origin id and store it in the object, and compare
it when entering a public API, and error out if they don't match, together with
the PID.
crypsetup-fido2 always depended on both libfido2 and libcryptsetup, but
0a8e026e82 forgot to make the then
implicit dependency on libcryptsetup explicit when moving it from
cryptsetup/ to shared/. This breaks builds when libfido2 is autodetected
but the system is missing libcryptsetup.
Introduce an explicit check for HAVE_LIBCRYPTSETUP such that
cryptsetup-fido2 is only built when both libraries are available.
Fixes#27374.
If we try to close after a fork, the FDs will have been cloned
too and we'll assert. This can happen for example in PAM modules.
Avoid the macro and define ref/unref by hand to do the same check.
This doesn't really matter too much as both are static functions. But
it's confusing as hell both when debugging and reading code, given that
homed actually uses mount-util.c
Hence, let's just rename one of the two, to minimize confusion.
No actual change in behaviour.
(and sooner or later we might want to export mount-util.c's version of
the function, since it's generically useful)
When doing x11->console conversions, find_converted_keymap() searches
automatically for a candidate in the converted keymap directory for a given x11
layout.
However doing console->x11 conversions, this automatic search is not done hence
simple conversion in this direction can't be achieved without populating
kbd-model-map with entries for converted keymaps.
For example, let's consider "at" layout which is not part of kbd-model-map. The
"at" x11 layout has a generated keymap
"/usr/share/kbd/keymaps/xkb/at.map.gz". If we configure "at" for the x11
layout, localed is able to automatically find the "at" converted vc layout and
the conversion just works :
$ localectl set-x11-keymap at
$ localectl
System Locale: LANG=en_US.UTF-8
VC Keymap: at
X11 Layout: at
However in the opposite direction, ie when setting the vc keymap to "at", no
conversion is done and the x11 layout is not defined:
$ localectl set-keymap at
$ localectl
System Locale: LANG=en_US.UTF-8
VC Keymap: at
X11 Layout: (unset)
This patch fixes this limitation as the implemenation is relatively simple and
it removes the need to populate kbd-model-map with (many) entries for converted
keymaps. However the patch doesn't remove the existing entries in kbd-model-map
which became unneeded after this change to be on the safe side.
Note: by default the automatically generated x11 keyboard configs use keyboard
model "microsoftpro" which should be equivalent to "pc105" model but with the
internet/media key mapping added.
When we're checking if /etc/resolv.conf exists so we can bind mount
on top of it, we care about whether the symlink itself exists if
/etc/resolv.conf exists and not the file it points to, so add
CHASE_NOFOLLOW to make sure we check existence of the symlink and
not the file it points to.
sd-bus connection is cached by the two pam modules globally, but this
can lead to issues due to hashmaps (used by sd-bus) using a global
static variable for the shared hash key, which is different per module
as both modules are loaded in the same process.
This happens because the sd-bus object is create in one module, but
used in the other, so global state does not match.
Use a different pam cache identifier for the sd-bus pointer, so that
each module uses a different sd-bus connection as a workaround.
Fixes https://github.com/systemd/systemd/issues/27216
Fixes https://github.com/systemd/systemd/issues/17266
acquire_home() takes a reference to a sd-bus object, which the open_session
hook cleans on success. But only when handling a user actually owned by homed,
it did not clean it up when skipping because it is being invoked on a system
user.
We need to be careful with sd-bus here as pam_sm_open_session is the last
hook before forking, and we want to clean up sd-bus before that happens, or
we'll have a broken reference (FDs are cloexec) in the child process, which
will then assert when attempting to close them, or leak the bus connection
which causes dbus to complain loudly:
dbus-daemon[62]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 30020ms)
This makes syntax be the same for commands which are started by the manager and
those which are spawned directly (when --scope is used).
Before:
$ systemd-run -q -t echo '$TERM'
xterm-256color
$ systemd-run -q --scope echo '$TERM'
$TERM
Now:
$ systemd-run -q --scope echo '$TERM'
xterm-256color
Previous behaviour can be restored via --expand-environment=no:
$ systemd-run -q --scope --expand-environment=no echo '$TERM'
$TERM
Fixes#22948.
At some level, this is a compat break. Fortunately --scope is not very widely
used, so I think we can get away with this. Having different syntax depending
on whether --scope was used or not was bad UX.
A NEWS entry will be required.
This uses StartExecEx to get the equivalent of ExecStart=:. StartExecEx was
added in b3d593673c, so this will not work with
older systemds.
A hint is emitted if we get an error indicating lack of support. PID1 returns
SD_BUS_ERROR_PROPERTY_READ_ONLY, but I'm checking for
SD_BUS_ERROR_UNKNOWN_PROPERTY too for safety.
Just refactoring, in preparation for future changes.
(Though I think it'd be reasonable to do anyway, those functions were
awfully long.)
'git diff' displays this badly. The middle part of start_transient_service()
is moved to make_transient_service_unit(), and the middle part of
start_transient_trigger() is moved to make_transient_trigger_unit().
start_transient_service() would return two ints: one normally and one via
*retval. We can just return one int and propagate it directly, because we
use DEFINE_MAIN_FUNCTION_WITH_POSITIVE_FAILURE().
The property name is called ExecStartEx, but we have to write it as ExecStart=
in the unit file. :(
Bug introduced in b3d593673c when ex-properties
were initially added.
In addition, we cannot escape $ as $$, because when ":" is used, we wouldn't
unescape $$ back to $.
Unfortunately we can't escape $ when ':' is used to prohibit variable expansion:
ExecStart=:echo $$
is not the same as
ExecStart=:echo $
This just adds the functionality and the unittests, without using it anywhere
for real yet.
Our escaping of '$' is '$$', not '\$'. We would write unit files that
were not valid:
$ systemd-run --user bash -c 'echo $$; sleep 1000'
Running as unit: run-r1c7c45b5b69f487c86ae205e12100808.service
$ systemctl cat --user run-r1c7c45b5b69f487c86ae205e12100808
# /run/user/1000/systemd/transient/run-r1c7c45b5b69f487c86ae205e12100808.service
...
ExecStart="/usr/bin/bash" "-c" "echo \$\$\; sleep 1000"
$ systemd-analyze verify /run/user/1000/systemd/transient/run-r1c7c45b5b69f487c86ae205e12100808.service
/run/user/1000/systemd/transient/run-r1c7c45b5b69f487c86ae205e12100808.service:7:
Ignoring unknown escape sequences: "echo \$\$\; sleep 1000"
Similarly, ';' cannot be escaped as '\;'. Only a handful of characters
listed in "Supported escapes" is allowed.
Escaping of "'" can be done, but it's not useful because we use double quotes
around the string anyway whenever we do escaping.
unit_write_setting() is called all over the place. In a great majority of
places we write either fixed strings or something that we generate ourselves,
so no escaping or quoting is needed. (And it's not allowed, e.g.
'Type="oneshot"' would not work.) But if we forgot to add escaping or quoting
for a free-style string, it would probably allow writing a unit file that would
be read completely wrong. I looked over various places where
unit_write_setting() is called, and I couldn't find any place where
quoting/escaping was forgotten. But trying to figure out the full
ramifications of this change is not easy.
__builtin_popcount() is a bit of a mouthful, so let's provide a helper.
Using _Generic has the advantage that if a type other then the ones on
the list is given, compilation will fail. This is nice, because if by any
change we pass a wider type, it is rejected immediately instead of being
truncated.
log.h is also needed. It is included transitively, but let's include it
directly.
macro.h is *not* needed.