This is closely related to the previous commit: if the credentials dir
is empty and nothing mounted on it, let's remove it again.
This will in particular happen if we decided to not actually install the
mount we prepared for the credentials because it is empty. In that case
the mount point inode is already there, and with this we'll remove it.
Primary effect, users will see ENOENT rather than EACCESS when trying to
access it, which should be preferable, given we already handle that
nicely in our credential consumption code.
This should also be useful on systems where we lack any privs to create
mounts, and thus operate on a regular dir anyway.
Let's avoid creating another mount in the system if it's empty anyway.
This is mostl a cosmetic thing in one (pretty common) special case: if
creds settings are used in a unit but no creds actually available to be
passed.
(While we are at it this also does one more minor optimization: it
adjusts the MS_RDONLY/MS_NOSUID/… flags of the source mount we are about
to MS_MOVE into the right place only if we actually really move it, and
if we instead unmount it again we won't bother with the flags either)
There's no need to fchdir() out of the rootfs and back into it around
the umount2(), hence don't.
This brings the logic closer to what the pivot_root() man page suggests.
While we are at it, always operate based on fds, once we opened the
original dir, and pass the path string along only for generating
messages (i.e. as "decoration").
Add tests for both code paths: the pivot_root() one and the MS_MOUNT.
In 623a00020f code was added that our
various programs send a notification message with their exit status on
exit. This is great, but it becomes utterly confusing in systemd-notify,
whose primary purpose is to send such messages after all, and sending an
implicit one in addition to the primary one is particularly confusing,
when debugging things.
Let's hence just drop the implicit message. systemd-notify's exit status
is after all indicative primarily because sd_notify() failed, and hence
it's pretty pointless to then send that fact as another sd_notify()
message.
(Primary reason for this patch is simply that it confused the hell out
of me, when debugging sd_notify() issues)
Follow-up for: 623a00020f
The error handling and fchmodat() invocation is pretty much the same in
the directory and symlink branches, hence make them the same.
No real change in behaviour. Just refactoring.
This also changes the open flags from
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC|O_NOFOLLOW to
O_DIRECTORY|O_CLOEXEC. O_RDONLY is redundant, since O_RDONLY is zero
anyway, and O_DIRECTORY pins the acces mode enough: it doesn't allow
read()/write() anyway when specified. O_NONBLOCK is also pointless given
that O_DIRECTORY is specified, it has no meaning on directories. (It is
useful if we don't know much about the inode we are opening, and could
be a device node or fifo, but the O_DIRECTORY excludes that case.)
O_NOFOLLOW is dropped since there's really no point in blocking out the
initial entrypoint being a symlink. Once we pinned the the root of the
tree it might make sense to restrict symlink use below it, but for the
entrypoint itself it doesn't matter.
So far, we invoked pivot_root() specifying /mnt/ as second argument,
which then unmounted right-after. We'd create /mnt/ if needed. This
sucks, because it means /mnt/ must strictly be pre-created on immutable
images.
Remove this limitation, by using pivot_root() with "." as source and
target, which will result in two stacked mounts afterwards: the new one
underneath, the old one ontop. We can then simply unmount the top one,
and have what we want without needing any extra /mnt/ dir.
Since we don't need /mnt/ anymore we can get rid of the extra
unmount_old_root parameter and simply specify it as NULL if we don't
want the old mount to stick around.
Fixup for 22ad038ac6 and
3fc5eed470. It seems that the tests are
not executed properly in CI. Nevertheless, test-ukify appears in logs:
rpm-build:fedora-rawhide-x86_64:
409/1191 systemd / test-ukify OK 0.16s
This is strange.
We generally nowadays use UPPERCASE for parameters in variuos help text.
Let's be consistent here too, and also drop duplicated 'usage:':
$ ukify -h
usage: ukify [options…] LINUX INITRD…
ukify -h | --help
Build and sign Unified Kernel Images
positional arguments:
LINUX vmlinuz file [.linux section]
INITRD… initrd files [.initrd section]
...
We were using the wrong memory type when allocating pool memory. This
does not seem to cause a problem on x86, but the kernel will fail to
boot at least on ARM in QEMU.
This is caused by mixing different allocation types which ended up
breaking the kernel or EDK2 during boot services exit. Commit
2f3c3b0bee appears to fix this boot
failure because it was replacing the gnu-efi xpool_print with xasprintf
thereby unifying the allocation type.
But this same issue can also happen without this fix somehow when the
random-seed logic is in use.
Fixes: #27371
This ensures that systemd won't erronously disconnect from the system
bus in case a bus recheck is triggered immediately after the bus service
emits `RELOADING=1`.
This fixes an issue where systemd-logind sometimes randomly stops
receiving `UnitRemoved` after a system update.
This also handles SERVICE_RELOAD_SIGNAL just in case somebody ever
creates a D-Bus broker implementation that uses `Type=notify-reload`.
When spawning generators within a sandbox we want a private /tmp, but it
might not exist, and on some systems we might be unable to create it
because users want a BTRFS subvolume instead.
Fixes https://github.com/systemd/systemd/issues/27436
These inodes are going to be overmounted anyway, hence let's create them
with access mode 555, so that they are as close to being immutable as
regular UNIX access modes allow them to be. In other words: this takes
the "w" mode away for root. This of course usually has little effect --
unless CAP_DAC_OVERRIDE is dropped. But at the very least it makes the
point clear that inodes should be considered immutable.
(I intended to make this 0000 originally, but that doesn't work, as many
tools – including our own – have fallback paths that when they see
ENOENT in /proc/ they can handle this gracefully. But changing the mode
to 000 would turn this to EACCES - something they usually have no
fallback path for)
This slightly extends the symbol file test and checks which symbols are
listed in one list but missing in the other. This is tremendously useful
to quickly determine which symbols wheren't exposed properly but should
have been.
(This is is implemented in pure C, no systemd helpers, to ensure we see
libsystemd.so API as any other tool would.)
If .next_entry_array_offset points to one of the previous entry or the
self entry, then the loop for entry array objects may run infinitely.
Let's assume that the offsets of each entry array object are in
increasing order, and check that in loop.
Fixes#27470.
Otherwise,
1. X.path triggered X.service, and the service has waiting start job,
2. systemctl stop X.service
3. the waiting start job is cancelled to install new stop job,
4. path_trigger_notify() is called, and may reinstall new start job,
5. the stop job cannot be installed, and triggeres assertion.
So, instead, let's add a defer event source, then enqueue the new start
job after the stop (or any other type) job finished.
Fixes https://github.com/systemd/systemd/issues/24577#issuecomment-1522628906.
reload/reexec currently used a separate implementation of the /run/ disk
space check, different from the one used for switch-root, even though
the code is mostly the same. The one difference is that the former
checks are authoritative, the latter are just informational (that's
because refusing a reload/reexec is relatively benign, but refusing a
switch-root quite troublesome, since this code is entered when it's
already "too late" to turn turn back, i.e. when the preparatory
transaction to initiate the switch root are already fully executed.
Let's share some code, and unify codepaths.
(This is preparation for later addition of a "userspace reboot" concept)
No change in behaviour, just refactoring.
We hardcode the path the initrd uses to prepare the final mount point at
so many places, let's also imply it in "systemctl switch-root" if not
specified.
This adds the fallback both to systemctl and to PID 1 (this is because
both to — different – checks on the path).
Even with Storage=journal we would still attempt to open the final
dmesg.txt file which causes a lot of noise in the journal:
```
[ 5.764111] H testsuite-82.sh[658]: + systemctl start systemd-pstore
[ 5.806385] H systemd[1]: Starting modprobe@efi_pstore.service...
[ 5.808656] H systemd[1]: modprobe@efi_pstore.service: Deactivated successfully.
[ 5.808971] H systemd[1]: Finished modprobe@efi_pstore.service.
[ 5.818845] H kernel: audit: type=1130 audit(1682630623.637:114): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=modprobe@efi_pstore comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? termin>
[ 5.818865] H kernel: audit: type=1131 audit(1682630623.637:115): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=modprobe@efi_pstore comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? termin>
[ 5.816052] H systemd[1]: Starting systemd-pstore.service...
[ 5.840703] H systemd-pstore[806]: PStore dmesg-efi-168263062313014.
[ 5.841239] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.841428] H systemd-pstore[806]: PStore dmesg-efi-168263062312014.
[ 5.841575] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.841712] H systemd-pstore[806]: PStore dmesg-efi-168263062311014.
[ 5.841839] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.841989] H systemd-pstore[806]: PStore dmesg-efi-168263062310014.
[ 5.842141] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.842274] H systemd-pstore[806]: PStore dmesg-efi-168263062309014.
[ 5.842423] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.842589] H systemd-pstore[806]: PStore dmesg-efi-168263062308014.
[ 5.842722] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.842865] H systemd-pstore[806]: PStore dmesg-efi-168263062307014.
[ 5.843003] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.843153] H systemd-pstore[806]: PStore dmesg-efi-168263062306014.
[ 5.843280] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.843434] H systemd-pstore[806]: PStore dmesg-efi-168263062305014.
[ 5.843570] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.843702] H systemd-pstore[806]: PStore dmesg-efi-168263062304014.
[ 5.843831] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.843958] H systemd-pstore[806]: PStore dmesg-efi-168263062303014.
[ 5.844093] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.844250] H systemd-pstore[806]: PStore dmesg-efi-168263062302014.
[ 5.844412] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.844619] H systemd-pstore[806]: PStore dmesg-efi-168263062301014.
[ 5.844781] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.844956] H systemd-pstore[806]: PStore dmesg-efi-168263062300014.
[ 5.845168] H systemd-pstore[806]: Failed to open file /var/lib/systemd/pstore/1682630623/014/dmesg.txt: Operation not permitted
[ 5.851101] H systemd[1]: Finished systemd-pstore.service.
```
The commit b640e274a7 introduced reflink()
and reflink_full(). We usually name function xyz_full() for fully
parameterized version of xyz(), and xyz() is typically a inline alias of
xyz_full(). But in this case, reflink() and reflink_full() call
different ioctl().
Moreover, reflink_full() does partial reflink, while reflink() does full
file reflink. That's super confusing.
Let's rename reflink_full() to reflink_range(), the new name is
consistent with ioctl name, and should be fine.
Autostart files which contain the line gnome-autostart-phase are currently
completely skipped by systemd. This is because these are handled internally by
gnome startup through other means.
The problem is a number of desktop files that need to run on KDE too have this
flag set. Ideally they should just create systemd user units, but we're not at
this point universally yet.
This patch changes the logic so if the flag is set, we set NotShowIn-gnome,
which in turn would just not load decided at runtime.
As an optimisation if we would get conflicting OnlyShowIn lines we still
skip the file completely.
Example:
$ rg 'Exec|Autostart-Phase' /etc/xdg/autostart/gnome-keyring-pkcs11.desktop
Exec=/usr/bin/gnome-keyring-daemon --start --components=pkcs11
X-GNOME-Autostart-Phase=PreDisplayServer
$ cat '/tmp/xxx/app-gnome\x2dkeyring\x2dpkcs11@autostart.service'
# Automatically generated by systemd-xdg-autostart-generator
[Unit]
SourcePath=/etc/xdg/autostart/gnome-keyring-pkcs11.desktop
...
[Service]
...
ExecCondition=/usr/lib/systemd/systemd-xdg-autostart-condition "Unity:MATE" "GNOME"
Co-authored-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
The kernel has had filesystem independent reflink ioctls for a
while now, let's try to use them and fall back to the btrfs specific
ones if they're not supported.