In file included from ../src/basic/siphash24.h:11,
from ../src/basic/hash-funcs.h:6,
from ../src/basic/hashmap.h:8,
from ../src/shared/fdset.h:6,
from ../src/shared/bpf-program.h:9,
from ../src/core/unit.h:11,
from ../src/core/all-units.h:4,
from ../src/core/manager.c:23:
../src/basic/time-util.h: In function 'manager_dispatch_jobs_in_progress':
../src/basic/time-util.h:140:38: error: 'x' may be used uninitialized [-Werror=maybe-uninitialized]
140 | #define FORMAT_TIMESPAN(t, accuracy) format_timespan((char[FORMAT_TIMESPAN_MAX]){}, FORMAT_TIMESPAN_MAX, t, accuracy)
| ^~~~~~~~~~~~~~~
In function 'manager_print_jobs_in_progress',
inlined from 'manager_dispatch_jobs_in_progress' at ../src/core/manager.c:3007:9:
../src/core/manager.c:219:18: note: 'x' was declared here
219 | uint64_t x;
| ^
cc1: all warnings being treated as errors
For some reason this (false positive) warning starts appearing after
-ftrivial-auto-var-init is used.
When something goes awry, we would get identical log messages from all the
bpf subsystems. E.g. "Failed to load BPF object: %m" appeared 5 times in the
sources. But it is very important to know *which* object we failed to load.
This could be guessed, e.g. from surroudning messages or from filename/line
metadata, but when we get log messages in bug reports, this might not be
available. Let's make the messages distinguishable.
While at it, some messages were adjusted a bit. In particular, we shouldn't use
internal names like BPFProgram which have no meaning outside of the codebase.
Testing the error paths is very important. If we are not root, we should
try and get a failure, which we should report nicely and mark the test
as skipped. After those checks are removed, this is what seems to happen.
This way we can see what will happen e.g. in the user manager when we try
to perform some bpf ops.
$ build/test-socket-bind
...
libbpf: load bpf program failed: Operation not permitted
libbpf: failed to load program 'sd_bind4'
libbpf: failed to load object 'socket_bind_bpf'
libbpf: failed to load BPF skeleton 'socket_bind_bpf': -1
Failed to load BPF object: Operation not permitted
Now all lines with "libbpf:" are at debug level and will be hidden by
default.
Partially fixes https://bugzilla.redhat.com/show_bug.cgi?id=2084955#c14
(i.e. the error that was exposed when the initial error was fixed.)
find_socket_fd() does not expect the sender address, but the
listen-address. This is in fact the destination of the DNS packet.
Matching via sender address caused a fallback to the default stub
listener in manager_dns_stub_fd() as the sender address can never
match the proxy stub listen address.
Note that manager_dns_stub_fd() is only used for the default
listener stub and the proxy stub, that means *extra* listeners
stubs (DNSStubListenerExtra=…) have not been affected as
`struct DnsStubListenerExtra` provides a direct link to the event
source.
By using the correct fd we ensure the correct socket options
(like TTL) are used and prevent issues like #23495 in case ifindex
could not be determined.
Fixes: #23520
[zjs: I added the comment and tweaked the patch a bit.
The call to reset_scheduled_shutdown() is moved down a bit to allow the
callback to have access to information about the operation being cancelled.
This all happens within the same function, so there should be no observable
change in behaviour.]
I had a strange failure where the pager was hanging on invocation (gdm crashed
and the kernel got into a strange state where it was hanging on some tasks).
Based on the logs from 'SYSTEMCTL_LOG_LEVEL=debug journalctl', I couldn't even
tell which pager binary we're executing. So let's shorten the function a bit and
provide a bit more detail.
We had yet-another table of descriptive strings to use in error messages.
I started thinking how to synchronize them with the strings in logind, but
ultimately I think it's better to remove those altogether. Those strings
should almost never be used: normally if the call fails, logind will provide
an error message itself, which is probably more detailed than what we can
figure out on the client side. And the most important part that we want to
show here is what exactly we called, in particular RebootWithFlags vs. Reboot,
etc. By using the "descriptive strings" we were obfuscating this. So let's just
simplify our code and print the actual method name, since this is more useful
as an error statement that is googlable and unique.
While at it, let's print the correct method name ;)
Those messages simply *feel* dated: "The system is going for suspend NOW!".
Let's say "The system will suspend|power off|hibernate|… now!" instead.
The exclamation mark is enough to show the urgency.
Also, the "the" seemed out of place. We're not talking about a specific reboot.
DnsPacket.ifindex=1 (loopback) is normalized to 0 whenever a message is
received on the loopback iface, so for both listeners, 127.0.0.53 and
127.0.0.54, the ifindex will be set to 0 by manager_recv() for queries
that have a local origin.
Replies to such local messages need to set a proper ifindex in any
case, as the supplied source-address would otherwise be ignored in
manager_ipv4_send() (CMSG generation is skipped due to ifindex > 0 check).
Note that this change only forces `ifindex` to loopback if it was actually
normalized to `0` before (due to a loopback detection) in order to keep the
nat-to-127.0.0.54-from-another-interface usecase that was described in
a8d0906344 intact.
Also note that nat is not supported for the main stub 127.0.0.53 which is
why forcing LOOPBACK_IFINDEX was/is fine for that case.
Fixes#23495
Fixes#23520. Replaces #23555.
The problem started with cdf370626f and
90b1ec03b2 which together started printing the
wall message in more cases. The motivation for those change was reasonable, but
this clearly causes problems described in #23520: users are getting unexpected
wall messages. Xterm, urxvt, (anything using libutempter?), and tmux (in some
configurations), register local pty sessions in utmp.
So let's try to suppress the message for local pseudo-terminal logins. This
patch based on #23538, but instead of filtering just on /dev/pts, it uses the
.ut_addr_v6 to only filter out local entries.
utmpdump doesn't print all the details. Looking at the list if useful
when trying to tweak the wall filtering logic.
This doesn't do much, but at least it serves as a smoke test for the cleanup
functions.
The casts in this and the next few commits are curently necessary
because CHAR8 is defined as uint8_t in gnu-efi, while char is signed.
Once we switch from gnu-efi typedefs to stdint types, the casts
will be dropped.
We currently have a convoluted and complex selection of which random
numbers to use. We can simplify this down to two functions that cover
all of our use cases:
1) Randomness for crypto: this one needs to wait until the RNG is
initialized. So it uses getrandom(0). If that's not available, it
polls on /dev/random, and then reads from /dev/urandom. This function
returns whether or not it was successful, as before.
2) Randomness for other things: this one uses getrandom(GRND_INSECURE).
If it's not available it uses getrandom(GRND_NONBLOCK). And if that
would block, then it falls back to /dev/urandom. And if /dev/urandom
isn't available, it uses the fallback code. It never fails and
doesn't return a value.
These two cases match all the uses of randomness inside of systemd.
I would prefer to make both of these return void, and get rid of the
fallback code, and simply assert in the incredibly unlikely case that
/dev/urandom doesn't exist. But Luca disagrees, so this commit attempts
to instead keep case (1) returning a return value, which all the callers
already check, and fix the fallback code in (2) to be less bad than
before.
For the less bad fallback code for (2), we now use auxval and some
timestamps, together with various counters representing the invocation,
hash it all together and provide the output. Provided that AT_RANDOM is
secure, this construction is probably okay too, though notably it
doesn't have any forward secrecy. Fortunately, it's only used by
random_bytes() and not by crypto_random_bytes().
After sending a SIGKILL to a process, the process might disappear from
`cgroup.threads` but still show up in `cgroup.procs` and still remains in the
cgroup and cause migrating new processes to `Delegate=yes` cgroups to fail with
`-EBUSY`. This is especially likely for heavyweight processes that consume more
kernel CPU time to clean up.
Fix this by only returning 0 when both `cgroup.threads` and
`cgroup.procs` are empty.
When we run `portablectl detach --enable --runtime`, then it triggers
`DisableUnitFilesWithFlags` DBus method and the main unit file is
removed, but its drop-ins are not. Hence, portable_detach() failed to
list existing portable units.
This makes the loop for listing portable units also accept drop-in
directories. So, all remaining drop-in directories are correctly
removed.
Before:
```
testsuite-29.sh[600]: + portablectl detach --now --runtime --enable /tmp/rootdir minimal-app0
portablectl[1391]: (Matching unit files with prefixes 'minimal-app0'.)
portablectl[1391]: Queued /org/freedesktop/systemd1/job/1812 to call StopUnit on portable service minimal-app0-foo.service.
portablectl[1391]: Removed "/run/systemd/system.attached/minimal-app0-foo.service".
portablectl[1391]: Queued /org/freedesktop/systemd1/job/1813 to call StopUnit on portable service minimal-app0.service.
portablectl[1391]: Removed "/run/systemd/system.attached/minimal-app0.service".
portablectl[1391]: Got result done/Success for job minimal-app0-foo.service
portablectl[1391]: Got result done/Success for job minimal-app0.service
portablectl[1391]: DetachImage failed: No unit files associated with '/tmp/rootdir' found attached to the system. Image not attached?
```
After:
```
testsuite-29.sh[508]: + portablectl detach --now --runtime --enable /tmp/rootdir minimal-app0
portablectl[1076]: (Matching unit files with prefixes 'minimal-app0'.)
portablectl[1076]: Queued /org/freedesktop/systemd1/job/1946 to call StopUnit on portable service minimal-app0-foo.service.
portablectl[1076]: Removed "/run/systemd/system.attached/minimal-app0-foo.service".
portablectl[1076]: Queued /org/freedesktop/systemd1/job/1947 to call StopUnit on portable service minimal-app0.service.
portablectl[1076]: Removed "/run/systemd/system.attached/minimal-app0.service".
portablectl[1076]: Removed /run/systemd/system.attached/minimal-app0.service.d/10-profile.conf.
portablectl[1076]: Removed /run/systemd/system.attached/minimal-app0.service.d/20-portable.conf.
portablectl[1076]: Removed /run/systemd/system.attached/minimal-app0.service.d.
portablectl[1076]: Removed /run/systemd/system.attached/minimal-app0-foo.service.d/10-profile.conf.
portablectl[1076]: Removed /run/systemd/system.attached/minimal-app0-foo.service.d/20-portable.conf.
portablectl[1076]: Removed /run/systemd/system.attached/minimal-app0-foo.service.d.
portablectl[1076]: Removed /run/portables/rootdir.
portablectl[1076]: Removed /run/systemd/system.attached.
```
Having this check as part of mount_can_start() is too late because
UNIT(u)->can_start() virtual method is called after checking the active
state of unit in unit_start().
We need to hold off running mount start jobs when /p/s/mountinfo monitor
is rate limited even when given mount unit is already active.
Fixes#20329
By using __extension__, we can silence pedantic errors we cannot or
do not want to fix.
This in particular silences:
- enum values being outside of int range
- variadic macros
- long long being C99
- type of bit-field ‘type’ is a GCC extension
- use of C99 bool in public header functions
Some compiler wrappers like honggfuzz pass -fno-builtin explicitly
and because of that the tests where fabs is used fail to compile
with something like
```
FAILED: test-bus-marshal
...
/usr/bin/ld: test-bus-marshal.p/src_libsystemd_sd-bus_test-bus-marshal.c.o: undefined reference to symbol 'fabs@@GLIBC_2.2.5'
/usr/bin/ld: /usr/lib64/libm.so.6: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
```
Fun fact: it took honggfuzz less than a minute to discover
https://github.com/advisories/GHSA-gmc7-pqv9-966m used by
systemd to compress/descompress some stuff.
Currently, the systemd-hwdb --root flag only has an effect for the
'update' verb. It would be useful to be able to use the --root option
for the 'query' verb too (e.g. for testing a hwdb.bin created with
systemd-hwdb update --root <path>).
Use sd_hwdb_new_from_path to initialize the hwdb if --root is passed to
systemd-hwdb query.
Note that this functionality was not added to 'udevadm hwdb' since that
command is deprecated.
The existing sd_hwdb_new function always initializes the hwdb from the
first successful hwdb.bin it finds from hwdb_bin_paths. This means there
is currently no way to initialize a hwdb from an explicit path, which
would be useful for systemd-hwdb query.
Add sd_hwdb_new_from_path to allow a sd_hwdb to be initialized from a
custom path outside of hwdb_bin_paths.
Include this header to fix errors when including hwdb-internal.h:
../src/libsystemd/sd-hwdb/hwdb-internal.h:16:21: error: field ‘st’ has incomplete type
16 | struct stat st;
If something doesn't match, let's print the non-matching value.
If we can't query something, say what.
And make the messages in the udev and blkid paths different, so
we tell which approach failed from a log.
Since the first version in 81516adcb7,
kernel-install would "gather" a return value by summing the exit codes
of the plugins… This makes no sense, because those are not additive values.
Let's just break off immediately. We now implement cleanup via trap, so if we
break, we should leave no garbage behind.
On switching root, a device may have a persistent databse. In that case,
Device.enumerated_found may have DEVICE_FOUND_UDEV flag, and it is not
necessary to downgrade the Device.deserialized_found and
Device.deserialized_state. Otherwise, the state of the device unit may
be changed plugged -> dead -> plugged, if the device has not been mounted.
Fixes#23429.
[mwilck: cherry-picked from #23437]
dm-crypt device units generated by systemd-cryptsetup-generator
habe BindsTo= dependencies on their backend devices. The dm-crypt
devices have the db_persist flag set, and thus survive the udev db
cleanup while switching root. But backend devices usually don't survive.
These devices are neither mounted nor used for swap, thus they will
seen as DEVICE_NOT_FOUND after switching root.
The BindsTo dependency will cause systemd to schedule a stop
job for the dm-crypt device, breaking boot:
[ 68.929457] krypton systemd[1]: systemd-cryptsetup@cr_root.service: Unit is stopped because bound to inactive unit dev-disk-by\x2duuid-3bf91f73\x2d1ee8\x2d4cfc\x2d9048\x2d93ba349b786d.device.
[ 68.945660] krypton systemd[1]: systemd-cryptsetup@cr_root.service: Trying to enqueue job systemd-cryptsetup@cr_root.service/stop/replace
[ 69.473459] krypton systemd[1]: systemd-cryptsetup@cr_root.service: Installed new job systemd-cryptsetup@cr_root.service/stop as 343
Avoid this by not setting the state of the backend devices to
DEVICE_DEAD.
Fixes the LUKS setup issue reported in #23429.
It looks like garbled output… I didn't use shell-escape, because the other
characters that are special for the shell that are used in versions should
not be escaped.
So far we had the rule that '' == '', '0_' == '0', but '_' > ''. This means
that the general rule that strings are compared iteratively, and each
segment that compares equal can be dropped and the comparison resumes at
the following characters wasn't true in such cases. Similarly, '0~' < '0',
but after dropping the common segment, '~' > ''.
The special handling of empty strings is dropped, and '_' == '' and
'~' < ''.
shmat() requires the CAP_IPC_OWNER capability. When running test-seccomp
in environments with root + CAP_SYS_ADMIN, but not CAP_IPC_OWNER,
memory_deny_write_execute_shmat would fail. This fixes it.
kernel's 'make install' invokes install.sh which calls /sbin/install-kernel.
Thus we are invoked as e.g.
/sbin/installkernel 5.18.0 arch/x86/boot/bzImage System.map /boot
The last two arguments would be passed as "initrds".
Before , we would just quitely ignore
/boot, because it doesn't pass the 'test -f' test, and possibly try to do
something with System.map. 742561efbe tightened
the check, so we now throw an error.
It seems that the correct thing is to ignore those two arguments, because
our plugin syntax has no notion of System.map. And the installation directory
we can figure out ourselves better. Effectively, this makes things behave
like before, but less by accident.
Fixes#23490.
Some NEWS entries are tweaked a bit to address complaints about readability
from users.
"udev" is pronounced as /ˈjuːdɛv/, like in "user", hence "a" not "an".
When closing a loop device, the kernel will asynchronously remove
the probed partitions. This can lead to race conditions where we
try to reuse a partition device that still needs to be removed by
the kernel. To avoid such issues, let's explicitly try to remove
any partitions using BLKPG_DEL_PARTITION when we're done with an
image.
To make sure we don't try to remove partitions when we want them
to remain (e.g. systemd-dissect --mount), we add
dissected_image_relinquish() in a similar vein to loop_device_relinquish()
and decrypted_image_relinquish().
```
...
../src/cryptsetup/cryptsetup-tokens/cryptsetup-token-systemd-fido2.c:33:13: error: variable 'r' set but not used [-Werror,-Wunused-but-set-variable]
int r;
^
1 error generated.
...
../src/cryptsetup/cryptsetup-tokens/cryptsetup-token-systemd-pkcs11.c:34:13: error: variable 'r' set but not used [-Werror,-Wunused-but-set-variable]
int r;
^
1 error generated.
ninja: build stopped: subcommand failed.
+ fatal ''\''meson compile'\'' failed with -Db_ndebug=true'
```
The kernel provides a ".compat" PE section that contains a list of
compat entry points with their respective arches. This entry point
does all the heavy lifting to support running 64bit kernels when
the UEFI firmware is 32bit.
Note that the EFI handover protocol code in linux_x86.c does not
need any adjustments as it already correctly calls the 32bit handover
code.
Fixes: #17056
This is in preparation for LINUX_INITRD_MEDIA support in boot.c. One
downside is that adding or changing the used initrds by command line
editing is not possible anymore.
This also moves the message about failed image execution into
image_start() as we would otherwise show two error messages if
any of the preparatory steps failed.
Not sure when the issue was fixed.
- kernel-3.10 on CentOS 7 has the issue,
- kernel-4.18 on CentOS 8 works fine.
Note, the workaround dropped by the commit is not incomplete:
with an old kernel which has the issue, all non-prefix routes are
configured on the specified route table, but the prefix route is
configured on the main table. That should not work for most cases,
hence, the workaround is mostly meaningless.
* Avoid traling slash as most links are defined without.
* Always use https:// protocol and www. subdomain
Allows for easier tree-wide linkvalidation
for our migration to systemd.io.
3970 e = object_path_startswith(path, prefix);
(gdb) p path
$1 = 0x55c5a166f768 "/org/freedesktop/portable1/image"
(gdb) p prefix
$2 = 0x55c59ffc2928 "/org/freedesktop/portable1/image"
(gdb) p e
$1 = 0x5581a1675788 ""
This can be a bit confusing in certain cases, so add a comment and a
test to make the behaviour clearer and explicit.
Before 9e82a74cb0, we had a check like the
following:
if [[ -d /efi/loader/entries ]] || [[ -d /efi/$MACHINE_ID ]]; then
ENTRY_DIR_ABS="/efi/$MACHINE_ID/$KERNEL_VERSION"
elif [[ -d /boot/loader/entries ]] || [[ -d /boot/$MACHINE_ID ]]; then
ENTRY_DIR_ABS="/boot/$MACHINE_ID/$KERNEL_VERSION"
elif [[ -d /boot/efi/loader/entries ]] || [[ -d /boot/efi/$MACHINE_ID ]]; then
ENTRY_DIR_ABS="/boot/efi/$MACHINE_ID/$KERNEL_VERSION"
…
In stock Fedora 34-, /efi isn't used, but grub creates /boot/loader/entries and
installs kernels and initrds directly in /boot. Thus the second arm of the
check wins, and we end up with BOOT_ROOT=/boot.
After 9e82a74cb0, we iterate over the inner
directory first and over the second directory later:
[ -d /efi/<machine-id> ]
[ -d /boot/efi/<machine-id> ]
[ -d /boot/<machine-id> ]
[ -d /efi/Default ]
[ -d /boot/efi/Default ]
[ -d /boot/Default ]
[ -d /efi/loader/entries ]
[ -d /boot/efi/loader/entries ]
[ -d /boot/loader/entries ]
This was partially reverted by 447a822f8e which
removed Default from the list, and a5307e173b,
which moved checks for /boot up, so we ended up with:
[ -d /efi/<machine-id> ]
[ -d /boot/<machine-id> ]
[ -d /boot/efi/<machine-id> ]
[ -d /efi/loader/entries ]
[ -d /boot/loader/entries ]
[ -d /boot/efi/loader/entries ]
6637cf9db6 added autodetection of an entry
token, so we end up checking the following suffixes:
<machine-id>, $IMAGE_ID, $ID, Default
But the important unchanged characteristic is that we iterate over the suffix
first. Sadly this breaks Fedora, because we find /boot/efi/<machine-id> before
we could find /boot/loader/entries. It seems that every possible aspect of
behaviour matters for somebody, so we need to keep the original order of
detection.
With the patch:
[ -d /efi/<machine-id> ]
...
[ -d /efi/loader/entries ]
[ -d /boot/<machine-id> ]
...
[ -d /boot/loader/entries ]
[ -d /boot/efi/<machine-id> ]
...
[ -d /boot/efi/loader/entries ]
Note that we need to check for "loader/entries" too, even though it is not
an entry-token candidate, so that we get the same detection priority as
before.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2071034.
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=43942 is a simple case
where a repeated entry generates a timeout. I didn't import that case, but
generated a simpler one by hand.
$ time build/fuzz-etc-hosts test/fuzz/fuzz-etc-hosts/timeout-many-entries
test/fuzz/fuzz-etc-hosts/timeout-many-entries... ok
build/fuzz-etc-hosts test/fuzz/fuzz-etc-hosts/timeout-many-entries 3.17s (old)
↓
build/fuzz-etc-hosts test/fuzz/fuzz-etc-hosts/timeout-many-entries 0.11s (new)
I considered simply disallowing too many aliases. E.g. microsoft appearently
sometimes ignores entries after the ninth [1], and other systems set stringent
limits [2,3], but the recommended way to get around that is to simply use more
lines (as is done in the sample), so this wouldn't change anything.
Even if we cannot put all those names in a reply packet, the resolution from
the alias to the address should work. I think cases where people define lots
and lots of aliases through some programmatic interface is realistic, for
example for a blocklist, and such a file shouldn't bring resolved down to its
knees.
[1] https://superuser.com/questions/932112/is-there-a-maximum-number-of-hostname-aliases-per-line-in-a-windows-hosts-file
[2] https://library.netapp.com/ecmdocs/ECMP1516135/html/GUID-C6F3B6D1-232D-44BB-A76C-3304C19607A3.html
[3] https://www.ibm.com/docs/en/zos/2.1.0?topic=optional-creating-etchosts
This will be helpful in cases where we are repeatedly adding entries
to a long strv and want to skip the iteration over old entries leading
to quadratic behaviour.
Note that we don't want to calculate the length if not necessary, so
the calculation is delayed until after we've checked that value is not
NULL.
I took inspiration from pid1:
bus_unit_find()
→ find_unit()
→ manager_load_unit_from_dbus_path()
→ unit_name_from_dbus_path()
→ !startswith(path, "/org/freedesktop/systemd1/unit/")
→ return -EINVAL
←
←
←
← if (r < 0) return 0
← 0
←
i.e. we return 0 when queried for "/org/freedesktop/systemd1/unit".
Fixes#23445.
If $BOOT_ROOT is specified, but entry-token not, we'd skip the detection
altogether, effectively defaulting to entry-token=machine-id.
The case where $BOOT_ROOT was not specied, but entry-token was configured
was handled correctly.
This patch makes the handling of both symmetrical, i.e. will only set what
wasn't configured.
No changes to behaviour, but let's print everything out as we discover it.
The docs say that BOOT_ROOT can be specified by the environment. I have
it locally in /etc/kernel/install.conf, and then the override doesn't work.
It'd be nice to handle such cases more reliably.
The docs are not entirely clear what glyphs qualify as digits.
The function is supposed to be locale-dependent, but I couldn't
get it to return true on any non-ascii digits I tried.
But it's better to be safe than sorry, let's use our trivial
replacement instead.
The interface, output, and exit status convention are all taken directly from
rpmdev-vercmp and dpkg --compare-versions. The implementation is different
though. See test-string-util for a list of known cases where we compare
strings incompatibly.
The idea is that this string comparison function will be declared as "the"
method to use for boot entry ordering in the specification and similar
uses. Thus it's nice to allow users to compare strings.
Fixes#23433
matches is plumbed through until it finally gets used in unit_match()
which can deal with NULL matches so the assert() is unnecessary and
can be removed.
The two call sites of extract_image_and_extensions() also don't
assert() on matches either.
No functional change is intended. The verbs where it wasn't immediately
clear if the success exit status is 0 or >= 0 are changed to explicitly
return 0. (I think it's better to be explicit than to rely on some call
stack always returning 0 on success.)
Some other functions are cleaned up to be more idiomatic.
We said that strverscmp_improved() is similar to rpm, so it's nice to include
their tests too so we can pin down the differences.
Our test is changed to print older<newer instead of newer>older.
(I know the computer doesn't care, but I find it much harder to think about
when newer is on the left…)
The rpm test strings are copied from
https://github.com/rpm-software-management/rpm/blob/master/tests/rpmvercmp.at.
rpmio is licensed GPL OR LGPL, so we can do that without any issue.
(I think it could be argued as "fair use" anyway, but that's not necessary
in this case.)
I kept the original form as much as possible so it'll be easy to copy things
back and forth in the future.
The SBAT section was included in a special section in the EFI code, but
the contents weren't directly visible in any way. Let's add a "test" that
prints them for visual inspection.
If there's some external linter for this format, we could hook it up in the
future.
We would return the result of strcmp(), i.e. some positive/negative value.
Now that we want to make this a documented interface for other people
to implement, let's make the implementation more contstrained, even if
we ourselves don't care about whether the specific values.
This fixes a spurious warning from the manager running in user mode:
systemd[1668]: Reached target sockets.target.
systemd[1669]: Failed to create BPF map: Operation not permitted
systemd[1669]: Finished systemd-tmpfiles-setup.service.
systemd[1669]: Listening on dbus.socket.
systemd[1669]: Reached target sockets.target.
systemd[1669]: Reached target basic.target.
systemd[1]: Started user@6.service.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2084955.
rpms can be installed in two different modes: into a chroot, where the system
is not running, and onto a live system. In the first mode, where should create
all changes that are "permanent", and in the second mode, all changes which are
"permanent" but also those which only affect the running system. Thus, changes
like new modprobe rules, tmpfiles rules, binfmt rules, udev rules, etc., are
guarded by 'test -d "/run/systemd/system"' which is the official way to check
if systemd is running, so that they are *not* executed when installed into a
chroot. But the same logic does not apply to sysusers, hwdb, and the journal
catalog: all those files can and should result in changes being performed
immediately to the system. This makes the creation of immutable images possible
(because there are no permanent changes to executed after a reboot), and allows
other packages to depend on the the effect of those changes.
Thus, the guard to check if we're not in a chroot is dropped from triggers for
sysusers, hwdb, and the journal catalog. This means that those triggers will
execute, and no subsequent work is needed. systemd-sysusers.service,
systemd-journal-catalog-update.service, and systemd-hwdb-update.service.in all
have ConditionNeedsUpdate= so they they generally won't be invoked after a
reboot. (systemd.rpm does not touch /usr to trigger the condition, because the
%transfiletriggers make that unnecessary.)
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=2085481
"left from <something>" is not correct. "left <something>" would be the
usual form, but "left master interface" is not clear at all. So reword
those messages totally.
Follow-up for 3881fd406b.
This fixes a minor bug introduced by 10af8bb24b.
Before the commit, the interface group was set only when Group= is explicitly
specified, otherwise the interface group was kept. However, after the commit,
we need to specify Group= with an empty string to keep the current interface
group.
This way we can connect correctly to any AF_UNIX socket in the file
system, and even save some code. Yay!
This also adds some test code for this, that ensures read_file_full()
works correctly for AF_UNIX sockets that violate the 108 char limit.
Supporting sockets like this kinda matters I think, for the simple
reason that apps want to build socket paths via XDG_RUNTIME_DIR and
suchlike, and we should be able to connect to them, even via
non-normalized paths.
This is a short helper for connecting to AF_UNIX sockets in the file
system. It works around the 108ch limit of sockaddr_un, and supports
"at" style fds.
This doesn't come with a test of its own, but the next patch will add
that.
let's not make up new errors in these checks that validate if connect()
work at all. After all, we don't really know if the ENXIO we saw earlier
actually is really caused by the inode being an AF_UNIX socket, we just
have the suspicion...
This way we can implement nice fallbacks later on.
While we are at it, provide a test for this (one that is a bit over the
top, but then again, we can never have enough tests).
The issue #12953 is caused by the following:
On switching root,
- deserialized_found == DEVICE_FOUND_UDEV | DEVICE_FOUND_MOUNT,
- deserialized_state == DEVICE_PLUGGED,
- enumerated_found == DEVICE_FOUND_MOUNT,
On switching root, most devices are not found by the enumeration process.
Hence, the device state is set to plugged by device_coldplug(), and then
changed to the dead state in device_catchup(). So the corresponding
mount point is unmounted. Later when the device is processed by udevd, it
will be changed to plugged state again.
The issue #23208 is caused by the fact that generated udev database in
initramfs and the main system are often different.
So, the two issues have the same root; we should not honor
DEVICE_FOUND_UDEV bit in the deserialized_found on switching root.
This partially reverts c6e892bc0e.
Fixes#12953 and #23208.
Replaces #23215.
Co-authored-by: Martin Wilck <mwilck@suse.com>
Previously, in sd_device_new_from_subsystem_sysname(), '/' in sysname
was replaced '!' for several limited subsystems. This was based on a wrong
assumption that no sysname in e.g. driver subsystem does not contain '!'.
And the assumption is actually wrong, and trigger issue #23327.
In device_set_sysname_and_sysnum() we unconditionally replace '!' in the
filename. Hence, the translation in sd_device_new_from_subsystem_sysname()
must be also done unconditionally.
Fixes#23327.
Without the size limits, oss-fuzz creates huge samples that time out. Usually
this is because some of our code has bad algorithmic complexity. For data like
configuration samples we don't need to care about this: non-rogue configs are
rarely more than a few items, and a bit of a slowdown with a few hundred items
is acceptable. This wouldn't be OK for processing of untrusted data though.
We need to set the limit in two ways: through .options and in the code. The
first because it nicely allows libFuzzer to avoid wasting time, and the second
because fuzzers like hongfuzz and afl don't support .options.
While at it, let's fix an off-by-one (65535 is the largest offset for a
power-of-two size, but we're checking the size here).
Co-authored-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
With an intentional mistake:
../src/login/logind-dbus.c: In function ‘bus_manager_log_shutdown’:
../src/login/logind-dbus.c:1542:39: error: format ‘%s’ expects a matching ‘char *’ argument [-Werror=format=]
1542 | LOG_MESSAGE("%s %s", message),
| ^~~~~~~
Also break some long lines for more uniform formatting. No functional change.
I went over all log_struct, log_struct_errno, log_unit_struct,
log_unit_struct_errno calls, and they seem fine.
This is trivially exploitable (in the sense of causing a crash from SEGV) e.g.
by 'shutdown now "Message %s %s %n"'. The message is settable through polkit,
but is limited to auth_admin:
<action id="org.freedesktop.login1.set-wall-message">
<description gettext-domain="systemd">Set a wall message</description>
<message gettext-domain="systemd">Authentication is required to set a wall message</message>
<defaults>
<allow_any>auth_admin_keep</allow_any>
<allow_inactive>auth_admin_keep</allow_inactive>
<allow_active>auth_admin_keep</allow_active>
</defaults>
</action>
Bug introduced in 9ef15026c0
('logind/systemctl: introduce SetWallMessage and --message', 2015-09-15).
UEFI provides a "monotonic boot counter" which is supposed to increase on
each reboot. We can include this in our random seed hash logic, which
makes things more robust in case our changes to the ESP end up not
actually being as persistent as we assume. As long as the monotonic boot
counter increases we should be good, as each boot we'll anyway end up
with a new seed that way.
This in fact should also pave the way that we can eventually enable the
random seed logic even on SecureBoot enabled systems. Why that? With
this change the input for the random seed hash is now:
1. the old seed file contents
2. (optionally) some bits from the UEFI RNG
3. (optionally) a per system random "token" stored in an UEFI variable,
initialized at OS install
4. the UEFI monotonic counter
5. a counter integer used by the random seed logic.
We can ignore #5 entirely for security considerations, it's always going
to be a constant series of values determined by the random seed logic.
The #1 file is under control of the attacker. (Since it resides in the
unprotected ESP)
The #2 data is possibly low quality. (it's hard enough to trust the
quality of the Linux RNG, let's not go as far as trusting the UEFI one)
The #3 data should not be under control of the attacker, and should only
exist if explicitly set. Unless you have privileged access to the system
you should not be able to read or set it. (well, within limits of flash
chip security and its connectivity to the firmware)
The #4 data is provided by the firmware, and should not be under control
of the attacker. If it works correctly then it might still be guessable
(i.e. a new system might have the counter close to zero).
Thus: 1+2+5 are guessable/under control of attacker, but 3+4 should not
be. Thus, if 3 is not known to attacker and not guessable, and 4
strictly monotonically increasing then it should be enough to guarantee
that every boot will get a different seed passed in, that should not be
known or guessable by the attacker.
That all said, this patch does not enable the random seed logic on
SecureBoot. That is left for a later patch.
This normally wouldn't happen, but if some of those places were called
with lhs and rhs being the same object, we could unref the last ref first,
and then try to take the ref again. It's easier to be safe, and with the
helper we save some lines too.
We canonicalize repeats that cover the whole range: "0:0:0/1" → "0:0:*". But
we'd also do "0:0:0/1,0" → "0:0:*,0", which we then refuse to parse. Thus,
first go throug the whole chain, and print a '*' and nothing else if any of the
components covers the whole range.
0..3 is not the same as 0..infinity, we need to check both ends of the range.
This logic was added in 3215e35c40, and back then
the field was called .value. .stop was added later and apparently wasn't taken
into account here.
Coverage data shows that we didn't test calendar_spec_next_usec() and
associated functions at all.
The input samples so far were only used until the first NUL. We take advantage
of that by using the part until the second NUL as the starting timestamp,
retaining backwards compatibility for how the first part is used.
calendar_spec_from_string() already calls calendar_spec_normalize(), so
there is no point in calling it from the fuzzer. Once that's removed, there's
just one internal caller and it can be made static.
When the the iterator variable is declared automatically, it "inherits" the
const/non-const status from the argument. We don't need to cast a const
table to non-const. If we had a programming error and tried to modify the
string, the compiler could now catch this.
It seems that we try to create a new file, which fails with -ENOSPC, and we
later fail when reading a file with ENODATA. journal_file_open() will return
-ENODATA if the file is too short or if journal_file_verify_header() fails.
We'll unlink a file we newly created if we fail to initialize it immediately
after creation. I'm not sure if the file we fail to open is the one we newly
created and e.g. failed to create the arena and such, or if it's the file we
were trying to rotate away from. Either way, I think we should be OK with
with a non-fully-initialized journal file.
Failed to create rotated journal: No space left on device
Failed to write entry of 2 bytes: No space left on device
sd_journal_open_files(["/tmp/fuzz-journal-remote.vELRpI.journal"]) failed: No data available
Assertion 'IN_SET(r, -ENOMEM, -EMFILE, -ENFILE)' failed at src/journal-remote/fuzz-journal-remote.c:70, function int LLVMFuzzerTestOneInput(const uint8_t *, size_t)(). Aborting.
oss-fuzz-39238: https://oss-fuzz.com/issue/4609851129462784
oss-fuzz reports timeouts which are created by appending to a very long strv.
The code is indeed not very efficient, but it's designed for normal
command-line use, where we don't expect more than a dozen of entries. The fact
that it is slow with ~100k entries is not particularly interesting.
In the future we could rework the code to have better algorithmic complexity.
But let's at least stop oss-fuzz from wasting more time on such examples.
(My first approach was to set max_len in .options, but apparently this doesn't
work for hongfuzz and and AFL.)
oss-fuzz-34527: https://oss-fuzz.com/issue/5722283944574976
The reallocation of memory and counter incrementation is moved from
the only caller to the function. This way the callers can remain oblivious
of the BootConfig internals.
We also shorten the logic by getting rid of the validate_node()
function. An extra check is added to verify we're dealing with
a device before calling sd_device_new_from_devname() since that
will return -EINVAL if anything other than a device is passed.
Instead of retrieving the new sysfs path in device_process_new(),
let's pass the syspath we retrieved earlier to device_process_new()
similar to how we do for other functions in core/device.c.
Previously, we manage DnsAnswerItem by an array and Set,
The array was used for the order of the items, and the set is used to
dedup items.
Let's use OrderedSet, then we can simplify the logic.
This fixes dns_answer_remove_by_key() and dns_answer_remove_by_rr()
which makes the set in a broken state.
In the documentation, using the term "managed" for both the RA flag and
the DHCPv6 mode is confusing because the mode is referred to as
"solicit" both in the official DHCPv6 documentation (see RFC 8415) and
in the WithoutRA option.
Furthermore, calling the other RA flag "other information" or "other
address configuration" is confusing because its official name is simply
"other configuration" (see RFC 4861 and RFC 5175) and it isn't used to
assign IP addresses.
Rewrite the documentation for DHCPv6Client and WithoutRA to make it
clear that getting the "managed" RA flag triggers the same kind of DHCP
request as WithoutRA=solicit, whereas getting the "other configuration"
RA flag triggers the same kind of DHCP request as
WithoutRA=information-request.
Plain strv_split() should not care if the strings contains backslashes
or quote characters. But extract_first_word() interprets backslashes
unless EXTRACT_RETAIN_ESCAPE is given.
I wonder how it's possible that nobody noticed this before. I think this
code was introduced in 0645b83a40.
Fixup for a5efbf468c: if $COLORTERM was set, we'd
unconditionally turn on colors, which is unexpected and wrong. It even breaks
our own tests when executed in gnome-terminal.
This reverts commit d6c9411072.
I still think this is something that needs to be done, but we're hitting some
unexplained failures, e.g. https://github.com/systemd/systemd/issues/22920.
So let's revert this for now, so -rc2 can be released, with a plan to return
to this after a release.
Closes#22920.
With a recent change paths leaving the statically known lookup paths would be
treated differently then those that remained within those. That was done
(AFAIK) to consistently handle alias names. Unfortunately that means that on
some distributions, especially those where /etc/ consists mostly of symlinks,
would trigger that new detection for every single unit in /etc/systemd/system.
The reason for that is that the units directory itself is already a symlink.
Rebased-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
The LLDP spec (IEEE 802.1AB) requires the three mandatory TLVs (Chassis
ID, Port ID, and TTL) to be the first three TLVs in the packet, in that
specific order, whereas systemd put the TTL near the end of the packet.
This violation caused the ethernet switch in our office to discard these
packets as malformed, and Wireshark's packet parser also chokes on them.
```
timedatectl list-timezones --no-pager
...
==164329==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 8192 byte(s) in 1 object(s) allocated from:
#0 0x7fe8a74b6f8c in reallocarray (/lib64/libasan.so.6+0xaef8c)
#1 0x7fe8a63485dc in strv_push ../src/basic/strv.c:419
#2 0x7fe8a6349419 in strv_consume ../src/basic/strv.c:490
#3 0x7fe8a634958d in strv_extend ../src/basic/strv.c:542
#4 0x7fe8a643d787 in bus_message_read_strv_extend ../src/libsystemd/sd-bus/bus-message.c:5606
#5 0x7fe8a643db9d in sd_bus_message_read_strv ../src/libsystemd/sd-bus/bus-message.c:5628
#6 0x4085fb in list_timezones ../src/timedate/timedatectl.c:314
#7 0x7fe8a61ef3e1 in dispatch_verb ../src/shared/verbs.c:103
#8 0x410f91 in timedatectl_main ../src/timedate/timedatectl.c:1025
#9 0x41111c in run ../src/timedate/timedatectl.c:1043
#10 0x411242 in main ../src/timedate/timedatectl.c:1046
#11 0x7fe8a489df1f in __libc_start_call_main (/lib64/libc.so.6+0x40f1f)
```
No need to involve a trivial shell script for this.
We could call the compiler directly, but test() expects arguments
to be passed separately and cc.cmd_array() can contain arguments
itself. Using env is easier than manually slicing the array because
meson has no builtins for that.
This is a follow-up for f470cb6d13 which in
turn is a follow-up for a068aceafb.
The latter started to honour hidden files when deciding whether a
directory is empty. The former reverted to the old behaviour to fix
issue #23220.
It introduced a bug though: when a directory contains a larger number of
hidden entries the getdents64() buffer will not suffice to read them,
since we just allocate three entries for it (which is definitely enough
if we just ignore the . + .. entries, but not ig we ignore more).
I think it's a bit confusing that dir_is_empty() can return true even if
rmdir() on the dir would return ENOTEMPTY. Hence, let's rework the
function to make it optional whether hidden files are ignored or not.
After all, I looking at the users of this function I am pretty sure in
more cases we want to honour hidden files.
Even if the device node is the same, devnum (thus, device ID) and
syspath may be different. If a 'remove' and 'add' events for the same
device node but with different devnum and syspath are queued,
previously, we might process them in parallel. And, udev_watch_end() for
the 'remove' event and udev_watch_begin() for the 'add' event may
interfere each other.
Previously, a device has DEVPATH_OLD is blocked by a previous event
whose devpath is equivalent to the DEVPATH_OLD.
This extends the condtion.
1. an event has DEVPATH_OLD is blocked by a previous event whose
devpath is a parent of, child of, or equivalent to the DEVPATH_OLD.
2. an event is blocked by a previous event whose DEVPATH_OLD is a
parent of, child of, or equivalent to the devpath of the new event.
I am not sure such check is really necessary. But, the cost of the check
is expected to be extremely small, as device renaming does not occur so
frequently. Hence, it should not introduce any significant performance
regression.
If two devices have the same devnum and subsystem (more specifically,
if the device is block or not), or have the same ifindex, then IDs of
these devices are equivalent.
Hence, the previous conditions are covered by comparing device IDs.
Of course, events with a same ID should be already blocked by the
devpath check. So, this should not change anything.
However, udevd saves many kinds of data under /run/udev named with
the device ID. If multiple workers processes events for the same device
ID, then the database may become corrupted.
Let's explicitly check the device IDs for safety and simplicity.
A compile time option is added to select behaviour: by default
UNIT_FILE_PRESET_ENABLE_ONLY is still used, but the intent is to change to
UNIT_FILE_PRESET_FULL at some point in the future. Distros that want to
opt-in can use the config option to change the behaviour.
(The option is just a boolean: it would be possible to make it multi-valued,
and allow full, enable-only, disable-only, none. But so far nobody has asked
for this, and it's better not to complicate things needlessly.)
With the configuration option flipped, instead of only doing enablements,
perform a full preset on first boot. The reason is that although
`/etc/machine-id` might be missing, there may be other files provisioned in
`/etc` (in fact, this use case is mentioned in `log_execution_mode`). Some of
those possible files include enablement symlinks even if presets dictate it
should be disabled.
Such a seemingly contradictory situation occurs in {RHEL,Fedora} CoreOS,
where we ship `/etc` as if `preset-all` were called. However, we want to
allow users to disable default-enabled services via Ignition, which does
this by creating preset dropins before switchroot. (For why we do
`preset-all` at compose time, see:
https://github.com/coreos/fedora-coreos-config/pull/77).
For example, the composed FCOS image has a `enable zincati.service`
preset and an enablement for that in `/etc`, while at boot time when we
switch root, there may be a `disable zincati.service` preset with higher
precedence. In that case, we want systemd to disable the service.
This is essentially a revert of 304b3079a2. It seems like systemd
*used* to do this, but it was changed to try to make the container
workflow a bit faster.
Resolves: https://github.com/coreos/fedora-coreos-tracker/issues/392
Co-authored-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
We have vendor presets, and local admin presets, and runtime presets
(under /usr/lib, /usr/local/lib and /etc, /run, respectively). When we
display preset state, it can be configured in any of those places, so
we shouldn't say anything about the origin.
(Another nice advantage is that it improves alignment:
[root@f36 ~]# systemctl list-unit-files multipathd.service
UNIT FILE STATE VENDOR PRESET
multipathd.service enabled enabled
^ this looks we have a "PRESET" column that is empty.)