Commit graph

67524 commits

Author SHA1 Message Date
Lennart Poettering 495e75ed5c core: move pid watch/unwatch logic of the service manager to pidfd
This makes sure unit_watch_pid() and unit_unwatch_pid() will track
processes by pidfd if supported. Also ports over some related code.
Should not really change behaviour.

Note that this does *not* add support waiting for POLLIN on the pidfds
as additional exit notification. This is left for a later commit (this
commit is already large enough), in particular as that would add new
logic and not just convert existing logic.
2023-09-28 23:22:58 +02:00
Lennart Poettering c407bfa68f test-watch-pid: use a real PID, not a made up one
This matters once we track processes with pidfds rather than just pid_t,
because made up PIDs likely won't exist.

The essence of the test remains unmodified, we just use a real, existing
PID instead of 4711.
2023-09-28 23:22:58 +02:00
Lennart Poettering ec8dc83530 pidref: add pidref_verify() helper
This new helper can be used after reading process info from procfs, to
verify that the data that was just read actually matches the pidfd, and
does not belong to some new process that just reused the numeric PID of
the process we originally pinned.
2023-09-28 23:22:58 +02:00
Lennart Poettering 9cb7e49f11 pidref: add pidref_hash_ops
This adds a "hash_ops" structure, which allows using PidRef structures
as keys in Hashmap and Set objects.
2023-09-28 23:22:58 +02:00
Lennart Poettering 837659825f pidref: add helpers for managing PidRef on the heap
Usually we want to embed PidRef in other structures, but sometimes it
makes sense to allocate it on the heap in case it should be used
standalone. Add helpers for that.

Primary usecase: use as key in Hashmap objects, that for example map
process to unit objects in PID 1.

This adds pidref_free()/pidref_freep() for freeing such an allocated
struct, as well as pidref_dup() (for duplicating an existing PidRef
on the heap 1:1), and pidref_new_pid() (for allocating a new PidRef from a
PID).
2023-09-28 23:22:58 +02:00
Lennart Poettering dcfcea6d02 pidref: add PIDREF_MAKE_FROM_PID()
This helper truns a pid_t into a PidRef. It's different from
pidref_set_pid() in being "passive", i.e. it does not attempt to acquire
a pidfd for the pid.

This is useful when using the PidRef as a lookup key that shall also
work after a process is already dead, and hence no conversion to a pidfd
is possible anymore.
2023-09-28 23:22:58 +02:00
Lennart Poettering 12c7d27b65 cgroup-util: add cg_read_pidref() helper
Just like cg_read_pid() but returns a PidRef
2023-09-28 23:22:58 +02:00
Luca Boccassi 76dc9e249f
Merge pull request #29249 from poettering/pid1-error-message
pid1: refactoring of unit state machine logging and unit timer refactoring
2023-09-28 22:18:15 +01:00
Bertrand Jacquin 7406ebd5b6 resolved: register ipv4only.arpa are private domain
From RFC 8880:

Because the 'ipv4only.arpa' zone has to be an insecure delegation,
DNSSEC cannot be used to protect these answers from tampering by
malicious devices on the path.

Consequently, the 'ipv4only.arpa' zone MUST be an insecure delegation to
give DNS64/NAT64 gateways the freedom to synthesize answers to those
queries at will, without the answers being rejected by DNSSEC-capable
resolvers. DNSSEC-capable resolvers that follow this specification MUST
NOT attempt to validate answers received in response to queries for the
IPv6 AAAA address records for 'ipv4only.arpa'. Note that the name
'ipv4only.arpa' has no use outside of being used for this special DNS
pseudo-query used to learn the DNS64/NAT64 address synthesis prefix, so
the lack of DNSSEC security for that name is not a problem.

See: https://datatracker.ietf.org/doc/html/rfc8880#name-security-considerations
2023-09-28 21:55:00 +01:00
Luca Boccassi 081c50ed3c
Merge pull request #29361 from keszybz/kernel-install-work
Advertise installkernel ↔ kernel-install duality
2023-09-28 17:16:15 +01:00
Daan De Meyer eafa923f81 Remove json_variant_merge_pair() in favor of json_variant_set_field_non_null() 2023-09-28 17:13:11 +01:00
Luca Boccassi 1e49f4ed8b
Merge pull request #28545 from bluca/softreboot_survive
pid1: add SurviveFinalKillSignal= to skip units on final sigterm/sigkill spree
2023-09-28 17:12:03 +01:00
Daan De Meyer 67c92f3fec kmod-setup: Load virtiofs and virtio_pci early
There's no way for us to wait for specific virtiofs tags to appear,
so we have to try and make sure that the tags are all available by
the time we try to mount any virtiofs tag. Let's try to do that by
loading the necessary modules as early as we can.
2023-09-28 15:21:50 +01:00
наб e2e0125921
show-logs: add assert and fix local variable type
Follows-up for: 0693e6b246

#29355
2023-09-28 15:21:15 +01:00
Mike Yuan a82b8b3dc8 core: mark units as need daemon-reload if unit file operations are
performed

systemctl would issue daemon-reload after unit file operations
(enable/disable/preset/...) succeed. However, such operations
are not atomic, meaning that the unit file state could still change
even if the operation generally fails, and the unit_file_state
cached by manager becomes outdated.

Fixes #29341
2023-09-28 15:19:24 +01:00
Luca Boccassi 13b3af4aa2 core: improve error message when setting up service mounts
Right now we include the private working directory when we say some files
where not found, which is confusing. Strip it from the error string.

For example, with a BindPaths=/var/bar that does not exist on the host:

Before:

  foo.service: Failed to set up mount namespacing: /run/systemd/unit-root/var/bar: No such file or directory

After:

  foo.service: Failed to set up mount namespacing: /var/bar: No such file or directory
2023-09-28 15:19:03 +01:00
Luca Boccassi 3cb5d34ce0
Merge pull request #29295 from valentindavid/valentindavid/sysupdate-patterns-in-directory
sysupdate: Allow patterns to match path with directories
2023-09-28 15:18:45 +01:00
Luca Boccassi 8e78e3b620
Merge pull request #29359 from poettering/bootctl-uki-measured
bootctl: show whether we booted in a measured UKI in status output (plus some minor other stuff)
2023-09-28 15:18:12 +01:00
Luca Boccassi 2c0ca3e398 docs: note root storage daemons can now also use SurviveFinalKillSignal=yes 2023-09-28 13:48:14 +01:00
Frantisek Sumsal d37b9154a7 test: check soft-reboot behavior wrt argv[0][0] == '@' 2023-09-28 13:48:14 +01:00
Luca Boccassi 559214cbbd pid1: add SurviveFinalKillSignal= to skip units on final sigterm/sigkill spree
Add a new boolean for units, SurviveFinalKillSignal=yes/no. Units that
set it will not have their process receive the final sigterm/sigkill in
the shutdown phase.

This is implemented by checking if a process is part of a cgroup marked
with a user.survive_final_kill_signal xattr (or a trusted xattr if we
can't set a user one, which were added only in kernel v5.7 and are not
supported in CentOS 8).
2023-09-28 13:48:14 +01:00
Lennart Poettering 69feab97f9 update TODO 2023-09-28 13:22:45 +02:00
Zbigniew Jędrzejewski-Szmek 9ec4f7c7a4 exec-util: print executed commands in do_execute()
kernel-install uses do_execute(). We would log whenever a spawned child
finished, but we would not log anything when the child is launched. When the
children log output without a prefix (as the kernel-install plugins do), it
is hard to see where that output is coming from.
2023-09-28 12:46:22 +02:00
Zbigniew Jędrzejewski-Szmek eb25844f83 kernel-install: describe usage as installkernel
For us, this is a compatibility mode, but most likely it is there to stay: the
kernel Makefile's install target expects to be able to call /bin/installkernel.
We want people who build their own kernels to use this, so that they use
kernel-install and get support for all the functionality provided by it,
including building of UKIs and other new features. So let's actually advertise
that this exists and works.
2023-09-28 12:40:28 +02:00
Bertrand Jacquin bdf58b47c3 resolved: never respond to .alt pseudo-TLD.
From RFC 9476:

Because names beneath .alt are in an alternative namespace, they have no
significance in the regular DNS context. DNS stub and recursive
resolvers do not need to look them up in the DNS context.

See: https://datatracker.ietf.org/doc/html/rfc9476#name-the-alt-namespace
2023-09-28 12:07:47 +02:00
Lennart Poettering 4ed9e2619c bootctl: highlight SecureBoot enabled state in green 2023-09-28 12:07:15 +02:00
Lennart Poettering a730a8f608 bootctl: if we can't access the ESP, show this in regular status output 2023-09-28 12:07:15 +02:00
Mike Yuan b92abd712e
Merge pull request #29333 from YHNdnzj/systemctl-warn-half-masked
systemctl-enable: warn if disabled/masked unit has active triggering units
2023-09-28 17:58:21 +08:00
Valentin David deafbeb0b9
sysupdate: Add documentation for new MatchPattern behavior 2023-09-28 11:41:29 +02:00
Valentin David 8b051623cd
sysupdate: Allow patterns to match path with directories
`MatchPattern` for regular-file and directory as target can now match
subdirectories This is useful to install files for examples in `.extra.d`
directories:

```
[Target]
Type=regular-file
Path=/EFI/Linux
PathRelativeTo=boot
MatchPattern=gnomeos_@v.efi.extra.d/apparmor.addon.efi
```

The if the directories in the path do not exist, they will be created.  Whereas
the part in `Path` is not created.
2023-09-28 11:41:29 +02:00
Lennart Poettering fa1f3aec33 bootctl: report if have been booted with a measured UKI
Just expose the result of efi_measured_uki() to the user.
2023-09-28 10:33:00 +02:00
Mike Yuan d708bb7c02
systemctl-enable: warn if disabled/masked units has active triggering units
Closes #311
2023-09-28 05:24:51 +08:00
Mike Yuan 0b675f97d6
systemctl-start: suppress the triggering unit warning when --no-warn 2023-09-28 05:24:51 +08:00
Mike Yuan 002db03f54
systemctl: clean up check_triggering_units
Preparation for #311
2023-09-28 05:24:51 +08:00
Mike Yuan 6ea32f61f3
systemctl: make unit_is_masked always query manager 2023-09-28 05:24:51 +08:00
Mike Yuan c36c81e467
systemctl: don't duplicate string needlessly 2023-09-28 05:14:42 +08:00
Mike Yuan 1f998158a9
systemctl: reflect that statically enabled units can be in .upholds/
Follow-up for 38f901791f
2023-09-28 05:14:42 +08:00
Luca Boccassi 89e7b9652b
Merge pull request #29353 from YHNdnzj/nft-followup
man/org.freedesktop.systemd1: add version info for NFTSet
2023-09-27 21:02:43 +01:00
Mike Yuan 05ae788d28
Merge pull request #29265 from YHNdnzj/sleep-util-refactor
sleep-util: split into three and first round of cleanups
2023-09-28 03:06:48 +08:00
Mike Yuan 6bd8340d11
man/org.freedesktop.systemd1: add version info for NFTSet
Follow-up for dc7d69b3c1
2023-09-28 03:04:28 +08:00
Mike Yuan 95f7492875
core/unit: use RET_GATHER in one more function 2023-09-28 03:00:13 +08:00
Topi Miettinen 435d523956 test: testing for core NFTSet= feature 2023-09-27 18:10:11 +00:00
Topi Miettinen 3bb48b19bd core: add user and group to NFTSet=
The benefit of using this setting is that user and group IDs, especially dynamic and random
IDs used by DynamicUser=, can be used in firewall configuration easily.

Example:

```
[Service]
NFTSet=user:inet:filter:serviceuser
```

Corresponding NFT rules:

```
table inet filter {
        set serviceuser {
                typeof meta skuid
        }
        chain service_output {
                meta skuid @serviceuser accept
                drop
        }
}
```

```
$ cat /etc/systemd/system/dunft.service
[Service]
DynamicUser=yes
NFTSet=user:inet:filter:serviceuser
ExecStart=/bin/sleep 1000

[Install]
WantedBy=multi-user.target
$ sudo nft list set inet filter serviceuser
table inet filter {
        set serviceuser {
                typeof meta skuid
                elements = { 64864 }
        }
}
$ ps -n --format user,group,pid,command -p `systemctl show dunft.service -P MainPID`
    USER    GROUP     PID COMMAND
   64864    64864   55158 /bin/sleep 1000
```
2023-09-27 18:10:11 +00:00
Topi Miettinen dc7d69b3c1 core: firewall integration of cgroups with NFTSet=
New directive `NFTSet=` provides a method for integrating dynamic cgroup IDs
into firewall rules with NFT sets. The benefit of using this setting is to be
able to use control group as a selector in firewall rules easily and this in
turn allows more fine grained filtering. Also, NFT rules for cgroup matching
use numeric cgroup IDs, which change every time a service is restarted, making
them hard to use in systemd environment.

This option expects a whitespace separated list of NFT set definitions. Each
definition consists of a colon-separated tuple of source type (only "cgroup"),
NFT address family (one of "arp", "bridge", "inet", "ip", "ip6", or "netdev"),
table name and set name. The names of tables and sets must conform to lexical
restrictions of NFT table names. The type of the element used in the NFT filter
must be "cgroupsv2". When a control group for a unit is realized, the cgroup ID
will be appended to the NFT sets and it will be be removed when the control
group is removed.  systemd only inserts elements to (or removes from) the sets,
so the related NFT rules, tables and sets must be prepared elsewhere in
advance.  Failures to manage the sets will be ignored.

If the firewall rules are reinstalled so that the contents of NFT sets are
destroyed, command systemctl daemon-reload can be used to refill the sets.

Example:

```
table inet filter {
...
        set timesyncd {
                type cgroupsv2
        }

        chain ntp_output {
                socket cgroupv2 != @timesyncd counter drop
                accept
        }
...
}
```

/etc/systemd/system/systemd-timesyncd.service.d/override.conf
```
[Service]
NFTSet=cgroup:inet:filter:timesyncd
```

```
$ sudo nft list set inet filter timesyncd
table inet filter {
        set timesyncd {
                type cgroupsv2
                elements = { "system.slice/systemd-timesyncd.service" }
        }
}
```
2023-09-27 18:10:11 +00:00
Lennart Poettering b28bd48238 update TODO 2023-09-27 19:08:56 +02:00
Lennart Poettering e92768004e core: generalize service_arm_timer() for all unit types 2023-09-27 17:37:02 +02:00
Lennart Poettering c5acfe18fb scope: also modernize state machine logging 2023-09-27 17:35:50 +02:00
Lennart Poettering 4d7da557e2 path: also modernize path state machine logging 2023-09-27 17:34:42 +02:00
Lennart Poettering e7912a08b4 timer: also modernize timer state machine error logging 2023-09-27 17:33:30 +02:00
Lennart Poettering bfeb10911e automount: also modernize log logic 2023-09-27 17:32:36 +02:00