Commit graph

99 commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek a37454bd90 man: update links to "API File Systems" 2024-05-28 14:48:56 +02:00
Zbigniew Jędrzejewski-Szmek d5c17aceb3 various: update links to more wiki pages 2024-05-28 14:48:53 +02:00
Daan De Meyer 7a66f21556 core: Add systemd.crash_action= kernel command line argument
Required for integration tests to power off on PID 1 crashes. We
deprecate systemd.crash_reboot and related options by removing them
from the documentation but still parsing them.
2024-04-29 14:34:22 +02:00
Jakub Sitnicki 97df75d7bd socket: pass socket FDs to all ExecXYZ= commands but ExecStartPre=
Today listen file descriptors created by socket unit don't get passed to
commands in Exec{Start,Stop}{Pre,Post}= socket options.

This prevents ExecXYZ= commands from accessing the created socket FDs to do
any kind of system setup which involves the socket but is not covered by
existing socket unit options.

One concrete example is to insert a socket FD into a BPF map capable of
holding socket references, such as BPF sockmap/sockhash [1] or
reuseport_sockarray [2]. Or, similarly, send the file descriptor with
SCM_RIGHTS to another process, which has access to a BPF map for storing
sockets.

To unblock this use case, pass ListenXYZ= file descriptors to ExecXYZ=
commands as listen FDs [4]. As an exception, ExecStartPre= command does not
inherit any file descriptors because it gets invoked before the listen FDs
are created.

This new behavior can potentially break existing configurations. Commands
invoked from ExecXYZ= might not expect to inherit file descriptors through
sd_listen_fds protocol.

To prevent breakage, add a new socket unit parameter,
PassFileDescriptorsToExec=, to control whether ExecXYZ= programs inherit
listen FDs.

[1] https://docs.kernel.org/bpf/map_sockmap.html
[2] https://lore.kernel.org/r/20180808075917.3009181-1-kafai@fb.com
[3] https://man.archlinux.org/man/socket.7#SO_INCOMING_CPU
[4] https://www.freedesktop.org/software/systemd/man/latest/sd_listen_fds.html
2024-03-27 01:41:26 +08:00
Sam Leonard f31cff849d
journald: implement socket forwarding
This commit adds a new way of forwarding journal messages - forwarding
over a socket.

The socket can be any of AF_INET, AF_INET6, AF_UNIUX or AF_VSOCK.

The address to connect to is retrieved from the "journald.forward_address" credential.

It can also be specified in systemd-journald's unit file with ForwardAddress=
2024-02-15 14:08:20 +00:00
Nick Cao 4be1fc8443 network: Add L3MasterDevice= into routing policy 2024-01-19 00:17:50 +00:00
Raito Bezarius b49595503d networkd: support proxy_arp_pvlan sysctl
The proxy ARP private VLAN sysctl is useful for VLAN aggregation, see
https://sysctl-explorer.net/net/ipv4/proxy_arp_pvlan/ for details.
2023-12-24 03:40:03 +09:00
Luca Boccassi 9e615fa3aa core: add WantsMountsFor=
This is the equivalent of RequiresMountsFor=, but adds Wants= instead
of Requires=. It will be useful for example for the autogenerated
systemd-cryptsetup units.

Fixes https://github.com/systemd/systemd/issues/11646
2023-11-29 11:04:59 +00:00
Zbigniew Jędrzejewski-Szmek 37edb704f9 test: shorten sample names, drop numerical prefixes
We don't care about the ordering, so we may just as well drop the numerical
prefixes that we normally use for sorting. Also rename some other samples
to keep width of output down to reasonable width.
2023-09-02 17:32:19 +03:00
Daan De Meyer 9c0c670125 core: Add RootEphemeral= setting
This setting allows services to run in an ephemeral copy of the root
directory or root image. To make sure the ephemeral copies are always
cleaned up, we add a tmpfiles snippet to unconditionally clean up
/var/lib/systemd/ephemeral. To prevent in use ephemeral copies from
being cleaned up by tmpfiles, we use the newly added COPY_LOCK_BSD
and BTRFS_SNAPSHOT_LOCK_BSD flags to take a BSD lock on the ephemeral
copies which instruct tmpfiles to not touch those ephemeral copies as
long as the BSD lock is held.
2023-06-21 12:48:46 +02:00
Zbigniew Jędrzejewski-Szmek e2e736cbbd fuzz: rename long samples
With those long filenames, output doesn't fit on the terminal.
2023-05-18 15:23:27 +02:00
Mike Yuan b5b1351317
test: add tests for UpheldBy= in [Install] section 2023-05-15 15:04:38 +08:00
Yu Watanabe 054749e413 core: add missing MemoryPressureWatch= and MemoryPressureThresholdSec= setting
Follow-up for #26393.

Addresses https://github.com/systemd/systemd/pull/26393#issuecomment-1458655798.
2023-03-09 23:43:04 +09:00
Quentin Deslandes 523ea1237a journal: log filtering options support in PID1
Define new unit parameter (LogFilterPatterns) to filter logs processed by
journald.

This option is used to store a regular expression which is carried from
PID1 to systemd-journald through a cgroup xattrs:
`user.journald_log_filter_patterns`.
2022-12-15 09:57:39 +00:00
Pasha Vorobyev d7fe0a6723 MemoryZSwapMax directive to configure new memory.zswap.max cgroup file 2022-11-15 21:15:37 +01:00
Michal Koutný 7e343b530e meson: Generate fuzzer inputs with directives
The lists of directives for fuzzer tests are maintained manually in the
repo. There is a tools/check-directives.sh script that runs during test
phase and reports stale directive lists.
Let's rework the script into a generator so that these directive files
are created on-the-flight and needn't be updated whenever a unit file
directives change. The scripts is rewritten in Python to get rid of gawk
dependency and each generated file is a separate meson target so that
incremental builds refresh what is just necessary (and parallelize
(negligible)).

Note: test/fuzz/fuzz-unit-file/directives-all.slice is kept since there
is not automated way to generate it (it is not covered by the check
script neither).
2022-10-20 14:43:50 +02:00
Lennart Poettering 351f7d5143 fuzz: add ConditionCredential= to fuzz files, and sort their sections 2022-07-15 10:53:45 +02:00
nl6720 0e68582323 tree-wide: link to docs.kernel.org for kernel documentation
https://www.kernel.org/ links to https://docs.kernel.org/ for the documentation.
See https://git.kernel.org/pub/scm/docs/kernel/website.git/commit/?id=ebc1c372850f249dd143c6d942e66c88ec610520

These URLs are shorter and nicer looking.
2022-07-04 19:56:53 +02:00
Yu Watanabe b48ed70c79 Revert NFTSet feature
This reverts PR #22587 and its follow-up commit. More specifically,
2299b1cae3 (partially),
e176f85527,
ceb46a31a0, and
51bb9076ab.

The PR was merged without final approval, and has several issues:
- OSS fuzz reported issues in the conf parser,
- It calls synchrnous netlink call, it should not be especially in PID1,
- The importance of NFTSet for CGroup and DynamicUser may be
  questionable, at least, there was no justification PID1 should support
  it.
- For networkd, it should be implemented with Request object,
- There is no test for the feature.

Fixes #23711.
Fixes #23717.
Fixes #23719.
Fixes #23720.
Fixes #23721.
Fixes #23759.
2022-06-22 22:23:58 +09:00
Topi Miettinen 46c3b1ff88 core: firewall integration with DynamicUserNFTSet=
New directive `DynamicUserNFTSet=` provides a method for integrating
configuration of dynamic users into firewall rules with NFT sets.

Example:
```
table inet filter {
        set u {
                typeof meta skuid
        }

        chain service_output {
                meta skuid != @u drop
                accept
        }
}
```

```
/etc/systemd/system/dunft.service
[Service]
DynamicUser=yes
DynamicUserNFTSet=inet:filter:u
ExecStart=/bin/sleep 1000

[Install]
WantedBy=multi-user.target
```

```
$ sudo nft list set inet filter u
table inet filter {
        set u {
                typeof meta skuid
                elements = { 64864 }
        }
}
$ ps -n --format user,group,pid,command -p `pgrep sleep`
    USER    GROUP     PID COMMAND
   64864    64864   55158 /bin/sleep 1000
```
2022-06-08 16:12:25 +00:00
Topi Miettinen c0548df0a2 core: firewall integration with ControlGroupNFTSet=
New directive `ControlGroupNFTSet=` provides a method for integrating services
into firewall rules with NFT sets.

Example:

```
table inet filter {
...
        set timesyncd {
                type cgroupsv2
        }

        chain ntp_output {
                socket cgroupv2 != @timesyncd counter drop
                accept
        }
...
}
```

/etc/systemd/system/systemd-timesyncd.service.d/override.conf
```
[Service]
ControlGroupNFTSet=inet:filter:timesyncd
```

```
$ sudo nft list set inet filter timesyncd
table inet filter {
        set timesyncd {
                type cgroupsv2
                elements = { "system.slice/systemd-timesyncd.service" }
        }
}
```
2022-06-08 16:12:25 +00:00
Benjamin Franzke 92897d768d tree-wide: replace obsolete wiki links with systemd.io/manpages
All wiki pages that contain a deprecation banner
pointing to systemd.io or manpages are updated to
point to their replacements directly.

Helpful command for identification of available links:
git grep freedesktop.org/wiki | \
    sed "s#.*\(https://www.freedesktop.org/wiki[^ $<'\\\")]*\)\(.*\)#\\1#" | \
    sort | uniq
2022-05-21 14:29:14 +02:00
Eduard Tolosa bb5824c9ab
Add ConditionCPUFeature to load-fragment-gperf.gperf (#23076)
Fixes #23075
2022-04-14 15:30:03 +09:00
Luca Boccassi aff3a9e1fa watchdog: add setting to configure pretimeout governor 2022-02-22 17:19:54 +00:00
Curtis Klein 5717062e93 watchdog: Add watchdog pretimeout support
Add support for managing and configuring watchdog pretimeout values if
the watchdog hardware supports it. The ping interval is adjusted to
account for a pretimeout so that it will still ping at half the timeout
interval before a pretimeout event would be triggered. By default the
pretimeout defaults to 0s or disabled.

The RuntimeWatchdogPreSec config option is added to allow the pretimeout
to be specified (similar to RuntimeWatchdogSec). The
RuntimeWatchdogPreUSec dbus property is added to override the pretimeout
value at runtime (similar to RuntimeWatchdogUSec). Setting the
pretimeout to 0s will disable the pretimeout.
2022-02-22 17:19:54 +00:00
Alvin Šipraga 19ff06b3a4
udev/net: support Match.Firmware= in .link files (#22462)
In cbcdcaaa0e ("Add support for conditions on the machines firmware")
a new Firmware= directive was added for .netdev and .network files.
While it was also documented to work on .link files, in actual fact the
support was missing. Add that one extra line to make it work, and also
update the fuzzer directives.
2022-02-10 16:19:28 +09:00
Santa Wiryaman 97f27f8a16 Add support for isolated parameter
Add the "Isolated" parameter in the *.network file, e.g.,

[Bridge]
Isolated=true|false

When the Isolated parameter is true, traffic coming out of this port
will only be forward to other ports whose Isolated parameter is false.

When Isolated is not specified, the port uses the kernel default
setting (false).

The "Isolated" parameter was introduced in Linux 4.19.
See man bridge(8) for more details.
But even though the kernel and bridge/iproute2 recognize the "Isolated"
parameter, systemd-networkd did not have a way to set it.
2022-02-09 17:37:37 +09:00
Luca Boccassi a07b992606 core: add ExtensionDirectories= setting
Add a new setting that follows the same principle and implementation
as ExtensionImages, but using directories as sources.
It will be used to implement support for extending portable images
with directories, since portable services can already use a directory
as root.
2022-01-21 22:53:12 +09:00
Luca Boccassi 47dba9fb09 path unit: add TriggerLimitBurst= and TriggerLimitIntervalSec=
Given there's now a default for these settings, also allow users to configure
them, matching socket units
2021-12-18 23:17:53 +00:00
Luca Boccassi 81513b382b core: add Condition[Memory/CPU/IO]Pressure
By default checks PSI on /proc/pressure, and causes a unit to be skipped
if the threshold is above the given configuration for the avg300
measurement.
Also allow to pass a custom timespan, and a particular slice unit to
check under.

Fixes #20139
2021-12-01 09:53:18 +01:00
Andrew Stone 7c5cef2211 core/automount: Add ExtraOptions field 2021-11-23 09:44:35 +01:00
Slava Bacherikov af493fb742 network: Add SuppressInterfaceGroup= into routing policy
This adds SuppressInterfaceGroup= option in the [RoutingPolicyRule] section
which has the same semantics as suppress_ifgroup in `ip rule` command.
2021-11-16 01:54:07 +09:00
Zbigniew Jędrzejewski-Szmek e2de2d28f4
Merge pull request #20813 from unusual-thoughts/exittype_v2
Reintroduce ExitType
2021-11-08 15:06:37 +01:00
Henri Chain 596e447076 Reintroduce ExitType
This introduces `ExitType=main|cgroup` for services.
Similar to how `Type` specifies the launch of a service, `ExitType` is
concerned with how systemd determines that a service exited.

- If set to `main` (the current behavior), the service manager will consider
  the unit stopped when the main process exits.

- The `cgroup` exit type is meant for applications whose forking model is not
  known ahead of time and which might not have a specific main process.
  The service will stay running as long as at least one process in the cgroup
  is running. This is intended for transient or automatically generated
  services, such as graphical applications inside of a desktop environment.

Motivation for this is #16805. The original PR (#18782) was reverted (#20073)
after realizing that the exit status of "the last process in the cgroup" can't
reliably be known (#19385)

This version instead uses the main process exit status if there is one and just
listens to the cgroup empty event otherwise.

The advantages of a service with `ExitType=cgroup` over scopes are:
- Integrated logging / stdout redirection
- Avoids the race / synchronisation issue between launch and scope creation
- More extensive use of drop-ins and thus distro-level configuration:
  by moving from scopes to services we can have drop ins that will affect
  properties that can only be set during service creation,
  like `OOMPolicy` and security-related properties
- It makes systemd-xdg-autostart-generator usable by fixing [1], as obviously
  only services can be used in the generator, not scopes.

[1] https://bugs.kde.org/show_bug.cgi?id=433299
2021-11-08 10:15:23 +01:00
Daan De Meyer 51462135fb exec: Add TTYRows and TTYColumns properties to set TTY dimensions 2021-11-05 21:32:14 +00:00
Iago López Galeiras e59ccd035c core: add RestrictFileSystems= fragment parser
It takes an allow or deny list of filesystems services should have
access to.
2021-10-06 10:52:14 +02:00
Albert Brox 5918a93355 core: implement RuntimeMaxDeltaSec directive 2021-09-28 16:46:20 +02:00
alexlzhu 8c35c10d20 core: Add ExecSearchPath parameter to specify the directory relative to which binaries executed by Exec*= should be found
Currently there does not exist a way to specify a path relative to which
all binaries executed by Exec should be found. The only way is to
specify the absolute path.

This change implements the functionality to specify a path relative to which
binaries executed by Exec*= can be found.

Closes #6308
2021-09-28 14:52:27 +01:00
Peter Morrow 1b75e5f343 fuzz: list directives in alphabetical order 2021-09-24 14:43:01 +01:00
Peter Morrow 88a56dc8d6 fuzz: add StartupAllowedCPUs and StartupAllowedMemoryNodes to directives
Signed-off-by: Peter Morrow <pemorrow@linux.microsoft.com>
2021-09-15 09:52:12 +01:00
Mauricio Vásquez 4f0c25c794 core: add load fragment implementation for RestrictNetworkInterfaces=
Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
2021-08-18 15:55:53 -05:00
Lennart Poettering 43144be4a1 pid1: add support for encrypted credentials 2021-07-08 09:30:56 +02:00
Zbigniew Jędrzejewski-Szmek abaf5edd08 Revert "Introduce ExitType"
This reverts commit cb0e818f7c.

After this was merged, some design and implementation issues were discovered,
see the discussion in #18782 and #19385. They certainly can be fixed, but so
far nobody has stepped up, and we're nearing a release. Hopefully, this feature
can be merged again after a rework.

Fixes #19345.
2021-06-30 21:56:47 +02:00
Luca Boccassi 1e26f8a60b core: add ConditionOSRelease= directive 2021-06-24 13:57:48 +01:00
Lennart Poettering 0bc488c99a core: implement Uphold= dependency type
This is like a really strong version of Wants=, that keeps starting the
specified unit if it is ever found inactive.

This is an alternative to Restart= inside a unit, acknowledging the fact
that whether to keep restarting the unit is sometimes not a property of
the unit itself but the state of the system.

This implements a part of what #4263 requests. i.e. there's no
distinction between "always" and "opportunistic". We just dumbly
implement "always" and become active whenever we see no job queued for
an inactive unit that is supposed to be upheld.
2021-05-25 16:03:03 +02:00
Lennart Poettering 294446dcb9 core: add new OnSuccess= dependency type
This is similar to OnFailure= but is activated whenever a unit returns
into inactive state successfully.

I was always afraid of adding this, since it effectively allows building
loops and makes our engine Turing complete, but it pretty much already
was it was just hidden.

Given that we have per-unit ratelimits as well as an event loop global
ratelimit I feel safe to add this finally, given it actually is useful.

Fixes: #13386
2021-05-25 16:03:03 +02:00
Lennart Poettering ffec78c05b core: add new PropagateStopTo= dependency (and inverse)
This takes inspiration from PropagatesReloadTo=, but propagates
stop jobs instead of restart jobs.

This is defined based on exactly two atoms: UNIT_ATOM_PROPAGATE_STOP +
UNIT_ATOM_RETROACTIVE_STOP_ON_STOP. The former ensures that when the
unit the dependency is originating from is stopped based on user
request, we'll propagate the stop job to the target unit, too. In
addition, when the originating unit suddenly stops from external causes
the stopping is propagated too. Note that this does *not* include the
UNIT_ATOM_CANNOT_BE_ACTIVE_WITHOUT atom (which is used by BoundBy=),
i.e. this dependency is purely about propagating "edges" and not
"levels", i.e. it's about propagating specific events, instead of
continious states.

This is supposed to be useful for dependencies between .mount units and
their backing .device units. So far we either placed a BindsTo= or
Requires= dependency between them. The former gave a very clear binding
of the to units together, however was problematic if users establish
mounnts manually with different block device sources than our
configuration defines, as we there might come to the conclusion that the
backing device was absent and thus we need to umount again what the user
mounted. By combining Requires= with the new StopPropagatedFrom= (i.e.
the inverse PropagateStopTo=) we can get behaviour that matches BindsTo=
in every single atom but one: UNIT_ATOM_CANNOT_BE_ACTIVE_WITHOUT is
absent, and hence the level-triggered logic doesn't apply.

Replaces: #11340
2021-05-25 16:03:03 +02:00
Zbigniew Jędrzejewski-Szmek 3968ccd0cd core: fix crash in BPFProgram parsing
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=33270.
2021-05-05 17:15:04 +02:00
Zbigniew Jędrzejewski-Szmek cc87b3f68f core: fix crash in parsing of SocketBind{Allow,Deny}=
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=33876.
2021-05-05 17:14:58 +02:00
Uwe Kleine-König cbcdcaaa0e Add support for conditions on the machines firmware
This allows to limit units to machines that run on a certain firmware
type. For device tree defined machines checking against the machine's
compatible is also possible.
2021-04-28 10:55:55 +02:00