We want to guard against concurrent modifications of profiles. We cannot
lock profiles, so what we instead do is expose (and bump) a version ID.
The user can check the version ID, plan ahead what to do, and tell
NetworkManager to only make the modification if no concurrent
modification was done. The conflict can be detected via the version ID.
The Update2() D-Bus call gets a parameter to only allow the request if
the version ID still matches.
nmcli should use this, but it is quite some effort to retry upon
concurrent modification. This is still to do.
Note that the user might make a decision that is based on multiple
profiles. As the new version-id is only per-profile, we cannot guard
against such inter-profile modifications. What would be needed, is a
UpdateMany() call, where we could modify multiple profiles at once, and
the action only takes effect if all version IDs show no concurrent
modification. That's not done yet, and maybe never will be.
By default, bond/bridge/team devices ignore carrier, and so do their
ports. However, it can make sense to set '[device*].ignore-carrier' for
the controller device. Meaningfully support that.
This is a follow up to commit 8c91422954 ('device: handle carrier
changes for master device differently'), which didn't fully solve the
problem.
What already works, is that when you set ignore-carrier for the
controller, then after loss of carrier and a carrier wait timeout, the
controller and ports go down. If both the controller and port profiles
have autoconnect disabled, they stay down and that's it. It works as
expected, but is not very useful, because when we want to automatically
react on carrier loss, we also want to automatically reconnect.
For controller profiles, carrier only makes sense when ports are
attached. However, we can (auto) activate controller profiles without
ports. So when the user enables autoconnect for the controller profile,
then the profile will eagerly reconnect. That means, after loss of
carrier, the device goes down and reconnects right away. It means, when
configuring a bond with ignore-carrier=no and autoconnect=yes, then
the sensible thing happens (an immediate reconnect). That is just not
a useful configuration.
The useful way to configure configure ignore-carrier=no for a controller
device, autoconnect on the master must be disabled while being enabled
on the ports. After all, it's the ports that will autoconnect based on
the carrier state and bring up the controller with them.
Note that at the moment when a port decides to autoconnect, the
controller profile is not yet selected. That only happens later during
_internal_activate_device() after searching it with find_master(). At
that point, the port profile checks whether it should autoconnect based
on its own carrier state, and abort if not.
If autoconnect is aborted due to lack of carrier, the profile gets
blocked from autoconnect with reason "failed". Hence, when the carrier
returns, we need to clear any "failed" blocked reasons and schedule
another autoconnect check,
Note that this really only works if the port is itself a simple device,
like an ethernet. If the port is itself a software device (like a bond,
or a VLAN), then the carrier state in _internal_activate_device() is
unknown, and we cannot avoid autoconnect. It's unclear how that could
make sense, if at all.
This setup can be combined with "connection.autoconnect-slaves=yes". In
that case, we have the first port to autoconnect when they get carrier,
bringing up the controller too. Usually the other ports that don't have
carrier would not autoconnect, but with autoconnect-slaves they will.
The effect is, that we autoconnect whenever any of the ports has
carrier, and then we immediately also bring up the ports that don't have
carrier (which we usually would not).
https://bugzilla.redhat.com/show_bug.cgi?id=2156684https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1658
Between ppp 2.4.8 and 2.4.9, "rp-pppoe.so" was renamed to "pppoe.so" (and a
symlink created). Between 2.4.9 and 2.5.0, the symlink was dropped.
See-also: b2c36e6c0e
I guess, NetworkManager always meant to use ppp's "(rp-)pppoe.so"
plugin, and never the library from the rp-pppoe project.
If a user actually wants to use the plugin from rp-pppoe project, then
this is going to break.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1312
Fixes: afe80171b2 ('ppp: move ppp code to "nm-pppd-compat.c"')
Previously, the ppp version was only detected (and used) at one place,
in "nm-pppd-compat.c", after including the ppp headers. That was nice
and easy.
However, with that way, we could only detect it after including ppp
headers, and given the ugliness of ppp headers, we only want to include
them in "nm-pppd-compat.c" (and nowhere else).
In particular, 'nm-pppd-compat.c" uses symbols from the ppp daemon, it
thus can only be linked into a ppp plugin, not in NetworkManager core
itself. But at some places we will need to know the ppp version, outside
of the ppp plugin and "nm-pppd-compat.c".
Additionally, detect it at configure time and place it in "config.h".
There is a static assert that we are in agreement with the two ways of
detection.
This makes the code more generic, where _match_section_infos_lookup()
accepts a match_data argument. Previously, it supported two hard-coded
approaches (from-device, from-platform).
Note that _match_section_infos_lookup() still accepts either a
"match_data" or a "device" argument. It might be nicer, if
_match_section_infos_lookup() would only accept "match_data". If a
caller wants lookup by "device", they would need to call
nm_match_spec_device_data_init_from_device() first. However, it's done
this way with a separate "device" argument, because often we don't have
any configuration to search for, and the initialization of the
match-data can be saved. So for the common case where we want to lookup
by "device", we initialize the "match-data" lazy and on demand.
Struct allow named arguments, which seems easier to maintain instead of
a function with many arguments. Also, adding a new parameter does not
require changes to most of the callers.
The real advantage of this is that we encode all the search parameters
in one argument. And we can add that argument to
_match_section_infos_lookup(), alongside lookup by NMDevice or
NMPlatformLink.
All callers eventually want a boolean instead of a NMMatchSpecMatchType.
I think the NMMatchSpecMatchType enum still has value at the lower
layers, where the enum values are clearer (when reading the code). So
don't drop NMMatchSpecMatchType entirely.
However, let's add nm_match_spec_match_type_to_bool() to convert the
match-type to a boolean to avoid duplicating the code.
Arguably, a kernel link is needed for DHCP and so the interface name
univocally identifies a device (for example, the OVS interface). But
for consistency and clarity, store the device type to be used for
logging.
When logging, messages include the interface name to specify what
device they refer to. In most case the interface name is unique.
There are some devices that don't have a kernel link associated, and
their interface name is not guaranteed to be unique. This is currently
the case for OVS bridges and OVS ports. When reading a log with
duplicate interface names, it is difficult to understand what is
happening. And this is made worse by the fact that it is common
practice to assign the same name to all devices in a OVS hierarchy
(bridge, port, interface).
To make logs unambiguous, we want to print the device type together
with the name; however we don't want to *always* print the type
because in most cases it's not useful and it would consume valuable
real estate on the screen. Adopt a simple heuristic of showing the
type only for OVS devices.
This commit adds a helper function to return the device type to show
in logs, when it is needed.
This is rather bad, because if we reach the "goto again" case,
the error variable is not cleared. Subsequently passing the
error location to nm_device_reapply_finish() will trigger a glib
warning.
Fixes: 29b0420be7 ('nm-cloud-setup: set preserve-external-ip flag during reapply')
Once we start reconfiguring the system, we need to finish on all
interfaces. Otherwise, we might reconfigure some interfaces, abort
and leave the network broken. When that happens, a subsequent run
might also be unable to recover, because we are unable to reach the
HTTP meta data service.
https://bugzilla.redhat.com/show_bug.cgi?id=2207812
Fixes: 69f048bf0c ('cloud-setup: add tool for automatic IP configuration in cloud')
network-online.target should not be reached before nm-cloud-setup
completes configuring the network, which may make user service get
started before the network is fully configured.
Setting nm-cloud-setup.service as "Before=network-online.target" would
maybe have already achieved that. However, also use a pre-up dispatcher
script, so that the device activation in NetworkManager is also waiting
for nm-cloud-setup to complete.
https://bugzilla.redhat.com/show_bug.cgi?id=2151040https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1653