Background
==========
Imagine you run a container on your machine. Then the routing table
might look like:
default via 10.0.10.1 dev eth0 proto dhcp metric 100
10.0.10.0/28 dev eth0 proto kernel scope link src 10.0.10.5 metric 100
[...]
10.42.0.0/24 via 10.42.0.0 dev flannel.1 onlink
10.42.1.2 dev cali02ad7e68ce1 scope link
10.42.1.3 dev cali8fcecf5aaff scope link
10.42.2.0/24 via 10.42.2.0 dev flannel.1 onlink
10.42.3.0/24 via 10.42.3.0 dev flannel.1 onlink
That is, there are another interfaces with subnets and specific routes.
If nm-cloud-setup now configures rules:
0: from all lookup local
30400: from 10.0.10.5 lookup 30400
32766: from all lookup main
32767: from all lookup default
and
default via 10.0.10.1 dev eth0 table 30400 proto static metric 10
10.0.10.1 dev eth0 table 30400 proto static scope link metric 10
then these other subnets will also be reached via the default route.
This container example is just one case where this is a problem. In
general, if you have specific routes on another interface, then the
default route in the 30400+ table will interfere badly.
The idea of nm-cloud-setup is to automatically configure the network for
secondary IP addresses. When the user has special requirements, then
they should disable nm-cloud-setup and configure whatever they want.
But the container use case is popular and important. It is not something
where the user actively configures the network. This case needs to work better,
out of the box. In general, nm-cloud-setup should work better with the
existing network configuration.
Change
======
Add new routing tables 30200+ with the individual subnets of the
interface:
10.0.10.0/24 dev eth0 table 30200 proto static metric 10
[...]
default via 10.0.10.1 dev eth0 table 30400 proto static metric 10
10.0.10.1 dev eth0 table 30400 proto static scope link metric 10
Also add more important routing rules with priority 30200+, which select
these tables based on the source address:
30200: from 10.0.10.5 lookup 30200
These will do source based routing for the subnets on these
interfaces.
Then, add a rule with priority 30350
30350: lookup main suppress_prefixlength 0
which processes the routes from the main table, but ignores the default
routes. 30350 was chosen, because it's in between the rules 30200+ and
30400+, leaving a range for the user to configure their own rules.
Then, as before, the rules 30400+ again look at the corresponding 30400+
table, to find a default route.
Finally, process the main table again, this time honoring the default
route. That is for packets that have a different source address.
This change means that the source based routing is used for the
subnets that are configured on the interface and for the default route.
Whereas, if there are any more specific routes in the main table, they will
be preferred over the default route.
Apparently Amazon Linux solves this differently, by not configuring a
routing table for addresses on interface "eth0". That might be an
alternative, but it's not clear to me what is special about eth0 to
warrant this treatment. It also would imply that we somehow recognize
this primary interface. In practise that would be doable by selecting
the interface with "iface_idx" zero.
Instead choose this approach. This is remotely similar to what WireGuard does
for configuring the default route ([1]), however WireGuard uses fwmark to match
the packets instead of the source address.
[1] https://www.wireguard.com/netns/#improved-rule-based-routing
The table number is chosen as 30400 + iface_idx. That is, the range is
limited and we shouldn't handle more than 100 devices. Add a check for
that and error out.
The routes/rules that are configured are independent of the
order in which we process the devices. That is, because they
use the "iface_idx" for cases where there is ambiguity.
Still, it feels nicer to always process them in a defined order.
Sorted by iface_idx. The iface_idx is probably something useful and
stable, provided by the provider. E.g. it's the order in which
interfaces are exposed on the meta data.
get-config() gives a NMCSProviderGetConfigResult structure, and the
main part of data is the GHashTable of MAC addresses and
NMCSProviderGetConfigIfaceData instances.
Let NMCSProviderGetConfigIfaceData also have a reference to the MAC
address. This way, I'll be able to create a (sorted) list of interface
datas, that also contain the MAC address.
nm-cloud-setup automatically configures the network. That may conflict
with what the user wants. In case the user configures some specific
setup, they are encouraged to disable nm-cloud-setup (and its
automatism).
Still, what we do by default matters, and should play as well with
user's expectations. Configuring policy routing and a higher priority
table (30400+) that hijacks the traffic can cause problems.
If the system only has one IPv4 address and one interface, then there
is no point in configuring policy routing at all. Detect that, and skip
the change in that case.
Note that of course we need to handle the case where previously multiple
IP addresses were configured and an update gives only one address. In
that case we need to clear the previously configured rules/routes. The
patch achieves this.
Now that we return a struct from get_config(), we can have system-wide
properties returned.
Let it count and cache the number of valid iface_datas.
Currently that is not yet used, but it will be.
Returning a struct seems easier to understand, because then the result
is typed.
Also, we might return additional results, which are system wide and not
per-interface.
The majority of times when I call this script, I want it to do the reformatting,
not the check-only mode. This is also because we use git, so I start with a
clean working directory and run the reformatting code. In the best case, there
is nothing to reformat, and all is good. I seldom want to only check.
Change the default of the script.
"nm-code-format.sh" is going to change the default behavior from "-n" to
"-i", that is, from check-only to reformat. Explicitly pass "-n" where
we want it.
"nm-code-format.sh" is going to change the default behavior from "-n" to
"-i", that is, from check-only to reformat. Explicitly pass "-n" where
we want it.
"assuming" means to gracefully take over after restart. The result
should be a working configuration with a device fully managed by
NetworkManager.
If we are assuming, and the interface is down we still want to set it
up.
I don't think this warrants a warning. It's important to keep the number
of warnings and errors in the log low, and only print such messages if
there is really something that requires attention by the user. If you
run without /etc/network/interfaces, then this is pretty much expected
and the warning isn't going to tell you anything useful.
NML3Cfg tracks the state of each object (that is addresses and routes).
Previously, it had a boolean flag "os_in_platform", that should be
true if (and only if) we have a corresponding NMPObject in the platform
cache.
But NMPObjects are immutable and ref-counted. That means, we can just as
well track the reference to the NMPObject from the cache. The advantage
is that we have an index (dictionary) to find the object state, and by
tracking the platform object, we have it easily accessible.
NML3ConfigData is supposed to be immutable. It can be initialized from a
NMConnection, and its DNS priority property might be zero.
For the DNS priority, the value can be overwritten by global defaults.
We thus need to inject the default value at the right place.
We will use these values from NML3Cfg, and it seems wrong that NML3Cfg
would include "dns/nm-dns-manager.h" for this.
Enums are very "static". They have no logic, and there is less need to
separate the code well. Meaning, it doesn't hurt to define this enum
in "libnm-base/nm-base.h" which can be included by (almost) anybody.
There was always the idea that you could pass paths and filenames
to "nm-code-format.sh" to format only a subset. However, the script
also needs to honor files that should be excluded and don't need
formatting.
Previously, what was implemented via `git ls-files -- ':(exclude)...'`
command, but git-ls-files has a bug ([1]) and might not list all files.
Refactor and do the filtering ourselves.
[1] https://www.spinics.net/lists/git/msg397982.html
Starting with OVS plugin installed but OVS service stopped, would lead to
<trace> [1631531732.8896] ovsdb: connect: opening /run/openvswitch/db.sock failed ("error connecting socket (No such file or directory)"). Retry with nm-sudo
...
<trace> [1631531732.9751] ovsdb: connect: failure to get FD from nm-sudo: GDBus.Error:org.gtk.GDBus.UnmappedGError.Quark._g_2dio_2derror_2dquark.Code1: error connecting socket (No such file or directory)
If we already know that the socket file does not exist, we don't need to ask nm-sudo.
That would only make sense, if nm-sudo somehow saw a different file systemd than
NetworkManager, but that is (currently) not the case.
The memory layout of the NMPlatformIPAddress structure changed. The unit test
needs to be adjusted.
Fixes: 9ec9a92f17 ('platform: avoid bitfield at end of __NMPlatformIPAddress_COMMON macro')
NetworkManager (the daemon) has no defined working directory, so
it can only handle absolute path names. This is in general and also for
the LoadConnections() D-Bus call.
That means, nmcli should make relative paths absolute.
We don't use g_canonicalize_filename() because that also cleans up
double slash and "/./". I don't think we should do that in this case, we
should only prepend $PWD to make the path absolute.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/794
Due to this, `nmcli connection load` would also not print a warning
about failure to load obviously bogus files:
$ nmcli connection load /bogus
Note that load is also used to unload files, so if the file name is a
possibly valid name for a non-existing file, there is no failure. For
example, we get no warning for
$ nmcli connection load /etc/NetworkManager/system-connections/bogus
Even if currently no such file is loaded, then the operation would still
silently succeed, instead of succeeding the first time only. That is because
load should be idempotent.
[thaller@redhat.com: rewrote commit message]
Fixes: 4af6219226 ('libnm: implement nm_client_load_connections_async() by using GDBusConnection directly')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/794https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/979
NMPlatformIPAddress, NMPlatformIP4Address and NMPlatformIP6Address are supposed
to have a common first part, which is address family agnostic. For that, the
is the macro __NMPlatformIPAddress_COMMON which defines the first fields.
Something similar is also done for routes and object types that have an ifindex.
Anyway, __NMPlatformIPAddress_COMMON used to have a bitfield as last element.
In particular NMPlatformIP4Address then has a bitfield as first IPv4 specific
field. With this it's not clear to me that the alignment is guaranteed
to be the same for all structs.
Avoid the trailing bitfield at __NMPlatformIPAddress_COMMON to workaround
this potential problem.
This helper class is supposed to encapsulate most logic about
configuring IPv6 link local addresses and exposes a simpler API in order
to simplify NMDevice. Currently this logic is spread out in NMDevice.
Also, NML3IPv6LL directly uses NML3Cfg, thereby freeing NMDevice to care
about that too much.
For several reasons, NML3IPv6LL works different than NML3IPv4LL.
For one, with IPv6 we need to configure the address in kernel, which does
DAD for us. So, NML3IPv6LL will tell NML3Cfg to configure those
addresses that it wants to probe. For IPv4, it only tells NML3Cfg to do
ACD, without configuring anything yet. That is left to the caller.
NML3Cfg tracks state about all addresses/routes. It needs that (at
least) for the following reaons:
1) if a address/route gets added by NetworkManager and then gets
externally removed then it is presumed that the user did this. In this
case, we remember that ("externally-removed") to not re-add the
address/route, until we do a full reapply. This was previously
tracked as "externally_removed_objs_hash".
2) when NML3Cfg configures a address/route in kernel, and later the
address/route is no longer to be configured, then NML3Cfg needs to
delete it again. It thus needs to remember which addresses/routes
it configured earlier to remove them. This was previously tracked via
"last_addresses_x" and "last_routes_x".
3) kernel rejects configuring certain routes while a related IPv6
address is still tentative. That means, NML3Cfg needs to detect that,
remember it, and retry later. That is previously tracked as
"routes_temporary_not_available_hash".
4) during NM_L3_CFG_COMMIT_TYPE_ASSUME, we don't remove extraneous
and don't add missing addresses/routes. This commit mode is done
while assuming a device, that is, gracefully taking over after
a restart. However, sometimes while assuming a device we forcefully
want to configure an address/route. That happens for example if we
do IPv6 link local addressing. Then we really want to add that
address/route, even in assume mode. That is what the
NM_L3CFG_CONFIG_FLAGS_ASSUME_CONFIG_ONCE flag does, and to implement
that we need to track whether we already tried to add the
address/route previously. This is something new.
Consolidate these various states in a new "obj_state_hash" and
"ObjStateData" structure. This solves above points the following way:
1) to track externally removed objects, we have a flag in ObjStateData
that indicates whether the object was every configured and whether
it currently is configured. Based on that we make decisions to
configure (or not) an address. See "_obj_states_sync_filter()".
2) we now mark objects that NML3Cfg configured, which are still in platform
and which are no longer to be configured as "zombies".
3) this is now tracked via ObjStateData's "os_temporary_not_available_lst".
4) with the available ObjStateData we can make appropriate decisions
in "_obj_states_sync_filter()".
It's a bit tricky how this flag works. It's needed for IPv6
link local addresses, which commits changes in %NM_L3_CFG_COMMIT_TYPE_ASSUME
mode. See the code comments how it works.
This commit only adds the flags and let's the NMPlatformIP{Address,Route}
properly track it. What is still needed is to actually implement any
meaning to that during the sync.
This flag is only relevant for IPv4. That is, because the way we do
ACD/DAD is fundamentally different between IPv4 and IPv6. For IPv4, we
use libn-acd while IPv6 we configure the address in kernel and wait for
the tentative flag to go away.
NMPlatformIP{Address,Route} are mainly the structs that we receive via
netlink and get cached in the NMPlatform cache.
However, the same structures are also used by the upper layers to track
which addresses to add.
Add a flag to addresses and routes, for a certain behavior, relevant
during NML3Cfg commit. The idea is that during commits for NML3Cfg of
type NM_L3_CFG_COMMIT_TYPE_ASSUME, no new addresses are added that
are not already configured. In some cases, we want to override that,
and need a flag to track that. More about that later.