The old Linux used 8-bit rtm_table field of the RTM_NEWROUTE message to
specify routing table id. Modern netlink uses RTA_TABLE 32-bit attribute.
Unfortunately, there is modern software (namely bird) that would prefer
the old API as long as the routing table id fits into 8-bit.
PR: 279662
These take struct ifreq and struct in6_ifreq respectively. Passing struct
in_aliasreq or struct in6_aliasreq means we're supplying a shorter object than
expected. While this doesn't actively break things on most architectures other
than CHERI it is still wrong.
Reported by: CheriBSD
Event: Kitchener-Waterloo Hackathon 202406
Although these particular constants aren't supported, the incorrect
values break bird 2.15 operation.
PR: 277618
Reported by: Ondrej Zajicek <santiago@crfreenet.org>
Implement Netlink socket receive buffer as a simple TAILQ of nl_buf's,
same part of struct sockbuf that is used for send buffer already.
This shaves a lot of code and a lot of extra processing. The pcb rids
of the I/O queues as the socket buffer is exactly the queue. The
message writer is simplified a lot, as we now always deal with linear
buf. Notion of different buffer types goes away as way as different
kinds of writers. The only things remaining are: a socket writer and
a group writer.
The impact on the network stack is that we no longer use mbufs, so
a workaround from d187154750 disappears.
Note on message throttling. Now the taskqueue throttling mechanism
needs to look at both socket buffers protected by their respective
locks and on flags in the pcb that are protected by the pcb lock.
There is definitely some room for optimization, but this changes tries
to preserve as much as possible.
Note on new nl_soreceive(). It emulates soreceive_generic(). It
must undergo further optimization, see large comment put in there.
Note on tests/sys/netlink/test_netlink_message_writer.py. This test
boiled down almost to nothing with mbufs removed. However, I left
it with minimal functionality (it basically checks that allocating N
bytes we get N bytes) as it is one of not so many examples of ktest
framework that allows to test KPIs with python.
Note on Linux support. It got much simplier: Netlink message writer
loses notion of Linux support lifetime, it is same regardless of
process ABI. On socket write from Linux process we perform
conversion immediately in nl_receive_message() and on an output
conversion to Linux happens in in nl_send_one(). XXX: both
conversions use M_NOWAIT allocation, which used to be the case
before this change, too.
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D42524
route add <host> -iface <netif>" for a netif without an IPv4/IPv6
address fails with EINVAL. Need to use a link-level ifaddr for gw if
an ifaddr for dst is not found as the rtsock-based implementation does.
PR: 275341
Reported by: Sean Cody <sean@tinfoilhat.ca>
Reviewed by: rcm
Tested by: rcm
Approved by: kp (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D41330
The netlink newneigh handler has the potential to leak the lock on
llentry objects in the kernel. This patch reconciles several paths
through the newneigh handler that could result in a lock leak.
MFC after: 1 week
Reviewed by: markj, kp
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D42307
Move the NETLINK define into opt_global.h so we can rely on it being
set correctly, without having to remember to include opt_netlink.h.
This ensures that the NETLINK define is correctly set. If not we
may end up with unloadable modules, due to missing symbols (such as
nlmsg_get_group_writer).
PR: 274306
Reviewed by: imp, markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42179
The check for if_addrlen in dump_iface() is not sufficient to determine
if we still have a valid if_addr. Rather than directly accessing if_addr
check the STAILQ (for the first entry).
This avoids panics when destroying cloned interfaces as experienced with
net80211 wlan ones.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Reviewed by: jhibbits (earlier version), kp
Differential Revision: https://reviews.freebsd.org/D42027
This change exports interface capabilities using the standard
Netlink attribute type, bitset, and switches `ifconfig(8)` to use
it when displaying interface data.
Bitset comes in two representations. The first one is "compact",
where the bits are exported via two arrays - "mask" listing the
"valid" bits and "values, providing the values for those bits.
The second one is more verbose, listing each bit as a separate item,
with its name, id and value. The latter option is handy when submitting
update requests.
The support for setting capabilities will be added in the upcoming diffs.
Differential Revision: https://reviews.freebsd.org/D40331
Some context on the current IPv6 interface setup & address management:
There are two data path for IPv6 initialisation in context of assigning
LL addresses:
1) Userland explicitly requests IFF_UP for the interface w/o any addresses.
if_up() then calls in6_if_up(), which calls in6_ifattach().
The latter sets up some initial ND/IN6 state and disables IPv6 for the
interface if it’s not loopback. If the interface is loopback, then it
adds ::1/128 and LL addresses via in6_ifattach_loopback().
Then, devd notification is generated (if the VNET is the default one),
which triggers rc.network ifconfig_up(), causing ifdisabled to be removed
via SIOCSIFINFO_IN6 from ifconfig. The kernel SIOCSIFINFO_IN6 handler
calls in6_if_up() once again and it assigns the interface link-local address.
2) Userland adds IPv4 or IPv6 address to the interface. SIOCAIFADDR[_IN6]
kernel handler calls IPv4/IPv6 protocol handler to add the address.
Both then call if_ioctl() with SIOCSIFADDR. Ethernet/loopback ioctl handlers
silently sets IFF_UP for the interface. Finally, if.c:ifioctl() wrapper code
compares old and new interface flags and, if IFF_UP is added, it explicitly
calls in6_if_up(), which adds link-local address if either the original
address is IPv6 or the interface is loopback.
In the latter case, “formal” interface-up notifications are missing.
The kernel does not trigger event handler event, does not call carp hook
and does not provide any userland notification.
This diff unifies the event handling in both scenarios, providing the
necessary notifications to the kernel and userland.
Reviewed By: kp
Differential Revision: https://reviews.freebsd.org/D40332
MFC after: 2 weeks
Adding P2P addresses is complex in both ioctl and Netlink.
In the ioctl interface, "broadcast" field is the same field as the
"peer". In is possible to specify non-p2p address for the p2p
interface in IPv6, but not in IPv4.
In the Netlink interface, "address" field means "peer" address.
As a result, a common notion for the Netlink users is to submit
same address/peer for non-P2P interfaces.
This change customises mapping the attribute on per-family basis.
Specifically,
for IPv4 - if the interface is P2P, assume "address" is p2p and
"local" is the address. If the interfase is non-p2p, use "local"
attribute as the address. If it's not set, use "address" attribute.
for IPv6 - start with "local" attribute as the address. If it's not set,
use use "address" attribute. If both are set and both are the same,
assume non p2p, otherwise add as p2p.
MFC after: 2 weeks
Reported by: jkim
up.
This change fixes the case when the first address added to the interface
is IPv6 GU address. Before the change, IPv6 LL addition was not
triggered.
PR: 271661
MFC after: 2 weeks
This provides compatibility with ifioctl() version of SIOCAIFADDR.
This change is temporary until the IPv4/IPv6 address handling code
is moved to netinet[6].
It is primarily used for adding scopeid to the IPv6 link-local
sockaddrs. Having proper sockaddrs after parsing minimises the
possibility of human mistake when using the parsing.
MFC after: 2 weeks
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
Reduce the default log level for netlink to LOG_INFO. This removes a
number of messages such as
> [nl_iface] dump_sa: unsupported family: 0, skipping
or
> [nl_iface] get_operstate_ether: error calling SIOCGIFMEDIA on vlan0: 22
that are useful for debugging, but not for most users.
Reviewed by: melifaro
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D40062
* Fill in IFA_CACHEINFO with prefix lifetime data
* Map IPv6 IN6_IFF_ flags to Netlink IFA_F_ flags
* Store original ia6_flags in the FreeBSD-specific IFAF_FLAGS field
MFC after: 2 weeks
* Move LLT_ADDEDPROXY handling into lltable_link_entry() to
reduct duplication
* Use standard lltable_delete_addr() for entry deletion
* Add (forgotten) call to llt_post_resolved handler after
adding the entry via netlink.
MFC after: 2 weeks
This change adds netlink create/modify/dump interfaces to the `if_clone.c`.
The previous attempt with storing the logic inside `netlink/route/iface_drivers.c`
did not quite work, as, for example, dumping interface-specific state
(like vlan id or vlan parent) required some peeking into the private interfaces.
The new interfaces are added in a compatible way - callers don't have to do anything
unless they are extended with Netlink.
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D39032
MFC after: 1 month
Use route destination sockaddr when the gateway is eiter AF_LINK or
has the different family (IPv4 over IPv6). This change ensures
the nexthop IFA has the same family as the destination.
Reported by: Dmitriy Smirnov <fox@sage.su>
Tested by: Dmitriy Smirnov <fox@sage.su>
MFC after: 3 days
Use already-existing RTM_F_PREFIX rtm_flag to indicate that the
request assumes exact-prefix lookup instead of the
longest-prefix-match.
MFC after: 2 weeks
Use full-featured ifa_ifwithroute() to guess route ifa/ifp
instead of ifa_ifwithnet(). This change makes the route addition
logic closer to the rt_getifa_fib() used by rtsock.
Reported by: glebius
Tested by: glebius
Differential Revision: https://reviews.freebsd.org/D39335
MFC after: 2 weeks
This change does the following:
Base Netlink KPIs (ability to register the family, parse and/or
write a Netlink message) are always present in the kernel. Specifically,
* Implementation of genetlink family/group registration/removal,
some base accessors (netlink_generic_kpi.c, 260 LoC) are compiled in
unconditionally.
* Basic TLV parser functions (netlink_message_parser.c, 507 LoC) are
compiled in unconditionally.
* Glue functions (netlink<>rtsock), malloc/core sysctl definitions
(netlink_glue.c, 259 LoC) are compiled in unconditionally.
* The rest of the KPI _functions_ are defined in the netlink_glue.c,
but their implementation calls a pointer to either the stub function
or the actual function, depending on whether the module is loaded or not.
This approach allows to have only 1k LoC out of ~3.7k LoC (current
sys/netlink implementation) in the kernel, which will not grow further.
It also allows for the generic netlink kernel customers to load
successfully without requiring Netlink module and operate correctly
once Netlink module is loaded.
Reviewed by: imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D39269