Commit graph

21 commits

Author SHA1 Message Date
Gleb Smirnoff f75d7fac10 netlink: avoid putting empty mbufs on the socket queue
When processing incoming Netlink messages in nl_process_nbuf() kernel
always allocates a writer with a buffer to put generated reply to.
However, certain messages aren't replied.  That makes nlmsg_flush()
to put an empty buffer to the socket.  Avoid doing that because avoiding
is much easier than dealing with empty buffers on the receiver side.
2024-01-10 20:51:53 -08:00
Gleb Smirnoff 09fa78d438 netlink: fix regression with group writers
Refactoring of argument list to nl_send_one() led to derefercing
wrong union member.  Rename nl_send_one() to a more generic name,
isolate anew nl_send_one() as the callback only for the normal
writer and provide correct argument to nl_send() from nl_send_group().

Fixes:	ff5ad900d2
2024-01-09 13:01:28 -08:00
Gleb Smirnoff 17083b94a9 netlink: use protocol specific receive buffer
Implement Netlink socket receive buffer as a simple TAILQ of nl_buf's,
same part of struct sockbuf that is used for send buffer already.
This shaves a lot of code and a lot of extra processing.  The pcb rids
of the I/O queues as the socket buffer is exactly the queue.  The
message writer is simplified a lot, as we now always deal with linear
buf.  Notion of different buffer types goes away as way as different
kinds of writers.  The only things remaining are: a socket writer and
a group writer.
The impact on the network stack is that we no longer use mbufs, so
a workaround from d187154750 disappears.

Note on message throttling.  Now the taskqueue throttling mechanism
needs to look at both socket buffers protected by their respective
locks and on flags in the pcb that are protected by the pcb lock.
There is definitely some room for optimization, but this changes tries
to preserve as much as possible.

Note on new nl_soreceive().  It emulates soreceive_generic().  It
must undergo further optimization, see large comment put in there.

Note on tests/sys/netlink/test_netlink_message_writer.py. This test
boiled down almost to nothing with mbufs removed.  However, I left
it with minimal functionality (it basically checks that allocating N
bytes we get N bytes) as it is one of not so many examples of ktest
framework that allows to test KPIs with python.

Note on Linux support. It got much simplier: Netlink message writer
loses notion of Linux support lifetime, it is same regardless of
process ABI.  On socket write from Linux process we perform
conversion immediately in nl_receive_message() and on an output
conversion to Linux happens in in nl_send_one(). XXX: both
conversions use M_NOWAIT allocation, which used to be the case
before this change, too.

Reviewed by:		melifaro
Differential Revision:	https://reviews.freebsd.org/D42524
2024-01-02 13:04:01 -08:00
Gleb Smirnoff 67d9023f07 netlink: uninline some KPI functions that work with struct nl_writer
These functions work with a buffer embedded into nl_writer, which
is going to go opaque with upcoming changes.  Make them private to
the netlink module.  No functional change intended.

Reviewed by:		melifaro
Differential Revision:	https://reviews.freebsd.org/D42523
2024-01-02 13:03:40 -08:00
Warner Losh fdafd315ad sys: Automated cleanup of cdefs and other formatting
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by:		Netflix
2023-11-26 22:24:00 -07:00
Kristof Provost ab393e9548 netlink: move NETLINK define to opt_global.h
Move the NETLINK define into opt_global.h so we can rely on it being
set correctly, without having to remember to include opt_netlink.h.
This ensures that the NETLINK define is correctly set. If not we
may end up with unloadable modules, due to missing symbols (such as
nlmsg_get_group_writer).

PR:		274306
Reviewed by:	imp, markj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D42179
2023-10-13 09:23:47 +02:00
Warner Losh 685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Alexander V. Chernikov c1839039b1 netlink: use netlink mbufs in the mbuf chains.
Continue D40356 and switch the remaining parts of mbuf-related
code to the Netlink mbufs.

Reviewed By: gallatin
Differential Revision: https://reviews.freebsd.org/D40368
MFC after:	2 weeks
2023-06-02 13:14:20 +00:00
Alexander V. Chernikov d187154750 netlink: use custom uma zone for the mbuf storage.
Netlink communicates with userland via sockets, utilising
 MCLBYTES-sized mbufs to append data to the socket buffers.
These mbufs are never transmitted via logical or physical network.

It may be possible that the 2k mbuf zone is temporary exhausted
 due to the DDoS-style traffic, leading to Netlink failure to
 respond to the requests.

To address it, this change introduces a custom Netlink-specific
 zone for the mbuf storage. It has the following benefits:
* no precious memory from UMA_ZONE_CONTIG zones is utilized for Netlink
* Netlink becomes (more) independent from the traffic spikes and
 other related network "corner" conditions.
* Netlink allocations are now isolated within a specific zone, making it
 easier to track Netlink mbuf usage and attribute mbufs.

Reviewed by:	gallatin, adrian
Differential Revision: https://reviews.freebsd.org/D40356
MFC after:	2 weeks
2023-06-01 06:43:39 +00:00
Warner Losh 4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Kristof Provost fa554de774 netlink: reduce default log levels
Reduce the default log level for netlink to LOG_INFO. This removes a
number of messages such as

> [nl_iface] dump_sa: unsupported family: 0, skipping
or
> [nl_iface] get_operstate_ether: error calling SIOCGIFMEDIA on vlan0: 22

that are useful for debugging, but not for most users.

Reviewed by:	melifaro
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D40062
2023-05-12 14:32:57 +02:00
Alexander V. Chernikov 9f324d8ac2 netlink: make netlink work correctly on CHERI.
Current Netlink message writer code relies on executing callbacks
 with arbitrary data (pointer or integer) to flush the completed
 messages.
This arbitrary data is stored as a union of { void *, uint64_t }.
At some stage, the message flushing code copied this data, using
 direct uint64_t assignment instead of copying the union. It lead
 to failure on CHERI, as sizeof(pointer) == 16 there.

Fix the code by making union non-anonymous and copying it entirely.

Reviewed by:	br, jhb, jrtc27
Differential Revision: https://reviews.freebsd.org/D39557
MFC after:	2 weeks
2023-04-14 16:33:43 +00:00
Alexander V. Chernikov 19e43c163c netlink: add netlink KPI to the kernel by default
This change does the following:

Base Netlink KPIs (ability to register the family, parse and/or
 write a Netlink message) are always present in the kernel. Specifically,
* Implementation of genetlink family/group registration/removal,
  some base accessors (netlink_generic_kpi.c, 260 LoC) are compiled in
  unconditionally.
* Basic TLV parser functions (netlink_message_parser.c, 507 LoC) are
  compiled in unconditionally.
* Glue functions (netlink<>rtsock), malloc/core sysctl definitions
 (netlink_glue.c, 259 LoC) are compiled in unconditionally.
* The rest of the KPI _functions_ are defined in the netlink_glue.c,
 but their implementation calls a pointer to either the stub function
 or the actual function, depending on whether the module is loaded or not.

This approach allows to have only 1k LoC out of ~3.7k LoC (current
 sys/netlink implementation) in the kernel, which will not grow further.
It also allows for the generic netlink kernel customers to load
 successfully without requiring Netlink module and operate correctly
 once Netlink module is loaded.

Reviewed by:	imp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D39269
2023-03-27 13:55:44 +00:00
Alexander V. Chernikov eccccd657f netlink: make nlattr_add_in[6]_addr inline
MFC after:	2 weeks
2023-03-27 11:53:34 +00:00
Kristof Provost 137818006d carp: support unicast
Allow users to configure the address to send carp messages to. This
allows carp to be used in unicast mode, which is useful in certain
virtual configurations (e.g. AWS, VMWare ESXi, ...)

Reviewed by:	melifaro
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D38940
2023-03-20 14:37:09 +01:00
Alexander V. Chernikov 25c2dd2f2c netlink: return optional metadata with the operation result.
Some operations like interface creation may need to return metadata
 - in this case, interface name - back to the caller if the operation
 is successful.
This change implements attaching an `NLMSGERR_ATTR_COOKIE` nla to the
operation reply message via `nlmsg_report_cookie()`.
Additionally, on successful interface creation, interface index and
 interface name are returned in the `IFLA_NEW_IFINDEX` and `IFLA_IFNAME
 TLVs, encapsulated in the `NLMSGERR_ATTR_COOKIE`.

Reviewed By: pauamma
Differential Revision: https://reviews.freebsd.org/D38283
MFC after:	1 week
2023-02-09 15:30:00 +00:00
Mark Johnston 35472cb60a netlink: Fix indentation in netlink_message_writer.c
This file is indented with a mixture of tabs and spaces.  No functional
change intended.

Reviewed by:	melifaro
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D38100
2023-01-17 09:37:33 -05:00
Mark Johnston e262610007 netlink: Make the writers function table static and const
No functional change intended.

Reviewed by:	melifaro
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D38099
2023-01-17 09:37:21 -05:00
Mark Johnston d91be0f121 netlink: Zero-initialize mbuf messages
Some users of nlmsg_reserve_object() and nlmsg_reserve_data() are not
careful to fully initialize pad and reserved fields, allowing
uninitialized bytes to leak to userspace.  For example, dump_nhgrp()
doesn't set nhm->resvd = 0.

Meanwhile, nlmsg_get_ns_buf() and nlmsg_get_ns_lbuf() zero-initialize
the buffer, so nlmsg_get_ns_mbuf() is inconsistent.  Let's just make
them all behave the same here.

Reported by:	KMSAN
Reviewed by:	melifaro
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D38098
2023-01-17 09:36:54 -05:00
Alexander V. Chernikov f4d3aa7490 netlink: suppress sending NLMSG_ERROR if NLMSG_DONE is already sent
Netlink has a confirmation/error reporting mechanism for the sent
messages. Kernel explicitly acks each messages if requested (NLM_F_ACK)
 or if message processing results in an error.
Similarly, for multipart messages - typically dumps, where each message
 represents a single object like an interface or a route - another
 message, NLMSG_DONE is used to indicate the end of dump and the
 resulting status.
As a result, successfull dump ends with both NLMSG_DONE and NLMSG_ERROR
 messages.
RFC 3549 does not say anything specific about such case.
Linux adopted an optimisation which suppresses NLMSG_ERROR message
 when NLMSG_DONE is already sent. Certain libraries/applications like
 libnl depends on such behavior.

Suppress sending NLMSG_ERROR if NLMSG_DONE is already sent, by
 setting newly-added 'suppress_ack' flag in the writer and checking
 this flag when generating ack.

This change restores libnl compatibility.

Before:
```
~ nl-link-list
Error: Unable to allocate link cache: Message sequence number mismatch
````

After:
```
~ nl-link-list
vtnet0 ether 52:54:00:14:e3:19 <broadcast,multicast,up,running>
lo0 ieee1394 <loopback,multicast,up,running>
```

Reviewed by:	bapt,pauamma
Tested by:	bapt
Differential Revision: https://reviews.freebsd.org/D37565
2022-11-30 13:24:38 +00:00
Alexander V. Chernikov 7e5bf68495 netlink: add netlink support
Netlinks is a communication protocol currently used in Linux kernel to modify,
 read and subscribe for nearly all networking state. Interfaces, addresses, routes,
 firewall, fibs, vnets, etc are controlled via netlink.
It is async, TLV-based protocol, providing 1-1 and 1-many communications.

The current implementation supports the subset of NETLINK_ROUTE
family. To be more specific, the following is supported:
* Dumps:
 - routes
 - nexthops / nexthop groups
 - interfaces
 - interface addresses
 - neighbors (arp/ndp)
* Notifications:
 - interface arrival/departure
 - interface address arrival/departure
 - route addition/deletion
* Modifications:
 - adding/deleting routes
 - adding/deleting nexthops/nexthops groups
 - adding/deleting neghbors
 - adding/deleting interfaces (basic support only)
* Rtsock interaction
 - route events are bridged both ways

The implementation also supports the NETLINK_GENERIC family framework.

Implementation notes:
Netlink is implemented via loadable/unloadable kernel module,
 not touching many kernel parts.
Each netlink socket uses dedicated taskqueue to support async operations
 that can sleep, such as interface creation. All message processing is
 performed within these taskqueues.

Compatibility:
Most of the Netlink data models specified above maps to FreeBSD concepts
 nicely. Unmodified ip(8) binary correctly works with
interfaces, addresses, routes, nexthops and nexthop groups. Some
software such as net/bird require header-only modifications to compile
and work with FreeBSD netlink.

Reviewed by:	imp
Differential Revision: https://reviews.freebsd.org/D36002
MFC after:	2 months
2022-10-01 14:15:35 +00:00