freebsd-src/sys/netinet
Robert Watson 52cd27cb58 Implement a CPU-affine TCP and UDP connection lookup data structure,
struct inpcbgroup.  pcbgroups, or "connection groups", supplement the
existing inpcbinfo connection hash table, which when pcbgroups are
enabled, might now be thought of more usefully as a per-protocol
4-tuple reservation table.

Connections are assigned to connection groups base on a hash of their
4-tuple; wildcard sockets require special handling, and are members
of all connection groups.  During a connection lookup, a
per-connection group lock is employed rather than the global pcbinfo
lock.  By aligning connection groups with input path processing,
connection groups take on an effective CPU affinity, especially when
aligned with RSS work placement (see a forthcoming commit for
details).  This eliminates cache line migration associated with
global, protocol-layer data structures in steady state TCP and UDP
processing (with the exception of protocol-layer statistics; further
commit to follow).

Elements of this approach were inspired by Willman, Rixner, and Cox's
2006 USENIX paper, "An Evaluation of Network Stack Parallelization
Strategies in Modern Operating Systems".  However, there are also
significant differences: we maintain the inpcb lock, rather than using
the connection group lock for per-connection state.

Likewise, the focus of this implementation is alignment with NIC
packet distribution strategies such as RSS, rather than pure software
strategies.  Despite that focus, software distribution is supported
through the parallel netisr implementation, and works well in
configurations where the number of hardware threads is greater than
the number of NIC input queues, such as in the RMI XLR threaded MIPS
architecture.

Another important difference is the continued maintenance of existing
hash tables as "reservation tables" -- these are useful both to
distinguish the resource allocation aspect of protocol name management
and the more common-case lookup aspect.  In configurations where
connection tables are aligned with hardware hashes, it is desirable to
use the traditional lookup tables for loopback or encapsulated traffic
rather than take the expense of hardware hashes that are hard to
implement efficiently in software (such as RSS Toeplitz).

Connection group support is enabled by compiling "options PCBGROUP"
into your kernel configuration; for the time being, this is an
experimental feature, and hence is not enabled by default.

Subject to the limited MFCability of change dependencies in inpcb,
and its change to the inpcbinfo init function signature, this change
in principle could be merged to FreeBSD 8.x.

Reviewed by:    bz
Sponsored by:   Juniper Networks, Inc.
2011-06-06 12:55:02 +00:00
..
cc Staticize malloc types. 2011-04-13 11:28:46 +00:00
ipfw Implement a CPU-affine TCP and UDP connection lookup data structure, 2011-06-06 12:55:02 +00:00
khelp Use the full and proper company name for Swinburne University of Technology 2011-04-12 08:13:18 +00:00
libalias LibAliasInit() should allocate memory with M_WAITOK flag. Modify it 2011-04-18 20:07:08 +00:00
accf_data.c Rework socket upcalls to close some races with setup/teardown of upcalls. 2009-06-01 21:17:03 +00:00
accf_dns.c Rework socket upcalls to close some races with setup/teardown of upcalls. 2009-06-01 21:17:03 +00:00
accf_http.c Rework socket upcalls to close some races with setup/teardown of upcalls. 2009-06-01 21:17:03 +00:00
cc.h Use the full and proper company name for Swinburne University of Technology 2011-04-12 08:13:18 +00:00
icmp6.h - Implement RDNSS and DNSSL options (RFC 6106, IPv6 Router Advertisement 2011-06-06 03:06:43 +00:00
icmp_var.h Many network stack subsystems use a single global data structure to hold 2009-08-02 19:43:32 +00:00
if_atm.c Bring back (most of) NATM to avoid further bitrot after r186119. 2010-12-15 22:58:45 +00:00
if_atm.h
if_ether.c - Merge changes to the base system to support OFED. These include 2011-03-21 09:40:01 +00:00
if_ether.h Add arp_update_event. This replaces route_arp_update_event, which 2009-09-08 21:17:17 +00:00
igmp.c After some off-list discussion, revert a number of changes to the 2010-11-22 19:32:54 +00:00
igmp.h These are no longer referenced in the tree, so can be safely removed. 2009-06-10 18:12:15 +00:00
igmp_var.h
in.c Supply the LLE_STATIC flag bit to in_ifscurb() when scrubbing interface 2011-05-29 02:21:35 +00:00
in.h Make the RPC specific __rpc_inet_ntop() and __rpc_inet_pton() general 2010-09-24 15:01:45 +00:00
in_cksum.c
in_debug.c Add initial inet DDB support for show in_ifaddr and show sin commands which 2010-10-24 22:02:36 +00:00
in_gif.c MFP4: @176978-176982, 176984, 176990-176994, 177441 2010-04-29 11:52:42 +00:00
in_gif.h
in_mcast.c Fix a few issues related to the legacy 4.4 BSD multicast APIs. 2010-04-10 12:05:31 +00:00
in_pcb.c Implement a CPU-affine TCP and UDP connection lookup data structure, 2011-06-06 12:55:02 +00:00
in_pcb.h Implement a CPU-affine TCP and UDP connection lookup data structure, 2011-06-06 12:55:02 +00:00
in_pcbgroup.c Implement a CPU-affine TCP and UDP connection lookup data structure, 2011-06-06 12:55:02 +00:00
in_proto.c Add FEATURE() definitions for IPv4 and IPv6 so that we can use 2011-05-25 00:34:25 +00:00
in_rmx.c After some off-list discussion, revert a number of changes to the 2010-11-22 19:32:54 +00:00
in_systm.h
in_var.h The statically configured (permanent) ARP entries are removed when an 2011-05-20 19:12:20 +00:00
ip.h use u_char instead of u_int for short bitfields. 2010-02-01 14:13:44 +00:00
ip6.h Fix more continuous/contiguous typos (cf. r215955) 2010-11-27 21:51:39 +00:00
ip_carp.c Make various (pseudo) interfaces compile without INET in the kernel 2011-04-27 19:30:44 +00:00
ip_carp.h Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with 2010-08-11 20:18:19 +00:00
ip_divert.c Implement a CPU-affine TCP and UDP connection lookup data structure, 2011-06-06 12:55:02 +00:00
ip_divert.h Various cleanup done in ipfw3-head branch including: 2010-01-04 19:01:22 +00:00
ip_dummynet.h whitespace fixes (trailing whitespace, bad indentation 2010-04-19 16:17:30 +00:00
ip_ecn.c
ip_ecn.h
ip_encap.c (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument. 2009-12-28 22:56:30 +00:00
ip_encap.h
ip_fastfwd.c Use correct field to track statistics counting error as bad header length. 2010-12-05 01:09:48 +00:00
ip_fw.h - Rewrite functions that copyin/out NAT configuration, so that they 2011-04-19 15:06:33 +00:00
ip_gre.c The NetBSD Foundation has granted permission to remove clause 3 and 4 from 2010-03-01 17:05:46 +00:00
ip_gre.h The NetBSD Foundation has granted permission to remove clause 3 and 4 from 2010-03-01 17:05:46 +00:00
ip_icmp.c MfP4 CH=192029: 2011-04-27 19:36:35 +00:00
ip_icmp.h MFP4: @176978-176982, 176984, 176990-176994, 177441 2010-04-29 11:52:42 +00:00
ip_id.c
ip_input.c MfP4 CH=192004: 2011-04-27 19:32:27 +00:00
ip_ipsec.c After some off-list discussion, revert a number of changes to the 2010-11-22 19:32:54 +00:00
ip_ipsec.h Remove ifdefed out part of code, which seems to have originated a decade ago 2009-11-09 19:53:34 +00:00
ip_mroute.c After some off-list discussion, revert a number of changes to the 2010-11-22 19:32:54 +00:00
ip_mroute.h Virtualize the IPv4 multicast routing code. 2010-06-02 15:44:43 +00:00
ip_options.c Use ifa_ifwithaddr_check() rather than ifa_ifwithaddr() as we are not 2010-10-14 12:32:49 +00:00
ip_options.h
ip_output.c The mbuf_frag_size always was and is file local and not queried from base 2011-04-14 09:47:09 +00:00
ip_var.h MFp4 CH=191470: 2011-04-20 08:00:29 +00:00
pim.h
pim_var.h Virtualize the IPv4 multicast routing code. 2010-06-02 15:44:43 +00:00
raw_ip.c Implement a CPU-affine TCP and UDP connection lookup data structure, 2011-06-06 12:55:02 +00:00
sctp.h Implement Resource Pooling V2 and an MPTCP like congestion 2011-05-04 21:27:05 +00:00
sctp_asconf.c Remove code with any effect. 2011-05-03 20:34:02 +00:00
sctp_asconf.h 1) Typo correction in comments and one spacing change. 2011-02-05 12:12:51 +00:00
sctp_auth.c Fix a locking issue showing up on Mac OS X when subscribing to 2011-05-08 09:11:59 +00:00
sctp_auth.h Fix a locking issue showing up on Mac OS X when subscribing to 2011-05-08 09:11:59 +00:00
sctp_bsd_addr.c Improve compilation of SCTP code without INET support. 2011-04-30 11:18:16 +00:00
sctp_bsd_addr.h 1) Typo correction in comments and one spacing change. 2011-02-05 12:12:51 +00:00
sctp_cc_functions.c Implement Resource Pooling V2 and an MPTCP like congestion 2011-05-04 21:27:05 +00:00
sctp_constants.h Tunes and fixes the new DC-CC to seem to hit the 2011-03-08 11:58:25 +00:00
sctp_crc32.c 1) Typo correction in comments and one spacing change. 2011-02-05 12:12:51 +00:00
sctp_crc32.h 1) Typo correction in comments and one spacing change. 2011-02-05 12:12:51 +00:00
sctp_dtrace_declare.h Tunes and fixes the new DC-CC to seem to hit the 2011-03-08 11:58:25 +00:00
sctp_dtrace_define.h Tunes and fixes the new DC-CC to seem to hit the 2011-03-08 11:58:25 +00:00
sctp_header.h 1) Typo correction in comments and one spacing change. 2011-02-05 12:12:51 +00:00
sctp_indata.c Get rid of unused functions. 2011-05-29 18:41:06 +00:00
sctp_indata.h Get rid of unused functions. 2011-05-29 18:41:06 +00:00
sctp_input.c Fix a locking issue showing up on Mac OS X when subscribing to 2011-05-08 09:11:59 +00:00
sctp_input.h Fix a locking issue showing up on Mac OS X when subscribing to 2011-05-08 09:11:59 +00:00
sctp_lock_bsd.h 1) Typo correction in comments and one spacing change. 2011-02-05 12:12:51 +00:00
sctp_os.h 1) Typo correction in comments and one spacing change. 2011-02-05 12:12:51 +00:00
sctp_os_bsd.h Remove some leftover debug code. 2011-04-30 11:22:30 +00:00
sctp_output.c Unbreak INET-less build. 2011-05-18 19:49:39 +00:00
sctp_output.h Fix the source address selection for boundall sockets 2011-05-14 18:22:14 +00:00
sctp_pcb.c Fix a locking issue showing up on Mac OS X when subscribing to 2011-05-08 09:11:59 +00:00
sctp_pcb.h 1) Typo correction in comments and one spacing change. 2011-02-05 12:12:51 +00:00
sctp_peeloff.c Mfp4 CH=177274,177280,177284-177285,177297,177324-177325 2011-02-16 21:29:13 +00:00
sctp_peeloff.h 1) Typo correction in comments and one spacing change. 2011-02-05 12:12:51 +00:00
sctp_ss_functions.c Fix several bugs related to stream scheduling. 2011-02-13 13:53:28 +00:00
sctp_structs.h Tunes and fixes the new DC-CC to seem to hit the 2011-03-08 11:58:25 +00:00
sctp_sysctl.c Improve compilation of SCTP code without INET support. 2011-04-30 11:18:16 +00:00
sctp_sysctl.h Implement Resource Pooling V2 and an MPTCP like congestion 2011-05-04 21:27:05 +00:00
sctp_timer.c Fix a locking issue showing up on Mac OS X when subscribing to 2011-05-08 09:11:59 +00:00
sctp_timer.h 1) Typo correction in comments and one spacing change. 2011-02-05 12:12:51 +00:00
sctp_uio.h Improvements to CC modules: 2011-02-26 15:23:46 +00:00
sctp_usrreq.c Copy out the mtu when calling getsockopt() with SCTP_GET_PEER_ADDR_INFO. 2011-05-17 15:57:31 +00:00
sctp_var.h Fix a locking issue showing up on Mac OS X when subscribing to 2011-05-08 09:11:59 +00:00
sctputil.c Get rid of unused functions. 2011-05-29 18:41:06 +00:00
sctputil.h Get rid of unused functions. 2011-05-29 18:41:06 +00:00
siftr.c Decompose the current single inpcbinfo lock into two locks: 2011-05-30 09:43:55 +00:00
tcp.h Add new, per connection, statistics for TCP, including: 2010-11-17 18:55:12 +00:00
tcp_debug.c Remove the "The option TCPDEBUG requires option INET." requirement. 2009-06-10 10:39:41 +00:00
tcp_debug.h
tcp_fsm.h
tcp_hostcache.c sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. 2011-01-12 19:53:50 +00:00
tcp_hostcache.h
tcp_input.c Add _mbuf() variants of various inpcb-related interfaces, including lookup, 2011-06-04 16:33:06 +00:00
tcp_lro.c Port of the LRO fix from mxge driver to the generic 2011-04-07 21:20:26 +00:00
tcp_lro.h Trim extra spaces before tabs. 2011-01-07 21:40:34 +00:00
tcp_offload.c Merge the remainder of kern_vimage.c and vimage.h into vnet.c and 2009-08-01 19:26:27 +00:00
tcp_offload.h Fix typos - remove duplicate "the". 2011-02-21 09:01:34 +00:00
tcp_output.c Handle a rare edge case with nearly full TCP receive buffers. If a TCP 2011-05-02 21:05:52 +00:00
tcp_reass.c Specify a CTLTYPE_FOO so that a future sysctl(8) change does not need 2011-01-18 21:14:13 +00:00
tcp_sack.c Covers values if (BYTES_THIS_ACK(tp, th) / tp->t_maxseg) value is from 2011-03-28 19:03:56 +00:00
tcp_seq.h
tcp_subr.c Implement a CPU-affine TCP and UDP connection lookup data structure, 2011-06-06 12:55:02 +00:00
tcp_syncache.c Implement a CPU-affine TCP and UDP connection lookup data structure, 2011-06-06 12:55:02 +00:00
tcp_syncache.h Trim extra spaces before tabs. 2011-01-07 21:40:34 +00:00
tcp_timer.c Decompose the current single inpcbinfo lock into two locks: 2011-05-30 09:43:55 +00:00
tcp_timer.h Remove the TCP inflight bandwidth limiter as announced in r211315 2010-09-16 21:06:45 +00:00
tcp_timewait.c Oops, fix order of sequence numbers in KASSERT()'s to catch negative 2011-05-14 14:41:40 +00:00
tcp_usrreq.c Do not leak the pcbinfohash lock in the case where in6_pcbladdr() returns 2011-06-02 10:21:05 +00:00
tcp_var.h TCP reuses t_rxtshift to determine the backoff timer used for both the 2011-04-29 15:40:12 +00:00
tcpip.h
toedev.h
udp.h Trim extra spaces before tabs. 2011-01-07 21:40:34 +00:00
udp_usrreq.c Implement a CPU-affine TCP and UDP connection lookup data structure, 2011-06-06 12:55:02 +00:00
udp_var.h Trim extra spaces before tabs. 2011-01-07 21:40:34 +00:00