system/freebsd-src

mirror of https://github.com/freebsd/freebsd-src synced 2024-10-02 06:35:36 +00:00

Author	SHA1	Message	Date
Michael Tuexen	fe1274ee39	Fix race when accepting TCP connections. When expanding a SYN-cache entry to a socket/inp a two step approach was taken: 1) The local address was filled in, then the inp was added to the hash table. 2) The remote address was filled in and the inp was relocated in the hash table. Before the epoch changes, a write lock was held when this happens and the code looking up entries was holding a corresponding read lock. Since the read lock is gone away after the introduction of the epochs, the half populated inp was found during lookup. This resulted in processing TCP segments in the context of the wrong TCP connection. This patch changes the above procedure in a way that the inp is fully populated before inserted into the hash table. Thanks to Paul <devgs@ukr.net> for reporting the issue on the net@ mailing list and for testing the patch! Reviewed by: rrs@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D22971	2020-01-12 17:52:32 +00:00
Bjoern A. Zeeb	c6feea3b89	nd6_rtr: constantly use __func__ for nd6log() Over time one or two hard coded function names did not match the actual function anymore. Consistently use __func__ for nd6log() calls and re-wrap/re-format some messages for consitency. MFC after: 2 weeks	2020-01-12 17:41:09 +00:00
Bjoern A. Zeeb	25ebfe3350	nd6_rtr: make nd6_prefix_onlink() static nd6_prefix_onlink() is not used anywhere outside nd6_rtr.c. Stop exporting it and make it file local static.	2020-01-12 16:58:21 +00:00
Bjoern A. Zeeb	e1891232fc	in6_mcast: make in6_joingroup_locked() static in6_joingroup_locked() is only used file-local. No need to export it hance make it static.	2020-01-11 18:55:12 +00:00
Alexander V. Chernikov	ead85fe415	Add fibnum, family and vnet pointer to each rib head. Having metadata such as fibnum or vnet in the struct rib_head is handy as it eases building functionality in the routing space. This change is required to properly bring back route redirect support. Reviewed by: bz MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D23047	2020-01-09 17:21:00 +00:00
Bjoern A. Zeeb	334fc5822b	vnet: virtualise more network stack sysctls. Virtualise tcp_always_keepalive, TCP and UDP log_in_vain. All three are set in the netoptions startup script, which we would love to run for VNETs as well [1]. While virtualising the log_in_vain sysctls seems pointles at first for as long as the kernel message buffer is not virtualised, it at least allows an administrator to debug the base system or an individual jail if needed without turning the logging on for all jails running on a system. PR: 243193 [1] MFC after: 2 weeks	2020-01-08 23:30:26 +00:00
Alexander V. Chernikov	e02d3fe70c	Fix rtsock route message generation for interface addresses. Reviewed by: olivier MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D22974	2020-01-07 21:16:30 +00:00
Gleb Smirnoff	e00ee1a9f4	In r343631 error code for a packet blocked by a firewall was changed from EACCES to EPERM. This change was not intentional, so fix that. Return EACCESS if a firewall forbids sending. Noticed by: ae	2020-01-01 17:32:20 +00:00
Alexander V. Chernikov	bdb214a4a4	Remove useless code from in6_rmx.c The code in questions walks IPv6 tree every 60 seconds and looks into the routes with non-zero expiration time (typically, redirected routes). For each such route it sets RTF_PROBEMTU flag at the expiration time. No other part of the kernel checks for RTF_PROBEMTU flag. RTF_PROBEMTU was defined 21 years ago, 30 Jun 1999, as RTF_PROTO1. RTF_PROTO1 is a de-facto standard indication of a route installed by a routing daemon for a last decade. Reviewed by: bz, ae MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D22865	2019-12-18 22:10:56 +00:00
Hans Petter Selasky	a4c5668d12	Leave multicast group before reaping and committing state for both IPv4 and IPv6. This fixes a regression issue after r349369. When trying to exit a multicast group before closing the socket, a multicast leave packet should be sent. Differential Revision: https://reviews.freebsd.org/D22848 PR: 242677 Reviewed by: bz (network) Tested by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com> MFC after: 1 week Sponsored by: Mellanox Technologies	2019-12-18 12:06:34 +00:00
Bjoern A. Zeeb	74ff87cd16	Update comment. Update the comment related to SIIT and v4mapped addresses being rejected by us when coming from the wire given we have supported IPv6-only kernels for a few years now. See also draft-itojun-v6ops-v4mapped-harmful. Suggested by: melifaro MFC after: 2 weeks	2019-12-06 16:53:42 +00:00
Bjoern A. Zeeb	b745e7623c	ip6_input: remove redundant v4mapped check In ip6_input() we apply the same v4mapped address check twice. The only case which skipps the first one is M_FASTFWD_OURS which should have passed the check on the firstinput pass and passed the firewall. Remove the 2nd redundant check. Reviewed by: kp, melifaro MFC after: 2 weeks Sponsored by: Netflix (originally) Differential Revision: https://reviews.freebsd.org/D22462	2019-12-06 16:42:58 +00:00
Kristof Provost	200424235e	Remove useless NULL check Coverity points out that we've already dereferenced m by the time we check, so there's no reason to keep the check. Moreover, it's safe to pass NULL to m_freem() anyway. CID: 1019092	2019-12-05 16:50:54 +00:00
Bjoern A. Zeeb	0700d2c3f0	Make icmp6_reflect() static. icmp6_reflect() is not used anywhere outside icmp6.c, no reason to export it. Sponsored by: Netflix	2019-12-03 14:46:38 +00:00
Hans Petter Selasky	5b64b824b9	Use refcount from "in_joingroup_locked()" when joining multicast groups. Do not acquire additional references. This makes the IPv4 IGMP code in line with the IPv6 MLD code. Background: The IPv4 multicast code puts an extra reference on the in_multi struct when joining groups. This becomes visible when using daemons like igmpproxy from ports, that multicast entries do not disappear from the output of ifmcstat(8) when multicast streams are disconnected. This fixes a regression issue after r349762. While at it factor the ip_mfilter_insert() and ip6_mfilter_insert() calls to avoid repeated "is_new" check. Differential Revision: https://reviews.freebsd.org/D22595 Tested by: Guido van Rooij <guido@gvr.org> Reviewed by: rgrimes (network) MFC after: 1 week Sponsored by: Mellanox Technologies	2019-12-03 08:46:59 +00:00
Michael Tuexen	e25b0dab9a	Update the hostcache also for PTB messages received for SCTP/IPv6. The corresponding code for SCTP/IPv4 was introduced in https://svnweb.freebsd.org/base?view=revision&revision=317597 Submitted by: Julius Flohr MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D22605	2019-12-01 16:14:44 +00:00
Bjoern A. Zeeb	a4adf6cc65	Fix m_pullup() problem after removing PULLDOWN_TESTs and KAME EXT_*macros. r354748-354750 replaced the KAME macros with m_pulldown() calls. Contrary to the rest of the network stack m_len checks before m_pulldown() were not put in placed (see r354748). Put these m_len checks in place for now (to go along with the style of the network stack since the initial commits). These are not put in for performance but to avoid an error scenario (even though it also will help performance at the moment as it avoid allocating an extra mbuf; not because of the unconditional function call). The observed error case went like this: (1) an mbuf with M_EXT arrives and we call m_pullup() unconditionally on it. (2) m_pullup() will call m_get() unless the requested length is larger than MHLEN (in which case it'll m_freem() the perfectly fine mbuf) and migrate the requested length of data and pkthdr into the new mbuf. (3) If m_get() succeeds, a further m_pullup() call going over MHLEN will fail. This was observed with failing auto-configuration as an RA packet of 200 bytes exceeded MHLEN and the m_pullup() called from nd6_ra_input() dropped the mbuf. (Re-)adding the m_len checks before m_pullup() calls avoids this problems with mbufs using external storage for now. MFC after: 3 weeks Sponsored by: Netflix	2019-12-01 00:22:04 +00:00
Ryan Libby	6afe56f9c3	in6_joingroup_locked: need if_addr_lock around in6m_disconnect_locked It looks like the call that requires the lock was introduced in r337866. Reviewed by: hselasky Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20739	2019-11-25 22:25:10 +00:00
Bjoern A. Zeeb	f8d4f9bce9	in6: move include Move the include for sysctl.h out of the middle of the file to the includes at the beginning. This is will make it easier to add new sysctls. No functional changes. MFC after: 3 weeks Sponsored by: Netflix	2019-11-19 21:14:15 +00:00
Bjoern A. Zeeb	3c5018ca10	nd6: sysctl Move the SYSCTL_DECL to the top of the file. Move the sysctl function before SYSCTL_PROC so that we don't need an extra function declaration in the middle of the file. No functional changes. MFC after: 3 weeks Sponsored by: Netflix	2019-11-19 21:08:18 +00:00
Bjoern A. Zeeb	6db6527385	nd6: make nd6_timer_ch static nd6_timer_ch is only used in file local context. There is no need to export it, so make it static. MFC after: 3 weeks Sponsored by: Netflix	2019-11-19 20:54:17 +00:00
Bjoern A. Zeeb	f77a6dbd1e	nd6_rtr: re-sort functions Resort functions within file in a way that they depend on each other as that makes it easier to rework various things. Also allows us to remove file local function declarations. No functional changes. MFC after: 3 weeks Sponsored by: Netflix	2019-11-19 20:34:33 +00:00
Bjoern A. Zeeb	b2b7a4b2ca	mld: fix epoch assertion in6ifa_ifpforlinklocal() asserts the net epoch. The test case from r354832 revealed code paths where we call into the function without having acquired the net epoch first and consequently we hit the assert. This happens in certain MLD states during VNET shutdown and most people normaly not notice this. For correctness acquire the net epoch around calls to mld_v1_transmit_report() in all cases to avoid the assertion firing. MFC after: 2 weeks Sponsored by: Netflix	2019-11-19 14:53:13 +00:00
Bjoern A. Zeeb	32af08ecad	icmpv6: Fix mbuf change in mld After r354748 mld_input() can change the mbuf. The new pointer is never returned to icmp6_input() and when passed to icmp6_rip6_input() the mbuf may no longer valid leading to a panic. Pass a pointer to the mbuf to mld_input() so we can return an updated version in the non-error case. Add a test sending an MLD packet case which will trigger this bug. Pointyhat to: bz Reported by: gallatin, thj MFC After: 2 weeks X-MFC with: r354748 Sponsored by: Netflix	2019-11-18 21:59:47 +00:00
Bjoern A. Zeeb	808c432f62	nd6: retire defrouter_select(), use _fib() variant. Burn bridges and replace the last two calls of defrouter_select() with defrouter_select_fib(). That allows us to retire defrouter_select() and make it more clear in the calling code that it applies to all FIBs. Sponsored by: Netflix	2019-11-16 00:17:35 +00:00
Bjoern A. Zeeb	f592d0c377	nd6_rtr: Pull in the TAILQ_HEAD() as it is not needed outside nd6_rtr.c. Rename the TAILQ_HEAD() struct and the nd_defrouter variable from "nd_" to "nd6_" as they are not part of the RFC 3542 API which uses "ND_". Ideally I'd like to also rename the struct nd_defrouter {} to "nd6_*" but given that is used externally there is more work to do. No functional changes. MFC after: 3 weeks Sponsored by: Netflix	2019-11-16 00:02:36 +00:00
Bjoern A. Zeeb	63abacc204	netinet*: replace IP6_EXTHDR_GET() In a few places we have IP6_EXTHDR_GET() left in upper layer protocols. The IP6_EXTHDR_GET() macro might perform an m_pulldown() in case the data fragment is not contiguous. Convert these last remaining instances into m_pullup()s instead. In CARP, for example, we will a few lines later call m_pullup() anyway, the IPsec code coming from OpenBSD would otherwise have done the m_pullup() and are copying the data a bit later anyway, so pulling it in seems no better or worse. Note: this leaves very few m_pulldown() cases behind in the tree and we might want to consider removing them as well to make mbuf management easier again on a path to variable size mbufs, especially given m_pulldown() still has an issue not re-checking M_WRITEABLE(). Reviewed by: gallatin MFC after: 8 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D22335	2019-11-15 21:44:17 +00:00
Bjoern A. Zeeb	a61b5cfbbf	netinet6: Remove PULLDOWN_TESTs. Remove the KAME introduced PULLDOWN_TESTs which did not even have a compile-time option in sys/conf to turn them on for a custom kernel build. They made the code a lot harder to read or more complicated in a few cases. Convert the IP6_EXTHDR_CHECK() calls into FreeBSD looking code. Rather than throwing the packet away if it would not fit the KAME mbuf expectations, convert the macros to m_pullup() calls. Do not do any extra manual conditional checks upfront as to whether the m_len would suffice (), simply let m_pullup() do its work (incl. an early check). Remove extra m_pullup() calls where earlier in the function or the only caller has already done the pullup. Discussed with: rwatson () Reviewed by: ae MFC after: 8 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D22334	2019-11-15 21:40:40 +00:00
Bjoern A. Zeeb	e20b5bc485	nd6: simplify code We are taking the same actions in both cases of the branch inside the block. Simplify that code as the extra branch is not needed. MFC after: 3 weeks Sponsored by: Netflix	2019-11-15 13:45:38 +00:00
Bjoern A. Zeeb	b3a25d2993	nd6: remove unused structs and defines Remove a collections of unused structs and #defines to make it easier to understand what is actually in use. Sponsored by: Netflix	2019-11-13 14:28:07 +00:00
Bjoern A. Zeeb	d64df9a2b2	nd6: make nd6_alloc() file static nd6_alloc() is a function used only locally. Make it static and no longer export it. Keeps the KPI smaller. Sponsored by: Netflix	2019-11-13 13:53:17 +00:00
Bjoern A. Zeeb	ad675b3279	nd6 defrouter: consolidate nd_defrouter manipulations in nd6_rtr.c Move the nd_defrouter along with the sysctl handler from nd6.c to nd6_rtr.c and make the variable file static. Provide (temporary) new accessor functions for code manipulating nd_defrouter from nd6.c, and stop exporting functions no longer needed outside nd6_rtr.c. This also shuffles a few functions around in nd6_rtr.c without functional changes. Given all nd_defrouter logic is now in one place we can tidy up the code, locking and, and other open items. MFC after: 3 weeks X-MFC: keep exporting the functions Sponsored by: Netflix	2019-11-13 12:05:48 +00:00
Bjoern A. Zeeb	a8fe77d877	netinet: update mp to pass the proper value back In ip6_[direct_]input() we are looping over the extension headers to deal with the next header. We pass a pointer to an mbuf pointer to the handling functions. In certain cases the mbuf can be updated there and we need to pass the new one back. That missing in dest6_input() and route6_input(). In tcp6_input() we should also update it before we call tcp_input(). In addition to that mark the mbuf NULL all the times when we return that we are done with handling the packet and no next header should be checked (IPPROTO_DONE). This will eventually allow us to assert proper behaviour and catch the above kind of errors more easily, expecting *mp to always be set. This change is extracted from a larger patch and not an exhaustive change across the entire stack yet. PR: 240135 Reported by: prabhakar.lakhera gmail.com MFC after: 3 weeks Sponsored by: Netflix	2019-11-12 15:46:28 +00:00
Gleb Smirnoff	c17cd08f53	It is unclear why in6_pcblookup_local() would require write access to the PCB hash. The function doesn't modify the hash. It always asserted write lock historically, but with epoch conversion this fails in some special cases. Reviewed by: rwatson, bz Reported-by: syzbot+0b0488ca537e20cb2429@syzkaller.appspotmail.com	2019-11-11 06:28:25 +00:00
Bjoern A. Zeeb	c1131de6f1	frag6: properly handle atomic fragments according to RFCs. RFC 8200 says: "If the fragment is a whole datagram (that is, both the Fragment Offset field and the M flag are zero), then it does not need any further reassembly and should be processed as a fully reassembled packet (i.e., updating Next Header, adjust Payload Length, removing the Fragment header, etc.). .." That means we should remove the fragment header and make all the adjustments rather than just skipping over the fragment header. The difference should be noticeable in that a properly handled atomic fragment triggering an ICMPv6 message at an upper layer (e.g. dest unreach, unreachable port) will not include the fragment header. Update the test cases to also test for an unfragmentable part. That is needed so that the next header is properly updated (not just lengths). MFC after: 3 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D22155	2019-11-08 14:36:44 +00:00
Gleb Smirnoff	2435e507de	Now with epoch synchronized PCB lookup tables we can greatly simplify locking in udp_output() and udp6_output(). First, we select if we need read or write lock in PCB itself, we take the lock and enter network epoch. Then, we proceed for the rest of the function. In case if we need to modify PCB hash, we would take write lock on it for a short piece of code. We could exit the epoch before allocating an mbuf, but with this patch we are keeping it all the way into ip_output()/ip6_output(). Today this creates an epoch recursion, since ip_output() enters epoch itself. However, once all protocols are reviewed, ip_output() and ip6_output() would require epoch instead of entering it. Note: I'm not 100% sure that in udp6_output() the epoch is required. We don't do PCB hash lookup for a bound socket. And all branches of in6_select_src() don't require epoch, at least they lack assertions. Today inet6 address list is protected by rmlock, although it is CKLIST. AFAIU, the future plan is to protect it by network epoch. That would require epoch in in6_select_src(). Anyway, in future ip6_output() would require epoch, udp6_output() would need to enter it.	2019-11-07 21:01:36 +00:00
Gleb Smirnoff	d797164a86	Since r353292 on input path we are always in network epoch, when we lookup PCBs. Thus, do not enter epoch recursively in in_pcblookup_hash() and in6_pcblookup_hash(). Same applies to tcp_ctlinput() and tcp6_ctlinput(). This leaves several sysctl(9) handlers that return PCB credentials unprotected. Add epoch enter/exit to all of them. Differential Revision: https://reviews.freebsd.org/D22197	2019-11-07 20:49:56 +00:00
Gleb Smirnoff	cf377af6e2	Remove unnecessary recursive epoch enter via INP_INFO_RLOCK macro in icmp6_rip6_input(). It shall always run in the network epoch.	2019-11-07 20:43:12 +00:00
Gleb Smirnoff	f42347c39a	Remove unnecessary recursive epoch enter via INP_INFO_RLOCK macro in raw input functions for IPv4 and IPv6. They shall always run in the network epoch.	2019-11-07 20:40:44 +00:00
Gleb Smirnoff	8d28524a90	Remove unnecessary recursive epoch enter via INP_INFO_RLOCK macro in udp6_input(). It shall always run in the network epoch.	2019-11-07 20:38:53 +00:00
Bjoern A. Zeeb	503f4e4736	netinet*: variable cleanup In preparation for another change factor out various variable cleanups. These mainly include: (1) do not assign values to variables during declaration: this makes the code more readable and does allow for better grouping of variable declarations, (2) do not assign values to variables before need; e.g., if a variable is only used in the 2nd half of a function and we have multiple return paths before that, then do not set it before it is needed, and (3) try to avoid assigning the same value multiple times. MFC after: 3 weeks Sponsored by: Netflix	2019-11-07 18:29:51 +00:00
Gleb Smirnoff	751d8d156a	Widen network epoch coverage in nd6_prefix_onlink() as in6ifa_ifpforlinklocal() requires the epoch. Reported by: bz Reviewed by: bz	2019-11-07 17:00:20 +00:00
Gleb Smirnoff	d6dbfed81e	In nd6_timer() enter the network epoch earlier. The defrouter_del() may call into leaf functions that require epoch. Since the function is already run in non-sleepable context, it should be safe to cover it whole with epoch. Reported by: syzcaller	2019-11-04 17:35:37 +00:00
Bjoern A. Zeeb	6e6b5143f5	Properly set VNET when nuking recvif from fragment queues. In theory the eventhandler invoke should be in the same VNET as the the current interface. We however cannot guarantee that for all cases in the future. So before checking if the fragmentation handling for this VNET is active, switch the VNET to the VNET of the interface to always get the one we want. Reviewed by: hselasky MFC after: 3 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D22153	2019-10-25 18:54:06 +00:00
Bjoern A. Zeeb	702828f643	frag6: do not leak counter in error cases When allocating the IPv6 fragement packet queue entry we do checks against counters and if we pass we increment one of the counters to claim the spot. Right after that we have two cases (malloc and MAC) which can both fail in which case we free the entry but never released our claim on the counter. In theory this can lead to not accepting new fragments after a long time, especially if it would be MAC "refusing" them. Rather than immediately subtracting the value in the error case, only increment it after these two cases so we can no longer leak it. MFC after: 3 weeks Sponsored by: Netflix	2019-10-25 16:29:09 +00:00
Bjoern A. Zeeb	619456bb59	frag6: prevent overwriting initial fragoff=0 packet meta-data. When we receive the packet with the first fragmented part (fragoff=0) we remember the length of the unfragmentable part and the next header (and should probably also remember ECN) as meta-data on the reassembly queue. Someone replying this packet so far could change these 2 (3) values. While changing the next header seems more severe, for a full size fragmented UDP packet, for example, adding an extension header to the unfragmentable part would go unnoticed (as the framented part would be considered an exact duplicate) but make reassembly fail. So do not allow updating the meta-data after we have seen the first fragmented part anymore. The frag6_20 test case is added which failed before triggering an ICMPv6 "param prob" due to the check for each queued fragment for a max-size violation if a fragoff=0 packet was received. MFC after: 3 weeks Sponsored by: Netflix	2019-10-24 22:07:45 +00:00
Bjoern A. Zeeb	cd188da20f	frag6: handling of overlapping fragments to conform to RFC 8200 While the comment was updated in r350746, the code was not. RFC8200 says that unless fragment overlaps are exact (same fragment twice) not only the current fragment but the entire reassembly queue for this packet must be silently discarded, which we now do if fragment offset and fragment length do not match. Obtained from: jtl MFC after: 3 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D16850	2019-10-24 20:22:52 +00:00
Michael Tuexen	4a91aa8fc9	Ensure that the flags indicating IPv4/IPv6 are not changed by failing bind() calls. This would lead to inconsistent state resulting in a panic. A fix for stable/11 was committed in https://svnweb.freebsd.org/base?view=revision&revision=338986 An accelerated MFC is planned as discussed with emaste@. Reported by: syzbot+2609a378d89264ff5a42@syzkaller.appspotmail.com Obtained from: jtl@ MFC after: 1 day Sponsored by: Netflix, Inc.	2019-10-24 20:05:10 +00:00
Bjoern A. Zeeb	53707abd41	frag6: export another counter read-only by sysctl Similar to the system global counter also export the per-VNET counter "frag6_nfragpackets" detailing the current number of fragment packets in this VNET's reassembly queues. The read-only counter is helpful for in-VNET statistical monitoring and for test-cases. MFC after: 3 weeks Sponsored by: Netflix	2019-10-24 20:00:37 +00:00
Bjoern A. Zeeb	dda02192f9	frag6: fix counter leak in error case and optimise code In case the first fragmented part (off=0) arrives we check for the maximum packet size for each fragmented part we already queued with the addition of the unfragmentable part from the first one. For one we do not have to enter the loop at all if this is the first fragmented part to arrive, and we can skip the check. Should we encounter an error case we send an ICMPv6 message for any fragment exceeding the maximum length limit. While dequeueing the original packet and freeing it, statistics were not updated and leaked both the reassembly queue count for the fragment and the global fragment count. Found by code inspection and confirmed by tightening test cases checking more statistical and system counters. While here properly wrap a line. MFC after: 3 weeks Sponsored by: Netflix	2019-10-24 19:57:18 +00:00

1 2 3 4 5 ...

1985 commits