system/freebsd-src

mirror of https://github.com/freebsd/freebsd-src synced 2024-10-16 21:34:10 +00:00

Author	SHA1	Message	Date
Gleb Smirnoff	47c8550576	Whitespace.	2014-02-17 12:02:44 +00:00
Gleb Smirnoff	5ec03c2bff	Bring copyright notice to standard style.	2014-02-17 12:01:50 +00:00
Gleb Smirnoff	0ff96b4f55	o Remove at compile time the HASH_ALL code, that was never tested and is unfinished. However, I've tested my version, it works okay. As before it is unfinished: timeout aren't driven by TCP session state. To enable the HASH_ALL mode, one needs in kernel config: options FLOWTABLE_HASH_ALL o Reduce the alignment on flentry to 64 bytes. Without the FLOWTABLE_HASH_ALL option, twice less memory would be consumed by flows. o API to ip_output()/ip6_output() got even more thin: 1 liner. o Remove unused unions. Simply use fle->f_key[]. o Merge all IPv4 code into flowtable_lookup_ipv4(), and do same flowtable_lookup_ipv6(). Stop copying data to on stack sockaddr structures, simply use key[] on stack. o Move code from flowtable_lookup_common() that actually works on insertion into flowtable_insert(). Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-17 11:50:56 +00:00
Adrian Chadd	a2c5809961	Make sure that the flowtable flowid is only set to m_flowid if there isn't one already supplied. The previous flowtable code also did this. Reviewed by: glebius Sponsored by: Netflix, Inc.	2014-02-15 07:57:01 +00:00
Luigi Rizzo	f0ea3689a9	This new version of netmap brings you the following: - netmap pipes, providing bidirectional blocking I/O while moving 100+ Mpps between processes using shared memory channels (no mistake: over one hundred million. But mind you, i said moving not processing); - kqueue support (BHyVe needs it); - improved user library. Just the interface name lets you select a NIC, host port, VALE switch port, netmap pipe, and individual queues. The upcoming netmap-enabled libpcap will use this feature. - optional extra buffers associated to netmap ports, for applications that need to buffer data yet don't want to make copies. - segmentation offloading for the VALE switch, useful between VMs. and a number of bug fixes and performance improvements. My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial amount of work on these features so we owe them a big thanks. There are some external repositories that can be of interest: https://code.google.com/p/netmap our public repository for netmap/VALE code, including linux versions and other stuff that does not belong here, such as python bindings. https://code.google.com/p/netmap-libpcap a clone of the libpcap repository with netmap support. With this any libpcap client has access to most netmap feature with no recompilation. E.g. tcpdump can filter packets at 10-15 Mpps. https://code.google.com/p/netmap-ipfw a userspace version of ipfw+dummynet which uses netmap to send/receive packets. Speed is up in the 7-10 Mpps range per core for simple rulesets. Both netmap-libpcap and netmap-ipfw will be merged upstream at some point, but while this happens it is useful to have access to them. And yes, this code will be merged soon. It is infinitely better than the version currently in 10 and 9. MFC after: 3 days	2014-02-15 04:53:04 +00:00
Gleb Smirnoff	f0e49f6631	Whenever flowtable lookup fails, we do route lookup and then try to insert flow entry. During the route lookup the critical section is exited. It may happen, that after route lookup we will be executed on an other CPU that already has such flowentry. Before this change we simply freed the flowentry and returned to ip_output() with failure. Actually there is nothing wrong with using previously allocated flow entry, updating it properly. Thus, make flowentry_insert() return the new either old fle, and make use of it. Count reuses as "collisions" and real inserts as "inserts". Reviewed by: adrian Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-14 10:56:26 +00:00
Gleb Smirnoff	48278b8846	Once pf became not covered by a single mutex, many counters in it became race prone. Some just gather statistics, but some are later used in different calculations. A real problem was the race provoked underflow of the states_cur counter on a rule. Once it goes below zero, it wraps to UINT32_MAX. Later this value is used in pf_state_expires() and any state created by this rule is immediately expired. Thus, make fields states_cur, states_tot and src_nodes of struct pf_rule be counter(9)s. Thanks to Dennis for providing me shell access to problematic box and his help with reproducing, debugging and investigating the problem. Thanks to: Dennis Yusupoff <dyr smartspb.net> Also reported by: dumbbell, pgj, Rambler Sponsored by: Nginx, Inc.	2014-02-14 10:05:21 +00:00
Adrian Chadd	0e778c88c9	Don't insert a flowtable entry if the lle isn't yet valid. Some of the collisions that are occuring are due to flowtable lookups that succeed but have an invalid lle - typically because the L2 adjacency lookup hasn't completed. This would lead to a follow-up insert which would then fail (ie, collision) and the code would fall through to doing a slow-path L2/L3 lookup in the netinet/netinet6 code. This patch simply aborts storing a new flowtable entry if the lle isn't yet valid. Whilst I'm here, add a new pcpu counter for the item so the number of failures can be tracked separately from generic "collisions." Reviewed by: glebius MFC after: 10 days Sponsored by: Netflix, Inc.	2014-02-14 00:05:09 +00:00
Gleb Smirnoff	25c03b9100	Remove unused FL_NOAUTO.	2014-02-13 05:19:09 +00:00
Gleb Smirnoff	4343b5fa49	o Axe non-pcpu flowtable implementation. It wasn't enabled or used, and probably is a leftover from first prototyping by Kip. The non-pcpu implementation used mutexes, so it doubtfully worked better than simple routing lookup. o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays, use zpcpu_get() to access data in there. o Substitute own single list implementation with SLIST(). This has two functional side effects: - new flows go into head of a list, before they went to tail. - a bug when incorrect flow was deleted in flow cleaner is fixed. o Due to cache line alignment, there is no reason to keep different zones for IPv4 and IPv6 flows. Both consume one cache line, real size of allocation is equal. o Rely on that f_hash, f_rt, f_lle are stable during fle lifetime, remove useless volatile quilifiers. o More INET/INET6 splitting. Reviewed by: adrian Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-13 04:59:18 +00:00
Mikolaj Golub	db2f5a2461	Fixup for r261590 (vnet sysctl handlers cleanup). Reviewed by: glebius	2014-02-09 08:13:17 +00:00
Gleb Smirnoff	0d3009b37c	Remove ft_rtalloc and choose rtalloc function at compile time.	2014-02-08 22:12:00 +00:00
Gleb Smirnoff	2333893af3	Spacing.	2014-02-08 22:10:53 +00:00
Gleb Smirnoff	07d9bc0740	Revert accidentially leaked changes in r261627.	2014-02-08 09:57:52 +00:00
Gleb Smirnoff	603819bc74	Remove never set flag FL_OVERWRITE. The only place where it was checked led to lock/critnest leak.	2014-02-08 09:56:26 +00:00
Gleb Smirnoff	d60a1d1ef4	Fix comment.	2014-02-07 22:30:42 +00:00
Gleb Smirnoff	f83f97fcbc	Remove unused defines.	2014-02-07 21:56:16 +00:00
Gleb Smirnoff	5d6d7e756b	o Revamp API between flowtable and netinet, netinet6. - ip_output() and ip_output6() simply call flowtable_lookup(), passing mbuf and address family. That's the only code under #ifdef FLOWTABLE in the protocols code now. o Revamp statistics gathering and export. - Remove hand made pcpu stats, and utilize counter(9). - Snapshot of statistics is available via 'netstat -rs'. - All sysctls are moved into net.flowtable namespace, since spreading them over net.inet isn't correct. o Properly separate at compile time INET and INET6 parts. o General cleanup. - Remove chain of multiple flowtables. We simply have one for IPv4 and one for IPv6. - Flowtables are allocated in flowtable.c, symbols are static. - With proper argument to SYSINIT() we no longer need flowtable_ready. - Hash salt doesn't need to be per-VNET. - Removed rudimentary debugging, which use quite useless in dtrace era. The runtime behavior of flowtable shouldn't be changed by this commit. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-07 15:18:23 +00:00
Gleb Smirnoff	b5c32cf481	Remove identical vnet sysctl handlers, and handle CTLFLAG_VNET in the sysctl_root(). Note: SYSCTL_VNET_* macros can be removed as well. All is needed to virtualize a sysctl oid is set CTLFLAG_VNET on it. But for now keep macros in place to avoid large code churn. Sponsored by: Nginx, Inc.	2014-02-07 13:47:33 +00:00
Gleb Smirnoff	7cf4986ba9	Spacing.	2014-02-07 10:05:12 +00:00
Alexander V. Chernikov	95fbe4d0cc	Simplify filling sockaddr_dl structure for if_resolvemulti() callback providers. link_init_sdl() function can be used to fill most of the parameters. Use caller stack instead of allocation / freing memory for each request. Do not drop support for extra-long (probably non-existing) link-layer protocols by introducing link_alloc_sdl() (used by if_resolvemulti() callback) and link_free_sdl() (used by caller). Since this change breaks KBI, MFC requires slightly different approach (link_init_sdl() auto-allocating buffer if necessary to handle cases with unmodified if_resolvemulti() callers). MFC after: 2 weeks	2014-01-18 23:24:51 +00:00
Luigi Rizzo	6601501905	forgot to update this file in 2607000	2014-01-17 04:38:58 +00:00
Luigi Rizzo	b82b221181	use explicit casts with void* to compile when included by C++ code	2014-01-11 00:00:11 +00:00
Alexander V. Chernikov	d375edc9b5	Simplify inet alias handling code: if we're adding/removing alias which has the same prefix as some other alias on the same interface, use newly-added rt_addrmsg() instead of hand-rolled in_addralias_rtmsg(). This eliminates the following rtsock messages: Pinned RTM_ADD for prefix (for alias addition). Pinned RTM_DELETE for prefix (for alias withdrawal). Example (got 10.0.0.1/24 on vlan4, playing with 10.0.0.2/24): before commit, addition: got message of size 116 on Fri Jan 10 14:13:15 2014 RTM_NEWADDR: address being added to iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 got message of size 192 on Fri Jan 10 14:13:15 2014 RTM_ADD: Add Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 10.0.0.2 (255) ffff ffff ff after commit, addition: got message of size 116 on Fri Jan 10 13:56:26 2014 RTM_NEWADDR: address being added to iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 14.0.0.2 14.0.0.255 before commit, wihdrawal: got message of size 192 on Fri Jan 10 13:58:59 2014 RTM_DELETE: Delete Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 10.0.0.2 (255) ffff ffff ff got message of size 116 on Fri Jan 10 13:58:59 2014 RTM_DELADDR: address being removed from iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 adter commit, withdrawal: got message of size 116 on Fri Jan 10 14:14:11 2014 RTM_DELADDR: address being removed from iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 Sending both RTM_ADD/RTM_DELETE messages to rtsock is completely wrong (and requires some hacks to keep prefix in route table on RTM_DELETE). I've tested this change with quagga (no change) and bird (). bird alias handling is already broken in BSD sysdep code, so nothing changes here, too. I'm going to MFC this change if there will be no complains about behavior change. While here, fix some style(9) bugs introduced by r260488 (pointed by glebius and bde). Sponsored by: Yandex LLC MFC after: 4 weeks	2014-01-10 12:13:55 +00:00
Alexander V. Chernikov	4cbac30b29	Split rt_newaddrmsg_fib() into two different functions. Adding/deleting interface addresses involves access to 3 different subsystems, int different parts of code. Each call can fail, so reporting successful operation by rtsock in the middle of the process error-prone. Further split routing notification API and actual rtsock calls via creating public-available rt_addrmsg() / rt_routemsg() functions with "private" rtsock_* backend. MFC after: 2 weeks	2014-01-09 18:13:25 +00:00
Alexander V. Chernikov	7d9b6df18b	Constanly use RT_ALL_FIBS everywhere instead of -1. MFC after: 2 weeks	2014-01-08 23:09:02 +00:00
Alexander V. Chernikov	955a2deb52	Remove dead code. Reported by: Coverity Coverity CID: 1018057 MFC after: 2 weeks	2014-01-07 19:00:40 +00:00
Alexander V. Chernikov	50da3e886d	Teach every SIOCGIFSTATUS provider to fill in ifs->ascii anyway. Remove old bits of data concat for 'ascii' field. Remove special SIOCGIFSTATUS handling from if.c (which Coverity yells at). Reported by: Coverity Coverity CID: 1147174 MFC after: 2 weeks	2014-01-07 15:59:33 +00:00
Alexander V. Chernikov	034c09ff10	Partially fix IPv4 interface routes deletion in RADIX_MPATH. Noticed by: Nikolay Denev <ndenev at gmail.com> MFC after: 1 month	2014-01-06 22:36:20 +00:00
Luigi Rizzo	17885a7bfd	It is 2014 and we have a new version of netmap. Most relevant features: - netmap emulation on any NIC, even those without native netmap support. On the ixgbe we have measured about 4Mpps/core/queue in this mode, which is still a lot more than with sockets/bpf. - seamless interconnection of VALE switch, NICs and host stack. If you disable accelerations on your NIC (say em0) ifconfig em0 -txcsum -txcsum you can use the VALE switch to connect the NIC and the host stack: vale-ctl -h valeXX:em0 allowing sharing the NIC with other netmap clients. - THE USER API HAS SLIGHTLY CHANGED (head/cur/tail pointers instead of pointers/count as before). This was unavoidable to support, in the future, multiple threads operating on the same rings. Netmap clients require very small source code changes to compile again. On the plus side, the new API should be easier to understand and the internals are a lot simpler. The manual page has been updated extensively to reflect the current features and give some examples. This is the result of work of several people including Giuseppe Lettieri, Vincenzo Maffione, Michio Honda and myself, and has been financially supported by EU projects CHANGE and OPENLAB, from NetApp University Research Fund, NEC, and of course the Universita` di Pisa.	2014-01-06 12:53:15 +00:00
Alexander V. Chernikov	5a2f4cbd92	Change semantics for rnh_lookup() function: now it performs exact match search, regardless of netmask existance. This simplifies most of rnh_lookup() consumers. Fix panic triggered by deleting non-existent host route. PR: kern/185092 Submitted by: Nikolay Denev <ndenev at gmail.com> MFC after: 1 month	2014-01-04 22:25:26 +00:00
Alexander V. Chernikov	868f984c05	Remove useless register variable modifiers. Do some more style(9). MFC after: 2 weeks	2014-01-03 14:33:25 +00:00
George V. Neville-Neil	d9168b014f	Convert #defines to enums so that the values are visible in the debugger. Requested by: gibbs MFC after: 2 weeks	2014-01-02 21:30:59 +00:00
Scott Long	1a8959dac6	Multi-queue NIC drivers and multi-port lagg tend to use the same lower bits of the flowid as each other, resulting in a poor distribution of packets among queues in certain cases. Work around this by adding a set of sysctls for controlling a bit-shift on the flowid when doing multi-port aggrigation in lagg and lacp. By default, lagg/lacp will now use bits 16 and higher instead of 0 and higher. Reviewed by: max Obtained from: Netflix MFC after: 3 days	2013-12-30 01:32:17 +00:00
Alexander V. Chernikov	78aed5e800	Simplify contiguous mask checking. Suggested by: glebius MFC after: 2 weeks	2013-12-17 22:16:27 +00:00
Luigi Rizzo	f9790aeb88	split netmap code according to functions: - netmap.c base code - netmap_freebsd.c FreeBSD-specific code - netmap_generic.c emulate netmap over standard drivers - netmap_mbq.c simple mbuf tailq - netmap_mem2.c memory management - netmap_vale.c VALE switch simplify devce-specific code	2013-12-15 08:37:24 +00:00
George V. Neville-Neil	4fe3b90bd3	Add constants for use in interrogating various fiber and copper connectors most often used with network interfaces. The SFF-8472 standard defines the information that can be retrieved from an optic or a copper cable plugged into a NIC, most often referred to as SFP+. Examples of values that can be read include the cable vendor's name, part number, date of manufacture as well as running data such as temperature, voltage and tx and rx power. Copious comments on how to use these values with an I2C interface are given in the header file itself. MFC after: 2 weeks	2013-11-27 20:20:02 +00:00
Gleb Smirnoff	4678c74014	Fix build.	2013-11-27 07:21:25 +00:00
Sergey Kandaurov	da162ca88f	Fix macro name in comment.	2013-11-26 15:23:56 +00:00
Andriy Gapon	d9fae5ab88	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks	2013-11-26 08:46:27 +00:00
Craig Rodrigues	6274ce3e2b	In vnet_route_uninit(), free some memory that is allocated in vnet_route_init(). To reproduce the problem: (1) Take a GENERIC kernel config, and add options for: VIMAGE, WITNESS, INVARIANTS. (2) Run this command in a loop: jail -l -u root -c path=/ name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo see: http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021280.html http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021291.html This doesn't eliminate all the "Freed UMA keg was not empty" warning messages on the console, but it helps.	2013-11-25 20:33:33 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Gleb Smirnoff	d77c1b3269	To support upcoming changes change internal API for source node handling: - Removed pf_remove_src_node(). - Introduce pf_unlink_src_node() and pf_unlink_src_node_locked(). These function do not proceed with freeing of a node, just disconnect it from storage. - New function pf_free_src_nodes() works on a list of previously disconnected nodes and frees them. - Utilize new API in pf_purge_expired_src_nodes(). In collaboration with: Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de> Sponsored by: InnoGames GmbH Sponsored by: Nginx, Inc.	2013-11-22 19:16:34 +00:00
Gleb Smirnoff	3260ae00be	Add missing 'extern'.	2013-11-22 19:02:22 +00:00
Gleb Smirnoff	654957c2c8	Merge head up to r258343.	2013-11-19 12:21:47 +00:00
George V. Neville-Neil	4857f5fbbc	Allow ethernet drivers to pass in packets connected via the nextpkt pointer. Handling packets in this way allows drivers to amortize work during packet reception. Submitted by: Vijay Singh Sponsored by: NetApp	2013-11-18 22:58:14 +00:00
Gleb Smirnoff	f053058cee	- Split functions that initialize various pf parts into their vimage parts and global parts. - Since global parts appeared to be only mutex initializations, just abandon them and use MTX_SYSINIT() instead. - Kill my incorrect VNET_FOREACH() iterator and instead use correct approach with VNET_SYSINIT(). Submitted by: Nikos Vassiliadis <nvass gmx.com> Reviewed by: trociny	2013-11-18 22:18:07 +00:00
George V. Neville-Neil	1350c361f6	Clean up the macros to avoid using casts. Suggested by: bde and jhb	2013-11-15 16:03:32 +00:00
Andrey V. Elsukov	c72a5d5d89	ANSIfy function defintions.	2013-11-15 12:12:50 +00:00
George V. Neville-Neil	d4aff8d11e	Put in the correct bit shifting and add a type to prevent clang from complaining. While here fix up a grammar nit. Pointed out by: Sergey Kandaurov and bz@ respectively.	2013-11-14 21:57:37 +00:00
George V. Neville-Neil	868eef3239	Shift our OUI correctly. Pointed out by: emaste	2013-11-14 20:07:17 +00:00
George V. Neville-Neil	62f1e1b2bc	The FreeBSD Project now has its own, Ogranizationally Unique Identifier, assigned by the IEEE. This file includes documentation on how developers must carve up the space as well as an initial allocation for bhyve. Sponsored by: The FreeBSD Foundation	2013-11-14 19:53:35 +00:00
Gleb Smirnoff	50d3286d9d	Merge head r232040 through r258006.	2013-11-11 20:33:25 +00:00
Gleb Smirnoff	555036b5f6	Remove never used ioctls that originate from KAME. The proof of their zero usage was exp-run from misc/183538.	2013-11-11 05:39:42 +00:00
Gleb Smirnoff	77b89ad837	Provide compat layer for OSIOCAIFADDR.	2013-11-06 19:46:20 +00:00
Gleb Smirnoff	af50ea380f	Axe IFF_SMART. Fortunately this layering violating flag was never used, it was just declared.	2013-11-05 12:52:56 +00:00
Gleb Smirnoff	5fb009bda7	Drop support for historic ioctls and also undefine them, so that code that checks their presence via ifdef, won't use them. Bump __FreeBSD_version as safety measure.	2013-11-05 10:29:47 +00:00
Gleb Smirnoff	9a6356bc31	In complemence to ifa_add_loopback_route() and ifa_del_loopback_route() provide function ifa_switch_loopback_route() that will be used in case when an interface address used for a loopback route goes away, but we have another interface address with same address value and want to preserve loopback route. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-11-05 07:36:17 +00:00
Gleb Smirnoff	b1b9dcae46	Remove net.link.ether.inet.useloopback sysctl tunable. It was always on by default from the very beginning. It was placed in wrong namespace net.link.ether, originally it had been at another wrong namespace. It was incorrectly documented at incorrect manual page arp(8). Since new-ARP commit, the tunable have been consulted only on route addition, and ignored on route deletion. Behaviour of a system with tunable turned off is not fully correct, and has no advantages comparing to normal behavior.	2013-11-05 07:32:09 +00:00
Adrian Chadd	dd50b3107e	Restore the entropy gathering from the m_data pointer value, not the m_data payload. After talking with markm/bde, this is what markm actually intended.	2013-11-02 15:13:02 +00:00
Luigi Rizzo	ce3ee1e7c4	update to the latest netmap snapshot. This includes the following: - use separate memory regions for VALE ports - locking fixes - some simplifications in the NIC-specific routines - performance improvements for the VALE switch - some new features in the pkt-gen test program - documentation updates There are small API changes that require programs to be recompiled (NETMAP_API has been bumped so you will detect old binaries at runtime). In particular: - struct netmap_slot now is 16 bytes to support an extra pointer, which may save one data copy when using VALE ports or VMs; - the struct netmap_if has two extra fields; MFC after: 3 days	2013-11-01 21:21:14 +00:00
Adrian Chadd	a09968c479	Convert the random entropy harvesting code to use a const void * pointer rather than just void . Then, as part of this, convert a couple of mbuf m->m_data accesses to mtod(m, const void ). Reviewed by: markm Approved by: security-officer (delphij) Sponsored by: Netflix, Inc.	2013-11-01 20:53:49 +00:00
Gleb Smirnoff	f9b2a21c9e	Merge head r232040 through r257457. M usr.sbin/portsnap/portsnap/portsnap.8 M usr.sbin/portsnap/portsnap/portsnap.sh M usr.sbin/tcpdump/tcpdump/Makefile	2013-10-31 17:33:29 +00:00
Andre Oppermann	5b74cfe42f	Make struct ifnet readable and comprehensible again by grouping and ordering related variables, fields and locks next to each other. Add more comments to variables. Over time 'ifnet' has accumlated a lot of additional pointers and functionality in an unstructured way making it quite hard to read and understand while obfuscating relationships between fields and variables. Quantify the structure size and how bloated it has become. This is only a mechanical change in preparation for upcoming work to make ifnet opaque to drivers and to separate out the interface queuing. Sponsored by: The FreeBSD Foundation	2013-10-31 15:46:10 +00:00
Andre Oppermann	ded7d20fc5	Move all interface queue related structures, macros and definitions from net/if_var to it own new net/ifq.h. For now net/ifq.h is unconditionally included through net/if_var.h. This is a mechanical change in preparation to make struct ifnet and the individual interface queue mechanisms opaque. Discussed with: glebius Sponsored by: The FreeBSD Foundation	2013-10-29 17:48:08 +00:00
Gleb Smirnoff	eaeb0c139a	Style: s/SYS_EVENTHANDLER_H/_SYS_EVENTHANDLER_H_/g Submitted by: bde	2013-10-28 20:32:05 +00:00
Gleb Smirnoff	c29e1ad930	- Make the prophecy from 1997 happen and remove if_var.h inclusion from if.h. - Remove unnecessary includes and declarations from if.h - Remove unnecessary includes and declarations from if_var.h [1] - Mark some declarations that are about to be removed in near future with comments, explaning why this declaration is still necessary. - Protect eventhandler declarations with #ifdef SYS_EVENTHANDLER_H. Obtained from: bdeBSD [1] Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 08:03:40 +00:00
Gleb Smirnoff	7ced9c2f66	Instead of putting ifnet declaration into eventhandler.h, move bpf(4) and vlan(4) related event declarations to bpf.h and if_vlan_var.h. To avoid dependency on eventhandler.h, protect these declarations with ifdef SYS_EVENTHANDLER_H. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 07:45:03 +00:00
Gleb Smirnoff	c3322cb91c	Include necessary headers that now are available due to pollution via if_var.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 07:29:16 +00:00
Gleb Smirnoff	628c030f77	Provide forward declaration for struct ifnet. Consumers of this header don't need contents of struct.	2013-10-27 17:27:06 +00:00
Gleb Smirnoff	47bb65deb8	Almost all if_clone consumers do not care about if_clone_event. Do not force them to include sys/eventhandler.h. Those who utilize EVENTHANDLER(9), will see the declaration.	2013-10-27 17:14:33 +00:00
Gleb Smirnoff	75bf2db380	Move new pf includes to the pf directory. The pfvar.h remain in net, to avoid compatibility breakage for no sake. The future plan is to split most of non-kernel parts of pfvar.h into pf.h, and then make pfvar.h a kernel only include breaking compatibility. Discussed with: bz	2013-10-27 16:25:57 +00:00
Gleb Smirnoff	9dae57e134	Start splitting pfvar.h into internal and external parts. - Provide pf_altq.h that has only stuff needed for ALTQ. - Start pf.h, that would have all constant values and eventually non-kernel structures. - Build ALTQ w/o pfvar.h, include if_var.h, that before came via pollution. - Build tcpdump w/o pfvar.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 18:59:58 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
Gleb Smirnoff	b9f19cb397	vnet.h needs to be included before raw_cb.h. Now it compiles due to pollution via if_var.h.	2013-10-25 19:49:03 +00:00
Peter Grehan	4083db7d5d	Fix panic in the tap driver when a tap and vmnet interface were created after each other e.g. ifconfig tap0 ifconfig vmnet0 <panic> Appears to be a cut'n'paste error from the tap code to the vmnet code where the name string wasn't updated in the call to make_dev(). Reviewed by: glebius MFC after: 3 days	2013-10-24 22:21:31 +00:00
Andrey V. Elsukov	22dc101fac	Add a note that lacp_compose_key() should be updated, when new media types will be added. Submitted by: melifaro X-MFC after: r256689	2013-10-21 07:49:36 +00:00
Gleb Smirnoff	0bfd163f52	Merge head r233826 through r256722.	2013-10-18 09:32:02 +00:00
Andrey V. Elsukov	d5773da82a	Use the same actor key for media types of the same speed. PR: 176097 MFC after: 2 weeks	2013-10-17 15:14:58 +00:00
Alexander V. Chernikov	65a17d744e	Fix long-standing issue with incorrect radix mask calculation. Usual symptoms are messages like rn_delete: inconsistent annotation rn_addmask: mask impossibly already in tree or inability to flush/delete particular prefix in ipfw table. Changes: * Assume 32 bytes as maximum radix key length * Remove rn_init() * Statically allocate rn_ones/rn_zeroes * Make separate mask tree for each "normal" tree instead of system global one * Remove "optimization" on masks reusage and key zeroying * Change rn_addmask() arguments to accept tree pointer (no users in base) PR: kern/182851, kern/169206, kern/135476, kern/134531 Found by: Slawa Olhovchenkov <slw@zxy.spb.ru> MFC after: 2 weeks Reviewed by: glebius Sponsored by: Yandex LLC	2013-10-16 12:18:44 +00:00
Alexander V. Chernikov	fa27a1fa71	Remove unused fields from radix_node_head. Sponsored by: Yandex LLC	2013-10-16 10:33:20 +00:00
Gleb Smirnoff	994409375b	Rename Free() macro to R_Free(). This matches R_Malloc() and has much lower probability to clash with other headers. Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>	2013-10-16 04:59:59 +00:00
Maksim Yevmenkin	9ca42425c6	In the flowtable scanner, restart the scan at the last found position, not at position 0. Changes the scanner from O(N^2) to O(N). Submitted by: scottl Obtained from: Netflix, Inc MFC after: 3 weeks	2013-10-15 21:28:51 +00:00
Gleb Smirnoff	7caf4ab7ac	- Utilize counter(9) to accumulate statistics on interface addresses. Add four counters to struct ifaddr. This kills '+=' on a variables shared between processors for every packet. - Nuke struct if_data from struct ifaddr. - In ip_input() do not put a reference on ifaddr, instead update statistics right now in place and do IN_IFADDR_RUNLOCK(). These removes atomic(9) for every packet. [1] - To properly support NET_RT_IFLISTL sysctl used by getifaddrs(3), in rtsock.c fill if_data fields using counter_u64_fetch(). - Accidentially fix bug in COMPAT_32 version of NET_RT_IFLISTL, which took if_data not from the ifaddr, but from ifaddr's ifnet. [2] Submitted by: melifaro [1], pluknet[2] Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 11:37:57 +00:00
Gleb Smirnoff	3fffa8c8ff	Push some defines under _KERNEL, improve styling and comments.	2013-10-15 10:43:26 +00:00
Gleb Smirnoff	67420bda02	Remove ifa_mtx. It was used only in one place in kernel, and ifnet's ifaddr lock can substitute it there. Discussed with: melifaro, ae Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:41:22 +00:00
Gleb Smirnoff	4675896098	Remove ifa_init() and provide ifa_alloc() that will allocate and setup struct ifaddr internally. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:31:42 +00:00
Gleb Smirnoff	6ed910fabe	Hide 'struct ifaddr' definition from userland. Two tools left that use it, namely ipftest(1) and ifmcstat(1). These sniff structure definition using _WANT_IFADDR define. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:19:24 +00:00
Mark Murray	72acff0f07	MFC - tracking commit.	2013-10-09 21:03:34 +00:00
Gleb Smirnoff	4cdc1f5421	There are some high performance NICs that count statistics in hardware, and there are ifnets, that do that via counter(9). Provide a flag that would skip cache line trashing '+=' operation in ether_input(). Sponsored by: Netflix Sponsored by: Nginx, Inc. Reviewed by: melifaro, adrian Approved by: re (marius)	2013-10-09 19:04:40 +00:00
Mark Murray	ad1f331196	Debug run. This now works, except that the "live" sources haven't been tested. With all sources turned on, this unlocks itself in a couple of seconds! That is no my box, and there is no guarantee that this will be the case everywhere. * Cut debug prints. * Use the same locks/mutexes all the way through. * Be a tad more conservative about entropy estimates.	2013-10-06 12:40:32 +00:00
Mark Murray	f02e47dc1e	Snapshot. This passes the build test, but has not yet been finished or debugged. Contains: * Refactor the hardware RNG CPU instruction sources to feed into the software mixer. This is unfinished. The actual harvesting needs to be sorted out. Modified by me (see below). * Remove 'frac' parameter from random_harvest(). This was never used and adds extra code for no good reason. * Remove device write entropy harvesting. This provided a weak attack vector, was not very good at bootstrapping the device. To follow will be a replacement explicit reseed knob. * Separate out all the RANDOM_PURE sources into separate harvest entities. This adds some secuity in the case where more than one is present. * Review all the code and fix anything obviously messy or inconsistent. Address som review concerns while I'm here, like rename the pseudo-rng to 'dummy'. Submitted by: Arthur Mesh <arthurmesh@gmail.com> (the first item)	2013-10-04 06:55:06 +00:00
Gleb Smirnoff	c7063c15b0	Clear knlist before destroying it in tap(4) and tun(4). This fixes later crash, when a kqueue descriptor tries to dereference appropriate knotes. Approved by: re (kib)	2013-10-02 20:44:36 +00:00
Gleb Smirnoff	bdad3190a2	Fix a fallout from r241610. One enc interface must be created on startup. Pointy hat to: glebius Reported by: gavin Approved by: re (gjb)	2013-09-28 14:14:23 +00:00
Gleb Smirnoff	540b1a7238	Clean up SIOCSIFDSTADDR usage from ifnet drivers. The ioctl itself is extremely outdated, and I doubt that it was ever used for ifnet drivers. It was used for AF_INET sockets in pre-FreeBSD time. Approved by: re (hrs) Sponsored by: Nginx, Inc.	2013-09-11 09:19:44 +00:00
Dag-Erling Smørgrav	1a05c762b9	Fix the length calculation for the final block of a sendfile(2) transmission which could be tricked into rounding up to the nearest page size, leaking up to a page of kernel memory. [13:11] In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR and SIOCSIFNETMASK at the socket layer rather than pass them on to the link layer without validation or credential checks. [SA-13:12] Prevent cross-mount hardlinks between different nullfs mounts of the same underlying filesystem. [SA-13:13] Security: CVE-2013-5666 Security: FreeBSD-SA-13:11.sendfile Security: CVE-2013-5691 Security: FreeBSD-SA-13:12.ifioctl Security: CVE-2013-5710 Security: FreeBSD-SA-13:13.nullfs Approved by: re	2013-09-10 10:05:59 +00:00
Mark Murray	a40c2646a4	Bring in some behind-the-scenes development, mainly By Arthur Mesh, the rest by me. o Namespace cleanup; the Yarrow name is now restricted to where it really applies; this is in anticipation of being augmented or replaced by Fortuna in the future. Fortuna is mentioned, but behind #if logic, and is ignorable for now. o The harvest queue is pulled out into its own modules. o Entropy harvesting is emproved, both by being made more conservative, and by separating (a bit!) the sources. Available entropy crumbs are marginally improved. o Selection of sources is made clearer. With recent revelations, this will receive more work in the weeks and months to come. Submitted by: Arthur Mesh (partly) <arthurmesh@gmail.com>	2013-09-07 14:15:13 +00:00
Davide Italiano	ab97ad0806	Don't clear the unused SI_CHEAPCLONE flag in tap_create()/tuncreate(). Reviewed by: kib	2013-09-07 13:50:13 +00:00
Mark Murray	9d32fc31c7	MFC	2013-09-07 07:58:29 +00:00
Davide Italiano	933e681d93	Retire netisr.netisr_direct and netisr.netisr_direct_force sysctls. These were used to control/export dispatch policy but they're not anymore. This commit cannot be MFC'ed to 9 because old netstat(9) binary relies on such sysctl to work. On the other hand, there's no real reason to keep'em around in 10.	2013-09-06 21:02:43 +00:00

1 2 3 4 5 ...

3170 commits