linux

mirror of https://github.com/torvalds/linux synced 2024-11-05 18:23:50 +00:00

Author	SHA1	Message	Date
Mitch Williams	f578f5f453	i40e/i40evf: pass QOS handle to VF The VF really doesn't care about the QOS handle but it will in the future. Since the VF only uses TC0, send it that handle. On the VF side, save the handle and use it to populate the QOS params when we call into the client interface. Change-ID: I76f41b070baeaa09b19383e9168bc677837e0761 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-08 16:30:13 -07:00
Mitch Williams	8ed995ff6b	i40evf: use capabilities flags properly Use the capabilities passed to us by the PF driver to control VF driver behavior. In the process, clean up the VLAN add/remove code so it's not a horrible morass of ifdefs. Change-ID: I1050eaf12b658a26fea6813047c9964163c70a73 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-08 16:27:14 -07:00
Jesse Brandeburg	a5fdaf342a	i40e: refactor code to remove indent I found a code indent that was avoidable because a whole function is inside an if block, reverse the if and move the code back a tab. Change-ID: I9989c8750ee61678fbf96a3b0fd7bf7cc7ef300a Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-08 16:24:22 -07:00
Jesse Brandeburg	6995b36c0f	i40e/i40evf: clean up some code Add missings spaces after declarations, remove another __func__ use, remove uncessary braces, remove unneeded breaks, and useless returns, and generally fix up some code. Change-ID: Ie715d6b64976c50e1c21531685fe0a2bd38c4244 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-08 16:19:55 -07:00
Mitch Williams	ee5c1e92dd	i40evf: detect reset more reliably Using VFGEN_RSTAT to detect a VF reset is an endeavor that is fraught with peril. It's entirely too easy to miss a reset because none of the bits are sticky. By the time the VF driver reads the register, the reset may have been processed and cleaned up by the PF driver, leaving the register in the same state that it was before the reset. Instead, detect a reset with the VF_ARQLEN register. When the VF is reset, the enable bit in this register is cleared, and it stays cleared until the VF driver processes the reset and re-enables the admin queue. Because we now deal with multiple registers in the reset and watchdog tasks, rename the rstat_val variable to reg_val. Change-ID: Id1df17045c0992e607da0162d31807f7fc20d199 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-08 16:08:50 -07:00
Greg Bowers	95e5613f75	i40e: Support FW CEE DCB UP to TC map nibble swap Changes parsing of AQ command Get CEE DCBX OPER CFG (0x0A07). Change is required because FW creates the oper_prio_tc nibbles reversed from those in the CEE Priority Group sub-TLV. Change-ID: I7d9d8641bb430d30e286fc3fac909866ef8a0de8 Signed-off-by: Greg Bowers <gregory.j.bowers@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-08 16:05:45 -07:00
Serey Kong	66486cd717	i40e/i40evf: Explicitly assign enum index for VSI type Ran into an issue where PF's VSI type list was different from VF's, which was resulted in different enum index. The VSI type list can be different depending on what build flag is used for PF and VF. The change is to explicitly assign enum index for each VSI type so that PF and VF always reference to the same VSI type event if the enum lists are different. Change-ID: I8c0e5fdb515f324f7964df863a458073cf467e57 Signed-off-by: Serey Kong <serey.kong@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-08 16:02:44 -07:00
Shannon Nelson	9ac7726637	i40e: add switch for link polling There's been some need for controlling the periodic link polling for debugging link issues. This patch enables switching it off and on through an ethtool private flag. The link poll remains on by default, but can be turned off with ethtool --set-priv-flags p261p1 LinkPolling off and later turned back on with ethtool --set-priv-flags p261p1 LinkPolling on To check the current status, use ethtool --show-priv-flags p261p1 Change-ID: I32e4ab654ff3eec90a06cf144899971b82d71c40 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-08 15:58:23 -07:00
Matt Jared	c156f856ad	i40e: Fix multiple link up messages This patch addresses an issue where multiple link up messages can be logged resulting from aq link status timing when link properties are changed (fc, speed, etc.); solved by using a single function to handle status printing and adding a mechanism to track whether link state (up or down) has actually changed. Change-ID: Ied6ed6e49dc397c77d992adc0bc9ed3767152b9d Signed-off-by: Matt Jared <matthew.a.jared@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-08 15:28:18 -07:00
Carolyn Wyborny	3487b6c30c	i40e: Fix for extra Flow Director filter in table after error This patch fixes a problem where the PF's fdir filter table would have an entry that the hw was unable to add. This notification happens in the hot path, so instead of trying to fix it then, we note the location in the failure case and delete it during regular fdir subtask callback. Without this patch, a case can occur where an invalid entry gets replayed and a valid one is not. Change-ID: I67831c183b5d0309876de807cc434809b74c9cb7 Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-08 15:24:28 -07:00
Neerav Parikh	1a9375eb7f	i40e/i40evf: Store CEE DCBX DesiredCfg and RemoteCfg This patch adds capability to query and store the CEE DCBX DesiredCfg and RemoteCfg data from the LLDP MIB. Added new member "desired_dcbx_config" in the i40e_hw data structure to hold CEE only DesiredCfg data. Change-ID: I19c550369594384eaff4cc63e690ca740231195d Signed-off-by: Neerav Parikh <neerav.parikh@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-08 15:20:35 -07:00
Neerav Parikh	909b2d16c4	i40e: Add parsing for CEE DCBX TLVs This patch adds parsing for CEE DCBX TLVs from the LLDP MIB. While the driver gets the DCB CEE operational configuration from Firmware using the "Get CEE DCBX Oper Config" AQ command there is a need to get the CEE DesiredCfg Tx by firmware and DCB configuration Rx from peer; for debug and other application purposes. Change-ID: I9140edf1a25a2852c7eff805d81e5eff6266178d Signed-off-by: Neerav Parikh <neerav.parikh@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-08 15:11:38 -07:00
Mitch Williams	96c8d0738a	i40e: add more verbose error messages Under certain circumstances, the device may not have enough resources to enable all of the VFs that it advertises in config space. Although the number of supported VFs is reported upon driver init, it is not obvious when this is different from the number reported in config space. To eliminate this confusion, add an error message explaining the problem. Additionally, move the 'Allocating VFs' message down below the error checks so as to prevent further confusion. Change-ID: I45b7efca53a7aebf7777be33a8bc9d615ae48ea1 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-08 15:08:04 -07:00
Jesse Brandeburg	02d109be3d	i40e: inline interrupt enable The interrupt enable function can be inlined by moving it to the header file, which decreases the function call overhead for a frequently called function. Change-ID: I3214cc99593725768642680e7b8ce7e9bba7e44d Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-08 15:04:04 -07:00
Jesse Brandeburg	1e8efb42cf	i40e: fix erroneous WARN_ON The driver was issuing a WARN_ON during ring size changes because the code was cloning the rx_ring struct but not zeroing out the pointers before allocating new memory. Zero out the pointers in the cloned copy before allocating new memory for them. In this case the code was correctly avoiding memory leaks but still triggering the warning. Change-ID: I186dd493948e9b7254ab0593d4aad8b68808918d Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-08 14:59:07 -07:00
Marcel Holtmann	f640ee98bb	Bluetooth: Fix basic debugfs entries for unconfigured controllers When the controller is unconfigured (for example it does not have a valid Bluetooth address), then the basic debugfs entries for dut_mode and vendor_diag are not creates. Ensure they are created in __hci_init and also __hci_unconf_init functions. One of them is called during setup stage of a new controller. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-08 15:33:18 +03:00
David S. Miller	df71842325	Merge branch 'bpf_random32' Daniel Borkmann says: ==================== BPF/random32 updates BPF update to split the prandom state apart, and to move the *once helpers to the core. For details, please see individual patches. Given the changes and since it's in the tree for quite some time, net-next is a better choice in our opinion. v1 -> v2: - Make DO_ONCE() type-safe, remove the kvec helper. Credits go to Alexei Starovoitov for the __VA_ARGS__ hint, thanks! - Add a comment to the DO_ONCE() helper as suggested by Alexei. - Rework prandom_init_once() helper to the new API. - Keep Alexei's Acked-by on the last patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 05:26:44 -07:00
Daniel Borkmann	3ad0040573	bpf: split state from prandom_u32() and consolidate {c, e}BPF prngs While recently arguing on a seccomp discussion that raw prandom_u32() access shouldn't be exposed to unpriviledged user space, I forgot the fact that SKF_AD_RANDOM extension actually already does it for some time in cBPF via commit `4cd3675ebf` ("filter: added BPF random opcode"). Since prandom_u32() is being used in a lot of critical networking code, lets be more conservative and split their states. Furthermore, consolidate eBPF and cBPF prandom handlers to use the new internal PRNG. For eBPF, bpf_get_prandom_u32() was only accessible for priviledged users, but should that change one day, we also don't want to leak raw sequences through things like eBPF maps. One thought was also to have own per bpf_prog states, but due to ABI reasons this is not easily possible, i.e. the program code currently cannot access bpf_prog itself, and copying the rnd_state to/from the stack scratch space whenever a program uses the prng seems not really worth the trouble and seems too hacky. If needed, taus113 could in such cases be implemented within eBPF using a map entry to keep the state space, or get_random_bytes() could become a second helper in cases where performance would not be critical. Both sides can trigger a one-time late init via prandom_init_once() on the shared state. Performance-wise, there should even be a tiny gain as bpf_user_rnd_u32() saves one function call. The PRNG needs to live inside the BPF core since kernels could have a NET-less config as well. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Chema Gonzalez <chema@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 05:26:39 -07:00
Daniel Borkmann	897ece56e7	random32: add prandom_init_once helper for own rngs Add a prandom_init_once() facility that works on the rnd_state, so that users that are keeping their own state independent from prandom_u32() can initialize their taus113 per cpu states. The motivation here is similar to net_get_random_once(): initialize the state as late as possible in the hope that enough entropy has been collected for the seeding. prandom_init_once() makes use of the recently introduced prandom_seed_full_state() helper and is generic enough so that it could also be used on fast-paths due to the DO_ONCE(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 05:26:38 -07:00
Daniel Borkmann	0dd50d1b0c	random32: add prandom_seed_full_state helper Factor out the full reseed handling code that populates the state through get_random_bytes() and runs prandom_warmup(). The resulting prandom_seed_full_state() will be used later on in more than the current __prandom_reseed() user. Fix also two minor whitespace issues along the way. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 05:26:37 -07:00
Hannes Frederic Sowa	c90aeb9482	once: make helper generic for calling functions once Make the get_random_once() helper generic enough, so that functions in general would only be called once, where one user of this is then net_get_random_once(). The only implementation specific call is to get_random_bytes(), all the rest of this *_once() facility would be duplicated among different subsystems otherwise. The new DO_ONCE() helper will be used by prandom() later on, but might also be useful for other scenarios/subsystems as well where a one-time initialization in often-called, possibly fast path code could occur. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 05:26:36 -07:00
Hannes Frederic Sowa	46234253b9	net: move net_get_random_once to lib There's no good reason why users outside of networking should not be using this facility, f.e. for initializing their seeds. Therefore, make it accessible from there as get_random_once(). Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 05:26:35 -07:00
Alexander Aring	4d6a6aed22	6lowpan: move shared settings to lowpan_netdev_setup This patch moves values for all lowpan interface to the shared implementation of 6lowpan. This patch also quietly fixes the forgotten IFF_NO_QUEUE flag for the bluetooth 6LoWPAN interface. An identically commit is `4afbc0d` ("net: 6lowpan: convert to using IFF_NO_QUEUE") which wasn't changed for bluetooth 6lowpan. All 6lowpan interfaces should be virtual with IFF_NO_QUEUE, using EUI64 address length, the mtu size is 1280 (IPV6_MIN_MTU) and the netdev type is ARPHRD_6LOWPAN. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-08 14:25:34 +02:00
David Ahern	28335a7445	net: Do not drop to make_route if oif is l3mdev Commit `deaa0a6a93` ("net: Lookup actual route when oif is VRF device") exposed a bug in __ip_route_output_key_hash for VRF devices: on FIB lookup failure if the oif is specified the current logic drops to make_route on the assumption that the route tables are wrong. For VRF/L3 master devices this leads to wrong dst entries and route lookups. For example: $ ip route ls table vrf-red unreachable default broadcast 10.2.1.0 dev eth1 proto kernel scope link src 10.2.1.2 10.2.1.0/24 dev eth1 proto kernel scope link src 10.2.1.2 local 10.2.1.2 dev eth1 proto kernel scope host src 10.2.1.2 broadcast 10.2.1.255 dev eth1 proto kernel scope link src 10.2.1.2 $ ip route get oif vrf-red 1.1.1.1 1.1.1.1 dev vrf-red src 10.0.0.2 cache With this patch: $ ip route get oif vrf-red 1.1.1.1 RTNETLINK answers: No route to host which is the correct response based on the default route Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 05:18:47 -07:00
Daniel Borkmann	cfc81b5038	bpf, skb_do_redirect: clear sender_cpu before xmit Similar to commit `c29390c6df` ("xps: must clear sender_cpu before forwarding"), we also need to clear the skb->sender_cpu when moving from RX to TX via skb_do_redirect() due to the shared location of napi_id (used on RX) and sender_cpu (used on TX). Fixes: `27b29f6305` ("bpf: add bpf_redirect() helper") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 05:03:08 -07:00
Arnd Bergmann	dfdd7230c5	net: hns: fix 32-bit build warning The recently added hns driver causes a build warning in ARM allmodconfig builds: drivers/net/ethernet/hisilicon/hns/hnae.c: In function 'handles_show': drivers/net/ethernet/hisilicon/hns/hnae.c:452:13: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] j, (u64)h->qs[i]->io_base); ^ This removes the pointless cast and prints the pointer address using the "%p" format string in all three locations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:54:15 -07:00
Jon Ringle	d70e53262f	net: Microchip encx24j600 driver This ethernet driver supports the Micorchip enc424j600/626j600 Ethernet controller over a SPI bus interface. This driver makes use of the regmap API to optimize access to registers by caching registers where possible. Datasheet: http://ww1.microchip.com/downloads/en/DeviceDoc/39935b.pdf Signed-off-by: Jon Ringle <jringle@gridpoint.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:49:55 -07:00
David S. Miller	494f8eb9b6	Merge branch 'broadcom-iproc' Arun Parameswaran says: ==================== Add support for Broadcom's iProc MDIO and Cygnus Ethernet PHY This patchset adds support for the iProc MDIO interface and the Broadcom Cygnus SoC's internal Ethernet PHY. The internal Ethernet PHY(s) in the Cygnus SoC's are accessed via the MDIO interface found in most of the iProc based chips. The patch also consolidates the common API's used by the Broadcom phys to a common library. Existing Broadcom phy drivers have been modified to use the common library API's. This patch series is based on Linux v4.3-rc1 and is avaliable in: https://github.com/Broadcom/cygnus-linux/tree/cygnus-net-phy-mdio-v3 The Ethernet driver for the iProc family will be submitted soon, as will the device tree configurations for the different iProc family SoCs. Changes from v2: - Modified drivers/net/phy/Kconfig to modify the BCM_CYGNUS_PHY driver to 'depends on MDIO_BCM_IPROC' instead of 'select'. - Added github branch to the cover letter Changes from v1: - Updated device tree documentation for the iProc MDIO driver based on Florian's feedback. - Moved the core register defines from the Cygnus PHY driver to 'include/linux/brcmphy.h' based on Florian's feedback. - Created a new patch/commit to modify the bcm7xxx phy driver to use the new core register defines. - Modified the Kconfig entry for the Broadcom PHY library to 'tristate' instead of 'bool' ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:46:03 -07:00
Arun Parameswaran	9200c27a1c	net: phy: bcm7xxx: Modified to use global core register defines Modified the bcm7xxx phy driver to remove local core register defines and use the common ones from "include/linux/brcmphy.h" Signed-off-by: Arun Parameswaran <arunp@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:45:53 -07:00
Arun Parameswaran	8e185d6997	net: phy: Broadcom Cygnus internal Etherent PHY driver Add support for the Broadcom Cygnus SoCs internal PHY's. The PHYs are 1000M/100M/10M capable with support for 'EEE' and 'APD' (Auto Power Down). This driver supports the following Broadcom Cygnus SoCs: - BCM583XX (BCM58300, BCM58302, BCM58303, BCM58305) - BCM113XX (BCM11300, BCM11320, BCM11350, BCM11360) The PHY's on these SoC's require some workarounds for stable operation, both during configuration time and during suspend/resume. This driver handles the application of the workarounds. Signed-off-by: Arun Parameswaran <arunp@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:45:52 -07:00
Arun Parameswaran	a1cba5613e	net: phy: Add Broadcom phy library for common interfaces This patch adds the Broadcom phy library to consolidate common interfaces shared by Broadcom phy's. Moved the common interfaces to the 'bcm-phy-lib.c' and updated the Broadcom PHY drivers to use the new APIs. Signed-off-by: Arun Parameswaran <arunp@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:45:46 -07:00
Arun Parameswaran	ddc24ae1fd	net: phy: Broadcom iProc MDIO bus driver This patch adds support for the Broadcom iProc MDIO bus interface. The MDIO interface can be found in the Broadcom iProc family Soc's. The MDIO bus is accessed using a combination of command and data registers. This MDIO driver provides access to the Etherent GPHY's connected to the MDIO bus. Signed-off-by: Arun Parameswaran <arunp@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:44:46 -07:00
Arun Parameswaran	bb257c3813	dt-bindings: net: Broadcom iProc MDIO bus driver device tree binding Add device tree binding documentation for the Broadcom iProc MDIO bus driver. Signed-off-by: Arun Parameswaran <arunp@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:44:44 -07:00
David S. Miller	91d2f14bc3	Merge branch 'net/rds/4.3-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux Santosh Shilimkar says: ==================== RDS: connection scalability and performance improvements [v4] Re-sending the same patches from v3 again since my repost of patch 05/14 from v3 was whitespace damaged. [v3] Updated patch "[PATCH v2 05/14] RDS: defer the over_batch work to send worker" as per David Miller's comment [4] to avoid the magic value usage. Patch now makes use of already available but unused send_batch_count module parameter. Rest of the patches are same as earlier version v2 [3] [v2]: Dropped "[PATCH 05/15] RDS: increase size of hash-table to 8K" from earlier version [1]. I plan to address the hash table scalability using re-sizable hash tables as suggested by David Laight and David Miller [2] This series addresses RDS connection bottlenecks on massive workloads and improve the RDMA performance almost by 3X. RDS TCP also gets a small gain of about 12%. RDS is being used in massive systems with high scalability where several hundred thousand end points and tens of thousands of local processes are operating in tens of thousand sockets. Being RC(reliable connection), socket bind and release happens very often and any inefficiencies in bind hash look ups hurts the overall system performance. RDS bin hash-table uses global spin-lock which is the biggest bottleneck. To make matter worst, it uses rcu inside global lock for hash buckets. This is being addressed by simply using per bucket rw lock which makes the locking simple and very efficient. The hash table size is still an issue and I plan to address it by using re-sizable hash tables as suggested on the list. For RDS RDMA improvement, the completion handling is revamped so that we can do batch completions. Both send and receive completion handlers are split logically to achieve the same. RDS 8K messages being one of the key usecase, mr pool is adapted to have the 8K mrs along with default 1M mrs. And while doing this, few fixes and couple of bottlenecks seen with rds_sendmsg() are addressed. Series applies against 4.3-rc1 as well net-next. Its tested on Oracle hardware with IB fabric for both bcopy as well as RDMA mode. RDS TCP is tested with iXGB NIC. Like last time, iWARP transport is untested with these changes. The patchset is also available at below git repo: git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux.git net/rds/4.3-v3 As a side note, the IB HCA driver I used for testing misses at least 3 important patches in upstream to see the full blown IB performance and am hoping to get that in mainline with help of them. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:38:37 -07:00
David S. Miller	6b92d0c4a6	Merge branch 'pass_net_through_output_path' Eric W. Biederman says: ==================== net: Pass net through the output path v2 This is the next installment of my work to pass struct net through the output path so the code does not need to guess how to figure out which network namespace it is in, and ultimately routes can have output devices in another network namespace. The first patch in this series is a fix for a bug that came in when sk was passed through the functions in the output path, and as such is probably a candidate for net. At the same time my later patches depend on it so sending the fix separately would be confusing. The second patch in this series is another fix that for an issue that came in when sk was passed through the output path. I don't think it needs a backport as I don't think anyone uses the path where the code was incorrect. The rest of the patchset focuses on the path from xxx_local_out to dst_output and in the end succeeds in passing sock_net(sk) from the socket a packet locally originates on to the dst->output function. Given the size reduction in the code I think this counts as a cleanup as much as feature work. There remain a number of helper functions (like ip option processing) to take care of before the network stack can support destination devices in other network namespaces but with this set of changes the backbone of the work is done. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:27:13 -07:00
Eric W. Biederman	ede2059dba	dst: Pass net into dst->output The network namespace is already passed into dst_output pass it into dst->output lwt->output and friends. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:27:03 -07:00
Eric W. Biederman	33224b16ff	ipv4, ipv6: Pass net into ip_local_out and ip6_local_out Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:27:02 -07:00
Eric W. Biederman	cf91a99daa	ipv4, ipv6: Pass net into __ip_local_out and __ip6_local_out Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:27:02 -07:00
Eric W. Biederman	57c4bf859c	ipvlan: Cache net in ipvlan_process_v4_outbound and ipvlan_process_v6_outbound Compute net once in ipvlan_process_v4_outbound and ipvlan_process_v6_outbound and store it in a variable so that net does not need to be recomputed next time it is used. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:27:01 -07:00
Eric W. Biederman	a7093fefa5	ppp: Cache net in pptp_xmit Compute net and store it in a variable in pptp_xmit, so that the value can be reused the next time it is needed. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:27:00 -07:00
Eric W. Biederman	77589ce0f8	ipv4: Cache net in ip_build_and_send_pkt and ip_queue_xmit Compute net and store it in a variable in the functions ip_build_and_send_pkt and ip_queue_xmit so that it does not need to be recomputed next time it is needed. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:59 -07:00
Eric W. Biederman	f859b0f662	ipv4: Cache net in iptunnel_xmit Store net in a variable in ip_tunnel_xmit so it does not need to be recomputed when it is used again. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:59 -07:00
Eric W. Biederman	792883303c	ipv6: Merge ip6_local_out and ip6_local_out_sk Stop hidding the sk parameter with an inline helper function and make all of the callers pass it, so that it is clear what the function is doing. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:58 -07:00
Eric W. Biederman	9f8955cc46	ipv6: Merge __ip6_local_out and __ip6_local_out_sk Only __ip6_local_out_sk has callers so rename __ip6_local_out_sk __ip6_local_out and remove the previous __ip6_local_out. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:58 -07:00
Eric W. Biederman	e2cb77db08	ipv4: Merge ip_local_out and ip_local_out_sk It is confusing and silly hiding a parameter so modify all of the callers to pass in the appropriate socket or skb->sk if no socket is known. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:57 -07:00
Eric W. Biederman	b92dacd456	ipv4: Merge __ip_local_out and __ip_local_out_sk Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:57 -07:00
Eric W. Biederman	4ebdfba73c	dst: Pass a sk into .local_out For consistency with the other similar methods in the kernel pass a struct sock into the dst_ops .local_out method. Simplifying the socket passing case is needed a prequel to passing a struct net reference into .local_out. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:55 -07:00
Eric W. Biederman	13206b6bff	net: Pass net into dst_output and remove dst_output_okfn Replace dst_output_okfn with dst_output Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:54 -07:00
Eric W. Biederman	3f5312ae62	xfrm: Only compute net once in xfrm_policy_queue_process Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:53 -07:00
Eric W. Biederman	850dcc4d4d	ipv4: Fix ip_queue_xmit to pass sk into ip_local_out_sk After a packet has been encapsulated by a tunnel we should use the tunnel sockets local multicast loopback flag to control if the encapsulated packet should be locally loopback back. Pass sk into ip_local_out_sk so that in the rare case we are dealing with a tunneled packet whose tunnel destination address is a multicast address the kernel properly decides to loopback this packet. In practice I don't think this matters as ip_queue_xmit is used by tcp, l2tp and sctp none of which I am aware of uses ip level multicasting as they are all point to point communications protocols. Let's fix this before someone uses ip_queue_xmit for a tunnel protocol that does use multicast. Fixes: `aad88724c9` ("ipv4: add a sock pointer to dst->output() path.") Fixes: `b0270e9101` ("ipv4: add a sock pointer to ip_queue_xmit()") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:52 -07:00

... 3 4 5 6 7 ...

548650 commits