linux

mirror of https://github.com/torvalds/linux synced 2024-11-05 18:23:50 +00:00

Author	SHA1	Message	Date
Asbjørn Sloth Tønnesen	7ff516ffe4	net: l2tp: only set L2TP_ATTR_UDP_CSUM if AF_INET Only set L2TP_ATTR_UDP_CSUM in l2tp_nl_tunnel_send() when it's running over IPv4. This prepares the code to also have IPv6 specific attributes. Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 18:55:36 -05:00
Asbjørn Sloth Tønnesen	3f11ec045f	net: l2tp: change L2TP_ATTR_UDP_ZERO_CSUM6_{RX, TX} attribute types The attributes L2TP_ATTR_UDP_ZERO_CSUM6_RX and L2TP_ATTR_UDP_ZERO_CSUM6_TX are used as flags, but is defined as a u8 in a comment. This patch redocuments them as flags. Adding nla_policy entries would break API, so not doing that. CC: Tom Herbert <therbert@google.com> Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 18:55:36 -05:00
Eric Dumazet	d61d072e87	net-gro: avoid reorders Receiving a GSO packet in dev_gro_receive() is not uncommon in stacked devices, or devices partially implementing LRO/GRO like bnx2x. GRO is implementing the aggregation the device was not able to do itself. Current code causes reorders, like in following case : For a given flow where sender sent 3 packets P1,P2,P3,P4 Receiver might receive P1 as a single packet, stored in GRO engine. Then P2-P4 are received as a single GSO packet, immediately given to upper stack, while P1 is held in GRO engine. This patch will make sure P1 is given to upper stack, then P2-P4 immediately after. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 18:48:54 -05:00
David S. Miller	8e6e596b06	Merge branch 'sfc-udp-rss' Edward Cree says: ==================== sfc: enable 4-tuple UDP RSS hashing EF10 based NICs have configurable RSS hash fields, and can be made to take the ports into the hash on UDP (they already do so for TCP). This patch series enables this, in order to improve spreading of UDP traffic. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:59:17 -05:00
Edward Cree	b718c88a62	sfc: report 4-tuple UDP hashing to ethtool, if it's enabled Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:59:17 -05:00
Edward Cree	a33a4c7381	sfc: enable 4-tuple RSS hashing for UDP This improves UDP spreading, and also slightly improves GRO performance of encapsulated TCP on 7000 series NICs. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:59:16 -05:00
David S. Miller	04b206b8a9	Merge branch 'mlx5-SRIOV-offload-tunnel_key-set-release' Saeed Mahameed says: ==================== Mellanox 100G SRIOV offloads tunnel_key set/release From Hadar Hen Zion: This series further enhances the SRIOV TC offloads of mlx5 to handle the TC tunnel_key release and set actions. This serves a common use-case in virtualization systems where the virtual switch encapsulate packets (tunnel_key set action) sent from VMs with outer headers corresponding to the local/remote host IPs and de-capsulate (tunnel_key release) outer headers before the packets are received by the VM. We use the new E-Switch switchdev mode and TC tunnel_key set/release action to achieve that also in SW defined SRIOV environments by offloading TC rules that contain these actions along with forwarding (TC mirred/redirect action) the packets. The first six patches are adding the needed support in flow dissector, flower and tc for offloading tunnel_key actions: - The first three patches are adding the needed help functions and enums - The next three patches in the series are adding UDP port attribute to tunnel_key release and set actions. The addition of UDP ports would allow the HW driver to make sure they are given (say) a VXLAN tunnel to offload (mlx5e uses that). Patches 7-10 are mlx5 preparations for tunnel_key actions offloads support. Patch #11 adds mlx5e support to offload tunnel_key release action, and the last two patches (#12-13) add mlx5e support to tc tunnel_key set action. Currently in order to offload tc tunnel_key release action, the tc rule should be placed on top of the mlx5e offloading (uplink) interface instead of the shared tunnel interface. The resolution between the tunnel interface to the HW netdevice will be implemented in a follow up series. This series was generated against commit `94edc86bf1` ("Merge branch 'dwmac-sti-refactor-cleanup'") ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:41:57 -05:00
Hadar Hen Zion	a54e20b4fc	net/mlx5e: Add basic TC tunnel set action for SRIOV offloads In mlx5 HW, encapsulation is offloaded by the steering rule having index into an encapsulation table containing the entire set of headers to be added by the HW. The driver sets these headers in a buffer when we are offloading the action. The code maintains mlx5_encap_entry for each encap header it has encountered when attempted to offload TC tunnel set action. This entry maintains a linked list of all the flows sharing the same encap header, when the last flow is removed from the list the encap entry is removed. The actual encap_header is allocated by the driver in the hardware only if we have layer two neighbour info when the encap entry is created. While the flow is in the driver, the driver holds a reference on the neighbour. When a new flow with encap action is inserted, the code first checks if the required encap entry exists according to the tunnel set parameters. If it does the encap is shared, otherwise a new mlx5_encap_entry is created. TC action parsing implementation in the driver assumes that tunnel set action is provided in the same order set by the user, e.g before the mirred_redirect action. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:41:57 -05:00
Hadar Hen Zion	4a25730eb2	net/mlx5e: Add ndo_udp_tunnel_add to VF representors By implementing this ndo, the host stack will set the vxlan udp port also to VF representor netdevices. This will allow the TC offload code in the driver when it gets a tunnel key set action to identify the UDP port as vxlan, and hence the rule will be a candidate for offloading. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:41:56 -05:00
Hadar Hen Zion	bbd00f7e23	net/mlx5e: Add TC tunnel release action for SRIOV offloads Enhance the parsing of offloaded TC rules to set HW matching on outer (encapsulation) headers. Parse TC tunnel release action and set it as mlx5 decap action when the required capabilities are supported. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:41:56 -05:00
Hadar Hen Zion	66958ed906	net/mlx5: Support encap id when setting new steering entry In order to support steering rules which add encapsulation headers, encap_id parameter is needed. Add new mlx5_flow_act struct which holds action related parameter: action, flow_tag and encap_id. Use mlx5_flow_act struct when adding a new steering rule. This patch doesn't change any functionality. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:41:56 -05:00
Hadar Hen Zion	c9f1b073d0	net/mlx5: Add creation flags when adding new flow table When creating flow tables, allow the caller to specify creation flags. Currently no flags are used and as such this patch doesn't add any new functionality. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:41:56 -05:00
Hadar Hen Zion	43f93839e3	net/mlx5: Check max encap header size capability Instead of comparing to a const value, check the value of max encap header size capability as reported by the Firmware. Fixes: `575ddf5888` ('net/mlx5: Introduce alloc_encap and dealloc_encap commands') Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:41:55 -05:00
Hadar Hen Zion	ae9f83ac24	net/mlx5: Move alloc/dealloc encap commands declarations to common header file The alloc and dealloc encap commands will be used in the mlx5e driver, as such, declare them in a common header file. Also, rename the functions: mlx5_cmd_{de}alloc_encap is replaced with mlx5_encap_{de}alloc. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:41:55 -05:00
Hadar Hen Zion	75bfbca01e	net/sched: act_tunnel_key: Add UDP dst port option The current tunnel set action supports only IP addresses and key options. Add UDP dst port option. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:41:55 -05:00
Hadar Hen Zion	24ba898d43	net/dst: Add dst port to dst_metadata utility functions Add dst port parameter to __ip_tun_set_dst and __ipv6_tun_set_dst utility functions. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:41:54 -05:00
Hadar Hen Zion	f4d997fd61	net/sched: cls_flower: Add UDP port to tunnel parameters The current IP tunneling classification supports only IP addresses and key. Enhance UDP based IP tunneling classification parameters by adding UDP src and dst port. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:41:54 -05:00
Hadar Hen Zion	519d10521c	net/sched: cls_flower: Allow setting encapsulation fields as used key When encapsulation field is set, mark it as used key for the flow dissector. This will be used by offloading drivers. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:41:54 -05:00
Hadar Hen Zion	9ba6a9a9f7	flow_dissector: Add enums for encapsulation keys New encapsulation keys were added to the flower classifier, which allow classification according to outer (encapsulation) headers attributes such as key and IP addresses. In order to expose those attributes outside flower, add corresponding enums in the flow dissector. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:41:54 -05:00
Hadar Hen Zion	9ce183b4c4	net/sched: act_tunnel_key: add helper inlines to access tcf_tunnel_key Needed for drivers to pick the relevant action when offloading tunnel key act. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:41:53 -05:00
Lorenzo Colitti	35b80733b3	net: core: add missing check for uid_range in rule_exists. Without this check, it is not possible to create two rules that are identical except for their UID ranges. For example: root@net-test:/# ip rule add prio 1000 lookup 300 root@net-test:/# ip rule add prio 1000 uidrange 100-200 lookup 300 RTNETLINK answers: File exists root@net-test:/# ip rule add prio 1000 uidrange 100-199 lookup 100 root@net-test:/# ip rule add prio 1000 uidrange 200-299 lookup 200 root@net-test:/# ip rule add prio 1000 uidrange 300-399 lookup 100 RTNETLINK answers: File exists Tested: https://android-review.googlesource.com/#/c/299980/ Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:28:10 -05:00
Mintz, Yuval	bb48024284	qed: Prevent stack corruption on MFW interaction Driver uses a union for copying data to & from management firmware when interacting with it. Problem is that the function always copies sizeof(union) while commit `2edbff8dcb` ("qed: Learn resources from management firmware") is casting a union elements which is of smaller size [24-byte instead of 88-bytes]. Also, the union contains some inappropriate elements which increase its size [should have been 32-bytes]. While this shouldn't corrupt other PF messages to the MFW [as management firmware enforces permissions so that each PF is allowed to write only to its own mailbox] we fix this here as well. Fixes: `2edbff8dcb` ("qed: Learn resources from management firmware") Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:27:25 -05:00
Philippe Reynes	b12ab9b119	net: 3com: typhoon: fix typhoon_get_link_ksettings When moving from typhoon_get_settings to typhoon_getlink_ksettings in the commit `f7a5537cd2` ("net: 3com: typhoon: use new api ethtool_{get\|set}_link_ksettings"), we use a local variable supported but we forgot to update the struct ethtool_link_ksettings with this value. We also initialize advertising to zero, because otherwise it may be uninitialized if no case of the switch (tp->xcvr_select) is used. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:25:14 -05:00
Philippe Reynes	90fdd04e2c	net: xgbe: use new api ethtool_{get\|set}_link_ksettings The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:25:13 -05:00
Philippe Reynes	ea74df816f	net: amd: pcnet32: use new api ethtool_{get\|set}_link_ksettings The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:25:13 -05:00
Philippe Reynes	1435003c2c	net: amd8111e: use new api ethtool_{get\|set}_link_ksettings The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:25:13 -05:00
Philippe Reynes	d17970d746	net: alteon: acenic: use new api ethtool_{get\|set}_link_ksettings The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Acked-by: Jes Sorensen <Jes.Sorensen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:25:12 -05:00
Philippe Reynes	f1cd5aa078	net: adaptec: starfire: use new api ethtool_{get\|set}_link_ksettings The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:25:12 -05:00
David S. Miller	35887d3217	Merge branch 'stmmac-dwmac-rk-PM' Joachim Eastwood says: ==================== stmmac: dwmac-rk: convert to standard PM/remove functions This patch set aims to remove the init/exit callbacks from the dwmac-rk driver and instead use standard PM callbacks. Eventually the init/exit callbacks will be deprecated and removed from all drivers dwmac-* except for dwmac-generic. Drivers will be refactored to use standard PM and remove callbacks. This conversion was pretty straight forward, but it would really nice if some chromium people could test suspend/resume with this patch set. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:21:25 -05:00
Joachim Eastwood	5a3c7805c4	Revert "net: stmmac: allow to split suspend/resume from init/exit callbacks" Instead of adding hooks inside stmmac_platform it is better to just use the standard PM callbacks within the specific dwmac-driver. This only used by the dwmac-rk driver. This reverts commit `cecbc5563a` ("stmmac: allow to split suspend/resume from init/exit callbacks"). Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:21:25 -05:00
Joachim Eastwood	07a5e76924	stmmac: dwmac-rk: absorb rk_gmac_init into probe Since the rk_gmac_init() only calls another function move this function call into probe so rk_gmac_init() can be removed. Since commit `cecbc5563a` ("stmmac: allow to split suspend/resume from init/exit callbacks") the init hook is no longer used in dwmac-rk so this can be removed. Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:21:24 -05:00
Joachim Eastwood	0de8c4c9a9	stmmac: dwmac-rk: turn exit into standard driver remove callback Convert the exit hook into a standard driver remove function as the hook doesn't really buy us anything extra. Eventually the exit hook will be deprecated in favor of the driver remove function. Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:21:24 -05:00
Joachim Eastwood	5619468a41	stmmac: dwmac-rk: turn resume/suspend into standard PM callbacks Use standard PM resume/suspend callbacks instead of the hooks in stmmac_platform. This gives the driver more control and flexibility when implementing PM functionality. The hooks in stmmac_platform also doesn't buy us anything extra. Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:21:24 -05:00
David S. Miller	c68d7f1b63	Merge branch 'tcp_get_info-locking' Eric Dumazet says: ==================== tcp: tcp_get_info() locking changes This short series prepares tcp_get_info() for more detailed infos. In order to not slow down fast path, our goal is to use the normal socket spinlock instead of custom synchronization. All we need to ensure is that tcp_get_info() is not called with ehash lock, which might dead lock, since packet processing would acquire the spinlocks in reverse way. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:02:28 -05:00
Eric Dumazet	67db3e4bfb	tcp: no longer hold ehash lock while calling tcp_get_info() We had various problems in the past in tcp_get_info() and used specific synchronization to avoid deadlocks. We would like to add more instrumentation points for TCP, and avoiding grabing socket lock in tcp_getinfo() was too costly. Being able to lock the socket allows to provide consistent set of fields. inet_diag_dump_icsk() can make sure ehash locks are not held any more when tcp_get_info() is called. We can remove syncp added in commit `d654976cbf` ("tcp: fix a potential deadlock in tcp_get_info()"), but we need to use lock_sock_fast() instead of spin_lock_bh() since TCP input path can now be run from process context. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:02:27 -05:00
Eric Dumazet	ccbf3bfaee	tcp: shortcut listeners in tcp_get_info() Being lockless in tcp_get_info() is hard, because we need to add specific synchronization in TCP fast path, like seqcount. Following patch will change inet_diag_dump_icsk() to no longer hold any lock for non listeners, so that we can properly acquire socket lock in get_tcp_info() and let it return more consistent counters. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:02:27 -05:00
David S. Miller	721ad32144	Merge branch 'Meson-GXL-internal-phy' Neil Armstrong says: ==================== ARM64: Add Internal PHY support for Meson GXL The Amlogic Meson GXL SoCs have an internal RMII PHY that is muxed with the external RGMII pins. In order to support switching between the two PHYs links, extended registers size for mdio-mux-mmioreg must be added. The DT related patches submitted as RFC in [3] will be sent in a separate patchset due to multiple patchsets and DTSI migrations. Changes since v2 RFC patchset at : [3] - Change phy Kconfig/Makefile alphabetic order - GXL dtsi cleanup Changes since original RFC patchset at : [2] - Remove meson8b experimental phy switching - Switch to mdio-mux-mmioreg with extennded size support - Add internal phy support for S905x and p231 - Add external PHY support for p230 [1] http://lkml.kernel.org/r/1477932286-27482-1-git-send-email-narmstrong@baylibre.com [2] http://lkml.kernel.org/r/1477060838-14164-1-git-send-email-narmstrong@baylibre.com [3] http://lkml.kernel.org/r/1477932987-27871-1-git-send-email-narmstrong@baylibre.com ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 12:50:56 -05:00
Neil Armstrong	7334b3e47a	net: phy: Add Meson GXL Internal PHY driver Add driver for the Internal RMII PHY found in the Amlogic Meson GXL SoCs. This PHY seems to only implement some standard registers and need some workarounds to provide autoneg values from vendor registers. Some magic values are currently used to configure the PHY, and this a temporary setup until clarification about these registers names and registers fields are provided by Amlogic. Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 12:50:55 -05:00
Neil Armstrong	9a4c803748	net: mdio-mux-mmioreg: Add support for 16bit and 32bit register sizes In order to support PHY switching on Amlogic GXL SoCs, add support for 16bit and 32bit registers sizes. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 12:50:55 -05:00
David S. Miller	ddc5e15729	Merge branch 'rds-tcp-fixes' Sowmini Varadhan says: ==================== RDS: TCP: bug fixes A couple of bug fixes identified during testing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 12:47:50 -05:00
Sowmini Varadhan	117d15bbfd	RDS: TCP: start multipath acceptor loop at 0 The for() loop in rds_tcp_accept_one() assumes that the 0'th rds_tcp_conn_path is UP and starts multipath accepts at index 1. But this assumption may not always be true: if the 0'th path has failed (ERROR or DOWN state) an incoming connection request should be used to resurrect this path. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 12:47:49 -05:00
Sowmini Varadhan	1ac507d4ff	RDS: TCP: report addr/port info based on TCP socket in rds-info The socket argument passed to rds_tcp_tc_info() is a PF_RDS socket, so it is incorrect to report the address port info based on rds_getname() as part of TCP state report. Invoke inet_getname() for the t_sock associated with the rds_tcp_connection instead. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 12:47:49 -05:00
Soheil Hassas Yeganeh	f5f99309fa	sock: do not set sk_err in sock_dequeue_err_skb Do not set sk_err when dequeuing errors from the error queue. Doing so results in: a) Bugs: By overwriting existing sk_err values, it possibly hides legitimate errors. It is also incorrect when local errors are queued with ip_local_error. That happens in the context of a system call, which already returns the error code. b) Inconsistent behavior: When there are pending errors on the error queue, sk_err is sometimes 0 (e.g., for the first timestamp on the error queue) and sometimes set to an error code (after dequeuing the first timestamp). c) Suboptimality: Setting sk_err to ENOMSG on simple TX timestamps can abort parallel reads and writes. Removing this line doesn't break userspace. This is because userspace code cannot rely on sk_err for detecting whether there is something on the error queue. Except for ICMP messages received for UDP and RAW, sk_err is not set at enqueue time, and as a result sk_err can be 0 while there are plenty of errors on the error queue. For ICMP packets in UDP and RAW, sk_err is set when they are enqueued on the error queue, but that does not result in aborting reads and writes. For such cases, sk_err is only readable via getsockopt(SO_ERROR) which will reset the value of sk_err on its own. More importantly, prior to this patch, recvmsg(MSG_ERRQUEUE) has a race on setting sk_err (i.e., sk_err is set by sock_dequeue_err_skb without atomic ops or locks) which can store 0 in sk_err even when we have ICMP messages pending. Removing this line from sock_dequeue_err_skb eliminates that race. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-07 20:29:10 -05:00
David S. Miller	5f7f75027f	Merge branch 'IFF_NO_QUEUE-semantics' Jesper Dangaard Brouer says: ==================== qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices This patchset is a cleanup for IFF_NO_QUEUE devices. It will hopefully help userspace get a more consistent behavior when attaching qdisc to such virtual devices. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-07 20:15:56 -05:00
Jesper Dangaard Brouer	84c46dd865	qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device It is a clear misconfiguration to attach a qdisc to a device with tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred, htb, plug and sfb) inherit/copy this value as their queue length. Why should the kernel catch such a misconfiguration? Because prior to introducing the IFF_NO_QUEUE device flag, userspace found a loophole in the qdisc config system that allowed them to achieve the equivalent of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from a device. The loophole on older kernels is setting tx_queue_len=0, prior to device qdisc init (the config time is significant, simply setting tx_queue_len=0 doesn't trigger the loophole). This loophole is currently used by Docker[1] to get better performance and scalability out of the veth device. The Docker developers were warned[1] that they needed to adjust the tx_queue_len if ever attaching a qdisc. The OpenShift project didn't remember this warning and attached a qdisc, this were caught and fixed in[2]. [1] https://github.com/docker/libcontainer/pull/193 [2] https://github.com/openshift/origin/pull/11126 Instead of fixing every userspace program that used this loophole, and forgot to reset the tx_queue_len, prior to attaching a qdisc. Let's catch the misconfiguration on the kernel side. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-07 20:15:55 -05:00
Jesper Dangaard Brouer	1159708432	net/qdisc: IFF_NO_QUEUE drivers should use consistent TX queue len The flag IFF_NO_QUEUE marks virtual device drivers that doesn't need a default qdisc attached, given they will be backed by physical device, that already have a qdisc attached for pushback. It is still supported to attach a qdisc to a IFF_NO_QUEUE device, as this can be useful for difference policy reasons (e.g. bandwidth limiting containers). For this to work, the tx_queue_len need to have a sane value, because some qdiscs inherit/copy the tx_queue_len (namely, pfifo, bfifo, gred, htb, plug and sfb). Commit `a813104d92` ("IFF_NO_QUEUE: Fix for drivers not calling ether_setup()") caught situations where some drivers didn't initialize tx_queue_len. The problem with the commit was choosing 1 as the fallback value. A qdisc queue length of 1 causes more harm than good, because it creates hard to debug situations for userspace. It gives userspace a false sense of a working config after attaching a qdisc. As low volume traffic (that doesn't activate the qdisc policy) works, like ping, while traffic that e.g. needs shaping cannot reach the configured policy levels, given the queue length is too small. This patch change the value to DEFAULT_TX_QUEUE_LEN, given other IFF_NO_QUEUE devices (that call ether_setup()) also use this value. Fixes: `a813104d92` ("IFF_NO_QUEUE: Fix for drivers not calling ether_setup()") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-07 20:15:55 -05:00
Jesper Dangaard Brouer	d0a81f67cd	net: make default TX queue length a defined constant The default TX queue length of Ethernet devices have been a magic constant of 1000, ever since the initial git import. Looking back in historical trees[1][2] the value used to be 100, with the same comment "Ethernet wants good queues". The commit[3] that changed this from 100 to 1000 didn't describe why, but from conversations with Robert Olsson it seems that it was changed when Ethernet devices went from 100Mbit/s to 1Gbit/s, because the link speed increased x10 the queue size were also adjusted. This value later caused much heartache for the bufferbloat community. This patch merely moves the value into a defined constant. [1] https://git.kernel.org/cgit/linux/kernel/git/davem/netdev-vger-cvs.git/ [2] https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/ [3] https://git.kernel.org/tglx/history/c/98921832c232 Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-07 20:15:55 -05:00
David S. Miller	fc13fd3986	Merge branch 'udp-fwd-mem-sched-on-dequeue' Paolo Abeni says: ==================== udp: do fwd memory scheduling on dequeue After commit `850cbaddb5` ("udp: use it's own memory accounting schema"), the udp code needs to acquire twice the receive queue spinlock on dequeue. This patch series remove the need for the second lock at skb free time, moving the udp memory scheduling inside the dequeue operation; the skb destructor field is not used anymore and an additional sk argument is added to ip_cmsg_recv_offset() to cope with null skb->sk after dequeue. Many thanks to Eric Dumazed for suggesting pretty all much the above. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-07 13:24:42 -05:00
Paolo Abeni	7c13f97ffd	udp: do fwd memory scheduling on dequeue A new argument is added to __skb_recv_datagram to provide an explicit skb destructor, invoked under the receive queue lock. The UDP protocol uses such argument to perform memory reclaiming on dequeue, so that the UDP protocol does not set anymore skb->desctructor. Instead explicit memory reclaiming is performed at close() time and when skbs are removed from the receive queue. The in kernel UDP protocol users now need to call a skb_recv_udp() variant instead of skb_recv_datagram() to properly perform memory accounting on dequeue. Overall, this allows acquiring only once the receive queue lock on dequeue. Tested using pktgen with random src port, 64 bytes packet, wire-speed on a 10G link as sender and udp_sink as the receiver, using an l4 tuple rxhash to stress the contention, and one or more udp_sink instances with reuseport. nr sinks vanilla patched 1 440 560 3 2150 2300 6 3650 3800 9 4450 4600 12 6250 6450 v1 -> v2: - do rmem and allocated memory scheduling under the receive lock - do bulk scheduling in first_packet_length() and in udp_destruct_sock() - avoid the typdef for the dequeue callback Suggested-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-07 13:24:41 -05:00
Paolo Abeni	ad959036a7	net/sock: add an explicit sk argument for ip_cmsg_recv_offset() So that we can use it even after orphaining the skbuff. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-07 13:24:41 -05:00

1 2 3 4 5 ...

634661 commits