Commit graph

738212 commits

Author SHA1 Message Date
Stephen Hemminger 82695b30ff inet: whitespace cleanup
Ran simple script to find/remove trailing whitespace and blank lines
at EOF because that kind of stuff git whines about and editors leave
behind.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 11:43:28 -05:00
Hernán Gonzalez 262c974015 emulex/benet: Constify *be_misconfig_evt_port_state[]
Note: This is compile only tested as I have no access to the hw.
No benefit gained except for some self-documenting.

add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
Function                                     old     new   delta
Total: Before=2757703, After=2757703, chg +0.00%

Signed-off-by: Hernán Gonzalez <hernan@vanguardiasur.com.ar>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 11:39:40 -05:00
Hernán Gonzalez 4f4aaa1720 qlogic/qed: Constify *pkt_type_str[]
Note: This is compile only tested as I have no access to the hw.
Constifying and declaring as static saves 24 bytes.

add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-24 (-24)
Function                                     old     new   delta
pkt_type_str                                  24       -     -24
Total: Before=3599256, After=3599232, chg -0.00%

Signed-off-by: Hernán Gonzalez <hernan@vanguardiasur.com.ar>
Acked-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 11:39:40 -05:00
David S. Miller 2824db741b Merge branch 'SFP-updates'
Russell King says:

====================
SFP updates

Included in this series are a further few updates for SFP support:

- Adding support for Fiberstore's non-standard BiDi modules operating
  at 1310nm/1550nm wavelengths rather than the 1000BASE-BX standard of
  1310nm/1490nm.
- Adding support for negotiating the PHY interface mode with the MAC,
  so that modules supporting faster speeds and Gigabit ethernet work
  with Gigabit-only MACs.
- Adding support for high power (>1W) SFP modules.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 11:07:12 -05:00
Jon Nettleton 3bb35261c7 sfp: add high power module support
This patch is the result of work by both Jon Nettleton and Russell King.
Jon wrote the original patch, adding support for SFP modules which
require a power level greater than '1'.

Russell's changes:
- Fix the power levels for big-endian, and make the code flow better.
- Convert to use device_property_read_u8()
- Warn for power levels exceeding host level
  SFF-8431 says:

  "To avoid exceeding system power supply limits and cooling capacity,
   all modules at power up by default shall operate with up to 1.0 W.
   Hosts supporting Power Level II or III operation may enable a Power
   Level II or III module through the 2-wire interface. Power Level II
   or III modules shall assert the power level declaration bit of
   SFF-8472."

  Print a warning for modules that exceed the host power level, and
  leave them operating in power level 1.

- Fix i2c write
  The first byte of any write after the bus address is always the
  device address.  In order to write a value to device D, address I,
  value V, we need to generate on the bus:

    S DDDDDDDD A IIIIIIII A VVVVVVVV A P

  where S = start, R = restart, A = ack, P = stop.  Splitting this
  as two:

    S DDDDDDDD A IIIIIIII A R DDDDDDDD A VVVVVVVV A P

  results in the device's address register being written first by I
  and then by V - the addressed register within the device is not
  written.

- Avoid power mode switching if 0xa2 is not implemented
  Some modules indicate that they support power level II or power level
  III, but do not implement address 0xa2, meaning that the bit to set
  them to high power mode is not accessible.

  These modules appear to have the sff8472_compliance field set to zero,
  and also do not implement diagnostics.  Detect this, but also ensure
  that the module does not require the address switching mode, which we
  do not implement.

- Use mW for power level rather than power level number.

- Fix high power mode transition
  We must not switch to SFP_MOD_PRESENT state until we have finished
  initialising, because the remaining state machines check for that
  state.  Add SFP_MOD_HPOWER as an intermediate state.

- Use definition for I2C register address rather than constant.

Signed-off-by: Jon Nettleton <jon@solid-run.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 11:07:11 -05:00
Russell King 66f5325ce9 dt-bindings: add maximum power level to SFP binding
Add the new maximum power level property to the SFP binding.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 11:07:11 -05:00
Russell King a9c79364df phylink,sfp: negotiate interface format with MAC
Negotiate the interface format with the MAC rather than requiring it to
be a fixed type specified solely by the SFP module.  This allows modules
that can work with several different interface signalling formats to
select a format compatible with the MAC - for example, a Fiber module
supporing Gigabit ethernet and faster connected to a Gigabit only MAC
needs to select the 1000BASE-X mode.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 11:07:11 -05:00
Russell King 03145864bd sfp: support 1G BiDi (eg, FiberStore SFP-GE-BX) modules
Some BiDi modules (eg, FiberStore SFP-GE-BX) are not compliant with
1000BASE-BX as they use different wavelengths from the 1000BASE-BX
standard (eg, 1310nm/1550nm rather than 1310nm/1490nm).  These modules
support 1000BASE-X ethernet, so detect them by a failure to find any
other support, the 8B10B encoding and a bit rate that falls within the
1Gbps window.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 11:07:11 -05:00
Ido Schimmel 44d15d930b team: Use extack to report enslavement failures
Use extack inside team's enslavement function and also propagate it to
the netdevice notifier to allow enslaved ports to report the failure
reason. Example:

$ teamd -t team0 -d -c '{"runner": {"name": "lacp"}}'
$ ip link set dev lo master team0
Error: Loopback device can't be added as a team port.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 11:01:30 -05:00
David S. Miller fb66cb0775 mlx5-update-2018-02-23 (IB representors)
From: Mark Bloch <markb@mellanox.com>
 =========
 Add IB representor when in switchdev mode
 
 The following series adds support for an IB (RAW Ethernet only) device
 representor which is created when the user switches to switchdev mode.
 
 Today when switching to switchdev mode the only representors which are
 created are net devices. Each netdev is a representor of a virtual
 function and any data sent via the representor is received on the virtual
 function, and any data sent via the virtual function is received by the
 representor.
 
 For the mlx5 driver the main use of this functionality is to be able to
 use Open vSwitch on the hypervisor in order to manage/control traffic
 from/to the virtual functions. Open vSwitch can also work with  DPDK
 devices and not just net devices, this series exposes an IB device, which
 Mellanox PMD driver uses, which then can be used by Open vSwitch DPDK.
 
 An IB device representor exposes only RAW Ethernet QP capabilities and
 the ability to create flow rules to direct traffic to its RX queues. The
 state of the IB device (ACTIVE/DOWN etc..) is based on the state of the
 corresponding net device representor. No other RDMA/RoCE functionality is
 currently supported and no GID table is exposed.
 =========
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJakH7zAAoJEEg/ir3gV/o+c/MIAMGGgNajr49+JP3t9wnrs011
 +cTfAfM88HBzTlfb/COEBz+jurH2oB7ZF4RZC29S+6pR3loKKBuvbiPndE0XKjSg
 Ue4sOkawybmDvfo9ZiMsusOiMfTp5wsLmqJP1HRUvGMAlSBeriMTZfbiKzx5c3Ok
 X8cMnRIvUOtCoQaJTfKarDUn4OF8aFam4tQW8k/RAo77kTPyihb1NlGiblrcCA2E
 PWYAOWW3D8gvE0cr19JVgEqpKIaJ/VRyjwQ7m8XSvfBJtw1ZTO6YMXiXbWMOsRzD
 fx33H+n/qwJT0cnxDmSpZrR7mEk+Wr2HL92O85KDupOSgLOIlywmtIIkEAnCeaw=
 =Fq6m
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

mlx5-update-2018-02-23 (IB representors)

From: Mark Bloch <markb@mellanox.com>
=========
Add IB representor when in switchdev mode

The following series adds support for an IB (RAW Ethernet only) device
representor which is created when the user switches to switchdev mode.

Today when switching to switchdev mode the only representors which are
created are net devices. Each netdev is a representor of a virtual
function and any data sent via the representor is received on the virtual
function, and any data sent via the virtual function is received by the
representor.

For the mlx5 driver the main use of this functionality is to be able to
use Open vSwitch on the hypervisor in order to manage/control traffic
from/to the virtual functions. Open vSwitch can also work with  DPDK
devices and not just net devices, this series exposes an IB device, which
Mellanox PMD driver uses, which then can be used by Open vSwitch DPDK.

An IB device representor exposes only RAW Ethernet QP capabilities and
the ability to create flow rules to direct traffic to its RX queues. The
state of the IB device (ACTIVE/DOWN etc..) is based on the state of the
corresponding net device representor. No other RDMA/RoCE functionality is
currently supported and no GID table is exposed.
=========

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 09:54:54 -05:00
David S. Miller 3f5a68300a Merge branch 'mlx4-misc'
Tariq Toukan says:

====================
mlx4_en misc for 4.17

This patchset contains misc enhancements from the team
to the mlx4 Eth driver.

Patch 1 by Eran adds physical layer counters.
Patch 2 by Eran cleans-up a redundant warn print.
Patch 3 combines the checks of two end cases into a single if statement.
Patch 4 takes common code structures out of the #ifdef, following your
comment on a previous patch.

Series generated against net-next commit:
f74290fdb3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:53:27 -05:00
Tariq Toukan a970d8dba5 net/mlx4_en: RX csum, pre-define enabled protocols for IP status masking
Pre-define a mask for IP status of a completion, that tests the
MLX4_CQE_STATUS_IPV6 only in case CONFIG_IPV6 is enabled.
Use it for IP status testing upon completion, instead of separating
the datapath into two flows.
This takes common code structures (such as closing parenthesis)
back to their original place, and makes code more readable.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:53:26 -05:00
Tariq Toukan 1cb8b1216c net/mlx4_en: Combine checks of end-cases in RX completion function
Combine two end-cases in the same if statement with a single return value.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:53:26 -05:00
Eran Ben Elisha 4f32e1c4a9 net/mlx4_en: Remove unnecessary warn print in reset config
In mlx4_en_reset_config, there was a redundant warn print that was left
from previous versions of this function. No warn is needed anymore.

This warn can be confusing when RX-FCS is changed:
Turn OFF RX-FCS:
  mlx4_en: eth1: Changing device configuration rx filter(0) rx vlan(1)
Turn ON RX-FCS:
  mlx4_en: eth1: Changing device configuration rx filter(0) rx vlan(1)

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:53:26 -05:00
Eran Ben Elisha f26d0d2543 net/mlx4_en: Add physical RX/TX bytes/packets counters
Add physical RX/TX packets/bytes counters into ethtool output to monitor
all traffic that was received and transmitted on the port. These
counters are available only for none Virtual Function.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:53:26 -05:00
David S. Miller 431c7ec3b3 Merge branch 'mlxsw-Offloading-encapsulated-SPAN'
Jiri Pirko says:

====================
mlxsw: Offloading encapsulated SPAN

Petr says:

This patch series introduces support for mirroring with GRE
encapsulation. It offloads tc action mirred mirror from a mlxsw port to
either a gretap or an ip6gretap netdevice.

Spectrum hardware needs to know all the details of the requested
encapsulation: source and destination MAC and IP addresses, details of
VLAN tagging, etc. The only variables are the encapsulated packet
itself, and TOS field, which may be inherited. To that end, mlxsw driver
resolves the route that encapsulated packets would take, queries the
corresponding neighbor, and with that configuration in hand, configures
the mirroring in the hardware.

The driver also hooks into event handlers for netdevice changes, FIB and
neighbor events, and reconsiders the configuration on each such change.
When the new configuration differs from the currently-offloaded one, the
existing offload is removed and replaced with a new one.

It is possible to mirror to {ip6,}gretap from a matchall rule as well as
from a flower match.

** Note that with this patch set, mlxsw build depends on NET_IPGRE and
   IPV6_GRE.

Current limitations:

- There has to be a route that directs packets to an mlxsw port. We
  intend to extend the logic to support other netdevice types in the
  future, but the eventual egress netdevice will have to be an mlxsw
  port in any case.

- Offload reconfiguration due to changes in netdevice configuration
  creates a window of time where packets are not mirrored. Under some
  circumstances this can be prevented by configuring an unused port
  analyzer and migrating mirrors over to that. However that's currently
  not implemented.

- Remote address of a tunnel device needs to be set, there may not be a
  GRE key, checksumming or sequence numbers, and TTL needs to be fixed
  (non-inherit). These are hard requirements imposed by the underlying
  hardware.

- TOS of a tunnel device needs to be "inherit". The hardware supports a
  fixed TOS, but that's currently not implemented.

The series start with two patches, #1 and #2, that publish one function
and add support for querying IPv6 tunnel parameters.

In patches #3 and #4, we introduce helpers to GRE and tunneling code
that we will use later in the patchset from the SPAN code.

Patches #5 and #6 introduce support for encapsulated SPAN in reg.h.

The following seven patches, #7-#13, then prepare the SPAN codebase for
introduction of mirroring to netdevices that don't correspond to front
panel ports.

Then #14 and #15 pull all this together to implement mirroring to
{ip6,}gretap netdevices.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:28 -05:00
Petr Machata 8f08a528de mlxsw: spectrum_span: Support mirror to ip6gretap
Similarly to mirror-to-gretap, this enables mirroring to IPv6 gretap
netdevice.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:28 -05:00
Petr Machata 27cf76fe60 mlxsw: spectrum_span: Support mirror to gretap
When a user requests mirror from a mlxsw physical port (possibly based
on an ACL match) to a gretap netdevice, the driver needs to resolve the
request to a particular physical port that the mirrored packets will
egress through, and a suite of configuration keys (importantly, IP and
MAC addresses). That means calling into routing and neighbor kernel code
to simulate the decisions made by the system for packets passing through
a gretap netdevice.

Add a new instance of mlxsw_sp_span_entry_ops to support this.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:28 -05:00
Petr Machata 52a6444cda mlxsw: Move a mirroring check to mlxsw_sp_span_entry_create
The check for whether a mirror port (which is a mlxsw front panel port)
belongs to the same mlxsw instance as the mirrored port, is currently
only done in spectrum_acl, even though it's applicable for the matchall
case as well. Thus move it to mlxsw_sp_span_entry_create().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:27 -05:00
Petr Machata 803335acbe mlxsw: Handle config changes pertinent to SPAN
For some netdevices, for which mlxsw offloads mirroring, may have a
complex relationship between the declared intent and low-level
device configuration.

Trying to accurately track which changes might influence offloading
decisions is finicky and error-prone. Instead, this patch introduces a
function mlxsw_sp_span_entry_respin, which re-queries the configuration
anew and, if different, removes the existing offloads and installs new
ones.

Call this function strategically at event handlers that might influence
the mirroring configuration.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:27 -05:00
Petr Machata 169b5d95c1 mlxsw: spectrum_span: Generalize SPAN support
To support mirroring to different device types, the functions that
partake in configuring the port analyzer need to be extended to admit
non-trivial SPAN types.

Create a structure where all details of SPAN configuration are kept,
struct mlxsw_sp_span_parms. Also create struct mlxsw_sp_span_entry_ops
to keep per-SPAN-type operations.

Instantiate the latter once for MLXSW_REG_MPAT_SPAN_TYPE_LOCAL_ETH, and
once for a suite of NOP callbacks used for invalidated SPAN entry. Put
the formet as a sole member of a new array mlxsw_sp_span_entry_types,
where all known SPAN types are kept. Introduce a new function,
mlxsw_sp_span_entry_ops(), to look up the right ops suite given a
netdevice.

Change mlxsw_sp_span_mirror_add() to use both parms and ops structures.
Change mlxsw_sp_span_entry_get() and mlxsw_sp_span_entry_create() to
take these as arguments. Modify mlxsw_sp_span_entry_configure() and
mlxsw_sp_span_entry_deconfigure() to dispatch to ops.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:27 -05:00
Petr Machata 079c9f393b mlxsw: spectrum: Keep mirror netdev in mlxsw_sp_span_entry
Currently the only mirror action supported by mlxsw is mirror to another
mlxsw physical port. Correspondingly, span_entry, which tracks each
mlxsw mirror in the system, currently holds a u8 number of the
destination port.

To extend this system to mirror to gretap and ip6gretap netdevices, have
struct mlxsw_sp_span_entry actually hold the destination netdevice
itself.

This change then trickles down in obvious manner to SPAN module API and
mirror-related interfaces in struct mlxsw_afa_ops.

To prevent use of invalid pointer, NETDEV_UNREGISTER needs to be hooked
and the corresponding SPAN entry invalidated.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:27 -05:00
Petr Machata 7b2ef81fd2 mlxsw: spectrum_span: Extract mlxsw_sp_span_entry_{de, }configure()
Configuring the hardware for encapsulated SPAN involves more code than
the simple mirroring case. Extract the related code to a separate
function to separate it from the rest of SPAN entry creation. Extract
deconfigure as well for symmetry, even though disablement is the same
regardless of SPAN type.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:27 -05:00
Petr Machata 3546b03ffc mlxsw: spectrum_span: Initialize span_entry.id eagerly
It is known statically ahead of time which SPAN entry will have which
ID. Just initialize it eagerly in mlxsw_sp_span_init(), don't wait until
the entry is actually created. This simplifies some code in
mlxsw_sp_span_entry_create()

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:27 -05:00
Petr Machata 98977089d8 mlxsw: span: Remove span_entry by span_id
Instead of removing span_entry by the port number, allow removing by
SPAN id. That simplifies some code right here, and for mirroring to soft
netdevices, avoids problems with netdevice pointer invalidation and
reuse.

Rename mlxsw_sp_span_entry_find() to mlxsw_sp_span_entry_find_by_port()
and keep it--follow-up patches will make use of it.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:26 -05:00
Petr Machata 1da93eb466 mlxsw: reg: Extend mlxsw_reg_mpat_pack()
To support encapsulated SPAN, extend mlxsw_reg_mpat_pack() with a field
to set the SPAN type.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:26 -05:00
Petr Machata 0d6cd3fcbc mlxsw: reg: Add SPAN encapsulation to MPAT register
MPAT Register is used to query and configure the Switch Port Analyzer
Table. To configure Port Analyzer to encapsulate mirrored packets,
additional fields need to be specified for the MPAT register.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:26 -05:00
Petr Machata b0066da52e ip_tunnel: Rename & publish init_tunnel_flow
Initializing struct flowi4 is useful for drivers that need to emulate
routing decisions made by a tunnel interface. Publish the
function (appropriately renamed) so that the drivers in question don't
need to cut'n'paste it around.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:26 -05:00
Petr Machata d1b2a6c4be net: GRE: Add is_gretap_dev, is_ip6gretap_dev
Determining whether a device is a GRE device is easily done by
inspecting struct net_device.type. However, for the tap variants, the
type is just ARPHRD_ETHER.

Therefore introduce two predicate functions that use netdev_ops to tell
the tap devices.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:26 -05:00
Petr Machata 8897207c89 mlxsw: spectrum_ipip: Support decoding IPv6 tunnel addresses
To support mirroring to ip6gretap, the SPAN module needs to be able to
decode IPv6 addresses specified at that tunnel.

Extend mlxsw_sp_ipip_netdev_saddr() and mlxsw_sp_ipip_netdev_daddr() to
support IPv6 addresses. To that end, add and publish a support function
mlxsw_sp_ipip_netdev_parms6().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:26 -05:00
Petr Machata 7e58a6c662 mlxsw: spectrum_ipip: Extract mlxsw_sp_l3addr_is_zero
Extract the logic for determining whether a given IPv4/IPv6 address is
all-zeroes from mlxsw_sp_ipip_tunnel_complete to a separate function.
Make that function public within the module.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:25 -05:00
David S. Miller 63d638012e Merge branch 'ibmvnic-Miscellaneous-driver-fixes-and-enhancements'
Thomas Falcon says:

====================
ibmvnic: Miscellaneous driver fixes and enhancements

There is not a general theme to this patch set other than that it
fixes a few issues with the ibmvnic driver. I will just give a quick
summary of what each patch does here.

"ibmvnic: Fix TX descriptor tracking again" resolves a race condition
introduced in an earlier fix to track outstanding transmit descriptors.
This condition can throw off the tracking counter to the point that
a transmit queue will halt forever.

"ibmvnic: Allocate statistics buffers during probe" allocates queue
statistics buffers on device probe to avoid a crash when accessing
statistics of an unopened interface.

"ibmvnic: Harden TX/RX pool cleaning" includes additional checks to
avoid a bad access when cleaning RX and TX buffer pools during a device
reset.

"ibmvnic: Report queue stops and restarts as debug output" changes TX
queue state notifications from informational to debug messages. This
information is not necessarily useful to a user and under load can result
in a lot of log output.

"ibmvnic: Do not attempt to login if RX or TX queues are not allocated"
checks that device queues have been allocated successfully before
attempting device login. This resolves a panic that could occur if a
user attempted to configure a device after a failed reset.

Thanks for your attention.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:31:20 -05:00
Thomas Falcon 20a8ab744f ibmvnic: Do not attempt to login if RX or TX queues are not allocated
If a device reset fails for some reason, TX and RX queue resources
could be released. If a user attempts to open the device in this scenario,
it may result in a kernel panic as the driver tries to access this
memory. To fix this, include a check before device login that TX/RX
queues are still there before enabling the device. In addition, return a
value that can be checked in case of any errors to avoid waiting for a
completion that will never come.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:31:19 -05:00
Thomas Falcon 0aecb13ce3 ibmvnic: Report queue stops and restarts as debug output
It's not necessary to report each time a queue is stopped and restarted
as an informational message. Change that to be a debug message so that
it can be observed if needed but not printed by default.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:31:19 -05:00
Thomas Falcon 637f81d164 ibmvnic: Harden TX/RX pool cleaning
If the driver releases resources after a failed reset or some other
error, the driver might attempt to clean up and free memory that
isn't there anymore. Include some additional checks that RX/TX queues
along with their associated structures are still there before cleaning.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:31:19 -05:00
Thomas Falcon 53cc7721fd ibmvnic: Allocate statistics buffers during probe
Currently, buffers holding individual queue statistics are allocated
when the device is opened. If an ibmvnic interface is hotplugged or
initialized but never opened, an attempt to get statistics with
ethtool will result in a kernel panic.

Since the driver allocates a constant number, the maximum supported
queues, of buffers, these can be allocated during device probe and
freed when the device is hot-unplugged or the module is removed.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:31:19 -05:00
Thomas Falcon ecba616e04 ibmvnic: Fix TX descriptor tracking again
Sorry, the previous change introduced a race condition between
transmit completion processing and tracking TX descriptors. If a
completion is received before the number of descriptors is logged,
the number of descriptors will be add but not removed. After enough
times, this could halt the transmit queue forever.

Log the number of descriptors used by a transmit before sending.
I stress tested the fix on two different systems running over the
weekend without any issues.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:31:19 -05:00
David S. Miller 51846bfef6 Merge branch 'stmmac-barrier-fixes-and-cleanup'
Niklas Cassel says:

====================
stmmac barrier fixes and cleanup
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:28:11 -05:00
Niklas Cassel 1e88f6e01b net: stmmac: make dwmac4_release_tx_desc() clear all descriptor fields
Make dwmac4_release_tx_desc() clear all descriptor fields, not just
TDES2 and TDES3.

I'm suspecting that TDES0 and TDES1 wasn't cleared because the DMA
engine uses them to store the tx hardware timestamp (if PTP is enabled).

However, stmmac_tx_clean() calls stmmac_get_tx_hwtstamp(), which reads
and saves the timestamp, before it calls release_tx_desc(), so this
is not an issue.

stmmac_xmit() and stmmac_tso_xmit() both always overwrite TDES0,
however, stmmac_tso_xmit() sometimes sets TDES1, and since neither
stmmac_xmit() nor stmmac_tso_xmit() explicitly clears TDES1, both
functions might reuse a DMA descriptor with old TDES1 data.

I haven't observed any misbehavior even though TDES1 sometimes
point to an old skb, however, explicitly clearing both TDES0 and TDES1
in dwmac4_release_tx_desc() minimizes the chances of undefined behavior.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:28:10 -05:00
Niklas Cassel a6b25da5e7 net: stmmac: ensure that the device has released ownership before reading data
According to Documentation/memory-barriers.txt, we need to use a
dma_rmb() after reading the status/own bit, to ensure that all
descriptor fields are read after reading the own bit.

This way, we ensure that the DMA engine is done with the DMA
descriptor before we read the other descriptor fields, e.g. reading
the tx hardware timestamp (if PTP is enabled).

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:28:10 -05:00
Niklas Cassel 95eb930a40 net: stmmac: use correct barrier between coherent memory and MMIO
The last memory barrier in stmmac_xmit()/stmmac_tso_xmit() is placed
between a coherent memory write and a MMIO write:

The own bit is written in First Desc (TSO: MSS desc or First Desc).
<barrier>
The DMA engine is started by a write to the tx desc tail pointer/
enable dma transmission register, i.e. a MMIO write.

This barrier cannot be a simple dma_wmb(), since a dma_wmb() is only
used to guarantee the ordering, with respect to other writes,
to cache coherent DMA memory.

To guarantee that the cache coherent memory writes have completed
before we attempt to write to the cache incoherent MMIO region,
we need to use the more heavyweight barrier wmb().

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:28:10 -05:00
Niklas Cassel 15d2ee42a3 net: stmmac: ensure that the MSS desc is the last desc to set the own bit
A dma_wmb() is used to guarantee the ordering, with respect to
other writes, to cache coherent DMA memory.

There is a dma_wmb() in prepare_tx_desc()/prepare_tso_tx_desc() which
ensures that TDES0/1/2 is written before TDES3 (which contains the own
bit), for First Desc.

However, in the rare case that MSS changes, there will be a MSS
context descriptor in front of the regular DMA descriptors:

<MSS desc> <- DMA Next Descriptor
<First Desc>
<desc n>
<Last Desc>

Thus, for this special case, we need a dma_wmb()
after prepare_tso_tx_desc()/before writing the own bit to the MSS desc,
so that we flush the write to TDES3 for First Desc,
in order to ensure that the MSS descriptor is the last descriptor to
set the own bit.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:28:10 -05:00
David S. Miller f4155eff1f Merge branch 'RDS-optimized-notification-for-zerocopy-completion'
Sowmini Varadhan says:

====================
RDS: optimized notification for zerocopy completion

Resending with acked-by additions: previous attempt does not show
up in Patchwork. This time with a new mail Message-Id.

RDS applications use predominantly request-response, transacation
based IPC, so that ingress and egress traffic are well-balanced,
and it is possible/desirable to reduce system-call overhead by
piggybacking the notifications for zerocopy completion response
with data.

Moreover, it has been pointed out that socket functions block
if sk_err is non-zero, thus if the RDS code does not plan/need
to use sk_error_queue path for completion notification, it
is preferable to remove the sk_errror_queue related paths in
RDS.

Both of these goals are implemented in this series.

v2: removed sk_error_queue support
v3: incorporated additional code review comments (details in each patch)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:19:11 -05:00
Sowmini Varadhan 6f3899e602 selftests/net: reap zerocopy completions passed up as ancillary data.
PF_RDS sockets pass up cookies for zerocopy completion as ancillary
data. Update msg_zerocopy to reap this information.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:19:11 -05:00
Sowmini Varadhan 401910db4c rds: deliver zerocopy completion notification with data
This commit is an optimization over commit 01883eda72
("rds: support for zcopy completion notification") for PF_RDS sockets.

RDS applications are predominantly request-response transactions, so
it is more efficient to reduce the number of system calls and have
zerocopy completion notification delivered as ancillary data on the
POLLIN channel.

Cookies are passed up as ancillary data (at level SOL_RDS) in a
struct rds_zcopy_cookies when the returned value of recvmsg() is
greater than, or equal to, 0. A max of RDS_MAX_ZCOOKIES may be passed
with each message.

This commit removes support for zerocopy completion notification on
MSG_ERRQUEUE for PF_RDS sockets.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:19:11 -05:00
Sowmini Varadhan 67490e34ba selftests/net: revert the zerocopy Rx path for PF_RDS
In preparation for optimized reception of zerocopy completion,
revert the Rx side changes introduced by Commit dfb8434b0a
("selftests/net: add zerocopy support for PF_RDS test case")

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:19:10 -05:00
David S. Miller c1de13bb93 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2018-02-26

This series contains updates to i40e and i40evf only.

Mariusz adds a new ethtool private flag for forcing true link state with
the requested changes from Jakub Kicinski.

Paweł fixes an issue where we were double locking the same resource
which would generate a kernel panic after bringing an interface up for
i40evf.

Alan modifies both drivers to use software values to determine if there
are packets stalled on the ring with the added benefit of being less CPU
intensive since we do not need to reach into the hardware to get the
values.

Colin Ian King provides a few fixes detected by Coverity, first was to
pass a struct by reference versus by value to be more efficient.  Then
verify the VSI pointer is not NULL before trying to dereference it.
Cleaned up redundant checks that always return true.

Dan Carpenter fixes over indented lines of code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 12:56:36 -05:00
Heiner Kallweit 6c6aa15fde r8169: improve interrupt handling
This patch improves few aspects of interrupt handling:
- update to current interrupt allocation API
  (use pci_alloc_irq_vectors() instead of deprecated pci_enable_msi())
- this implicitly will allocate a MSI-X interrupt if available
- get rid of flag RTL_FEATURE_MSI
- remove some dead code, intentionally disabling (unreliable) MSI
  being partially available on old PCI chips.

The patch works fine on a RTL8168evl (chip version 34) and on a
RTL8169SB (chip version 04).

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 11:47:41 -05:00
David Ahern a52b839752 selftests: Add fib-onlink-tests.sh to TEST_PROGS
Fixes: 153e1b84f4 ("selftests: Add FIB onlink tests")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 11:41:26 -05:00
David S. Miller b17db8c0a1 Merge branch 'DPAA-Ethernet-fixes'
Madalin Bucur says:

====================
DPAA Ethernet fixes

Fixed an issue on the Tx path that was visible in netperf
TCP_SENDFILE tests. Addressed another issue with Rx errors
not being always counted. Adding control for allmulti.

v2: rephrased commit message, reduced changes in the SG mapping fix
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 11:40:04 -05:00