Commit graph

872376 commits

Author SHA1 Message Date
Ursula Braun 81cf4f4707 net/smc: remove close abort worker
With the introduction of the link group termination worker there is
no longer a need to postpone smc_close_active_abort() to a worker.
To protect socket destruction due to normal and abnormal socket
closing, the socket refcount is increased.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 11:23:44 -07:00
Ursula Braun f528ba24a8 net/smc: introduce link group termination worker
Use a worker for link group termination to guarantee process context.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 11:23:44 -07:00
Ursula Braun 2a0674fffb net/smc: improve abnormal termination of link groups
If a link group and its connections must be terminated,
* wake up socket waiters
* do not enable buffer reuse

A linkgroup might be terminated while normal connection closing
is running. Avoid buffer reuse and its related LLC DELETE RKEY
call, if linkgroup termination has started. And use the earliest
indication of linkgroup termination possible, namely the removal
from the linkgroup list.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 11:23:44 -07:00
Ursula Braun 8317976096 net/smc: tell peers about abnormal link group termination
There are lots of link group termination scenarios. Most of them
still allow to inform the peer of the terminating sockets about aborting.
This patch tries to call smc_close_abort() for terminating sockets.

And the internal TCP socket is reset with tcp_abort().

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 11:23:43 -07:00
Ursula Braun 8e316b9e72 net/smc: improve link group freeing
Usually link groups are freed delayed to enable quick connection
creation for a follow-on SMC socket. Terminated link groups are
freed faster. This patch makes sure, fast schedule of link group
freeing is not rescheduled by a delayed schedule. And it makes sure
link group freeing is not rescheduled, if the real freeing is already
running.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 11:23:43 -07:00
Ursula Braun 69318b5215 net/smc: improve abnormal termination locking
Locking hierarchy requires that the link group conns_lock can be
taken if the socket lock is held, but not vice versa. Nevertheless
socket termination during abnormal link group termination should
be protected by the socket lock.
This patch reduces the time segments the link group conns_lock is
held to enable usage of lock_sock in smc_lgr_terminate().

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 11:23:43 -07:00
Ursula Braun 8caa654451 net/smc: terminate link group without holding lgr lock
When a link group is to be terminated, it is sufficient to hold
the lgr lock when unlinking the link group from its list.
Move the lock-protected link group unlinking into smc_lgr_terminate().

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 11:23:43 -07:00
Ursula Braun b290098092 net/smc: cancel send and receive for terminated socket
The resources for a terminated socket are being cleaned up.
This patch makes sure
* no more data is received for an actively terminated socket
* no more data is sent for an actively or passively terminated socket

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 11:23:43 -07:00
Jakub Kicinski fe28afe23e Merge branch 'mlxsw-core-extend-qsfp-eeprom-size'
Ido Schimmel says:

====================
Vadim says:

This patch set extends the size of QSFP EEPROM for the cable types
SSF-8436 and SFF-8636 from 256 bytes to 640 bytes. This allows ethtool
to show correct information for these cable types (more details below).

Patch #1 adds a macro that computes the EEPROM page number from the
provided offset specified in the request.

Patch #2 teaches the driver to access the information stored in the
upper pages of the QSFP memory map.

Details and examples:

SFF-8436 specification defines pages 0, 1, 2 and 3. Page 0 contains
lower memory page offsets (from 0x00 to 0x7f) and upper page offsets
(from 0x80 to 0xfe). Upper pages 1, 2 and 3 are optional and can be
empty.

Page 1 is provided if upper page 0 byte 0xc3 bit 6 is set.
Page 2 is provided if upper page 0 byte 0xc3 bit 7 is set.
Page 3 is provided if lower page 0 byte 0x02 bit 2 is cleared.
Offset 0xc3 for the upper page is provided as 0x43 = 0xc3 - 0x80.

As a result of exposing 256 bytes only, ethtool shows wrong information
for pages 1, 2 and 3. In the below hex dump from ethtool for a cable
compliant to SFF-8636 specification, it can be seen that EEPROM of this
device contains optical diagnostic page (lower page 0 byte 0x02 bit 2 is
cleared), but it is not exposed, as the length defined for this type is
256 bytes.

$ ethtool -m sfp42 hex on
Offset          Values
------          ------
0x0000:         11 07 00 ff 00 ff 00 00 00 55 55 00 00 00 00 00
0x0010:         00 00 00 00 00 00 2a 90 00 00 82 ae 00 00 00 00
0x0020:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0030:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0040:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0050:         00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00
0x0060:         00 00 ff 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0070:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0080:         11 8c 0c 80 00 00 00 00 00 00 00 05 ff 00 00 23
0x0090:         00 00 32 00 4d 65 6c 6c 61 6e 6f 78 20 20 20 20
0x00a0:         20 20 20 20 00 00 02 c9 4d 4d 41 31 42 30 30 2d
0x00b0:         53 53 31 20 20 20 20 20 41 32 42 68 0b b8 46 05
0x00c0:         02 07 f5 9e 4d 54 31 38 33 34 46 54 30 33 38 34
0x00d0:         36 20 20 20 31 38 30 37 30 33 00 00 0c 10 67 c2
0x00e0:         38 32 36 46 4d 41 32 32 36 49 30 31 31 35 20 20
0x00f0:         00 00 00 00 00 00 00 00 00 00 01 00 0e 00 00 00

After changing the length returned by get_module_info() callback from
256 bytes to 640 bytes, the upper pages 1, 2 and 3 are exposed by
ethtool. In the below hex dump from the same cable it can be seen that
the optical diagnostic page (page 3, from offset 0x0200) has non-zero
data.

$ ethtool -m sfp42 hex on
Offset          Values
------          ------
0x0000:         11 07 00 ff 00 ff 00 00 00 55 55 00 00 00 00 00
0x0010:         00 00 00 00 00 00 27 79 00 00 82 c5 00 00 00 00
0x0020:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0030:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0040:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0050:         00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00
0x0060:         00 00 ff 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0070:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0080:         11 8c 0c 80 00 00 00 00 00 00 00 05 ff 00 00 23
0x0090:         00 00 32 00 4d 65 6c 6c 61 6e 6f 78 20 20 20 20
0x00a0:         20 20 20 20 00 00 02 c9 4d 4d 41 31 42 30 30 2d
0x00b0:         53 53 31 20 20 20 20 20 41 32 42 68 0b b8 46 05
0x00c0:         02 07 f5 9e 4d 54 31 38 33 34 46 54 30 33 38 34
0x00d0:         36 20 20 20 31 38 30 37 30 33 00 00 0c 10 67 c2
0x00e0:         38 32 36 46 4d 41 32 32 36 49 30 31 31 35 20 20
0x00f0:         00 00 00 00 00 00 00 00 00 00 01 00 0e 00 00 00
0x0100:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0110:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0120:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0130:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0140:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0150:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0160:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0170:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0180:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0190:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x01a0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x01b0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x01c0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x01d0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x01e0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x01f0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0200:         50 00 f6 00 46 00 00 00 00 00 00 00 00 00 00 00
0x0210:         88 b8 79 18 87 5a 7a 76 00 00 00 00 00 00 00 00
0x0220:         00 00 00 00 00 00 00 00 00 00 18 30 0e 61 60 b7
0x0230:         87 71 01 d3 43 e2 03 a5 10 9a 0a ba 0f a0 0b b8
0x0240:         87 71 02 d4 43 e2 05 a5 00 00 00 00 00 00 00 00
0x0250:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0260:         a7 03 00 00 00 00 00 00 00 00 44 44 22 22 11 11
0x0270:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

And 'ethtool -m sfp42' shows the real values for the below fields, while
before it exposed zeros for these fields:

Laser bias current high alarm threshold   : 8.500 mA
Laser bias current low alarm threshold    : 5.492 mA
Laser bias current high warning threshold : 8.000 mA
Laser bias current low warning threshold  : 6.000 mA
Laser output power high alarm threshold   : 3.4673 mW / 5.40 dBm
Laser output power low alarm threshold    : 0.0724 mW / -11.40 dBm
Laser output power high warning threshold : 1.7378 mW / 2.40 dBm
Laser output power low warning threshold  : 0.1445 mW / -8.40 dBm
Module temperature high alarm threshold   : 80.00 degrees C / 176.00 F
Module temperature low alarm threshold    : -10.00 degrees C / 14.00 F
Module temperature high warning threshold : 70.00 degrees C / 158.00 F
Module temperature low warning threshold  : 0.00 degrees C / 32.00 F
Module voltage high alarm threshold       : 3.5000 V
Module voltage low alarm threshold        : 3.1000 V
====================

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 11:21:00 -07:00
Vadim Pasternak a45bfb5a50 mlxsw: core: Extend QSFP EEPROM size for ethtool
Extend the size of QSFP EEPROM for the cable types SSF8436 and SFF8636
from 256 to 640 bytes in order to expose all the EEPROM pages by
ethtool.

For SFF-8636 and SFF-8436 specifications, the driver exposes 256 bytes
of data for ethtool's get_module_eeprom() callback. This is because the
driver uses the below defines to specify SFF module length in ethtool's
get_module_info() callback:
'ETH_MODULE_SFF_8636_LEN' and 'ETH_MODULE_SFF_8436_LEN' (both are 256).

As a result of exposing 256 bytes only, ethtool shows wrong "zero" info
for pages 1, 2, 3.

The patch changes the length returned by callback for get_module_info()
to the values from the next defines: 'ETH_MODULE_SFF_8636_MAX_LEN' and
'ETH_MODULE_SFF_8436_MAX_LEN' (both are 640) to allow exposing of upper
page 1, 2 and 3.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 10:30:41 -07:00
Vadim Pasternak f366cd2a2e mlxsw: reg: Add macro for getting QSFP module EEPROM page number
Provide a macro for getting QSFP module EEPROM page number from the
optional upper page number row offset, specified in request.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 10:30:41 -07:00
Juergen Gross 2ac061ce97 xen/netback: cleanup init and deinit code
Do some cleanup of the netback init and deinit code:

- add an omnipotent queue deinit function usable from
  xenvif_disconnect_data() and the error path of xenvif_connect_data()
- only install the irq handlers after initializing all relevant items
  (especially the kthreads related to the queue)
- there is no need to use get_task_struct() after creating a kthread
  and using put_task_struct() again after having stopped it.
- use kthread_run() instead of kthread_create() to spare the call of
  wake_up_process().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <pdurrant@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 09:52:04 -07:00
Jakub Kicinski 88238d2d22 Merge branch 'r8152-phy-firmware'
Hayes Wang says:

====================
Support loading the firmware of the PHY with the type of RTL_FW_PHY_NC.
====================

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 09:46:05 -07:00
Hayes Wang af14288f94 r8152: support firmware of PHY NC for RTL8153A
Support the firmware of PHY NC which is used to fix the issue found
for PHY. Currently, only RTL_VER_04, RTL_VER_05, and RTL_VER_06 need
it.

The order of loading PHY firmware would be

	RTL_FW_PHY_START
	RTL_FW_PHY_NC
	RTL_FW_PHY_STOP

The RTL_FW_PHY_START/RTL_FW_PHY_STOP are used to lock/unlock the PHY,
and set/clear the patch key from the firmware file.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 09:45:21 -07:00
Hayes Wang 470e39194a r8152: move r8153_patch_request forward
Move r8153_patch_request() forward for later patch.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 09:45:21 -07:00
Hayes Wang 5a16a3d9f9 r8152: add checking fw_offset field of struct fw_mac
Make sure @fw_offset field of struct fw_mac is more than the size
of struct fw_mac.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 09:45:21 -07:00
Hayes Wang a66edaafae r8152: rename fw_type_1 with fw_mac
The struct fw_type_1 is used by MAC only, so rename it to a meaningful one.

Besides, adjust two messages. Replace "load xxx fail" with "check xxx fail"

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-22 09:45:21 -07:00
Jakub Kicinski 39438490c9 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2019-10-21

This series contains updates to e1000e and igc only.

Sasha adds stream control transmission protocol (SCTP) CRC checksum
support for igc.  Also added S0ix support to the e1000e driver.  Then
added multicast support by adding the address list to the MTA table and
providing the option for IPv6 address for igc.  In addition, added
receive checksum support to igc as well.  Lastly, cleaned up some code
that was not fully implemented yet for the VLAN filter table array.

v2: Dropped patch 1 & 2 from the original series.  Patch 1 is being sent
    to 'net' tree as a fix and patch 2 implementation needs to be
    re-worked.  Updated the patch to add support for S0ix to fix the
    reverse Xmas tree issues and made the entry/exit functions void
    since they constantly returned success.  All based on community
    feedback.
v3: Cleaned up patch 4 of the series based on feedback from the
    community.  Cleaned up a stray comma in a code comment and removed
    the 'inline' of a function that would be inlined by the compiler
    anyways.
====================

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-21 20:16:12 -07:00
Davide Caratti 985fd98ab5 net/sched: act_police: re-use tcf_tm_dump()
Use tcf_tm_dump(), instead of an open coded variant (no functional change
in this patch).

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 11:22:40 -07:00
David S. Miller 3e78815f75 Merge branch 'phy-marvell-support-downshift-as-PHY-tunable'
Heiner Kallweit says:

====================
net: phy: marvell: support downshift as PHY tunable

So far downshift is implemented for one small use case only and can't
be controlled from userspace. So let's implement this feature properly
as a PHY tunable so that it can be controlled via ethtool.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:45:03 -07:00
Heiner Kallweit e2d861cc0f net: phy: marvell: remove superseded function marvell_set_downshift
Instead of superseded function marvell_set_downshift() we can use new
function m88e1111_set_downshift() in m88e1116r_config_init().
For this m88e1116r_config_init() has to be moved in the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:45:03 -07:00
Heiner Kallweit a3bdfce7bf net: phy: marvell: support downshift as PHY tunable
So far downshift is implemented for one small use case only and can't
be controlled from userspace. So let's implement this feature properly
as a PHY tunable so that it can be controlled via ethtool.
More Marvell PHY's may support downshift, but I restricted it for now
to the ones where I have the datasheet.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:45:02 -07:00
Roman Mashak a8fad5459d tc-testing: updated pedit TDC tests
Added test cases for IP header operations:
- set tos/precedence
- add value to tos/precedence
- clear tos/precedence
- invert tos/precedence

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:38:51 -07:00
David S. Miller 7170debecd Merge branch 'mvneta-xdp'
Lorenzo Bianconi says:

====================
add XDP support to mvneta driver

Add XDP support to mvneta driver for devices that rely on software
buffer management. Supported verdicts are:
- XDP_DROP
- XDP_PASS
- XDP_REDIRECT
- XDP_TX
Moreover set ndo_xdp_xmit net_device_ops function pointer in order
to support redirecting from other device (e.g. virtio-net).
Convert mvneta driver to page_pool API.
This series is based on previous work done by Jesper and Ilias.
We will send follow-up patches to reduce DMA-sync operations.

Changes since v4:
- reset page_pool pointer to NULL in mvneta_rxq_drop_pkts and in
  mvneta_create_page_pool error path
- move dma sync in mvneta_rx_refill() in patch 2/7
- verify bpf prog pointer in mvneta_xdp_setup to double-check if
  stop/start is really necessary
- coding style fixes

Changes since v3:
- rename MVNETA_XDP_CONSUMED in MVNETA_XDP_DROPPED
- squash patch 4/8 and patch 3/8
- fix dma sync for XDP_TX verdict
- fix queue_index in xdp_rxq_info_reg
- cosmetics

Changes since v2:
- rely on page_pool_recycle_direct instead of xdp_return_buff for XDP_DROP
- define xdp buffer in mvneta_rx_swbm and avoid default initializations
- use dma_sync_single_for_cpu instead of dma_sync_single_range_for_cpu
- run page_pool_release_page in mvneta_swbm_add_rx_fragment even if
  the buffer contains just ETH_FCS

Changes since v1:
- sync dma buffers before refilling hw queues
- fix stats accounting

Changes since RFC:
- implement XDP_TX
- make tx pending buffer list agnostic
- code refactoring
- check if device is running in mvneta_xdp_setup
====================

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:36:08 -07:00
Lorenzo Bianconi b0a43db908 net: mvneta: add XDP_TX support
Implement XDP_TX verdict and ndo_xdp_xmit net_device_ops function
pointer

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:36:07 -07:00
Lorenzo Bianconi 9e58c8b410 net: mvneta: make tx buffer array agnostic
Allow tx buffer array to contain both skb and xdp buffers in order to
enable xdp frame recycling adding XDP_TX verdict support

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:36:07 -07:00
Lorenzo Bianconi fa383f6b77 net: mvneta: move header prefetch in mvneta_swbm_rx_frame
Move data buffer prefetch in mvneta_swbm_rx_frame after
dma_sync_single_range_for_cpu

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:36:07 -07:00
Lorenzo Bianconi 0db51da7a8 net: mvneta: add basic XDP support
Add basic XDP support to mvneta driver for devices that rely on software
buffer management. Currently supported verdicts are:
- XDP_DROP
- XDP_PASS
- XDP_REDIRECT
- XDP_ABORTED

- iptables drop:
$iptables -t raw -I PREROUTING -p udp --dport 9 -j DROP
$nstat -n && sleep 1 && nstat
IpInReceives		151169		0.0
IpExtInOctets		6953544		0.0
IpExtInNoECTPkts	151165		0.0

- XDP_DROP via xdp1
$./samples/bpf/xdp1 3
proto 0:	421419 pkt/s
proto 0:	421444 pkt/s
proto 0:	421393 pkt/s
proto 0:	421440 pkt/s
proto 0:	421184 pkt/s

Tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:36:07 -07:00
Lorenzo Bianconi 8dc9a0888f net: mvneta: rely on build_skb in mvneta_rx_swbm poll routine
Refactor mvneta_rx_swbm code introducing mvneta_swbm_rx_frame and
mvneta_swbm_add_rx_fragment routines. Rely on build_skb in oreder to
allocate skb since the previous patch introduced buffer recycling using
the page_pool API.
This patch fixes even an issue in the original driver where dma buffers
are accessed before dma sync.
mvneta driver can run on not cache coherent devices so it is
necessary to sync DMA buffers before sending them to the device
in order to avoid memory corruptions. Running perf analysis we can
see a performance cost associated with this DMA-sync (anyway it is
already there in the original driver code). In follow up patches we
will add more logic to reduce DMA-sync as much as possible.

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:36:07 -07:00
Lorenzo Bianconi 568a3fa24a net: mvneta: introduce page pool API for sw buffer manager
Use the page_pool api for allocations and DMA handling instead of
__dev_alloc_page()/dma_map_page() and free_page()/dma_unmap_page().
Pages are unmapped using page_pool_release_page before packets
go into the network stack.

The page_pool API offers buffer recycling capabilities for XDP but
allocates one page per packet, unless the driver splits and manages
the allocated page.
This is a preliminary patch to add XDP support to mvneta driver

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:36:07 -07:00
Lorenzo Bianconi ff519e2acd net: mvneta: introduce mvneta_update_stats routine
Introduce mvneta_update_stats routine to collect {rx/tx} statistics
(packets and bytes). This is a preliminary patch to add XDP support to
mvneta driver

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 10:36:07 -07:00
Sasha Neftin 70332577e4 igc: Clean up unused shadow_vfta pointer
VLAN filter table array not implemented yet and shadow_vfta pointer
not used. Clean up the code and remove the unused shadow_vfta pointer.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-10-21 10:27:01 -07:00
Sasha Neftin 3bdd7086f7 igc: Add Rx checksum support
Extend the socket buffer field process and add Rx checksum functionality
Minor: fix indentation with tab instead of spaces.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-10-21 10:26:39 -07:00
Sasha Neftin 7f839684c5 igc: Add set_rx_mode support
Add multicast addresses list to the MTA table.
Implement basic Rx mode support.
Add option for IPv6 address settings.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-10-21 10:22:13 -07:00
Sasha Neftin f15bb6dde7 e1000e: Add support for S0ix
Implement flow for S0ix support. Modern SoCs support S0ix low power
states during idle periods, which are sub-states of ACPI S0 that increase
power saving while supporting an instant-on experience for providing
lower latency that ACPI S0. The S0ix states shut off parts of the SoC
when they are not in use, while still maintaning optimal performance.
This patch add support for S0ix started from an Ice Lake platform.

Suggested-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@linux.intel.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-10-21 10:22:13 -07:00
Sasha Neftin 0ac960a8e1 igc: Add SCTP CRC checksumming functionality
Add stream control transmission protocol CRC checksum.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-10-21 10:22:13 -07:00
David S. Miller 13faf77185 Merge branch 'hns3-next'
Huazhong Tan says:

====================
net: hns3: add some cleanups & optimizations

This patchset includes some cleanups and optimizations for the HNS3
ethernet driver.

[patch 1/8] removes unused and unnecessary structures.

[patch 2/8] uses a ETH_ALEN u8 array to replace two mac_addr_*
field in struct hclge_mac_mgr_tbl_entry_cmd.

[patch 3/8] optimizes the barrier used in the IO path.

[patch 4/8] introduces macro ring_to_netdev() to get netdevive
from struct hns3_enet_ring variable.

[patch 5/8] makes struct hns3_enet_ring to be cacheline aligned

[patch 6/8] adds a minor cleanup for hns3_handle_rx_bd().

[patch 7/8] removes linear data allocating for fraglist SKB.

[patch 8/8] clears hardware error when resetting.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 09:22:10 -07:00
Jian Shen 4fdd0bca61 net: hns3: log and clear hardware error after reset complete
When device is resetting, the CMDQ service may be stopped until
reset completed. If a new RAS error occurs at this moment, it
will no be able to clear the RAS source. This patch fixes it
by clear the RAS source after reset complete.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 09:22:10 -07:00
Yunsheng Lin 7fda3a930d net: hns3: do not allocate linear data for fraglist skb
Currently, napi_alloc_skb() is used to allocate skb for fraglist
when the head skb is not enough to hold the remaining data, and
the remaining data is added to the frags part of the fraglist skb,
leaving the linear part unused.

So this patch passes length of 0 to allocate fraglist skb with
zero size of linear data.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 09:22:10 -07:00
Yunsheng Lin d35bced88f net: hns3: minor cleanup for hns3_handle_rx_bd()
Since commit e559709505 ("net: hns3: Add handling of GRO Pkts
not fully RX'ed in NAPI poll"), ring->skb is used to record the
current SKB when processing the RX BD in hns3_handle_rx_bd(),
so the parameter out_skb is unnecessary.

This patch also adjusts the err checking to reduce duplication
in hns3_handle_rx_bd(), and "err == -ENXIO" is rare case, so put
it in the unlikely annotation.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 09:22:10 -07:00
Yunsheng Lin 76643555a1 net: hns3: make struct hns3_enet_ring cacheline aligned
Since struct hns3_enet_ring is a frequently used in critical data
path, so make it cacheline aligned as struct hns3_enet_tqp_vector.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 09:22:09 -07:00
Yunsheng Lin c871195601 net: hns3: introduce ring_to_netdev() in enet module
There are a few places that need to access the netdev of a ring
through ring->tqp->handle->kinfo.netdev, and ring->tqp is a struct
which both in enet and hclge modules, it is better to use the
struct that is only used in enet module.

This patch adds the ring_to_netdev() to access the netdev of ring
through ring->tqp_vector->napi.dev.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 09:22:09 -07:00
Yunsheng Lin 88b7c58c19 net: hns3: minor optimization for barrier in IO path
Currently, the TX and RX ring in a queue is bounded to the
same IRQ, there may be unnecessary barrier op when only one of
the ring need to be processed.

This patch adjusts the location of rmb() in hns3_clean_tx_ring()
and adds a checking in hns3_clean_rx_ring() to avoid unnecessary
barrier op when there is nothing to do for the ring.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 09:22:09 -07:00
Guojia Liao 0e02a53d64 net: hns3: optimized MAC address in management table.
mac_addr_hi32 and mac_addr_lo16 are used to store the MAC address
for management table. But using array of mac_addr[ETH_ALEN] would
be more general and not need to care about the big-endian mode of
the CPU.

Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 09:22:09 -07:00
Yunsheng Lin 5f06b903cb net: hns3: remove struct hns3_nic_ring_data in hns3_enet module
Only the queue_index field in struct hns3_nic_ring_data is
used, other field is unused and unnecessary for hns3 driver,
so this patch removes it and move the queue_index field to
hns3_enet_ring.

This patch also removes an unused struct hns_queue declaration.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-21 09:22:09 -07:00
David S. Miller 2f184393e0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Several cases of overlapping changes which were for the most
part trivially resolvable.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-20 10:43:00 -07:00
Linus Torvalds 531e93d114 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
 "I was battling a cold after some recent trips, so quite a bit piled up
  meanwhile, sorry about that.

  Highlights:

   1) Fix fd leak in various bpf selftests, from Brian Vazquez.

   2) Fix crash in xsk when device doesn't support some methods, from
      Magnus Karlsson.

   3) Fix various leaks and use-after-free in rxrpc, from David Howells.

   4) Fix several SKB leaks due to confusion of who owns an SKB and who
      should release it in the llc code. From Eric Biggers.

   5) Kill a bunc of KCSAN warnings in TCP, from Eric Dumazet.

   6) Jumbo packets don't work after resume on r8169, as the BIOS resets
      the chip into non-jumbo mode during suspend. From Heiner Kallweit.

   7) Corrupt L2 header during MPLS push, from Davide Caratti.

   8) Prevent possible infinite loop in tc_ctl_action, from Eric
      Dumazet.

   9) Get register bits right in bcmgenet driver, based upon chip
      version. From Florian Fainelli.

  10) Fix mutex problems in microchip DSA driver, from Marek Vasut.

  11) Cure race between route lookup and invalidation in ipv4, from Wei
      Wang.

  12) Fix performance regression due to false sharing in 'net'
      structure, from Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (145 commits)
  net: reorder 'struct net' fields to avoid false sharing
  net: dsa: fix switch tree list
  net: ethernet: dwmac-sun8i: show message only when switching to promisc
  net: aquantia: add an error handling in aq_nic_set_multicast_list
  net: netem: correct the parent's backlog when corrupted packet was dropped
  net: netem: fix error path for corrupted GSO frames
  macb: propagate errors when getting optional clocks
  xen/netback: fix error path of xenvif_connect_data()
  net: hns3: fix mis-counting IRQ vector numbers issue
  net: usb: lan78xx: Connect PHY before registering MAC
  vsock/virtio: discard packets if credit is not respected
  vsock/virtio: send a credit update when buffer size is changed
  mlxsw: spectrum_trap: Push Ethernet header before reporting trap
  net: ensure correct skb->tstamp in various fragmenters
  net: bcmgenet: reset 40nm EPHY on energy detect
  net: bcmgenet: soft reset 40nm EPHYs before MAC init
  net: phy: bcm7xxx: define soft_reset for 40nm EPHY
  net: bcmgenet: don't set phydev->link from MAC
  net: Update address for MediaTek ethernet driver in MAINTAINERS
  ipv4: fix race condition between route lookup and invalidation
  ...
2019-10-19 17:09:11 -04:00
Eric Dumazet 2a06b8982f net: reorder 'struct net' fields to avoid false sharing
Intel test robot reported a ~7% regression on TCP_CRR tests
that they bisected to the cited commit.

Indeed, every time a new TCP socket is created or deleted,
the atomic counter net->count is touched (via get_net(net)
and put_net(net) calls)

So cpus might have to reload a contended cache line in
net_hash_mix(net) calls.

We need to reorder 'struct net' fields to move @hash_mix
in a read mostly cache line.

We move in the first cache line fields that can be
dirtied often.

We probably will have to address in a followup patch
the __randomize_layout that was added in linux-4.13,
since this might break our placement choices.

Fixes: 355b985537 ("netns: provide pure entropy for net_hash_mix()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19 12:21:53 -07:00
Vivien Didelot 50c7d2ba9d net: dsa: fix switch tree list
If there are multiple switch trees on the device, only the last one
will be listed, because the arguments of list_add_tail are swapped.

Fixes: 83c0afaec7 ("net: dsa: Add new binding implementation")
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19 12:19:41 -07:00
Mans Rullgard 05908d72cc net: ethernet: dwmac-sun8i: show message only when switching to promisc
Printing the info message every time more than the max number of mac
addresses are requested generates unnecessary log spam.  Showing it only
when the hw is not already in promiscous mode is equally informative
without being annoying.

Signed-off-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19 12:18:10 -07:00