Commit graph

636344 commits

Author SHA1 Message Date
Ido Schimmel f414b48e92 mlxsw: resources: Add maximum buffer size
We need to be able to limit the size of shared buffer pools, so query
the maximum size from the device during init.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29 20:48:51 -05:00
Arnd Bergmann 67ea7ef19b mlxsw: switchib: add MLXSW_PCI dependency
The newly added switchib driver fails to link if MLXSW_PCI=m:

drivers/net/ethernet/mellanox/mlxsw/mlxsw_switchib.o: In function^Cmlxsw_sib_module_exit':
switchib.c:(.exit.text+0x8): undefined reference to `mlxsw_pci_driver_unregister'
switchib.c:(.exit.text+0x10): undefined reference to `mlxsw_pci_driver_unregister'
drivers/net/ethernet/mellanox/mlxsw/mlxsw_switchib.o: In function `mlxsw_sib_module_init':
switchib.c:(.init.text+0x28): undefined reference to `mlxsw_pci_driver_register'
switchib.c:(.init.text+0x38): undefined reference to `mlxsw_pci_driver_register'
switchib.c:(.init.text+0x48): undefined reference to `mlxsw_pci_driver_unregister'

The other two such sub-drivers have a dependency, so add the same one
here. In theory we could allow this driver if MLXSW_PCI is disabled,
but it's probably not worth it.

Fixes: d1ba526384 ("mlxsw: switchib: Introduce SwitchIB and SwitchIB silicon driver")
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29 20:36:25 -05:00
Geert Uytterhoeven 0075bd692d net: phy: Fix use after free in phy_detach()
If device_release_driver(&phydev->mdio.dev) is called, it releases all
resources belonging to the PHY device. Hence the subsequent call to
phy_led_triggers_unregister() will access already freed memory when
unregistering the LEDs.

Move the call to phy_led_triggers_unregister() before the possible call
to device_release_driver() to fix this.

Fixes: 2e0bc452f4 ("net: phy: leds: add support for led triggers on phy link state change")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Zach Brown <zach.brown@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29 20:35:24 -05:00
Pavel Machek 22d3efe5f6 stmmac: fix comments, make debug output consistent
Fix comments, add some new, and make debugfs output consistent.

Signed-off-by: Pavel Machek <pavel@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29 19:53:22 -05:00
Daniel Mack 01ae87eab5 bpf: cgroup: fix documentation of __cgroup_bpf_update()
There's a 'not' missing in one paragraph. Add it.

Fixes: 3007098494 ("cgroup: add support for eBPF programs")
Signed-off-by: Daniel Mack <daniel@zonque.org>
Reported-by: Rami Rosen <roszenrami@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29 19:50:59 -05:00
David S. Miller a152c91889 Merge branch 'eee-broken-modes'
Jerome Brunet says:

====================
Fix OdroidC2 Gigabit Tx link issue

This patchset fixes an issue with the OdroidC2 board (DWMAC + RTL8211F).
The platform seems to enter LPI on the Rx path too often while performing
relatively high TX transfer. This eventually break the link (both Tx and
Rx), and require to bring the interface down and up again to get the Rx
path working again.

The root cause of this issue is not fully understood yet but disabling EEE
advertisement on the PHY prevent this feature to be negotiated.
With this change, the link is stable and reliable, with the expected
throughput performance.

The patchset adds options in the generic phy driver to disable EEE
advertisement, through device tree. The way it is done is very similar
to the handling of the max-speed property.

Changes since V2: [2]
 - Rename "eee-advert-disable" to "eee-broken-modes" to make the intended
   purpose of this option clear (flag broken configuration, not a
   configuration option)
 - Add DT bindings constants so the DT configuration is more user friendly
 - Submit to net-next instead of net.

Changes since V1: [1]
 - Disable the advertisement of EEE in the generic code instead of the
   realtek driver.

[1] : http://lkml.kernel.org/r/1479220154-25851-1-git-send-email-jbrunet@baylibre.com
[2] : http://lkml.kernel.org/r/1479742524-30222-1-git-send-email-jbrunet@baylibre.com
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29 19:38:31 -05:00
jbrunet c44a3bc142 dt: bindings: add ethernet phy eee-broken-modes option documentation
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29 19:38:31 -05:00
jbrunet 1fc31357ad dt-bindings: net: add EEE capability constants
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Yegor Yefremov <yegorslists@googlemail.com>
Tested-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29 19:38:31 -05:00
jbrunet d853d145ea net: phy: add an option to disable EEE advertisement
This patch adds an option to disable EEE advertisement in the generic PHY
by providing a mask of prohibited modes corresponding to the value found in
the MDIO_AN_EEE_ADV register.

On some platforms, PHY Low power idle seems to be causing issues, even
breaking the link some cases. The patch provides a convenient way for these
platforms to disable EEE advertisement and work around the issue.

Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Yegor Yefremov <yegorslists@googlemail.com>
Tested-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29 19:38:31 -05:00
Niklas Cassel 436feafe95 net: stmmac: enable tx queue 0 for gmac4 IPs synthesized with multiple TX queues
The dwmac4 IP can synthesized with 1-8 number of tx queues.
On an IP synthesized with DWC_EQOS_NUM_TXQ > 1, all txqueues are disabled
by default. For these IPs, the bitfield TXQEN is R/W.

Always enable tx queue 0. The write will have no effect on IPs synthesized
with DWC_EQOS_NUM_TXQ == 1.

The driver does still not utilize more than one tx queue in the IP.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29 19:10:47 -05:00
Peter Robinson 530742e707 net: arc_emac: add dependencies on associated arches and compile test
Add dependencies on the architectures that support these devices and
add compile test to ensure ongoing code build coverage.

Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29 18:57:36 -05:00
Andreas Färber b26bff6e52 MAINTAINERS: Add device tree bindings to mv88e6xx section
Also include the netdev list for convenience, as done elsewhere.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29 18:55:52 -05:00
Eric Dumazet 40931b8511 mlx4: give precise rx/tx bytes/packets counters
mlx4 stats are chaotic because a deferred work queue is responsible
to update them every 250 ms.

Even sampling stats every one second with "sar -n DEV 1" gives
variations like the following :

lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
07:39:22         eth0 146877.00 3265554.00   9467.15 4828168.50
07:39:23         eth0 146587.00 3260329.00   9448.15 4820445.98
07:39:24         eth0 146894.00 3259989.00   9468.55 4819943.26
07:39:25         eth0 110368.00 2454497.00   7113.95 3629012.17  <<>>
07:39:26         eth0 146563.00 3257502.00   9447.25 4816266.23
07:39:27         eth0 145678.00 3258292.00   9389.79 4817414.39
07:39:28         eth0 145268.00 3253171.00   9363.85 4809852.46
07:39:29         eth0 146439.00 3262185.00   9438.97 4823172.48
07:39:30         eth0 146758.00 3264175.00   9459.94 4826124.13
07:39:31         eth0 146843.00 3256903.00   9465.44 4815381.97
Average:         eth0 142827.50 3179259.70   9206.30 4700578.16

This patch allows rx/tx bytes/packets counters being folded at the
time we need stats.

We now can fetch stats every 1 ms if we want to check NIC behavior
on a small time window. It is also easier to detect anomalies.

lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
07:42:50         eth0 142915.00 3177696.00   9212.06 4698270.42
07:42:51         eth0 143741.00 3200232.00   9265.15 4731593.02
07:42:52         eth0 142781.00 3171600.00   9202.92 4689260.16
07:42:53         eth0 143835.00 3192932.00   9271.80 4720761.39
07:42:54         eth0 141922.00 3165174.00   9147.64 4679759.21
07:42:55         eth0 142993.00 3207038.00   9216.78 4741653.05
07:42:56         eth0 141394.06 3154335.64   9113.85 4663731.73
07:42:57         eth0 141850.00 3161202.00   9144.48 4673866.07
07:42:58         eth0 143439.00 3180736.00   9246.05 4702755.35
07:42:59         eth0 143501.00 3210992.00   9249.99 4747501.84
Average:         eth0 142835.66 3182165.93   9206.98 4704874.08

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29 13:36:34 -05:00
Haishuang Yan 31ac1c1945 geneve: fix ip_hdr_len reserved for geneve6 tunnel.
It shold reserved sizeof(ipv6hdr) for geneve in ipv6 tunnel.

Fixes: c3ef5aa5e5 ('geneve: Merge ipv4 and ipv6 geneve_build_skb()')
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 16:14:49 -05:00
David S. Miller 742f74b504 Merge branch 'phy-doc-improvements'
Florian Fainelli says:

====================
Documentation: net: phy: Improve documentation

This patch series addresses discussions and feedback that was recently received
on the mailing-list in the area of: flow control/pause frames, interpretation of
phy_interface_t and finally add some links to useful standards documents.

Changes in v3:

- add Timur's feedback into patch 3

Changes in v2:

- clarify a few things in the RGMII section, add a paragraph about common issues
  with RGMII delay mismatches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 16:07:43 -05:00
Florian Fainelli cc9111162b Documentation: net: phy: Add links to several standards documents
Add links to the IEEE 802.3-2008 document, and the RGMII v1.3 and v2.0
revisions of the standard.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 16:07:43 -05:00
Florian Fainelli bf8f6952a2 Documentation: net: phy: Add blurb about RGMII
RGMII is a recurring source of pain for people with Gigabit Ethernet
hardware since it may require PHY driver and MAC driver level
configuration hints. Document what are the expectations from PHYLIB and
what options exist.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 16:07:42 -05:00
Florian Fainelli 2fa3e25b45 Documentation: net: phy: Add a paragraph about pause frames/flow control
Describe that the Ethernet MAC controller is ultimately responsible for
dealing with proper pause frames/flow control advertisement and
enabling, and that it is therefore allowed to have it change
phydev->supported/advertising with SUPPORTED_Pause and
SUPPORTED_AsymPause.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 16:07:42 -05:00
Florian Fainelli 527fd70e26 Documentation: net: phy: remove description of function pointers
Remove the function pointers documentation which duplicates information
found in include/linux/phy.h. Maintaining documentation about two
different locations just does not work, but the code is less likely to
be outdated.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 16:07:42 -05:00
Andreas Färber 5edef2f288 net: dsa: mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt count
mv88e6xxx_g1_irq_setup() sets up chip->g1_irq.nirqs interrupt mappings,
so free the same amount. This will be 8 or 9 in practice, less than 16.

Fixes: dc30c35be7 ("net: dsa: mv88e6xxx: Implement interrupt support.")
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 15:59:40 -05:00
David S. Miller 33362fc2da Merge branch 'mlx5-DCBX-and-ethtool-updates'
Saeed Mahameed says:

====================
Mellanox 100G mlx5 DCBX and ethtool updates

This series provides the following mlx5 updates:

From Huy:
DCBX CEE API and DCBX firmware/host modes support.
    - 1st patch ensures the dcbnl_rtnl_ops is published only when the qos
      capability bits is on.
    - 2nd patch adds the support for CEE interfaces into mlx5 dcbnl_rtnl_ops
    - 3rd patch refactors ETS query to read ETS configuration directly from
      firmware rather than having a software shadow to it. The existing IEEE
      interfaces stays the same.
    - 4th patch adds the support for MLX5_REG_DCBX_PARAM and MLX5_REG_DCBX_APP
      firmware commands to manipulate mlx5 DCBX mode.
    - 5th patch adds the driver support for the new DCBX firmware.  This ensures
      the backward compatibility versus the old and new firmware. With the new DCBX
      firmware, qos settings can be controlled by either firmware or software
      depending on the DCBX mode.

From Kamal and Saeed:
    - mlx5 self-test support.

From Shaker:
    - Private flag to give the user the ability to enable/disable mlx5 CQE
      compression.

V1->V2:
    - Check ETS capability where needed in:
	("net/mlx5e: Read ETS settings directly from firmware")
    - Fix return value of mlx5e_dcbnl_switch_to_host_mode in:
	("net/mlx5e: ConnectX-4 firmware support for DCBX")
    - Update commit message of:
	("net/mlx5e: ConnectX-4 firmware support for DCBX")
    - Fix two sparse static check warnings in en_selftest.c

This series was generated against commit:
e5f12b3f5e ("Merge branch 'mlxsw-trap-groups-and-policers'")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 15:09:37 -05:00
Shaker Daibes 9bcc86064b net/mlx5e: Add CQE compression user control
The user can now override the automatic driver decision using the
rx_cqe_compress flag, which is the preference for CQE compression.
The flag is initialized with the automatic driver decision.

Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 15:09:36 -05:00
Shaker Daibes 59ece1c969 net/mlx5e: Moves pflags to priv->params
pflags is a configuration parameter for the netdev, naturally it belongs
to priv->params.
Also introduce MLX5E_GET_PFLAG

Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 15:09:36 -05:00
Saeed Mahameed 0952da791c net/mlx5e: Add support for loopback selftest
Extend the self diagnostic tests to support loopback test.

The loopback test doesn't require the offline flag, it will use the
generic dev_queue_xmit and a dedicated packet_type to capture and verify
mlx5e selftest loopback packets.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 15:09:36 -05:00
Kamal Heib d605d6686d net/mlx5e: Add support for ethtool self diagnostics test
The self diagnostics test implementaion include the following features:
1. Link Test: Check that link is in up state.
2. Speed Test: Check that link was negotiated correctly.
3. Health Test: Check the device health.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 15:09:35 -05:00
Huy Nguyen 0eca995f3e net/mlx5e: Add DCBX control interface
Use setdcbx interface to set the DCBX mode to firmware or os.
If setdcbx is called with mode value of zero, the DCBX mode
is set to firmware.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 15:09:35 -05:00
Huy Nguyen e207b7e991 net/mlx5e: ConnectX-4 firmware support for DCBX
DBCX by default is controlled by firmware where dcbx capability bit
is set. In this mode, firmware is responsible for reading/sending the
TLV packets from/to the remote partner.

This patch sets up the infrastructure to move between HOST/FW DCBX
control mode.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 15:09:35 -05:00
Huy Nguyen 341c5ee2fb net/mlx5: Add DCBX firmware commands support
Add set/query commands for DCBX_PARAM register

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 15:09:35 -05:00
Huy Nguyen 820c2c5e77 net/mlx5e: Read ETS settings directly from firmware
Issue description:
Current implementation saves the ETS settings from user in
a temporal soft copy and returns this settings when user
queries the ETS settings.

With the new DCBX firmware, the ETS settings can be changed
by firmware when the DCBX is in firmware controlled mode. Therefore,
user will obtain wrong values from the temporal soft copy.

Solution:
1. Read the ETS settings directly from firmware.
2. For tc_tsa:
   a. Initialize tc_tsa to vendor IEEE_8021QAZ_TSA_VENDOR at netdev
      creation.
   b. When reading ETS setting from FW, if the traffic class bandwidth
      is less than 100, set tc_tsa to IEEE_8021QAZ_TSA_ETS. This
      implementation solves the scenarios when the DCBX is in FW control
      and willing bit is on which means the ETS setting is dictated
      by remote switch.

Also check ETS capability where needed.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 15:09:34 -05:00
Huy Nguyen 3a6a931dfb net/mlx5e: Support DCBX CEE API
Add DCBX CEE API interface for ConnectX-4. Configurations are stored in
a temporary structure and are applied to the card's firmware when
the CEE's setall callback function is called.

Note:
  priority group in CEE is equivalent to traffic class in ConnectX-4
  hardware spec.

  bw allocation per priority in CEE is not supported because ConnectX-4
  only supports bw allocation per traffic class.

  user priority in CEE does not have an equivalent term in ConnectX-4.
  Therefore, user priority to priority mapping in CEE is not supported.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 15:09:34 -05:00
Huy Nguyen 80653f73c5 net/mlx5e: Add qos capability check
Make sure firmware supports qos before exposing the DCB API.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 15:09:34 -05:00
Jason Wang 4490001029 virtio-net: enable multiqueue by default
We use single queue even if multiqueue is enabled and let admin to
enable it through ethtool later. This is used to avoid possible
regression (small packet TCP stream transmission). But looks like an
overkill since:

- single queue user can disable multiqueue when launching qemu
- brings extra troubles for the management since it needs extra admin
  tool in guest to enable multiqueue
- multiqueue performs much better than single queue in most of the
  cases

So this patch enables multiqueue by default: if #queues is less than or
equal to #vcpu, enable as much as queue pairs; if #queues is greater
than #vcpu, enable #vcpu queue pairs.

Cc: Hannes Frederic Sowa <hannes@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Cc: Jeremy Eder <jeder@redhat.com>
Cc: Marko Myllynen <myllynen@redhat.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 13:17:40 -05:00
David S. Miller 06b3ccfbb7 Merge branch 'MV88E6097-fixes'
Stefan Eichenberger says:

====================
Fix support for the MV88E6097

This patchset fixes the following two issues for the MV88E6097:
- Add missing definition of g1_irqs
- Add missing comment
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 11:58:57 -05:00
Stefan Eichenberger 15da3cc890 net: dsa: mv88e6xxx: add missing comment for MV88E6097
Add a missing comment for the MV88E6097 because of unification.

Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 11:58:57 -05:00
Stefan Eichenberger c534178bdd net: dsa: mv88e6xxx: add g1_irqs definition for MV88E6097
Add the missing definition of g1_irqs for MV88E6097.

Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 11:58:57 -05:00
David S. Miller 53c4ce0214 Merge branch 'bpf-misc-next'
Daniel Borkmann says:

====================
BPF cleanups and misc updates

This patch set adds couple of cleanups in first few patches,
exposes owner_prog_type for array maps as well as mlocked mem
for maps in fdinfo, allows for mount permissions in fs and
fixes various outstanding issues in selftests and samples.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 20:38:49 -05:00
Daniel Borkmann e00c7b216f bpf: fix multiple issues in selftest suite and samples
1) The test_lru_map and test_lru_dist fails building on my machine since
   the sys/resource.h header is not included.

2) test_verifier fails in one test case where we try to call an invalid
   function, since the verifier log output changed wrt printing function
   names.

3) Current selftest suite code relies on sysconf(_SC_NPROCESSORS_CONF) for
   retrieving the number of possible CPUs. This is broken at least in our
   scenario and really just doesn't work.

   glibc tries a number of things for retrieving _SC_NPROCESSORS_CONF.
   First it tries equivalent of /sys/devices/system/cpu/cpu[0-9]* | wc -l,
   if that fails, depending on the config, it either tries to count CPUs
   in /proc/cpuinfo, or returns the _SC_NPROCESSORS_ONLN value instead.
   If /proc/cpuinfo has some issue, it returns just 1 worst case. This
   oddity is nothing new [1], but semantics/behaviour seems to be settled.
   _SC_NPROCESSORS_ONLN will parse /sys/devices/system/cpu/online, if
   that fails it looks into /proc/stat for cpuX entries, and if also that
   fails for some reason, /proc/cpuinfo is consulted (and returning 1 if
   unlikely all breaks down).

   While that might match num_possible_cpus() from the kernel in some
   cases, it's really not guaranteed with CPU hotplugging, and can result
   in a buffer overflow since the array in user space could have too few
   number of slots, and on perpcu map lookup, the kernel will write beyond
   that memory of the value buffer.

   William Tu reported such mismatches:

     [...] The fact that sysconf(_SC_NPROCESSORS_CONF) != num_possible_cpu()
     happens when CPU hotadd is enabled. For example, in Fusion when
     setting vcpu.hotadd = "TRUE" or in KVM, setting ./qemu-system-x86_64
     -smp 2, maxcpus=4 ... the num_possible_cpu() will be 4 and sysconf()
     will be 2 [2]. [...]

   Documentation/cputopology.txt says /sys/devices/system/cpu/possible
   outputs cpu_possible_mask. That is the same as in num_possible_cpus(),
   so first step would be to fix the _SC_NPROCESSORS_CONF calls with our
   own implementation. Later, we could add support to bpf(2) for passing
   a mask via CPU_SET(3), for example, to just select a subset of CPUs.

   BPF samples code needs this fix as well (at least so that people stop
   copying this). Thus, define bpf_num_possible_cpus() once in selftests
   and import it from there for the sample code to avoid duplicating it.
   The remaining sysconf(_SC_NPROCESSORS_CONF) in samples are unrelated.

After all three issues are fixed, the test suite runs fine again:

  # make run_tests | grep self
  selftests: test_verifier [PASS]
  selftests: test_maps [PASS]
  selftests: test_lru_map [PASS]
  selftests: test_kmod.sh [PASS]

  [1] https://www.sourceware.org/ml/libc-alpha/2011-06/msg00079.html
  [2] https://www.mail-archive.com/netdev@vger.kernel.org/msg121183.html

Fixes: 3059303f59 ("samples/bpf: update tracex[23] examples to use per-cpu maps")
Fixes: 86af8b4191 ("Add sample for adding simple drop program to link")
Fixes: df570f5772 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
Fixes: e155967179 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_HASH")
Fixes: ebb676daa1 ("bpf: Print function name in addition to function id")
Fixes: 5db58faf98 ("bpf: Add tests for the LRU bpf_htab")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 20:38:47 -05:00
Daniel Borkmann a3af5f8001 bpf: allow for mount options to specify permissions
Since we recently converted the BPF filesystem over to use mount_nodev(),
we now have the possibility to also hold mount options in sb's s_fs_info.
This work implements mount options support for specifying permissions on
the sb's inode, which will be used by tc when it manually needs to mount
the fs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 20:38:47 -05:00
Daniel Borkmann 21116b7068 bpf: add owner_prog_type and accounted mem to array map's fdinfo
Allow for checking the owner_prog_type of a program array map. In some
cases bpf(2) can return -EINVAL /after/ the verifier passed and did all
the rewrites of the bpf program.

The reason that lets us fail at this late stage is that program array
maps are incompatible. Allow users to inspect this earlier after they
got the map fd through BPF_OBJ_GET command. tc will get support for this.

Also, display how much we charged the map with regards to RLIMIT_MEMLOCK.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 20:38:47 -05:00
Daniel Borkmann c491680f8f bpf: reuse dev_is_mac_header_xmit for redirect
Commit dcf800344a ("net/sched: act_mirred: Refactor detection whether
dev needs xmit at mac header") added dev_is_mac_header_xmit(); since it's
also useful elsewhere, move it to if_arp.h and reuse it for BPF.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 20:38:47 -05:00
Daniel Borkmann 55556dd59d bpf: drop useless bpf_fd member from cls/act
After setup we don't need to keep user space fd number around anymore, as
it also has no useful meaning for anyone, just remove it.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 20:38:47 -05:00
Daniel Borkmann 88575199cc bpf: drop unnecessary context cast from BPF_PROG_RUN
Since long already bpf_func is not only about struct sk_buff * as
input anymore. Make it generic as void *, so that callers don't
need to cast for it each time they call BPF_PROG_RUN().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 20:38:47 -05:00
Dan Carpenter e373909927 sfc: remove unneeded variable
We don't use ->heap_buf after commit 46d1efd852 ("sfc: remove Software
TSO") so let's remove the last traces.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 20:30:13 -05:00
David S. Miller 33f8a0458b wireless-drivers-next patches for 4.10
Major changes:
 
 iwlwifi
 
 * finalize and enable dynamic queue allocation
 * use dev_coredumpmsg() to prevent locking the driver
 * small fix to pass the AID to the FW
 * use FW PS decisions with multi-queue
 
 ath9k
 
 * add device tree bindings
 * switch to use mac80211 intermediate software queues to reduce
   latency and fix bufferbloat
 
 wl18xx
 
 * allow scanning in AP mode
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJYOAC4AAoJEG4XJFUm622bUYkH/3SSYp6moSdKpVnVPx7ST7yK
 t9WHR9IMZFIhD6vq8AK6+8OQr1TgGjHfPu+WZj7CIl8nu53kcgPRi51gg1mndbNg
 9N3RbVp06nGbM2VnW8ZIpg3OLIXatZ4c9g3LFvvtyobYvWGJ6W4D79JdlmTG1ELr
 XAjInbxFsgon+CwqCMOaAJx8xYp42rBnPRZZvhOq9O33kRw8Umo9UQw0s1U2Vfgx
 prxQ6d0GxNAPEe8QiDw/vtBcXWFMOhQeDl8sK70ZcojSn1FY730NsIh/Y86PcQTK
 6TsvOL5gg+rd0ln8TZRAslnDrZBAhTEDqUzLQMRJ9VjEj5RFd8eLCSIzHfaroI8=
 =4qCH
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-next-for-davem-2016-11-25' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.10

Major changes:

iwlwifi

* finalize and enable dynamic queue allocation
* use dev_coredumpmsg() to prevent locking the driver
* small fix to pass the AID to the FW
* use FW PS decisions with multi-queue

ath9k

* add device tree bindings
* switch to use mac80211 intermediate software queues to reduce
  latency and fix bufferbloat

wl18xx

* allow scanning in AP mode
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 20:26:59 -05:00
Michael S. Tsirkin 5a717f4f8f netdevice: fix sparse warning for HARD_TX_LOCK
sparse warns about context imbalance in any code
that uses HARD_TX_LOCK/UNLOCK - this is because it's
unable to determine that flags don't change so
lock and unlock are paired.

Seems easy enough to fix by adding __acquire/__release
calls.

With this patch af_packet.c is now sparse-clean,

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 15:28:35 -05:00
Ulrik De Bie 428951161b ptp: gianfar: Use high resolution frequency method.
This patch depends on commit d8d2635419 ("ptp: Introduce a high
resolution frequency adjustment method.")

The gianfar devices offer a frequency resolution of about 0.46 ppb
(depends on actual value of tmr_add, for the calculation assumed
0x80000000). This patch lets users of the device benefit from the increased
frequency resolution when tuning the clock. Thanks to the rounding the
maximum error between the requested frequency and the applied frequency
will then be about 0.23 ppb.

Tested on a v3.3.8 kernel on a real gianfar device. Verified compilation
on net-next (currently at v4.9-rc5).

Signed-off-by: Ulrik De Bie <ulrik.debie-os@e2big.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 15:26:15 -05:00
Eric Dumazet b9972d2205 mlx4: do not use priv->stats_lock in mlx4_en_auto_moderation()
Per RX ring packets/bytes counters are not protected by global
priv->stats_lock.

Better not confuse the reader, and use READ_ONCE() to show we read
these counters without surrounding synchronization.

Interrupt moderation is best effort, and we do not really care of
ultra precise counters.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 15:26:15 -05:00
David S. Miller 0b42f25d2f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
udplite conflict is resolved by taking what 'net-next' did
which removed the backlog receive method assignment, since
it is no longer necessary.

Two entries were added to the non-priv ethtool operations
switch statement, one in 'net' and one in 'net-next, so
simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-26 23:42:21 -05:00
Linus Torvalds d8e435f3ab Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs splice fix from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix default_file_splice_read()
2016-11-26 17:21:13 -08:00
Al Viro 8e54cadab4 fix default_file_splice_read()
Botched calculation of number of pages.  As the result,
we were dropping pieces when doing splice to pipe from
e.g. 9p.

Reported-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-11-26 20:05:42 -05:00