Commit graph

710384 commits

Author SHA1 Message Date
Andrew Lunn 743fcc283e net: dsa: mv88e6xxx: Print offending port when vlan check fails
When testing if a VLAN is one more than one bridge, we print an error
message that the VLAN is already in use somewhere else. Print both the
new port which would like the VLAN, and the port which already has it,
to aid debugging.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 19:33:11 +09:00
Andrew Lunn cd88646994 net: dsa: mv88e6xxx: Fixed port netdev check for VLANs
Having the same VLAN on multiple bridges is currently unsupported as
an offload. mv88e6xxx_port_check_hw_vlan() is used to ensure that a
VLAN is not on multiple bridges when adding a VLAN range to a port. It
loops the ports and checks to see if there are ports in a different
bridge with the same VLAN.

While walking all switch ports, the code was checking if the new port
has a netdev slave attached to it. If not, skip checking the port
being walked. This seems like a typ0. If the new port does not have a
slave, how has a VLAN been added to it in the first place, requiring
this check be performed at all? More likely, we should be checking if
the port being walked has a slave. Without the port having a slave, it
cannot have a VLAN on it, so there is no need to check further for
that particular port.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 19:33:11 +09:00
Andrew Lunn 13edbdb6ed net: dsa: {e}dsa: set offload_fwd_mark on received packets
The software bridge needs to know if a packet has already been bridged
by hardware offload to ports in the same hardware offload, in order
that it does not re-flood them, causing duplicates. This is
particularly true for broadcast and multicast traffic which the host
has requested.

By setting offload_fwd_mark in the skb the bridge will only flood to
ports in other offloads and other netifs. Set this flag in the DSA and
EDSA tag driver.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 19:33:11 +09:00
Andrew Lunn a42c8e33f2 net: dsa: Fix SWITCHDEV_ATTR_ID_PORT_PARENT_ID
SWITCHDEV_ATTR_ID_PORT_PARENT_ID is used by the software bridge when
determining which ports to flood a packet out. If the packet
originated from a switch, it assumes the switch has already flooded
the packet out the switches ports, so the bridge should not flood the
packet itself out switch ports. Ports on the same switch are expected
to return the same parent ID when SWITCHDEV_ATTR_ID_PORT_PARENT_ID is
called.

DSA gets this wrong with clusters of switches. As far as the software
bridge is concerned, the cluster is all one switch. A packet from any
switch in the cluster can be assumed to have been flooded as needed
out of all ports of the cluster, not just the switch it originated
from. Hence all ports of a cluster should return the same parent. The
old implementation did not, each switch in the cluster had its own ID.

Also wrong was that the ID was not unique if multiple DSA instances
are in operation.

Use the tree ID as the parent ID, which is the same for all switches
in a cluster and unique across switch clusters.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 19:33:10 +09:00
David S. Miller 8c5f9a8c13 Merge branch 'ieee802154-for-davem-2017-11-09' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next
Stefan Schmidt says:

====================
pull-request: net-next: ieee802154 2017-11-09

A small update on ieee802154 patches for net-next. Nothing dramatic, but simply
housekeeping this time around.
A fix for the correct mask to be applied in the mrf24j40 driver by Gustavo A. R. Silva
Removal of a non existing email user for the ca8210 driver by Harry Morris
A bunch of checkpatch cleanups across the subsystem from myself
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 19:30:16 +09:00
Niklas Cassel 3ec26c7944 bindings: net: stmmac: correctify note about LPI interrupt
There are two different combined signal for various interrupt events:
In EQOS-CORE and EQOS-MTL configurations, mci_intr_o is the interrupt
signal.
In EQOS-DMA, EQOS-AHB and EQOS-AXI configurations, these interrupt events
are combined with the events in the DMA on the sbd_intr_o signal.

Depending on configuration, the device tree irq "macirq" will refer to
either mci_intr_o or sbd_intr_o.

The databook states:
"The MAC generates the LPI interrupt when the Tx or Rx side enters or exits
the LPI state. The interrupt mci_intr_o (sbd_intr_o in certain
configurations) is asserted when the LPI interrupt status is set.

When the MAC exits the Rx LPI state, then in addition to the mci_intr_o
(sbd_intr_o in certain configurations), the sideband signal lpi_intr_o is
asserted.

If you do not want to gate-off the application clock during the Rx LPI
state, you can leave the lpi_intr_o signal unconnected and use the
mci_intr_o (sbd_intr_o in certain configurations) signal to detect Rx LPI
exit."

Since the "macirq" is always raised when Tx or Rx enters/exits the LPI
state, "eth_lpi" must therefore refer to lpi_intr_o, which is only raised
when Rx exits the LPI state. Update the DT binding description to reflect
reality.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Acked-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 19:29:08 +09:00
Keefe Liu ca29fd7cce ipvlan: fix ipv6 outbound device
When process the outbound packet of ipv6, we should assign the master
device to output device other than input device.

Signed-off-by: Keefe Liu <liuqifa@huawei.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 19:27:05 +09:00
Aleksey Makarov 3d67a50752 net: thunderx: fix double free error
This patch fixes an error in memory allocation/freeing in
ThunderX PF driver.

I moved the allocation to the probe() function and made it managed.

>From the Colin's email:

While running static analysis on linux-next with CoverityScan I found 3
double free errors in the Cavium thunder driver.

The issue occurs on the err_disable_device: label of function nic_probe
when nic_free_lmacmem(nic) is called and a double free occurs on
nic->duplex, nic->link and nic->speed.  This occurs when nic_init_hw()
fails:

        /* Initialize hardware */
        err = nic_init_hw(nic);
        if (err)
                goto err_release_regions;

nic_init_hw() calls nic_get_hw_info() and this calls nic_free_lmacmem()
if any of the allocations fail. This free'ing occurs again by the call
to nic_free_lmacmem() on the err_release_regions exit path in nic_probe().

Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 19:22:34 +09:00
Mika Westerberg 86dabda426 net: thunderbolt: Clear finished Tx frame bus address in tbnet_tx_callback()
When Thunderbolt network interface is disabled or when the cable is
unplugged the driver releases all allocated buffers by calling
tbnet_free_buffers() for each ring. This function then calls
dma_unmap_page() for each buffer it finds where bus address is non-zero.
Now, we only clear this bus address when the Tx buffer is sent to the
hardware so it is possible that the function finds an entry that has
already been unmapped.

Enabling DMA-API debugging catches this as well:

  thunderbolt 0000:06:00.0: DMA-API: device driver tries to free DMA
    memory it has not allocated [device address=0x0000000068321000] [size=4096 bytes]

Fix this by clearing the bus address of a Tx frame right after we have
unmapped the buffer.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 19:20:45 +09:00
Tonghao Zhang 5290ada4a2 sock: Remove the global prot_inuse counter.
The per-cpu counter for init_net is prepared in core_initcall.
The patch 7d720c3e ("percpu: add __percpu sparse annotations to net")
and d6d9ca0fe ("net: this_cpu_xxx conversions") optimize the
routines. Then remove the old counter.

Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 19:17:33 +09:00
Colin Ian King 492d070f24 net: sfc: remove redundant variable start
Variable start is assigned but never read hence it is redundant
and can be removed. Cleans up clang warning:

drivers/net/ethernet/sfc/ptp.c:655:2: warning: Value stored to 'start'
is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 19:14:14 +09:00
Colin Ian King 98b07e3ed0 qlge: remove duplicated assignment to mbcp
The assignment to mbcp is identical to the initiatialized value assigned
to mbcp at declaration time a few lines earlier, hence we can remove the
second redundant assignment.  Cleans up clang warning:

drivers/net/ethernet/qlogic/qlge/qlge_mpi.c:209:22: warning:
Value stored to 'mbcp' during its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 19:13:39 +09:00
Gustavo A. R. Silva a3e2ecbae0 net: wan: x25_asy: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 114928
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 19:10:06 +09:00
Gustavo A. R. Silva 0aa3b413f6 net: 3com: 3c574_cs: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 114888
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 19:10:06 +09:00
Gustavo A. R. Silva 75d28f461e net: 8390: pcnet_cs: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 114891
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 19:10:06 +09:00
Gustavo A. R. Silva 2d91914968 net: decnet: dn_table: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 115106
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 19:10:06 +09:00
Guillaume Nault 765924e362 l2tp: don't close sessions in l2tp_tunnel_destruct()
Sessions are already removed by the proto ->destroy() handlers, and
since commit f3c66d4e14 ("l2tp: prevent creation of sessions on terminated tunnels"),
we're guaranteed that no new session can be created afterwards.

Furthermore, l2tp_tunnel_closeall() can sleep when there are sessions
left to close. So we really shouldn't call it in a ->sk_destruct()
handler, as it can be used from atomic context.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 19:05:17 +09:00
David S. Miller f31f54db94 Merge branch 'FACK-loss-recovery-remove'
Yuchung Cheng says:

====================
remove FACK loss recovery

This patch set removes the forward-acknowledgment (FACK)
packet-based loss and reordering detection. This simplifies TCP
loss recovery since the SACK scoreboard no longer needs to track
the number of pending packets under highest SACKed sequence. FACK
is subsumed by the time-based RACK loss detection which is more
robust under reordering and second order losses.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 18:53:17 +09:00
Yuchung Cheng 737ff31456 tcp: use sequence distance to detect reordering
Replace the reordering distance measurement in packet unit with
sequence based approach. Previously it trackes the number of "packets"
toward the forward ACK (i.e.  highest sacked sequence)in a state
variable "fackets_out".

Precisely measuring reordering degree on packet distance has not much
benefit, as the degree constantly changes by factors like path, load,
and congestion window. It is also complicated and prone to arcane bugs.
This patch replaces with sequence-based approach that's much simpler.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 18:53:16 +09:00
Yuchung Cheng 713bafea92 tcp: retire FACK loss detection
FACK loss detection has been disabled by default and the
successor RACK subsumed FACK and can handle reordering better.
This patch removes FACK to simplify TCP loss recovery.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 18:53:16 +09:00
Gustavo A. R. Silva e4ec138413 fsl/fman_port: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1397960
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 18:50:33 +09:00
Gustavo A. R. Silva d9b9c0e027 net: ethernet: bgmac: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1397972
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 18:49:26 +09:00
Nathan Fontenot 37798d0211 ibmvnic: Add vnic client data to login buffer
Update the login buffer to include client data for the vnic driver,
this includes the OS name, LPAR name, and device name. This update
allows this information to be available in the VIOS.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 18:46:11 +09:00
David S. Miller f3edacbd69 bpf: Revert bpf_overrid_function() helper changes.
NACK'd by x86 maintainer.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 18:24:55 +09:00
David S. Miller bee955cd3a Merge branch 'bpf-Fix-bugs-in-sock_ops-samples'
Lawrence Brakmo says:

====================
bpf: Fix bugs in sock_ops samples

The programs were returning -1 in some cases when they should
only return 0 or 1. Changes in the verifier now catch this
issue and the programs fail to load. This is now fixed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:52:41 +09:00
Lawrence Brakmo 03e982eed4 bpf: Fix tcp_clamp_kern.c sample program
The program was returning -1 in some cases which is not allowed
by the verifier any longer.

Fixes: 390ee7e29f ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:52:41 +09:00
Lawrence Brakmo e1853319fc bpf: Fix tcp_iw_kern.c sample program
The program was returning -1 in some cases which is not allowed
by the verifier any longer.

Fixes: 390ee7e29f ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:52:41 +09:00
Lawrence Brakmo 2ff969fbe2 bpf: Fix tcp_cong_kern.c sample program
The program was returning -1 in some cases which is not allowed
by the verifier any longer.

Fixes: 390ee7e29f ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:52:41 +09:00
Lawrence Brakmo a4174f0560 bpf: Fix tcp_bufs_kern.c sample program
The program was returning -1 in some cases which is not allowed
by the verifier any longer.

Fixes: 390ee7e29f ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:52:41 +09:00
Lawrence Brakmo 016e661bb0 bpf: Fix tcp_rwnd_kern.c sample program
The program was returning -1 in some cases which is not allowed
by the verifier any longer.

Fixes: 390ee7e29f ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:52:41 +09:00
Lawrence Brakmo 7863f46bac bpf: Fix tcp_synrto_kern.c sample program
The program was returning -1 in some cases which is not allowed
by the verifier any longer.

Fixes: 390ee7e29f ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:52:41 +09:00
Jon Maloy 8d6e79d3ce tipc: improve link resiliency when rps is activated
Currently, the TIPC RPS dissector is based only on the incoming packets'
source node address, hence steering all traffic from a node to the same
core. We have seen that this makes the links vulnerable to starvation
and unnecessary resets when we turn down the link tolerance to very low
values.

To reduce the risk of this happening, we exempt probe and probe replies
packets from the convergence to one core per source node. Instead, we do
the opposite, - we try to diverge those packets across as many cores as
possible, by randomizing the flow selector key.

To make such packets identifiable to the dissector, we add a new
'is_keepalive' bit to word 0 of the LINK_PROTOCOL header. This bit is
set both for PROBE and PROBE_REPLY messages, and only for those.

It should be noted that these packets are not part of any flow anyway,
and only constitute a minuscule fraction of all packets sent across a
link. Hence, there is no risk that this will affect overall performance.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:36:05 +09:00
David S. Miller 141f575f76 Merge branch 'macb-next'
Michael Grzeschik says:

====================
net: macb: add error handling on probe and

This series adds more error handling to the macb driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:27:45 +09:00
Michael Grzeschik 66ee6a06e6 net: macb: add of_node_put to error paths
We add the call of_node_put(bp->phy_node) to all associated error
paths for memory clean up.

Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:27:44 +09:00
Michael Grzeschik 9ce981401c net: macb: add of_phy_deregister_fixed_link to error paths
We add the call of_phy_deregister_fixed_link to all associated
error paths for memory clean up.

Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:27:44 +09:00
Miquel Raynal e5c500eb29 net: mvpp2: fix GOP statistics loop start and stop conditions
GOP statistics from all ports of one instance of the driver are gathered
with one work recalled in loop in a workqueue. The loop is started when
a port is up, and stopped when a port is down. This last condition is
obviously wrong.

Fix this by having a work per port. This way, starting and stoping it
when the port is up or down will be fine, while minimizing unnecessary
CPU usage.

Fixes: 118d6298f6 ("net: mvpp2: add ethtool GOP statistics")
Reported-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: Miquel Raynal <miquel.raynal@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:21:30 +09:00
David S. Miller fc98135928 Merge branch 'hns3-bug-fixes'
Lipeng says:

====================
net: hns3: Bug fixes & Code improvements in HNS3 driver

This patch-set introduces some bug fixes and code improvements.
As [patch 1/2] depends on the patch {5392902 net: hns3: Consistently using
GENMASK in hns3 driver}, which exists in net-next, not exists in net, so
push this serise to nex-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:17:56 +09:00
Fuyun Liang c040366bc4 net: hns3: cleanup mac auto-negotiation state query in hclge_update_speed_duplex
When checking whether auto-negotiation is on, driver only needs to
check the value of mac.autoneg(SW) directly, and does not need to
query it from hardware. Because this value is always synchronized
with the auto-negotiation state of hardware.

This patch removes mac auto-negotiation state query in
hclge_update_speed_duplex().

Fixes: 46a3df9f97 (net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support)
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:17:56 +09:00
Fuyun Liang 39e2151f10 net: hns3: fix a bug when getting phy address from NCL_config file
Driver gets phy address from NCL_config file and uses the phy address
to initialize phydev. There are 5 bits for phy address. And C22 phy
address has 5 bits. So 0-31 are all valid address for phy. If there
is no phy, it will crash. Because driver always get a valid phy address.

This patch fixes the phy address to 8 bits, and use 0xff to indicate
invalid phy address.

Fixes: 46a3df9f97 (net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support)
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:17:56 +09:00
David Ahern 28033ae4e0 net: netlink: Update attr validation to require exact length for some types
Attributes using NLA_U* and NLA_S* (where * is 8, 16,32 and 64) are
expected to be an exact length. Split these data types from
nla_attr_minlen into nla_attr_len and update validate_nla to require
the attribute to have exact length for them.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:14:38 +09:00
Maciej Żenczykowski 2210d6b2f2 net: ipv6: sysctl to specify IPv6 ND traffic class
Add a per-device sysctl to specify the default traffic class to use for
kernel originated IPv6 Neighbour Discovery packets.

Currently this includes:

  - Router Solicitation (ICMPv6 type 133)
    ndisc_send_rs() -> ndisc_send_skb() -> ip6_nd_hdr()

  - Neighbour Solicitation (ICMPv6 type 135)
    ndisc_send_ns() -> ndisc_send_skb() -> ip6_nd_hdr()

  - Neighbour Advertisement (ICMPv6 type 136)
    ndisc_send_na() -> ndisc_send_skb() -> ip6_nd_hdr()

  - Redirect (ICMPv6 type 137)
    ndisc_send_redirect() -> ndisc_send_skb() -> ip6_nd_hdr()

and if the kernel ever gets around to generating RA's,
it would presumably also include:

  - Router Advertisement (ICMPv6 type 134)
    (radvd daemon could pick up on the kernel setting and use it)

Interface drivers may examine the Traffic Class value and translate
the DiffServ Code Point into a link-layer appropriate traffic
prioritization scheme.  An example of mapping IETF DSCP values to
IEEE 802.11 User Priority values can be found here:

    https://tools.ietf.org/html/draft-ietf-tsvwg-ieee-802-11

The expected primary use case is to properly prioritize ND over wifi.

Testing:
  jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  0
  jzem22:~# echo -1 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  -bash: echo: write error: Invalid argument
  jzem22:~# echo 256 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  -bash: echo: write error: Invalid argument
  jzem22:~# echo 0 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  jzem22:~# echo 255 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  255
  jzem22:~# echo 34 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  34

  jzem22:~# echo $[0xDC] > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  jzem22:~# tcpdump -v -i eth0 icmp6 and src host jzem22.pgc and dst host fe80::1
  tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
  IP6 (class 0xdc, hlim 255, next-header ICMPv6 (58) payload length: 24)
  jzem22.pgc > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement,
  length 24, tgt is jzem22.pgc, Flags [solicited]

(based on original change written by Erik Kline, with minor changes)

v2: fix 'suspicious rcu_dereference_check() usage'
    by explicitly grabbing the rcu_read_lock.

Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Erik Kline <ek@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:13:02 +09:00
Samuel Mendoza-Jonas 04bad8bda9 net/ncsi: Don't return error on normal response
Several response handlers return EBUSY if the data corresponding to the
command/response pair is already set. There is no reason to return an
error here; the channel is advertising something as enabled because we
told it to enable it, and it's possible that the feature has been
enabled previously.

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:09:44 +09:00
Samuel Mendoza-Jonas 9ef8690be1 net/ncsi: Improve general state logging
The NCSI driver is mostly silent which becomes a headache when trying to
determine what has occurred on the NCSI connection. This adds additional
logging in a few key areas such as state transitions and calling out
certain errors more visibly.

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:09:44 +09:00
David S. Miller a8a6f1e4ea Merge branch 'bpftool-show-filenames-of-pinned-objects'
Prashant Bhole says:

====================
tools: bpftool: show filenames of pinned objects

This patchset adds support to show pinned objects in object details.

Patch1 adds a funtionality to open a path in bpf-fs regardless of its object
type.

Patch2 adds actual functionality by scanning the bpf-fs once and adding
object information in hash table, with object id as a key. One object may be
associated with multiple paths because an object can be pinned multiple times

Patch3 adds command line option to enable this functionality. Making it optional
because scanning bpf-fs can be costly.
====================

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2017-11-11 12:35:49 +09:00
Prashant Bhole c541b73466 tools: bpftool: optionally show filenames of pinned objects
Making it optional to show file names of pinned objects because
it scans complete bpf-fs filesystem which is costly.
Added option -f|--bpffs. Documentation updated.

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 12:35:41 +09:00
Prashant Bhole 4990f1f461 tools: bpftool: show filenames of pinned objects
Added support to show filenames of pinned objects.

For example:

root@test# ./bpftool prog
3: tracepoint  name tracepoint__irq  tag f677a7dd722299a3
    loaded_at Oct 26/11:39  uid 0
    xlated 160B  not jited  memlock 4096B  map_ids 4
    pinned /sys/fs/bpf/softirq_prog

4: tracepoint  name tracepoint__irq  tag ea5dc530d00b92b6
    loaded_at Oct 26/11:39  uid 0
    xlated 392B  not jited  memlock 4096B  map_ids 4,6

root@test# ./bpftool --json --pretty prog
[{
        "id": 3,
        "type": "tracepoint",
        "name": "tracepoint__irq",
        "tag": "f677a7dd722299a3",
        "loaded_at": "Oct 26/11:39",
        "uid": 0,
        "bytes_xlated": 160,
        "jited": false,
        "bytes_memlock": 4096,
        "map_ids": [4
        ],
        "pinned": ["/sys/fs/bpf/softirq_prog"
        ]
    },{
        "id": 4,
        "type": "tracepoint",
        "name": "tracepoint__irq",
        "tag": "ea5dc530d00b92b6",
        "loaded_at": "Oct 26/11:39",
        "uid": 0,
        "bytes_xlated": 392,
        "jited": false,
        "bytes_memlock": 4096,
        "map_ids": [4,6
        ],
        "pinned": []
    }
]

root@test# ./bpftool map
4: hash  name start  flags 0x0
    key 4B  value 16B  max_entries 10240  memlock 1003520B
    pinned /sys/fs/bpf/softirq_map1
5: hash  name iptr  flags 0x0
    key 4B  value 8B  max_entries 10240  memlock 921600B

root@test# ./bpftool --json --pretty map
[{
        "id": 4,
        "type": "hash",
        "name": "start",
        "flags": 0,
        "bytes_key": 4,
        "bytes_value": 16,
        "max_entries": 10240,
        "bytes_memlock": 1003520,
        "pinned": ["/sys/fs/bpf/softirq_map1"
        ]
    },{
        "id": 5,
        "type": "hash",
        "name": "iptr",
        "flags": 0,
        "bytes_key": 4,
        "bytes_value": 8,
        "max_entries": 10240,
        "bytes_memlock": 921600,
        "pinned": []
    }
]

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 12:35:41 +09:00
Prashant Bhole 1852719658 tools: bpftool: open pinned object without type check
This was needed for opening any file in bpf-fs without knowing
its object type

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 12:34:42 +09:00
David S. Miller 329fca60a9 Merge branch 'BPF-directed-error-injection'
Josef Bacik says:

====================
Add the ability to do BPF directed error injection

I'm sending this through Dave since it'll conflict with other BPF changes in his
tree, but since it touches tracing as well Dave would like a review from
somebody on the tracing side.

v4->v5:
- disallow kprobe_override programs from being put in the prog map array so we
  don't tail call into something we didn't check.  This allows us to make the
  normal path still fast without a bunch of percpu operations.

v3->v4:
- fix a build error found by kbuild test bot (I didn't wait long enough
  apparently.)
- Added a warning message as per Daniels suggestion.

v2->v3:
- added a ->kprobe_override flag to bpf_prog.
- added some sanity checks to disallow attaching bpf progs that have
  ->kprobe_override set that aren't for ftrace kprobes.
- added the trace_kprobe_ftrace helper to check if the trace_event_call is a
  ftrace kprobe.
- renamed bpf_kprobe_state to bpf_kprobe_override, fixed it so we only read this
  value in the kprobe path, and thus only write to it if we're overriding or
  clearing the override.

v1->v2:
- moved things around to make sure that bpf_override_return could really only be
  used for an ftrace kprobe.
- killed the special return values from trace_call_bpf.
- renamed pc_modified to bpf_kprobe_state so bpf_override_return could tell if
  it was being called from an ftrace kprobe context.
- reworked the logic in kprobe_perf_func to take advantage of bpf_kprobe_state.
- updated the test as per Alexei's review.

- Original message -

A lot of our error paths are not well tested because we have no good way of
injecting errors generically.  Some subystems (block, memory) have ways to
inject errors, but they are random so it's hard to get reproduceable results.

With BPF we can add determinism to our error injection.  We can use kprobes and
other things to verify we are injecting errors at the exact case we are trying
to test.  This patch gives us the tool to actual do the error injection part.
It is very simple, we just set the return value of the pt_regs we're given to
whatever we provide, and then override the PC with a dummy function that simply
returns.

Right now this only works on x86, but it would be simple enough to expand to
other architectures.  Thanks,
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 12:18:06 +09:00
Josef Bacik eafb3401fa samples/bpf: add a test for bpf_override_return
This adds a basic test for bpf_override_return to verify it works.  We
override the main function for mounting a btrfs fs so it'll return
-ENOMEM and then make sure that trying to mount a btrfs fs will fail.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 12:18:06 +09:00
Josef Bacik dd0bb688ea bpf: add a bpf_override_function helper
Error injection is sloppy and very ad-hoc.  BPF could fill this niche
perfectly with it's kprobe functionality.  We could make sure errors are
only triggered in specific call chains that we care about with very
specific situations.  Accomplish this with the bpf_override_funciton
helper.  This will modify the probe'd callers return value to the
specified value and set the PC to an override function that simply
returns, bypassing the originally probed function.  This gives us a nice
clean way to implement systematic error injection for all of our code
paths.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 12:18:05 +09:00