Now sctp holds read_lock when foreach sctp_ep_hashtable without disabling
BH. If CPU schedules to another thread A at this moment, the thread A may
be trying to hold the write_lock with disabling BH.
As BH is disabled and CPU cannot schedule back to the thread holding the
read_lock, while the thread A keeps waiting for the read_lock. A dead
lock would be triggered by this.
This patch is to fix this dead lock by calling read_lock_bh instead to
disable BH when holding the read_lock in sctp_for_each_endpoint.
Fixes: 626d16f50f ("sctp: export some apis or variables for sctp_diag and reuse some for proc")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 2b30842b23 ("net: fec: Clear and enable MIB counters on imx51")
introduced fec_enet_clear_ethtool_stats(), but missed to add a stub
for the CONFIG_M5272=y case, causing build failure for the
m5272c3_defconfig.
Add the missing empty stub to fix the build failure.
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes a counter problem on 32bit systems:
When the rx_bytes counter reached 2 GiB, it jumpd to (2^64 Bytes - 2GiB) Bytes.
rtnl_link_stats64 has __u64 type and atomic_long_read returns
atomic_long_t which is signed. Due to the conversation
we get an incorrect value on 32bit systems if the MSB of
the atomic_long_t value is set.
CC: Tom Parkin <tparkin@katalix.com>
Fixes: 7b7c0719cd ("l2tp: avoid deadlock in l2tp stats update")
Signed-off-by: Dominik Heidler <dheidler@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function is not defined, so no need to declare it.
As I don't have the hardware, I'd be very pleased if
someone may test this patch.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz says:
====================
bnx2x: Fix malicious VFs indication
It was discovered that for a VF there's a simple [yet uncommon] scenario
which would cause device firmware to declare that VF as malicious -
Add a vlan interface on top of a VF and disable txvlan offloading for
that VF [causing VF to transmit packets where vlan is on payload].
Patch #1 corrects driver transmission to prevent this issue.
Patch #2 is a by-product correcting PF behavior once a VF is declared
malicious.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Once firmware indicates that a given VF is malicious and until
that VF passes an FLR all bets are off - PF can't know anything
is happening to the VF [since VF can't communicate anything to its PF].
But PF is currently still periodically asking device to collect
statistics for the VF which might in turn fill logs by IOMMU blocking
memory access done by the VF's PCI function [in the case VF has unmapped
its buffers].
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VF clients are configured as enforced, meaning firmware is validating
the correctness of their ethertype/vid during transmission.
Once txvlan is disabled, VF would start getting SKBs for transmission
here vlan is on the payload - but it'll pass the packet's ethertype
instead of the vid, leading to firmware declaring it as malicious.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEE4bay/IylYqM/npjQHv7KIOw4HPYFAlk6mLwTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRAe/sog7Dgc9vx5CADTRLfMxgmmGsLcM8nO0sKn/7X/o4Hy
qprZy1Nu4J49kG/93pumIvWkcG9fzZ4hbDy+4DiEmL/fIumbvg56aaW90fyvW9Cl
HzL3S7Id8Db6Qwfr4Wd7r1IRYn08+om4ukO09GdskmMpS4wOr68+3WhutGGu1mEX
ZS2sTE/dKnozZqUw6agcwXJmAJWFbYBaVcF+x2GT07ako0jYXF29zqe/GlpmzVBJ
Wmib9DO2BeDm/QpNIot/I3Se86xS0AIH6ds++YRJX6k9dlFb4T2Liw6UysPSVCz6
QkAT3SiTu3TKoT1TXEyMQEQSpnrFtpMowdTk/s04bKibVIpTTskZm44c
=DNKW
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-4.12-20170609' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2017-06-09
this is a pull request of 6 patches for net/master.
There's a patch by Stephane Grosjean that fixes an uninitialized symbol warning
in the peak_canfd driver. A patch by Johan Hovold to fix the product-id
endianness in an error message in the the peak_usb driver. A patch by Oliver
Hartkopp to enable CAN FD for virtual CAN devices by default. Three patches by
me, one makes the helper function can_change_state() robust to be called with
cf == NULL. The next patch fixes a memory leak in the gs_usb driver. And the
last one fixes a lockdep splat by properly initialize the per-net
can_rcvlists_lock spin_lock.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The change to remove free_netdev() from ieee80211_if_free()
erroneously didn't add the necessary free_netdev() for when
ieee80211_if_free() is called directly in one place, rather
than as the priv_destructor. Add the missing call.
Fixes: cf124db566 ("net: Fix inconsistent teardown and release of private netdev state.")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPI's from the victim cpu are not handled in dev_cpu_callback.
So these pending IPI's would be sent to the remote cpu only when
NET_RX is scheduled on the victim cpu and since this trigger is
unpredictable it would result in packet latencies on the remote cpu.
This patch add support to send the pending ipi's of victim cpu.
Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
1.) Bugfix of function stmmac_get_tx_hwtstamp.
Corrected the tx timestamp available check (same as 4.8 and older)
Change printout from info syslevel to debug.
2.) Bugfix of function stmmac_get_rx_hwtstamp.
Corrected the rx timestamp available check (same as 4.8 and older)
Change printout from info syslevel to debug.
Fixes: ba1ffd74df ("stmmac: fix PTP support for GMAC4")
Signed-off-by: Mario Molitor <mario_molitor@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
According the CYCLON V documention only the bit 16 of snaptypesel should
set.
(more information see Table 17-20 (cv_5v4.pdf) :
Timestamp Snapshot Dependency on Register Bits)
Fixes: d2042052a0 ("stmmac: update the PTP header file")
Signed-off-by: Mario Molitor <mario_molitor@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
It looks like this:
Message from syslogd@flamingo at Apr 26 00:45:00 ...
kernel:unregister_netdevice: waiting for lo to become free. Usage count = 4
They seem to coincide with net namespace teardown.
The message is emitted by netdev_wait_allrefs().
Forced a kdump in netdev_run_todo, but found that the refcount on the lo
device was already 0 at the time we got to the panic.
Used bcc to check the blocking in netdev_run_todo. The only places
where we're off cpu there are in the rcu_barrier() and msleep() calls.
That behavior is expected. The msleep time coincides with the amount of
time we spend waiting for the refcount to reach zero; the rcu_barrier()
wait times are not excessive.
After looking through the list of callbacks that the netdevice notifiers
invoke in this path, it appears that the dst_dev_event is the most
interesting. The dst_ifdown path places a hold on the loopback_dev as
part of releasing the dev associated with the original dst cache entry.
Most of our notifier callbacks are straight-forward, but this one a)
looks complex, and b) places a hold on the network interface in
question.
I constructed a new bcc script that watches various events in the
liftime of a dst cache entry. Note that dst_ifdown will take a hold on
the loopback device until the invalidated dst entry gets freed.
[ __dst_free] on DST: ffff883ccabb7900 IF tap1008300eth0 invoked at 1282115677036183
__dst_free
rcu_nocb_kthread
kthread
ret_from_fork
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Verify that the caller-provided sockaddr structure is large enough to
contain the sa_family field, before accessing it in bind() and connect()
handlers of the AF_UNIX socket. Since neither syscall enforces a minimum
size of the corresponding memory region, very short sockaddrs (zero or
one byte long) result in operating on uninitialized memory while
referencing .sa_family.
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 0d7e2d2166 ("IB/ipoib: add get_link_ksettings in ethtool")
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CAN FD capable CAN interfaces can handle (classic) CAN 2.0 frames too.
New users usually fail at their first attempt to explore CAN FD on
virtual CAN interfaces due to the current CAN_MTU default.
Set the MTU to CANFD_MTU by default to reduce this confusion.
If someone *really* needs a 'classic CAN'-only device this can be set
with the 'ip' tool with e.g. 'ip link set vcan0 mtu 16' as before.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch adds the missing kfree() in gs_cmd_reset() to free the
memory that is not used anymore after usb_control_msg().
Cc: linux-stable <stable@vger.kernel.org>
Cc: Maximilian Schneider <max@schneidersoft.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Make sure to use the USB device product-id stored in host-byte order in
a probe error message.
Also remove a redundant reassignment of the local usb_dev variable which
had already been used to retrieve the product id.
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch fixes two uninitialized symbol warnings in the new code adding
support of the PEAK-System PCAN-PCI Express FD boards, in the socket-CAN
network protocol family.
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
In OOM situations where no skb can be allocated, can_change_state() may
be called with cf == NULL. As this function updates the state and error
statistics it's not an option to skip the call to can_change_state() in
OOM situations.
This patch makes can_change_state() robust, so that it can be called
with cf == NULL.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Commit 1aa6c4f6b8 ("net: vrf: Add l3mdev rules on first device create")
adds the l3mdev FIB rule the first time a VRF device is created. However,
it only creates the rule once and only in the namespace the first device
is created - which may not be init_net. Fix by using the net_generic
capability to make the add_fib_rules flag per network namespace.
Fixes: 1aa6c4f6b8 ("net: vrf: Add l3mdev rules on first device create")
Reported-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I noticed that test_l4lb was failing in selftests:
# ./test_progs
test_pkt_access:PASS:ipv4 77 nsec
test_pkt_access:PASS:ipv6 44 nsec
test_xdp:PASS:ipv4 2933 nsec
test_xdp:PASS:ipv6 1500 nsec
test_l4lb:PASS:ipv4 377 nsec
test_l4lb:PASS:ipv6 544 nsec
test_l4lb:FAIL:stats 6297600000 200000
test_tcp_estats:PASS: 0 nsec
Summary: 7 PASSED, 1 FAILED
Tracking down the issue actually revealed that endianness selection
in bpf_endian.h is broken when compiled with clang with bpf target.
test_pkt_access.c, test_l4lb.c is compiled with __BYTE_ORDER as
__BIG_ENDIAN, test_xdp.c as __LITTLE_ENDIAN! test_l4lb noticeably
fails, because the test accounts bytes via bpf_ntohs(ip6h->payload_len)
and bpf_ntohs(iph->tot_len), and compares them against a defined
value and given a wrong endianness, the test outcome is different,
of course.
Turns out that there are actually two bugs: i) when we do __BYTE_ORDER
comparison with __LITTLE_ENDIAN/__BIG_ENDIAN, then depending on the
include order we see different outcomes. Reason is that __BYTE_ORDER
is undefined due to missing endian.h include. Before we include the
asm/byteorder.h (e.g. through linux/in.h), then __BYTE_ORDER equals
__LITTLE_ENDIAN since both are undefined, after the include which
correctly pulls in linux/byteorder/little_endian.h, __LITTLE_ENDIAN
is defined, but given __BYTE_ORDER is still undefined, we match on
__BYTE_ORDER equals to __BIG_ENDIAN since __BIG_ENDIAN is also
undefined at that point, sigh. ii) But even that would be wrong,
since when compiling the test cases with clang, one can select between
bpfeb and bpfel targets for cross compilation. Hence, we can also not
rely on what the system's endian.h provides, but we need to look at
the compiler's defined endianness. The compiler defines __BYTE_ORDER__,
and we can match __ORDER_LITTLE_ENDIAN__ and __ORDER_BIG_ENDIAN__,
which also reflects targets bpf (native), bpfel, bpfeb correctly,
thus really only rely on that. After patch:
# ./test_progs
test_pkt_access:PASS:ipv4 74 nsec
test_pkt_access:PASS:ipv6 42 nsec
test_xdp:PASS:ipv4 2340 nsec
test_xdp:PASS:ipv6 1461 nsec
test_l4lb:PASS:ipv4 400 nsec
test_l4lb:PASS:ipv6 530 nsec
test_tcp_estats:PASS: 0 nsec
Summary: 7 PASSED, 0 FAILED
Fixes: 43bcf707cc ("bpf: fix _htons occurences in test_progs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each time a new speed is added, the bonding 802.3ad isn't updated. Add a
comment to remind the developer to update this driver.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds 14 Gbps enum definition, and fixes
aggregated bandwidth calculation based on above slave links.
Fixes: 0d7e2d2166 ("IB/ipoib: add get_link_ksettings in ethtool")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds [5|50] Gbps enum definition, and fixes
aggregated bandwidth calculation based on above slave links.
Fixes: c9a70d4346 ("net-next: ethtool: Added port speed macros.")
Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The first netlink attribute (value 0) must always be defined
as none/unspec.
Because we cannot change an existing UAPI, I add a comment to point the
mistake and avoid to propagate it in a new ovs API in the future.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While discussing the possible merits of clang warning about unused initialized
functions, I found one function that was clearly meant to be called but
never actually is.
__ila_hash_secret_init() initializes the hash value for the ila locator,
apparently this is intended to prevent hash collision attacks, but this ends
up being a read-only zero constant since there is no caller. I could find
no indication of why it was never called, the earliest patch submission
for the module already was like this. If my interpretation is right, we
certainly want to backport the patch to stable kernels as well.
I considered adding it to the ila_xlat_init callback, but for best effect
the random data is read as late as possible, just before it is first used.
The underlying net_get_random_once() is already highly optimized to avoid
overhead when called frequently.
Fixes: 7f00feaf10 ("ila: Add generic ILA translation facility")
Cc: stable@vger.kernel.org
Link: https://www.spinics.net/lists/kernel/msg2527243.html
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/staging/rtl8723bs/os_dep/ioctl_cfg80211.c: In function ‘rtw_cfg80211_add_monitor_if’:
drivers/staging/rtl8723bs/os_dep/ioctl_cfg80211.c:2670:10: error: ‘struct net_device’ has no member named ‘destructor’
mon_ndev->destructor = rtw_ndev_destructor;
^
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger says:
====================
netvsc: bug fixes
These are bugfixes for netvsc driver in 4.12.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The work queue and handling of network filter parameters should
be in rndis_device. This gets rid of warning from RCU checks,
eliminates a race and cleans up code.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ndo_poll_controller function needs to schedule NAPI to pick
up arriving packets and send completions. Otherwise no data
will ever be received. For simple case of netconsole, it also
will allow send completions to happen. Without this netpoll
will eventually get stuck.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool info command calls the netvsc get_sset_count with RTNL
but not with RCU. Which causes warning:
drivers/net/hyperv/netvsc_drv.c:1010 suspicious rcu_dereference_check() usage!
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa reported attempts to delete a bond device that is referenced in a
multipath route is hanging:
$ ifdown bond2 # ifupdown2 command that deletes virtual devices
unregister_netdevice: waiting for bond2 to become free. Usage count = 2
Steps to reproduce:
echo 1 > /proc/sys/net/ipv6/conf/all/ignore_routes_with_linkdown
ip link add dev bond12 type bond
ip link add dev bond13 type bond
ip addr add 2001:db8:2::0/64 dev bond12
ip addr add 2001:db8:3::0/64 dev bond13
ip route add 2001:db8:33::0/64 nexthop via 2001:db8:2::2 nexthop via 2001:db8:3::2
ip link del dev bond12
ip link del dev bond13
The root cause is the recent change to keep routes on a linkdown. Update
the check to detect when the device is unregistering and release the
route for that case.
Fixes: a1a22c1206 ("net: ipv6: Keep nexthop of multipath route on admin down")
Reported-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the structure's fields are not initialized by the
rtnetlink. If driver doesn't set those in ndo_get_vf_config(),
they'd leak memory to user.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
CC: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Verify that the length of the socket buffer is sufficient to cover the
nlmsghdr structure before accessing the nlh->nlmsg_len field for further
input sanitization. If the client only supplies 1-3 bytes of data in
sk_buff, then nlh->nlmsg_len remains partially uninitialized and
contains leftover memory from the corresponding kernel allocation.
Operating on such data may result in indeterminate evaluation of the
nlmsg_len < sizeof(*nlh) expression.
The bug was discovered by a runtime instrumentation designed to detect
use of uninitialized memory in the kernel. The patch prevents this and
other similar tools (e.g. KMSAN) from flagging this behavior in the future.
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 85eac2ba35.
There is an updated version of this fix which we should
use instead.
Signed-off-by: David S. Miller <davem@davemloft.net>
emac_mdio_read_link() was not copying the requested phy settings
back into the emac driver's own phy api. This has caused a link
speed mismatch issue for the AR8035 as the emac driver kept
trying to connect with 10/100MBps on a 1GBit/s link.
This patch also unifies shared code between emac_setup_aneg()
and emac_mdio_setup_forced(). And furthermore it removes
a chunk of emac_mdio_init_phy(), that was copying the same
data into itself.
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a problem where the AR8035 PHY can't be
detected on an Cisco Meraki MR24, if the ethernet cable is
not connected on boot.
Russell Senior provided steps to reproduce the issue:
|Disconnect ethernet cable, apply power, wait until device has booted,
|plug in ethernet, check for interfaces, no eth0 is listed.
|
|This appears to be a problem during probing of the AR8035 Phy chip.
|When ethernet has no link, the phy detection fails, and eth0 is not
|created. Plugging ethernet later has no effect, because there is no
|interface as far as the kernel is concerned. The relevant part of
|the boot log looks like this:
|this is the failing case:
|
|[ 0.876611] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
|[ 0.882532] /plb/opb/ethernet@ef600c00: reset timeout
|[ 0.888546] /plb/opb/ethernet@ef600c00: can't find PHY!
|and the succeeding case:
|
|[ 0.876672] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
|[ 0.883952] eth0: EMAC-0 /plb/opb/ethernet@ef600c00, MAC 00:01:..
|[ 0.890822] eth0: found Atheros 8035 Gigabit Ethernet PHY (0x01)
Based on the comment and the commit message of
commit 23fbb5a87c ("emac: Fix EMAC soft reset on 460EX/GT").
This is because the AR8035 PHY doesn't provide the TX Clock,
if the ethernet cable is not attached. This causes the reset
to timeout and the PHY detection code in emac_init_phy() is
unable to detect the AR8035 PHY. As a result, the emac driver
bails out early and the user left with no ethernet.
In order to stay compatible with existing configurations, the driver
tries the current reset approach at first. Only if the first attempt
timed out, it does perform one more retry with the clock temporarily
switched to the internal source for just the duration of the reset.
LEDE-Bug: #687 <https://bugs.lede-project.org/index.php?do=details&task_id=687>
Cc: Chris Blake <chrisrblake93@gmail.com>
Reported-by: Russell Senior <russell@personaltelco.net>
Fixes: 23fbb5a87c ("emac: Fix EMAC soft reset on 460EX/GT")
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Verify that the length of the socket buffer is sufficient to cover the
entire nlh->nlmsg_len field before accessing that field for further
input sanitization. If the client only supplies 1-3 bytes of data in
sk_buff, then nlh->nlmsg_len remains partially uninitialized and
contains leftover memory from the corresponding kernel allocation.
Operating on such data may result in indeterminate evaluation of the
nlmsg_len < sizeof(*nlh) expression.
The bug was discovered by a runtime instrumentation designed to detect
use of uninitialized memory in the kernel. The patch prevents this and
other similar tools (e.g. KMSAN) from flagging this behavior in the future.
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
> ../drivers/hsi/clients/ssi_protocol.c:1069:5: error: 'struct net_device' has no member named 'destructor'
Reported-by: Mark Brown <broonie@kernel.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Network devices can allocate reasources and private memory using
netdev_ops->ndo_init(). However, the release of these resources
can occur in one of two different places.
Either netdev_ops->ndo_uninit() or netdev->destructor().
The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.
netdev_ops->ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.
netdev->destructor(), on the other hand, does not run until the
netdev references all go away.
Further complicating the situation is that netdev->destructor()
almost universally does also a free_netdev().
This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.
If netdev_ops->ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops->ndo_uninit(). But
it is not able to invoke netdev->destructor().
This is because netdev->destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.
However, this means that the resources that would normally be released
by netdev->destructor() will not be.
Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.
Many drivers do not try to deal with this, and instead we have leaks.
Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev->destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().
netdev->priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev->destructor(), except for
free_netdev().
netdev->needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().
Now, register_netdevice() can sanely release all resources after
ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
and netdev->priv_destructor().
And at the end of unregister_netdevice(), we invoke
netdev->priv_destructor() and optionally call free_netdev().
Signed-off-by: David S. Miller <davem@davemloft.net>
Will reported that in BPF_XADD we must use a different register in stxr
instruction for the status flag due to otherwise CONSTRAINED UNPREDICTABLE
behavior per architecture. Reference manual says [1]:
If s == t, then one of the following behaviors must occur:
* The instruction is UNDEFINED.
* The instruction executes as a NOP.
* The instruction performs the store to the specified address, but
the value stored is UNKNOWN.
Thus, use a different temporary register for the status flag to fix it.
Disassembly extract from test 226/STX_XADD_DW from test_bpf.ko:
[...]
0000003c: c85f7d4b ldxr x11, [x10]
00000040: 8b07016b add x11, x11, x7
00000044: c80c7d4b stxr w12, x11, [x10]
00000048: 35ffffac cbnz w12, 0x0000003c
[...]
[1] https://static.docs.arm.com/ddi0487/b/DDI0487B_a_armv8_arm.pdf, p.6132
Fixes: 85f68fe898 ("bpf, arm64: implement jiting of BPF_XADD")
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mvpp22_port_mii_set() function was added by 2697582144, but the
function directly returns without doing anything. This return was used
when debugging and wasn't removed before sending the patch. Fix this.
Fixes: 2697582144 ("net: mvpp2: handle misc PPv2.1/PPv2.2 differences")
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing the mtu is currently not supported in the ibmvnic driver.
Implement .ndo_change_mtu in the driver so that attempting to use ifconfig
to change the mtu will fail and present the user with an error message.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 61b905da33 ("net: Rename skb->rxhash to skb->hash")
didn't update the documentation, fix this up.
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When freeing VF's DMA mappings, an already NULLed pointer was checked
again due to an apparent copy&paste error. Consequently, the pf2vf
bulletin DMA mapping was not freed.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
KMSAN reported a use of uninitialized memory in dev_set_alias(),
which was caused by calling strlcpy() (which in turn called strlen())
on the user-supplied non-terminated string.
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Made TCP congestion control documentation match current reality,
from Anmol Sarma.
2) Various build warning and failure fixes from Arnd Bergmann.
3) Fix SKB list leak in ipv6_gso_segment().
4) Use after free in ravb driver, from Eugeniu Rosca.
5) Don't use udp_poll() in ping protocol driver, from Eric Dumazet.
6) Don't crash in PCI error recovery of cxgb4 driver, from Guilherme
Piccoli.
7) _SRC_NAT_DONE_BIT needs to be cleared using atomics, from Liping
Zhang.
8) Use after free in vxlan deletion, from Mark Bloch.
9) Fix ordering of NAPI poll enabled in ethoc driver, from Max
Filippov.
10) Fix stmmac hangs with TSO, from Niklas Cassel.
11) Fix crash in CALIPSO ipv6, from Richard Haines.
12) Clear nh_flags properly on mpls link up. From Roopa Prabhu.
13) Fix regression in sk_err socket error queue handling, noticed by
ping applications. From Soheil Hassas Yeganeh.
14) Update mlx4/mlx5 MAINTAINERS information.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (78 commits)
net: stmmac: fix a broken u32 less than zero check
net: stmmac: fix completely hung TX when using TSO
net: ethoc: enable NAPI before poll may be scheduled
net: bridge: fix a null pointer dereference in br_afspec
ravb: Fix use-after-free on `ifconfig eth0 down`
net/ipv6: Fix CALIPSO causing GPF with datagram support
net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value
Revert "sit: reload iphdr in ipip6_rcv"
i40e/i40evf: proper update of the page_offset field
i40e: Fix state flags for bit set and clean operations of PF
iwlwifi: fix host command memory leaks
iwlwifi: fix min API version for 7265D, 3168, 8000 and 8265
iwlwifi: mvm: clear new beacon command template struct
iwlwifi: mvm: don't fail when removing a key from an inexisting sta
iwlwifi: pcie: only use d0i3 in suspend/resume if system_pm is set to d0i3
iwlwifi: mvm: fix firmware debug restart recording
iwlwifi: tt: move ucode_loaded check under mutex
iwlwifi: mvm: support ibss in dqa mode
iwlwifi: mvm: Fix command queue number on d0i3 flow
iwlwifi: mvm: rs: start using LQ command color
...