linux/net
Daniel Borkmann 628185cfdd net, sched: fix soft lockup in tc_classify
Shahar reported a soft lockup in tc_classify(), where we run into an
endless loop when walking the classifier chain due to tp->next == tp
which is a state we should never run into. The issue only seems to
trigger under load in the tc control path.

What happens is that in tc_ctl_tfilter(), thread A allocates a new
tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
with it. In that classifier callback we had to unlock/lock the rtnl
mutex and returned with -EAGAIN. One reason why we need to drop there
is, for example, that we need to request an action module to be loaded.

This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
after we loaded and found the requested action, we need to redo the
whole request so we don't race against others. While we had to unlock
rtnl in that time, thread B's request was processed next on that CPU.
Thread B added a new tp instance successfully to the classifier chain.
When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
and destroying its tp instance which never got linked, we goto replay
and redo A's request.

This time when walking the classifier chain in tc_ctl_tfilter() for
checking for existing tp instances we had a priority match and found
the tp instance that was created and linked by thread B. Now calling
again into tp->ops->change() with that tp was successful and returned
without error.

tp_created was never cleared in the second round, thus kernel thinks
that we need to link it into the classifier chain (once again). tp and
*back point to the same object due to the match we had earlier on. Thus
for thread B's already public tp, we reset tp->next to tp itself and
link it into the chain, which eventually causes the mentioned endless
loop in tc_classify() once a packet hits the data path.

Fix is to clear tp_created at the beginning of each request, also when
we replay it. On the paths that can cause -EAGAIN we already destroy
the original tp instance we had and on replay we really need to start
from scratch. It seems that this issue was first introduced in commit
12186be7d2 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
and avoid kernel panic when we use cls_cgroup").

Fixes: 12186be7d2 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
Reported-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-26 11:24:10 -05:00
..
6lowpan
9p IB/core: add support to create a unsafe global rkey to ib_create_pd 2016-09-23 13:47:44 -04:00
802 net: use core MTU range checking in misc drivers 2016-10-20 14:51:10 -04:00
8021q netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
appletalk
atm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-12-17 20:17:04 -08:00
ax25
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-12-06 21:33:19 -05:00
bluetooth Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-12-16 10:24:44 -08:00
bridge net: bridge: shorten ageing time on topology change 2016-12-10 21:28:28 -05:00
caif Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-12-06 21:33:19 -05:00
can can: raw: raw_setsockopt: limit number of can_filter that can be set 2016-12-07 10:45:57 +01:00
ceph libceph: remove now unused finish_request() wrapper 2016-12-14 22:39:08 +01:00
core neigh: Send netevent after marking neigh as dead 2016-12-23 12:31:18 -05:00
dcb net: dcb: set error code on failures 2016-12-03 23:54:25 -05:00
dccp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-12-03 12:29:53 -05:00
decnet net: use designated initializers 2016-12-17 11:56:57 -05:00
dns_resolver
dsa Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-12-03 12:29:53 -05:00
ethernet net: make default TX queue length a defined constant 2016-11-07 20:15:55 -05:00
hsr Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-10-30 12:42:58 -04:00
ieee802154 Makefile: drop -D__CHECK_ENDIAN__ from cflags 2016-12-16 00:13:43 +02:00
ipv4 inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets 2016-12-23 12:19:18 -05:00
ipv6 ipv6: handle -EFAULT from skb_copy_bits 2016-12-23 12:20:39 -05:00
ipx
irda irda: irnet: add member name to the miscdevice declaration 2016-12-17 11:29:31 -05:00
iucv Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 19:25:04 -08:00
kcm Merge branch 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-07 15:36:58 -07:00
key netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
l2tp net: l2tp: ppp: change PPPOL2TP_MSG_* => L2TP_MSG_* 2016-12-10 23:29:11 -05:00
l3mdev
lapb
llc net: fix sleeping for sk_wait_event() 2016-11-14 13:17:21 -05:00
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-12-17 20:17:04 -08:00
mac802154 Makefile: drop -D__CHECK_ENDIAN__ from cflags 2016-12-16 00:13:43 +02:00
mpls Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-12-06 21:33:19 -05:00
ncsi net/ncsi: Improve HNCDSC AEN handler 2016-10-20 11:23:08 -04:00
netfilter netfilter: nft_counter: rework atomic dump and reset 2016-12-11 10:01:05 -05:00
netlabel genetlink: mark families as __ro_after_init 2016-10-27 16:16:09 -04:00
netlink netlink: use blocking notifier 2016-12-10 17:25:58 -05:00
netrom
nfc genetlink: mark families as __ro_after_init 2016-10-27 16:16:09 -04:00
openvswitch openvswitch: Add a missing break statement. 2016-12-20 14:07:41 -05:00
packet Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-12-16 10:24:44 -08:00
phonet netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
qrtr
rds RDS: use rb_entry() 2016-12-20 14:22:49 -05:00
rfkill
rose
rxrpc rxrpc: abstract away knowledge of IDR internals 2016-12-14 16:04:10 -08:00
sched net, sched: fix soft lockup in tc_classify 2016-12-26 11:24:10 -05:00
sctp sctp: fix recovering from 0 win with small data chunks 2016-12-23 14:01:35 -05:00
strparser strparser: Propagate correct error code in strp_recv() 2016-10-12 01:51:49 -04:00
sunrpc The one new feature is support for a new NFSv4.2 mode_umask attribute 2016-12-16 10:48:28 -08:00
switchdev Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-10-30 12:42:58 -04:00
tipc tipc: don't send FIN message from connectionless socket 2016-12-23 17:53:47 -05:00
unix Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs 2016-12-16 10:58:12 -08:00
vmw_vsock Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-12-17 20:17:04 -08:00
wimax genetlink: mark families as __ro_after_init 2016-10-27 16:16:09 -04:00
wireless Makefile: drop -D__CHECK_ENDIAN__ from cflags 2016-12-16 00:13:43 +02:00
x25 net/x25: use designated initializers 2016-12-17 11:56:57 -05:00
xfrm Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 19:25:04 -08:00
compat.c
Kconfig bpf: BPF for lightweight tunnel infrastructure 2016-12-02 10:51:49 -05:00
Makefile
socket.c net: socket: removed an unnecessary newline 2016-12-10 17:27:07 -05:00
sysctl_net.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2016-10-06 09:52:23 -07:00