Find a file
Daniel Borkmann 710ad98c36 veth: Do not record rx queue hint in veth_xmit
Laurent reported that they have seen a significant amount of TCP retransmissions
at high throughput from applications residing in network namespaces talking to
the outside world via veths. The drops were seen on the qdisc layer (fq_codel,
as per systemd default) of the phys device such as ena or virtio_net due to all
traffic hitting a _single_ TX queue _despite_ multi-queue device. (Note that the
setup was _not_ using XDP on veths as the issue is generic.)

More specifically, after edbea92202 ("veth: Store queue_mapping independently
of XDP prog presence") which made it all the way back to v4.19.184+,
skb_record_rx_queue() would set skb->queue_mapping to 1 (given 1 RX and 1 TX
queue by default for veths) instead of leaving at 0.

This is eventually retained and callbacks like ena_select_queue() will also pick
single queue via netdev_core_pick_tx()'s ndo_select_queue() once all the traffic
is forwarded to that device via upper stack or other means. Similarly, for others
not implementing ndo_select_queue() if XPS is disabled, netdev_pick_tx() might
call into the skb_tx_hash() and check for prior skb_rx_queue_recorded() as well.

In general, it is a _bad_ idea for virtual devices like veth to mess around with
queue selection [by default]. Given dev->real_num_tx_queues is by default 1,
the skb->queue_mapping was left untouched, and so prior to edbea92202 the
netdev_core_pick_tx() could do its job upon __dev_queue_xmit() on the phys device.

Unbreak this and restore prior behavior by removing the skb_record_rx_queue()
from veth_xmit() altogether.

If the veth peer has an XDP program attached, then it would return the first RX
queue index in xdp_md->rx_queue_index (unless configured in non-default manner).
However, this is still better than breaking the generic case.

Fixes: edbea92202 ("veth: Store queue_mapping independently of XDP prog presence")
Fixes: 638264dc90 ("veth: Support per queue XDP ring")
Reported-by: Laurent Bernaille <laurent.bernaille@datadoghq.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-06 13:49:54 +00:00
arch MIPS: lantiq: dma: increase descritor count 2022-01-05 17:18:03 -08:00
block block-5.16-2021-12-19 2021-12-19 12:38:53 -08:00
certs certs: Add support for using elliptic curve keys for signing modules 2021-08-23 19:55:42 +03:00
crypto Update to zstd-1.4.10 2021-11-13 15:32:30 -08:00
Documentation Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-01-05 14:36:10 -08:00
drivers veth: Do not record rx queue hint in veth_xmit 2022-01-06 13:49:54 +00:00
fs Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-01-05 14:36:10 -08:00
include gro: add ability to control gro max packet size 2022-01-06 12:27:05 +00:00
init kbuild: Fix -Wimplicit-fallthrough=5 error for GCC 5.x and 6.x 2021-11-14 18:59:49 -08:00
ipc shm: extend forced shm destroy to support objects from several IPC nses 2021-11-20 10:35:54 -08:00
kernel Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-12-31 14:35:40 +00:00
lib lib: objagg: Use the bitmap API when applicable 2021-12-24 14:54:29 -08:00
LICENSES LICENSES/dual/CC-BY-4.0: Git rid of "smart quotes" 2021-07-15 06:31:24 -06:00
mm mm: vmscan: reduce throttling due to a failure to make progress -fix 2021-12-31 13:12:55 -08:00
net ethtool: use phydev variable 2022-01-06 12:33:35 +00:00
samples Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-12-31 14:35:40 +00:00
scripts recordmcount.pl: fix typo in s390 mcount regex 2021-12-24 10:20:12 +01:00
security Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-12-31 14:35:40 +00:00
sound sound fixes for 5.16-rc7 2021-12-23 09:55:58 -08:00
tools gro: add ability to control gro max packet size 2022-01-06 12:27:05 +00:00
usr initramfs: Check timestamp to prevent broken cpio archive 2021-10-24 13:48:40 +09:00
virt KVM: downgrade two BUG_ONs to WARN_ON_ONCE 2021-11-26 06:43:28 -05:00
.clang-format clang-format: Update with the latest for_each macro list 2021-05-12 23:32:39 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: ignore only top-level modules.builtin 2021-05-02 00:43:35 +09:00
.mailmap mailmap: update email address for Guo Ren 2021-12-10 17:10:55 -08:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Move Daniel Drake to credits 2021-09-21 08:34:58 +03:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-01-05 14:36:10 -08:00
Makefile Linux 5.16-rc8 2022-01-02 14:23:25 -08:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.