Find a file
David S. Miller 2af48d430a Merge branch 'fib6-rcu'
Wei Wang says:

====================
ipv6: replace rwlock with rcu and spinlock in fib6 table

Currently, fib6 table is protected by rwlock. During route lookup,
reader lock is taken and during route insertion, deletion or
modification, writer lock is taken. This is a very inefficient
implementation because the fastpath always has to do the operation
to grab the reader lock.
According to my latest syn flood test on an iota ivybridage machine
with 2 10G mlx nics bonded together, each with 8 rx queues on 2 NUMA
nodes, and with the upstream net-next kernel:
ipv4 stack can handle around 4.2Mpps
ipv6 stack can handle around 1.3Mpps

In order to close the gap of the performance number between ipv4
and ipv6 stack, this patch series tries to get rid of the usage of
the rwlock and replace it with rcu and spinlock protection. This will
greatly speed up the fastpath performance as it only needs to hold
rcu which is much less expensive than grabbing the reader lock. It
also makes ipv6 fib implementation more consistent with ipv4.

In order to be able to replace the current rwlock with rcu and
spinlock, some preparation work is needed:
Patch 1-8 introduces a per-route hash table (protected by rcu and a
different spinlock) to store all cached routes created by pmtu and ip
redirect under its main route. This makes the main fib6 tree only
contain static routes.
Patch 9-14 prepares all the reader path to be ready to tolerate
concurrent writer.
Patch 15 finally does the rwlock to rcu and spinlock conversion.
Patch 16 takes care of rt6_stats.

After this patch series, in the same syn flood test,
ipv6 stack can now handle around 3.5Mpps compared to previous 1.3Mpps
in my test setup.

After this patch series, there are still some improvements that should
be done in ipv6 stack:
1. During route lookup, dst_use() is called everytime on the selected
route to update dst->__use and dst->lastuse. This dirties the cacheline
and causes extra cacheline miss and should be avoided.
2. when no route is found in the current table, net->ip6.ipv6_null_entry
is used and refcnt is taken on it. As there is no pcpu cache for this
specific route, frequent change on the refcnt for this route causes
quite some cacheline misses.
And to make things worse, if CONFIG_IPV6_MULTIPLE_TABLES is defined,
output path route lookup always starts with local table first and
guarantees to hit net->ipv6.ip6_null_entry before continuing to do
lookup in the main table.
These operations on net->ipv6.ip6_null_entry could potentially be
avoided.
3. ipv6 input path route lookup grabs refcnt on dst. This is different
from ipv4. We could potentially change this behavior to let ipv6 input
path route lookup not to grab refcnt on dst. However, it does not give
us much performance boost as we currently have pcpu route cache for
input path as well in ipv6. But this work probably is still worth doing
to unify ipv6 and ipv4 route lookup behavior.

The above issues will be addressed separately after this patch series
has been accepted.

This is a joint work with Martin KaFai Lau and Eric Dumazet. And many
many thanks to them for their inspiring ideas and big big code review
efforts.
====================

Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-07 21:22:59 +01:00
arch Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-05 18:19:22 -07:00
block block: fix a crash caused by wrong API 2017-09-25 08:56:05 -06:00
certs modsign: add markers to endif-statements in certs/Makefile 2017-07-14 11:01:37 +10:00
crypto crypto: af_alg - update correct dst SGL entry 2017-09-20 17:42:42 +08:00
Documentation Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-05 18:19:22 -07:00
drivers bnxt_en: don't consider building bnxt_tc.o if option not enabled 2017-10-07 03:00:26 +01:00
firmware firmware: Restore support for built-in firmware 2017-09-16 10:58:48 -07:00
fs Merge branch 'akpm' (patches from Andrew) 2017-10-04 09:30:50 -07:00
include ipv6: take care of rt6_stats 2017-10-07 21:22:58 +01:00
init Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-09-14 18:54:01 -07:00
ipc fix a typo in put_compat_shm_info() 2017-09-25 20:41:46 -04:00
kernel Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-05 18:19:22 -07:00
lib Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-05 18:19:22 -07:00
mm mm/memory_hotplug: define find_{smallest|biggest}_section_pfn as unsigned long 2017-10-03 17:54:26 -07:00
net ipv6: take care of rt6_stats 2017-10-07 21:22:58 +01:00
samples samples/bpf: xdp_monitor increase memory rlimit 2017-10-06 10:04:36 -07:00
scripts Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-05 18:19:22 -07:00
security lsm: fix smack_inode_removexattr and xattr_getsecurity memleak 2017-10-04 18:03:15 +11:00
sound sound fixes for 4.14-rc4 2017-10-05 10:39:29 -07:00
tools libbpf: use map_flags when creating maps 2017-10-05 21:42:28 -07:00
usr ramfs: clarify help text that compression applies to ramfs as well as legacy ramdisk. 2017-07-06 16:24:30 -07:00
virt Revert "KVM: Don't accept obviously wrong gsi values via KVM_IRQFD" 2017-09-19 08:37:17 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Add hch to .get_maintainer.ignore 2015-08-21 14:30:10 -07:00
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore kbuild: Add support to generate LLVM assembly files 2017-04-25 08:13:52 +09:00
.mailmap Update James Hogan's email address 2017-10-04 17:11:53 -07:00
COPYING
CREDITS selinux/stable-4.14 PR 20170831 2017-09-12 13:21:00 -07:00
Kbuild kbuild: Consolidate header generation from ASM offset information 2017-04-13 05:43:37 +09:00
Kconfig
MAINTAINERS VSOCK: add tools/testing/vsock/vsock_diag_test 2017-10-05 18:44:17 -07:00
Makefile Linux 4.14-rc3 2017-10-01 14:54:54 -07:00
README README: add a new README file, pointing to the Documentation/ 2016-10-24 08:12:35 -02:00

Linux kernel
============

This file was moved to Documentation/admin-guide/README.rst

Please notice that there are several guides for kernel developers and users.
These guides can be rendered in a number of formats, like HTML and PDF.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.