linux/arch/arm
Clement Courbet 0ade34c370 lib: optimize cpumask_next_and()
We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
It's essentially a joined iteration in search for a non-zero bit, which is
currently implemented as a lookup join (find a nonzero bit on the lhs,
lookup the rhs to see if it's set there).

Implement a direct join (find a nonzero bit on the incrementally built
join).  Also add generic bitmap benchmarks in the new `test_find_bit`
module for new function (see `find_next_and_bit` in [2] and [3] below).

For cpumask_next_and, direct benchmarking shows that it's 1.17x to 14x
faster with a geometric mean of 2.1 on 32 CPUs [1].  No impact on memory
usage.  Note that on Arm, the new pure-C implementation still outperforms
the old one that uses a mix of C and asm (`find_next_bit`) [3].

[1] Approximate benchmark code:

```
  unsigned long src1p[nr_cpumask_longs] = {pattern1};
  unsigned long src2p[nr_cpumask_longs] = {pattern2};
  for (/*a bunch of repetitions*/) {
    for (int n = -1; n <= nr_cpu_ids; ++n) {
      asm volatile("" : "+rm"(src1p)); // prevent any optimization
      asm volatile("" : "+rm"(src2p));
      unsigned long result = cpumask_next_and(n, src1p, src2p);
      asm volatile("" : "+rm"(result));
    }
  }
```

Results:
pattern1    pattern2     time_before/time_after
0x0000ffff  0x0000ffff   1.65
0x0000ffff  0x00005555   2.24
0x0000ffff  0x00001111   2.94
0x0000ffff  0x00000000   14.0
0x00005555  0x0000ffff   1.67
0x00005555  0x00005555   1.71
0x00005555  0x00001111   1.90
0x00005555  0x00000000   6.58
0x00001111  0x0000ffff   1.46
0x00001111  0x00005555   1.49
0x00001111  0x00001111   1.45
0x00001111  0x00000000   3.10
0x00000000  0x0000ffff   1.18
0x00000000  0x00005555   1.18
0x00000000  0x00001111   1.17
0x00000000  0x00000000   1.25
-----------------------------
               geo.mean  2.06

[2] test_find_next_bit, X86 (skylake)

 [ 3913.477422] Start testing find_bit() with random-filled bitmap
 [ 3913.477847] find_next_bit: 160868 cycles, 16484 iterations
 [ 3913.477933] find_next_zero_bit: 169542 cycles, 16285 iterations
 [ 3913.478036] find_last_bit: 201638 cycles, 16483 iterations
 [ 3913.480214] find_first_bit: 4353244 cycles, 16484 iterations
 [ 3913.480216] Start testing find_next_and_bit() with random-filled
 bitmap
 [ 3913.481074] find_next_and_bit: 89604 cycles, 8216 iterations
 [ 3913.481075] Start testing find_bit() with sparse bitmap
 [ 3913.481078] find_next_bit: 2536 cycles, 66 iterations
 [ 3913.481252] find_next_zero_bit: 344404 cycles, 32703 iterations
 [ 3913.481255] find_last_bit: 2006 cycles, 66 iterations
 [ 3913.481265] find_first_bit: 17488 cycles, 66 iterations
 [ 3913.481266] Start testing find_next_and_bit() with sparse bitmap
 [ 3913.481272] find_next_and_bit: 764 cycles, 1 iterations

[3] test_find_next_bit, arm (v7 odroid XU3).

[  267.206928] Start testing find_bit() with random-filled bitmap
[  267.214752] find_next_bit: 4474 cycles, 16419 iterations
[  267.221850] find_next_zero_bit: 5976 cycles, 16350 iterations
[  267.229294] find_last_bit: 4209 cycles, 16419 iterations
[  267.279131] find_first_bit: 1032991 cycles, 16420 iterations
[  267.286265] Start testing find_next_and_bit() with random-filled
bitmap
[  267.302386] find_next_and_bit: 2290 cycles, 8140 iterations
[  267.309422] Start testing find_bit() with sparse bitmap
[  267.316054] find_next_bit: 191 cycles, 66 iterations
[  267.322726] find_next_zero_bit: 8758 cycles, 32703 iterations
[  267.329803] find_last_bit: 84 cycles, 66 iterations
[  267.336169] find_first_bit: 4118 cycles, 66 iterations
[  267.342627] Start testing find_next_and_bit() with sparse bitmap
[  267.356919] find_next_and_bit: 91 cycles, 1 iterations

[courbet@google.com: v6]
  Link: http://lkml.kernel.org/r/20171129095715.23430-1-courbet@google.com
[geert@linux-m68k.org: m68k/bitops: always include <asm-generic/bitops/find.h>]
  Link: http://lkml.kernel.org/r/1512556816-28627-1-git-send-email-geert@linux-m68k.org
Link: http://lkml.kernel.org/r/20171128131334.23491-1-courbet@google.com
Signed-off-by: Clement Courbet <courbet@google.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yury Norov <ynorov@caviumnetworks.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:44 -08:00
..
boot Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm 2018-02-02 09:50:51 -08:00
common Merge branches 'fixes', 'misc', 'sa1111' and 'sa1100-for-next' into for-next 2018-01-21 15:38:10 +00:00
configs ARM: SoC platform updates for 4.16 2018-02-01 16:17:40 -08:00
crypto crypto: hash - annotate algorithms taking optional key 2018-01-12 23:03:35 +11:00
firmware
include lib: optimize cpumask_next_and() 2018-02-06 18:32:44 -08:00
kernel Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm 2018-02-02 09:50:51 -08:00
kvm GICv4 Support for KVM/ARM for v4.15 2017-11-17 13:20:01 +01:00
lib Merge branches 'fixes', 'misc', 'sa1111' and 'sa1100-for-next' into for-next 2018-01-21 15:38:10 +00:00
mach-actions ARM: SoC platform updates for 4.15 2017-11-16 14:05:12 -08:00
mach-alpine License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-artpec
mach-asm9260
mach-aspeed
mach-at91 License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-axxia License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-bcm soc: brcmstb: biuctrl: Move to early_initcall 2017-12-20 17:37:44 -08:00
mach-berlin
mach-clps711x License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-cns3xxx License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-davinci Merge branch 'i2c/for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2018-02-04 10:57:43 -08:00
mach-digicolor License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-dove License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-ebsa110 License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-efm32 License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-ep93xx ARM: ep93xx: ts72xx: Add support for BK3 board - ts72xx derivative 2017-12-13 22:26:10 +01:00
mach-exynos ARM: EXYNOS: Add SPDX license identifiers 2018-01-03 18:36:22 +01:00
mach-footbridge Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-13 17:56:58 -08:00
mach-gemini License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-highbank License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-hisi License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-imx ARM: imx: remove unused imx3 pm definitions 2017-12-26 16:30:20 +08:00
mach-integrator ARM: SoC platform updates for 4.15 2017-11-16 14:05:12 -08:00
mach-iop13xx License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-iop32x treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
mach-iop33x License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-ixp4xx w1: w1-gpio: Convert to use GPIO descriptors 2017-12-08 15:32:53 +01:00
mach-keystone License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-ks8695 Merge branch 'i2c/for-4.15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2017-11-14 17:52:21 -08:00
mach-lpc18xx
mach-lpc32xx
mach-mediatek ARM: mediatek: use more generic prompts for SoCs with ARMv7 2017-12-20 15:48:18 +01:00
mach-meson Amlogic 32-bit DT changes for v4.16 2017-12-21 16:37:34 +01:00
mach-mmp ARM: pxa: move header file out of I2C realm 2017-11-28 22:49:30 +01:00
mach-moxart
mach-mv78xx0 License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-mvebu License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-mxs ARM: mxs: constify platform_suspend_ops 2017-09-20 22:21:08 +08:00
mach-netx
mach-nomadik
mach-nspire
mach-omap1 ARM: SoC platform updates for 4.15 2017-11-16 14:05:12 -08:00
mach-omap2 ARM: SoC driver updates for 4.16 2018-02-01 16:35:31 -08:00
mach-orion5x treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
mach-oxnas
mach-picoxcell
mach-prima2 License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-pxa ARM: SoC platform updates for 4.16 2018-02-01 16:17:40 -08:00
mach-qcom
mach-realview
mach-rockchip Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm 2017-09-12 06:10:44 -07:00
mach-rpc License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-s3c24xx ARM: S3C24XX: Add SPDX license identifiers 2018-01-03 18:36:43 +01:00
mach-s3c64xx ARM: S3C64XX: Add SPDX license identifiers 2018-01-03 18:42:53 +01:00
mach-s5pv210 ARM: S5PV210: Add SPDX license identifiers 2018-01-03 18:43:04 +01:00
mach-sa1100 ARM: sa1100/neponset: add GPIO drivers for control and modem registers 2018-01-01 00:50:05 +00:00
mach-shmobile ARM: SoC platform updates for 4.15 2017-11-16 14:05:12 -08:00
mach-socfpga License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-spear License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-sti
mach-stm32
mach-sunxi ARM: sunxi: add support for R40 SoC 2017-09-22 21:57:09 +02:00
mach-tango License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-tegra Merge branch 'linus' into locking/core, to resolve conflicts 2017-11-07 10:32:44 +01:00
mach-u300 License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-uniphier kbuild: remove all dummy assignments to obj- 2017-11-18 11:46:06 +09:00
mach-ux500 License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-versatile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-vexpress ARM: SoC platform updates for 4.15 2017-11-16 14:05:12 -08:00
mach-vt8500 License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-w90x900 License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-zx License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mach-zynq License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mm Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm 2018-02-02 09:50:51 -08:00
net bpf, arm: remove obsolete exception handling from div/mod 2018-01-26 16:42:07 -08:00
nwfpe License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
oprofile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
plat-iop License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
plat-omap ARM: SoC platform updates for 4.15 2017-11-16 14:05:12 -08:00
plat-orion
plat-pxa
plat-samsung ARM: SAMSUNG: Add SPDX license identifiers 2018-01-03 18:43:13 +01:00
plat-versatile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
probes ARM: probes: avoid adding kprobes to sensitive kernel-entry/exit code 2017-12-17 22:14:21 +00:00
tools ARM: ep93xx: ts72xx: Add support for BK3 board - ts72xx derivative 2017-12-13 22:26:10 +01:00
vdso Merge branch 'linus' into locking/core, to resolve conflicts 2017-11-07 10:32:44 +01:00
vfp signal/arm: Document conflicts with SI_USER and SIGFPE 2018-01-12 14:21:05 -06:00
xen xen: re-introduce support for grant v2 interface 2017-11-06 15:50:17 -05:00
Kconfig Currently, hardened usercopy performs dynamic bounds checking on slab 2018-02-03 16:25:42 -08:00
Kconfig-nommu Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm 2017-11-16 12:50:35 -08:00
Kconfig.debug ARM: 8737/1: mm: dump: add checking for writable and executable 2018-01-21 15:32:20 +00:00
Makefile ARM: 8723/2: always assume the "unified" syntax for assembly code 2017-12-17 22:14:21 +00:00