linux/lib
Linus Torvalds 596ff4a09b cpumask: re-introduce constant-sized cpumask optimizations
Commit aa47a7c215 ("lib/cpumask: deprecate nr_cpumask_bits") resulted
in the cpumask operations potentially becoming hugely less efficient,
because suddenly the cpumask was always considered to be variable-sized.

The optimization was then later added back in a limited form by commit
6f9c07be9d ("lib/cpumask: add FORCE_NR_CPUS config option"), but that
FORCE_NR_CPUS option is not useful in a generic kernel and more of a
special case for embedded situations with fixed hardware.

Instead, just re-introduce the optimization, with some changes.

Instead of depending on CPUMASK_OFFSTACK being false, and then always
using the full constant cpumask width, this introduces three different
cpumask "sizes":

 - the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids.

   This is used for situations where we should use the exact size.

 - the "small" size (small_cpumask_bits) is the NR_CPUS constant if it
   fits in a single word and the bitmap operations thus end up able
   to trigger the "small_const_nbits()" optimizations.

   This is used for the operations that have optimized single-word
   cases that get inlined, notably the bit find and scanning functions.

 - the "large" size (large_cpumask_bits) is the NR_CPUS constant if it
   is an sufficiently small constant that makes simple "copy" and
   "clear" operations more efficient.

   This is arbitrarily set at four words or less.

As a an example of this situation, without this fixed size optimization,
cpumask_clear() will generate code like

        movl    nr_cpu_ids(%rip), %edx
        addq    $63, %rdx
        shrq    $3, %rdx
        andl    $-8, %edx
        callq   memset@PLT

on x86-64, because it would calculate the "exact" number of longwords
that need to be cleared.

In contrast, with this patch, using a MAX_CPU of 64 (which is quite a
reasonable value to use), the above becomes a single

	movq $0,cpumask

instruction instead, because instead of caring to figure out exactly how
many CPU's the system has, it just knows that the cpumask will be a
single word and can just clear it all.

Note that this does end up tightening the rules a bit from the original
version in another way: operations that set bits in the cpumask are now
limited to the actual nr_cpu_ids limit, whereas we used to do the
nr_cpumask_bits thing almost everywhere in the cpumask code.

But if you just clear bits, or scan for bits, we can use the simpler
compile-time constants.

In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()'
which were not useful, and which fundamentally have to be limited to
'nr_cpu_ids'.  Better remove them now than have somebody introduce use
of them later.

Of course, on x86-64 with MAXSMP there is no sane small compile-time
constant for the cpumask sizes, and we end up using the actual CPU bits,
and will generate the above kind of horrors regardless.  Please don't
use MAXSMP unless you really expect to have machines with thousands of
cores.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-05 14:30:34 -08:00
..
842
crypto
dim
fonts
kunit kunit: Fix 'hooks.o' build by recursing into kunit 2023-02-27 14:49:03 -08:00
livepatch
lz4
lzo
math
mpi
pldmfw
raid6
reed_solomon
test_fortify
vdso
xz
zlib_deflate lib/zlib: DFLTCC deflate does not write all available bits for Z_NO_FLUSH 2023-02-27 17:00:14 -08:00
zlib_dfltcc lib/zlib: remove redundation assignement of avail_in dfltcc_gdht() 2023-02-02 22:50:10 -08:00
zlib_inflate lib/zlib: Split deflate and inflate states for DFLTCC 2023-02-02 22:50:09 -08:00
zstd
.gitignore
argv_split.c
ashldi3.c
ashrdi3.c
asn1_decoder.c
asn1_encoder.c
assoc_array.c
atomic64_test.c
atomic64.c
audit.c
base64.c
bcd.c
bch.c
bitfield_kunit.c
bitmap.c
bitrev.c
bootconfig-data.S
bootconfig.c
bsearch.c
btree.c
bucket_locks.c
bug.c
build_OID_registry
buildid.c
bust_spinlocks.c
check_signature.c
checksum.c
clz_ctz.c
clz_tab.c
cmdline_kunit.c
cmdline.c
cmpdi2.c
compat_audit.c
cpu_rmap.c
cpumask_kunit.c cpumask: re-introduce constant-sized cpumask optimizations 2023-03-05 14:30:34 -08:00
cpumask.c lib/cpumask: update comment for cpumask_local_spread() 2023-02-07 18:20:00 -08:00
crc-ccitt.c
crc-itu-t.c
crc-t10dif.c
crc4.c
crc7.c
crc8.c
crc16.c
crc32.c
crc32defs.h
crc32test.c
crc64-rocksoft.c
crc64.c
ctype.c
debug_info.c
debug_locks.c
debugobjects.c
dec_and_lock.c
decompress_bunzip2.c
decompress_inflate.c
decompress_unlz4.c
decompress_unlzma.c
decompress_unlzo.c
decompress_unxz.c
decompress_unzstd.c
decompress.c
devmem_is_allowed.c
devres.c
dhry_1.c
dhry_2.c
dhry_run.c
dhry.h
digsig.c
dump_stack.c
dynamic_debug.c
dynamic_queue_limits.c
earlycpio.c
errname.c printf: fix errname.c list 2023-02-15 15:44:43 +01:00
error-inject.c
errseq.c
extable.c
fault-inject-usercopy.c
fault-inject.c
fdt_addresses.c
fdt_empty_tree.c
fdt_ro.c
fdt_rw.c
fdt_strerror.c
fdt_sw.c
fdt_wip.c
fdt.c
find_bit_benchmark.c
find_bit.c lib/find: introduce find_nth_and_andnot_bit 2023-02-07 18:20:00 -08:00
flex_proportions.c
fortify_kunit.c
gen_crc32table.c
gen_crc64table.c
genalloc.c
generic-radix-tree.c
glob.c
globtest.c
group_cpus.c
hashtable_test.c lib/hashtable_test.c: add test for the hashtable structure 2023-02-08 14:28:17 -07:00
hexdump.c
hweight.c
idr.c
inflate.c
interval_tree_test.c
interval_tree.c
iomap_copy.c
iomap.c
iommu-helper.c
iov_iter.c 46 fs/cifs (smb3 client) changesets, 37 in fs/cifs and 9 for related helper functions and cleanup outside from Dave Howells and Willy 2023-02-22 17:12:44 -08:00
irq_poll.c
irq_regs.c
is_signed_type_kunit.c
is_single_threaded.c
kasprintf.c
Kconfig
Kconfig.debug There is no particular theme here - mainly quick hits all over the tree. 2023-02-23 17:55:40 -08:00
Kconfig.kasan kasan: treat meminstrinsic as builtins in uninstrumented files 2023-03-02 21:54:22 -08:00
Kconfig.kcsan Kernel concurrency sanitizer (KCSAN) updates for v6.3 2023-02-25 13:02:20 -08:00
Kconfig.kfence
Kconfig.kgdb
Kconfig.kmsan
Kconfig.ubsan
kfifo.c
klist.c
kobject_uevent.c
kobject.c kobject: make dynamic_kobj_ktype and kset_ktype const 2023-02-08 13:34:20 +01:00
kstrtox.c
kstrtox.h
libcrc32c.c
linear_ranges.c
list_debug.c
list_sort.c
list-test.c
llist.c
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-rtmutex.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c
lockref.c
logic_iomem.c
logic_pio.c
lru_cache.c
lshrdi3.c
Makefile kunit: Fix 'hooks.o' build by recursing into kunit 2023-02-27 14:49:03 -08:00
maple_tree.c maple_tree: reduce stack usage with gcc-9 and earlier 2023-02-16 20:43:55 -08:00
memcat_p.c
memcpy_kunit.c
memory-notifier-error-inject.c
memregion.c
memweight.c
muldi3.c
net_utils.c
netdev-notifier-error-inject.c
nlattr.c
nmi_backtrace.c
notifier-error-inject.c
notifier-error-inject.h
objagg.c
of-reconfig-notifier-error-inject.c
oid_registry.c
once.c
overflow_kunit.c
packing.c
parman.c
parser.c lib: parser: update documentation for match_NUMBER functions 2023-03-02 21:54:22 -08:00
pci_iomap.c
percpu_counter.c
percpu_test.c
percpu-refcount.c
plist.c
pm-notifier-error-inject.c
polynomial.c
radix-tree.c
random32.c
ratelimit.c
rbtree_test.c
rbtree.c
ref_tracker.c
refcount.c
rhashtable.c
sbitmap.c
scatterlist.c
seq_buf.c
sg_pool.c
sg_split.c
show_mem.c
siphash_kunit.c
siphash.c
slub_kunit.c
smp_processor_id.c
sort.c
stackdepot.c lib/stackdepot: move documentation comments to stackdepot.h 2023-02-16 20:43:52 -08:00
stackinit_kunit.c kernel/range: Uplevel the cxl subsystem's range_contains() helper 2023-02-10 17:32:37 -08:00
stmp_device.c
string_helpers.c
string.c
strncpy_from_user.c
strnlen_user.c
strscpy_kunit.c
syscall.c
test_bitmap.c
test_bitops.c
test_bits.c
test_blackhole_dev.c
test_bpf.c
test_debug_virtual.c
test_dynamic_debug.c
test_firmware.c
test_fprobe.c
test_fpu.c
test_free_pages.c
test_hash.c
test_hexdump.c
test_hmm_uapi.h
test_hmm.c
test_ida.c
test_kmod.c
test_kprobes.c test_kprobes: Add recursed kprobe test case 2023-02-21 08:52:42 +09:00
test_linear_ranges.c
test_list_sort.c
test_lockup.c
test_maple_tree.c test_maple_tree: test modifications while iterating 2023-02-09 16:51:31 -08:00
test_memcat_p.c
test_meminit.c
test_min_heap.c
test_module.c
test_objagg.c
test_parman.c
test_printf.c
test_ref_tracker.c
test_rhashtable.c
test_scanf.c
test_sort.c
test_static_key_base.c
test_static_keys.c
test_string.c
test_sysctl.c
test_ubsan.c
test_user_copy.c
test_uuid.c
test_vmalloc.c
test_xarray.c
test-kstrtox.c
test-string_helpers.c
textsearch.c
timerqueue.c
trace_readwrite.c
ts_bm.c
ts_fsm.c
ts_kmp.c
ubsan.c hardening updates for v6.3-rc1 2023-02-21 11:07:23 -08:00
ubsan.h arm64: Support Clang UBSAN trap codes for better reporting 2023-02-08 15:26:58 -08:00
ucmpdi2.c
ucs2_string.c
usercopy.c uaccess: Add speculation barrier to copy_from_user() 2023-02-21 14:45:22 -08:00
uuid.c
vsprintf.c
win_minmax.c
xarray.c
xxhash.c