linux/lib
Linus Torvalds 596ff4a09b cpumask: re-introduce constant-sized cpumask optimizations
Commit aa47a7c215 ("lib/cpumask: deprecate nr_cpumask_bits") resulted
in the cpumask operations potentially becoming hugely less efficient,
because suddenly the cpumask was always considered to be variable-sized.

The optimization was then later added back in a limited form by commit
6f9c07be9d ("lib/cpumask: add FORCE_NR_CPUS config option"), but that
FORCE_NR_CPUS option is not useful in a generic kernel and more of a
special case for embedded situations with fixed hardware.

Instead, just re-introduce the optimization, with some changes.

Instead of depending on CPUMASK_OFFSTACK being false, and then always
using the full constant cpumask width, this introduces three different
cpumask "sizes":

 - the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids.

   This is used for situations where we should use the exact size.

 - the "small" size (small_cpumask_bits) is the NR_CPUS constant if it
   fits in a single word and the bitmap operations thus end up able
   to trigger the "small_const_nbits()" optimizations.

   This is used for the operations that have optimized single-word
   cases that get inlined, notably the bit find and scanning functions.

 - the "large" size (large_cpumask_bits) is the NR_CPUS constant if it
   is an sufficiently small constant that makes simple "copy" and
   "clear" operations more efficient.

   This is arbitrarily set at four words or less.

As a an example of this situation, without this fixed size optimization,
cpumask_clear() will generate code like

        movl    nr_cpu_ids(%rip), %edx
        addq    $63, %rdx
        shrq    $3, %rdx
        andl    $-8, %edx
        callq   memset@PLT

on x86-64, because it would calculate the "exact" number of longwords
that need to be cleared.

In contrast, with this patch, using a MAX_CPU of 64 (which is quite a
reasonable value to use), the above becomes a single

	movq $0,cpumask

instruction instead, because instead of caring to figure out exactly how
many CPU's the system has, it just knows that the cpumask will be a
single word and can just clear it all.

Note that this does end up tightening the rules a bit from the original
version in another way: operations that set bits in the cpumask are now
limited to the actual nr_cpu_ids limit, whereas we used to do the
nr_cpumask_bits thing almost everywhere in the cpumask code.

But if you just clear bits, or scan for bits, we can use the simpler
compile-time constants.

In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()'
which were not useful, and which fundamentally have to be limited to
'nr_cpu_ids'.  Better remove them now than have somebody introduce use
of them later.

Of course, on x86-64 with MAXSMP there is no sane small compile-time
constant for the cpumask sizes, and we end up using the actual CPU bits,
and will generate the above kind of horrors regardless.  Please don't
use MAXSMP unless you really expect to have machines with thousands of
cores.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-05 14:30:34 -08:00
..
842
crypto crypto: lib/blake2s - Split up test function to halve stack usage 2022-12-30 22:56:27 +08:00
dim
fonts lib/fonts: fix undefined behavior in bit shift for get_default_font 2022-11-18 13:55:09 -08:00
kunit kunit: Fix 'hooks.o' build by recursing into kunit 2023-02-27 14:49:03 -08:00
livepatch
lz4
lzo
math math64: favor kernel-doc from header files 2022-11-21 14:30:53 -07:00
mpi lib/mpi: Fix buffer overrun when SG is too long 2023-01-06 17:15:46 +08:00
pldmfw
raid6 for-6.2/block-2022-12-08 2022-12-13 10:43:59 -08:00
reed_solomon treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
test_fortify
vdso lib/vdso: use "grep -E" instead of "egrep" 2022-11-23 19:50:15 +01:00
xz
zlib_deflate lib/zlib: DFLTCC deflate does not write all available bits for Z_NO_FLUSH 2023-02-27 17:00:14 -08:00
zlib_dfltcc lib/zlib: remove redundation assignement of avail_in dfltcc_gdht() 2023-02-02 22:50:10 -08:00
zlib_inflate lib/zlib: Split deflate and inflate states for DFLTCC 2023-02-02 22:50:09 -08:00
zstd zstd: import usptream v1.5.2 2022-10-24 12:12:32 -07:00
.gitignore
argv_split.c
ashldi3.c
ashrdi3.c
asn1_decoder.c
asn1_encoder.c
assoc_array.c
atomic64.c
atomic64_test.c
audit.c
base64.c
bcd.c
bch.c
bitfield_kunit.c
bitmap.c lib/bitmap: remove bitmap_ord_to_pos 2022-09-26 12:19:12 -07:00
bitrev.c
bootconfig-data.S
bootconfig.c
bsearch.c
btree.c
bucket_locks.c
bug.c cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG 2023-01-31 15:01:45 +01:00
build_OID_registry
buildid.c
bust_spinlocks.c
check_signature.c
checksum.c
clz_ctz.c
clz_tab.c
cmdline.c
cmdline_kunit.c treewide: use get_random_{u8,u16}() when possible, part 1 2022-10-11 17:42:58 -06:00
cmpdi2.c
compat_audit.c
cpu_rmap.c
cpumask.c lib/cpumask: update comment for cpumask_local_spread() 2023-02-07 18:20:00 -08:00
cpumask_kunit.c cpumask: re-introduce constant-sized cpumask optimizations 2023-03-05 14:30:34 -08:00
crc-ccitt.c
crc-itu-t.c
crc-t10dif.c
crc4.c
crc7.c
crc8.c
crc16.c
crc32.c
crc32defs.h
crc32test.c
crc64-rocksoft.c
crc64.c
ctype.c
debug_info.c
debug_locks.c
debugobjects.c Non-MM patches for 6.2-rc1. 2022-12-12 17:28:58 -08:00
dec_and_lock.c perf: Fix perf_event_pmu_context serialization 2023-01-31 20:37:18 +01:00
decompress.c
decompress_bunzip2.c
decompress_inflate.c
decompress_unlz4.c
decompress_unlzma.c
decompress_unlzo.c
decompress_unxz.c
decompress_unzstd.c
devmem_is_allowed.c
devres.c
dhry.h lib: add Dhrystone benchmark test 2023-02-02 22:50:01 -08:00
dhry_1.c lib: add Dhrystone benchmark test 2023-02-02 22:50:01 -08:00
dhry_2.c lib: add Dhrystone benchmark test 2023-02-02 22:50:01 -08:00
dhry_run.c lib: add Dhrystone benchmark test 2023-02-02 22:50:01 -08:00
digsig.c
dump_stack.c
dynamic_debug.c
dynamic_queue_limits.c
earlycpio.c
errname.c printf: fix errname.c list 2023-02-15 15:44:43 +01:00
error-inject.c error-injection: remove EI_ETYPE_NONE 2023-02-02 22:50:00 -08:00
errseq.c
extable.c
fault-inject-usercopy.c
fault-inject.c fault-injection: make stacktrace filter works as expected 2022-12-15 16:40:44 -08:00
fdt.c
fdt_addresses.c
fdt_empty_tree.c
fdt_ro.c
fdt_rw.c
fdt_strerror.c
fdt_sw.c
fdt_wip.c
find_bit.c lib/find: introduce find_nth_and_andnot_bit 2023-02-07 18:20:00 -08:00
find_bit_benchmark.c treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
flex_proportions.c
fortify_kunit.c kunit/fortify: Validate __alloc_size attribute results 2022-11-22 21:08:28 -08:00
gen_crc32table.c
gen_crc64table.c
genalloc.c lib/genalloc: use try_cmpxchg in {set,clear}_bits_ll 2023-02-02 22:50:05 -08:00
generic-radix-tree.c
glob.c
globtest.c
group_cpus.c genirq/affinity: Only build SMP-only helper functions on SMP kernels 2023-01-18 12:16:47 +01:00
hashtable_test.c lib/hashtable_test.c: add test for the hashtable structure 2023-02-08 14:28:17 -07:00
hexdump.c
hweight.c
idr.c
inflate.c
interval_tree.c interval-tree: Add a utility to iterate over spans in an interval tree 2022-11-29 16:34:15 -04:00
interval_tree_test.c
iomap.c kmsan: add iomap support 2022-10-03 14:03:21 -07:00
iomap_copy.c
iommu-helper.c
iov_iter.c 46 fs/cifs (smb3 client) changesets, 37 in fs/cifs and 9 for related helper functions and cleanup outside from Dave Howells and Willy 2023-02-22 17:12:44 -08:00
irq_poll.c
irq_regs.c
is_signed_type_kunit.c lib: assume char is unsigned 2022-11-19 00:56:15 +01:00
is_single_threaded.c
kasprintf.c
Kconfig iommufd for 6.2 2022-12-14 09:15:43 -08:00
Kconfig.debug There is no particular theme here - mainly quick hits all over the tree. 2023-02-23 17:55:40 -08:00
Kconfig.kasan kasan: treat meminstrinsic as builtins in uninstrumented files 2023-03-02 21:54:22 -08:00
Kconfig.kcsan Kernel concurrency sanitizer (KCSAN) updates for v6.3 2023-02-25 13:02:20 -08:00
Kconfig.kfence
Kconfig.kgdb parisc: Convert PDC console to an early console 2022-10-11 12:01:24 +02:00
Kconfig.kmsan kmsan: make sure PREEMPT_RT is off 2022-11-08 15:57:24 -08:00
Kconfig.ubsan
kfifo.c
klist.c
kobject.c kobject: make dynamic_kobj_ktype and kset_ktype const 2023-02-08 13:34:20 +01:00
kobject_uevent.c
kstrtox.c
kstrtox.h
libcrc32c.c
linear_ranges.c
list-test.c
list_debug.c
list_sort.c
llist.c llist: avoid extra memory read in llist_add_batch 2022-11-18 13:55:06 -08:00
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-rtmutex.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c
lockref.c lockref: stop doing cpu_relax in the cmpxchg loop 2023-01-13 14:35:38 -06:00
logic_iomem.c
logic_pio.c
lru_cache.c lru_cache: remove unused lc_private, lc_set, lc_index_of 2022-11-22 19:38:39 -07:00
lshrdi3.c
Makefile kunit: Fix 'hooks.o' build by recursing into kunit 2023-02-27 14:49:03 -08:00
maple_tree.c maple_tree: reduce stack usage with gcc-9 and earlier 2023-02-16 20:43:55 -08:00
memcat_p.c
memcpy_kunit.c kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST 2023-01-25 12:24:40 -08:00
memory-notifier-error-inject.c
memregion.c
memweight.c
muldi3.c
net_utils.c mac_pton: Don't access memory over expected length 2022-11-09 19:28:02 -08:00
netdev-notifier-error-inject.c
nlattr.c netlink: prevent potential spectre v1 gadgets 2023-01-20 17:52:32 -08:00
nmi_backtrace.c x86/nmi: Print reasons why backtrace NMIs are ignored 2023-01-19 15:55:12 -08:00
notifier-error-inject.c lib/notifier-error-inject: fix error when writing -errno to debugfs file 2022-11-30 16:13:16 -08:00
notifier-error-inject.h
objagg.c
of-reconfig-notifier-error-inject.c
oid_registry.c lib/oid_registry.c: remove redundant assignment to variable num 2022-11-18 13:55:06 -08:00
once.c once: rename _SLOW to _SLEEPABLE 2022-10-03 17:34:32 -07:00
overflow_kunit.c overflow: Introduce overflows_type() and castable_to_type() 2022-11-02 12:39:27 -07:00
packing.c lib: packing: replace bit_reverse() with bitrev8() 2022-12-12 15:06:30 -08:00
parman.c
parser.c lib: parser: update documentation for match_NUMBER functions 2023-03-02 21:54:22 -08:00
pci_iomap.c
percpu-refcount.c percpu-refcount: Use call_rcu_hurry() for atomic switch 2022-11-30 13:16:40 -08:00
percpu_counter.c lib/percpu_counter: percpu_counter_add_batch() overflow/underflow 2023-02-02 22:50:01 -08:00
percpu_test.c
plist.c
pm-notifier-error-inject.c
polynomial.c
radix-tree.c lib/radix-tree.c: fix uninitialized variable compilation warning 2022-11-30 16:13:17 -08:00
random32.c treewide: use get_random_bytes() when possible 2022-10-11 17:42:58 -06:00
ratelimit.c
rbtree.c
rbtree_test.c
ref_tracker.c
refcount.c
rhashtable.c rhashtable: Allow rhashtable to be used from irq-safe contexts 2022-12-09 10:42:56 +00:00
sbitmap.c sbitmap: correct wake_batch recalculation to avoid potential IO hung 2023-01-29 20:03:01 -07:00
scatterlist.c lib/scatterlist: Fix to calculate the last_pg properly 2023-01-16 12:08:31 -04:00
seq_buf.c
sg_pool.c
sg_split.c
show_mem.c mm: reduce noise in show_mem for lowmem allocations 2022-09-26 19:46:29 -07:00
siphash.c
siphash_kunit.c siphash: Convert selftest to KUnit 2022-11-01 10:04:52 -07:00
slub_kunit.c linux-kselftest-kunit-next-6.2-rc1 2022-12-12 16:42:57 -08:00
smp_processor_id.c
sort.c
stackdepot.c lib/stackdepot: move documentation comments to stackdepot.h 2023-02-16 20:43:52 -08:00
stackinit_kunit.c kernel/range: Uplevel the cxl subsystem's range_contains() helper 2023-02-10 17:32:37 -08:00
stmp_device.c
string.c lib/string: Use strchr() in strpbrk() 2023-01-27 11:42:57 -08:00
string_helpers.c
strncpy_from_user.c
strnlen_user.c
strscpy_kunit.c fortify: Short-circuit known-safe calls to strscpy() 2022-11-01 10:04:52 -07:00
syscall.c
test-kstrtox.c
test-string_helpers.c treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
test_bitmap.c lib/bitmap: add tests for for_each() loops 2022-10-01 10:22:58 -07:00
test_bitops.c
test_bits.c
test_blackhole_dev.c
test_bpf.c net: remove skb->vlan_present 2022-11-11 18:18:05 -08:00
test_debug_virtual.c
test_dynamic_debug.c
test_firmware.c test_firmware: Use kstrtobool() instead of strtobool() 2023-01-20 14:09:47 +01:00
test_fprobe.c treewide: use get_random_u32_{above,below}() instead of manual loop 2022-11-18 02:15:22 +01:00
test_fpu.c
test_free_pages.c
test_hash.c
test_hexdump.c treewide: use get_random_u32_inclusive() when possible 2022-11-18 02:18:02 +01:00
test_hmm.c hmm-tests: add test for migrate_device_range() 2022-10-12 18:51:50 -07:00
test_hmm_uapi.h hmm-tests: add test for migrate_device_range() 2022-10-12 18:51:50 -07:00
test_ida.c
test_kmod.c test_kmod: stop kernel-doc warnings 2023-01-25 14:07:21 -08:00
test_kprobes.c test_kprobes: Add recursed kprobe test case 2023-02-21 08:52:42 +09:00
test_linear_ranges.c lib/test_linear_ranges: Use LINEAR_RANGE() 2022-11-16 13:32:32 +00:00
test_list_sort.c treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
test_lockup.c
test_maple_tree.c test_maple_tree: test modifications while iterating 2023-02-09 16:51:31 -08:00
test_memcat_p.c
test_meminit.c lib/test_meminit: add checks for the allocation functions 2022-10-12 18:51:49 -07:00
test_min_heap.c treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
test_module.c
test_objagg.c treewide: use get_random_bytes() when possible 2022-10-11 17:42:58 -06:00
test_parman.c
test_printf.c mm: discard __GFP_ATOMIC 2023-02-02 22:33:13 -08:00
test_ref_tracker.c
test_rhashtable.c Networking changes for 6.2. 2022-12-13 15:47:48 -08:00
test_scanf.c
test_sort.c
test_static_key_base.c
test_static_keys.c
test_string.c
test_sysctl.c testing: use the copyleft-next-0.3.1 SPDX tag 2022-11-08 15:44:02 +01:00
test_ubsan.c
test_user_copy.c
test_uuid.c
test_vmalloc.c lib/test_vmalloc.c: add parameter use_huge for fix_size_alloc_test 2023-01-18 17:12:41 -08:00
test_xarray.c
textsearch.c
timerqueue.c
trace_readwrite.c asm-generic/io: Add _RET_IP_ to MMIO trace for more accurate debug info 2022-11-21 22:02:10 +01:00
ts_bm.c
ts_fsm.c
ts_kmp.c
ubsan.c hardening updates for v6.3-rc1 2023-02-21 11:07:23 -08:00
ubsan.h arm64: Support Clang UBSAN trap codes for better reporting 2023-02-08 15:26:58 -08:00
ucmpdi2.c
ucs2_string.c
usercopy.c uaccess: Add speculation barrier to copy_from_user() 2023-02-21 14:45:22 -08:00
uuid.c treewide: use get_random_bytes() when possible 2022-10-11 17:42:58 -06:00
vsprintf.c Random number generator updates for Linux 6.2-rc1. 2022-12-12 16:22:22 -08:00
win_minmax.c lib/win_minmax: use /* notation for regular comments 2023-01-11 16:14:21 -08:00
xarray.c
xxhash.c