Commit graph

1279567 commits

Author SHA1 Message Date
Andrey Konovalov 2e577732e8 kasan, fortify: properly rename memintrinsics
After commit 69d4c0d321 ("entry, kasan, x86: Disallow overriding mem*()
functions") and the follow-up fixes, with CONFIG_FORTIFY_SOURCE enabled,
even though the compiler instruments meminstrinsics by generating calls to
__asan/__hwasan_ prefixed functions, FORTIFY_SOURCE still uses
uninstrumented memset/memmove/memcpy as the underlying functions.

As a result, KASAN cannot detect bad accesses in memset/memmove/memcpy. 
This also makes KASAN tests corrupt kernel memory and cause crashes.

To fix this, use __asan_/__hwasan_memset/memmove/memcpy as the underlying
functions whenever appropriate.  Do this only for the instrumented code
(as indicated by __SANITIZE_ADDRESS__).

Link: https://lkml.kernel.org/r/20240517130118.759301-1-andrey.konovalov@linux.dev
Fixes: 69d4c0d321 ("entry, kasan, x86: Disallow overriding mem*() functions")
Fixes: 51287dcb00 ("kasan: emit different calls for instrumentable memintrinsics")
Fixes: 36be5cba99 ("kasan: treat meminstrinsic as builtins in uninstrumented files")
Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com>
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Reported-by: Nico Pache <npache@redhat.com>
Closes: https://lore.kernel.org/all/20240501144156.17e65021@outsider.home/
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Nico Pache <npache@redhat.com>
Acked-by: Nico Pache <npache@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:05 -07:00
Suren Baghdasaryan a38568a0b4 lib: add version into /proc/allocinfo output
Add version string and a header at the beginning of /proc/allocinfo to
allow later format changes.  Example output:

> head /proc/allocinfo
allocinfo - version: 1.0
#     <size>  <calls> <tag info>
           0        0 init/main.c:1314 func:do_initcalls
           0        0 init/do_mounts.c:353 func:mount_nodev_root
           0        0 init/do_mounts.c:187 func:mount_root_generic
           0        0 init/do_mounts.c:158 func:do_mount_root
           0        0 init/initramfs.c:493 func:unpack_to_rootfs
           0        0 init/initramfs.c:492 func:unpack_to_rootfs
           0        0 init/initramfs.c:491 func:unpack_to_rootfs
         512        1 arch/x86/events/rapl.c:681 func:init_rapl_pmus
         128        1 arch/x86/events/rapl.c:571 func:rapl_cpu_online

[akpm@linux-foundation.org: remove stray newline from struct allocinfo_private]
Link: https://lkml.kernel.org/r/20240514163128.3662251-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:05 -07:00
Hailong.Liu 8e0545c83d mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
commit a421ef3030 ("mm: allow !GFP_KERNEL allocations for kvmalloc")
includes support for __GFP_NOFAIL, but it presents a conflict with commit
dd544141b9 ("vmalloc: back off when the current task is OOM-killed").  A
possible scenario is as follows:

process-a
__vmalloc_node_range(GFP_KERNEL | __GFP_NOFAIL)
    __vmalloc_area_node()
        vm_area_alloc_pages()
		--> oom-killer send SIGKILL to process-a
        if (fatal_signal_pending(current)) break;
--> return NULL;

To fix this, do not check fatal_signal_pending() in vm_area_alloc_pages()
if __GFP_NOFAIL set.

This issue occurred during OPLUS KASAN TEST. Below is part of the log
-> oom-killer sends signal to process
[65731.222840] [ T1308] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/apps/uid_10198,task=gs.intelligence,pid=32454,uid=10198

[65731.259685] [T32454] Call trace:
[65731.259698] [T32454]  dump_backtrace+0xf4/0x118
[65731.259734] [T32454]  show_stack+0x18/0x24
[65731.259756] [T32454]  dump_stack_lvl+0x60/0x7c
[65731.259781] [T32454]  dump_stack+0x18/0x38
[65731.259800] [T32454]  mrdump_common_die+0x250/0x39c [mrdump]
[65731.259936] [T32454]  ipanic_die+0x20/0x34 [mrdump]
[65731.260019] [T32454]  atomic_notifier_call_chain+0xb4/0xfc
[65731.260047] [T32454]  notify_die+0x114/0x198
[65731.260073] [T32454]  die+0xf4/0x5b4
[65731.260098] [T32454]  die_kernel_fault+0x80/0x98
[65731.260124] [T32454]  __do_kernel_fault+0x160/0x2a8
[65731.260146] [T32454]  do_bad_area+0x68/0x148
[65731.260174] [T32454]  do_mem_abort+0x151c/0x1b34
[65731.260204] [T32454]  el1_abort+0x3c/0x5c
[65731.260227] [T32454]  el1h_64_sync_handler+0x54/0x90
[65731.260248] [T32454]  el1h_64_sync+0x68/0x6c

[65731.260269] [T32454]  z_erofs_decompress_queue+0x7f0/0x2258
--> be->decompressed_pages = kvcalloc(be->nr_pages, sizeof(struct page *), GFP_KERNEL | __GFP_NOFAIL);
	kernel panic by NULL pointer dereference.
	erofs assume kvmalloc with __GFP_NOFAIL never return NULL.
[65731.260293] [T32454]  z_erofs_runqueue+0xf30/0x104c
[65731.260314] [T32454]  z_erofs_readahead+0x4f0/0x968
[65731.260339] [T32454]  read_pages+0x170/0xadc
[65731.260364] [T32454]  page_cache_ra_unbounded+0x874/0xf30
[65731.260388] [T32454]  page_cache_ra_order+0x24c/0x714
[65731.260411] [T32454]  filemap_fault+0xbf0/0x1a74
[65731.260437] [T32454]  __do_fault+0xd0/0x33c
[65731.260462] [T32454]  handle_mm_fault+0xf74/0x3fe0
[65731.260486] [T32454]  do_mem_abort+0x54c/0x1b34
[65731.260509] [T32454]  el0_da+0x44/0x94
[65731.260531] [T32454]  el0t_64_sync_handler+0x98/0xb4
[65731.260553] [T32454]  el0t_64_sync+0x198/0x19c

Link: https://lkml.kernel.org/r/20240510100131.1865-1-hailong.liu@oppo.com
Fixes: 9376130c39 ("mm/vmalloc: add support for __GFP_NOFAIL")
Signed-off-by: Hailong.Liu <hailong.liu@oppo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Barry Song <21cnbao@gmail.com>
Reported-by: Oven <liyangouwen1@oppo.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:04 -07:00
Linus Torvalds f1f9984fdc RISC-V Patches for the 6.10 Merge Window, Part 2
* The compression format used for boot images is now configurable at
   build time, and these formats are shown in `make help`.
 * access_ok() has been optimized.
 * A pair of performance bugs have been fixed in the uaccess handlers.
 * Various fixes and cleanups, including one for the IMSIC build failure
   and one for the early-boot ftrace illegal NOPs bug.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmZQtRwTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiWPBD/0UitwMg88m6urvMd0Pfvwwbu/OnGqW
 TZT8C55iJi/e5f9K4mBrSyjATI8z/MblD+Zz0adX8ygavS4JuQ7DoWwb1yTT3pww
 +z74FkWeJuiar+HfbhQ602CfMrnzvWjnyJ3URemqy5pIBKyvD9gGkDJDZwf8hJTk
 Vqh5qVtnBqFBO9kWpIx+/pLCfpyHVNkhWr1AzKfoqQ1WPIpZ/o0IGdvS88rL+EBR
 QOXiwVhEsRfC+LT6Jhn8l2bGp7PaSRVOid19OxNsJKpAhpL6AOscaafclVrLBuTd
 gkys0rT2dHdoWTAkPHQpvlOI6OmGTgopxo5pUKJHS8J9VRoBun25zC1FGBF8uyVd
 05CabWPnh7olNsRge9XiNj3x8PXjGVi7X7wUbRgOBG5aDc6TbKdxu37J0tXe0M7a
 Q74ctQvk8Nk6bQWirgTNlfJJHzL5pJbKc9VwY5uGX4qTmH+yEvCIt45ZXgXOuS/F
 eqijStkkdXUDnkMdcpaZJvXP80rHcgfP8bqevvPymRli8ER9zj9aXJQ3rmCUcPz+
 EtbyS+vOEN31wNTA1EQlfIRxfvr22x7r70DDdRwmhuD1W1tgfblm+R0Cq76I5rnJ
 VSgXKq1b4mY0eautqXEnPGyqb7H8iJIq7AoyfbzzWN+4u6yVEUvpDKueeksy+fFt
 sGNtjWqGhWyKXg==
 =/Qtt
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull more RISC-V updates from Palmer Dabbelt:

 - The compression format used for boot images is now configurable at
   build time, and these formats are shown in `make help`

 - access_ok() has been optimized

 - A pair of performance bugs have been fixed in the uaccess handlers

 - Various fixes and cleanups, including one for the IMSIC build failure
   and one for the early-boot ftrace illegal NOPs bug

* tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fix early ftrace nop patching
  irqchip: riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
  riscv: selftests: Add signal handling vector tests
  riscv: mm: accelerate pagefault when badaccess
  riscv: uaccess: Relax the threshold for fast path
  riscv: uaccess: Allow the last potential unrolled copy
  riscv: typo in comment for get_f64_reg
  Use bool value in set_cpu_online()
  riscv: selftests: Add hwprobe binaries to .gitignore
  riscv: stacktrace: fixed walk_stackframe()
  ftrace: riscv: move from REGS to ARGS
  riscv: do not select MODULE_SECTIONS by default
  riscv: show help string for riscv-specific targets
  riscv: make image compression configurable
  riscv: cpufeature: Fix extension subset checking
  riscv: cpufeature: Fix thead vector hwcap removal
  riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context
  riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled
  riscv: Define TASK_SIZE_MAX for __access_ok()
  riscv: Remove PGDIR_SIZE_L3 and TASK_SIZE_MIN
2024-05-24 10:46:35 -07:00
Linus Torvalds 9351f138d1 xen: branch for v6.10-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZlCW5wAKCRCAXGG7T9hj
 vmgfAPwMj6Pf6faPJ8Db4cUkeJqxT60RCjOoCLoiJ5MYtrxIBgEAqFv3JOHaoDCH
 nogrS10fldxUTtxtx8DciFtzZ59jJws=
 =LXuw
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.10a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - a small cleanup in the drivers/xen/xenbus Makefile

 - a fix of the Xen xenstore driver to improve connecting to a late
   started Xenstore

 - an enhancement for better support of ballooning in PVH guests

 - a cleanup using try_cmpxchg() instead of open coding it

* tag 'for-linus-6.10a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  drivers/xen: Improve the late XenStore init protocol
  xen/xenbus: Use *-y instead of *-objs in Makefile
  xen/x86: add extra pages to unpopulated-alloc if available
  locking/x86/xen: Use try_cmpxchg() in xen_alloc_p2m_entry()
2024-05-24 10:24:49 -07:00
Linus Torvalds 02c438bbff for-6.10-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmZQjzoACgkQxWXV+ddt
 WDsFaw/+O6lH+rPLhvUoqtnrydC6QLnEW5Qj5EURDt3HkROOsXHszdNGKdsETZ2i
 /s4dDiCRwLv7PP/bWlFfQbHzckoBHI9I/1GxHKQM3OM27BpXvILacXSMJ13zw4vq
 DRQIUdTwfUkegEytZb0ddv6+++R1YyU6nE6LfiF2Pf4XJMQ2WXPRNu6bAa27xUia
 4ITHB6m92zynhATJk0/RpfCU64HWwj919WnJDmoVOJ7Nr8Pslz4jKm7HS1qiehNd
 EbhduQPhj7UvWiL4C9/iFFndgzm1tX1WNlJDu5c0KqwYIHq2+BmDv3Cqhkazkdvu
 veU0wO62bZzV42vmTvQXzyXeXjNXRyLOvK6uHXv0VCO8VVsl2/WnYTRWmH44ECar
 z4tByfBKA7nIL2e23ztkyqnhygDf8Y1/Dy+GfprR6JPhyYGHJDqLcB3Gyw9y/AXO
 b/2MoAEgET9QPM/0HLqdonDJ75D2PF0qmwp1ys79w/BGH0BUoxZs/POL2UT87EJO
 rO5kW0/nZy99sbWFfZRwDUxTj1IlDqdudaHPOdJs/tUb3wPseLm5abQEyk+Dns6K
 3y7OviNVQy0x325JY9RmdfnJv60KHvv5pqws1Nkuhqk1LH8csL6MsYlcybhR+vOk
 G9qkNxg35aNqjNlBi7RacMT8OgwVbVhik8jVr+MfXk30grIevzU=
 =XR4r
 -----END PGP SIGNATURE-----

Merge tag 'for-6.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull more btrfs updates from David Sterba:
 "A few more updates, mostly stability fixes or user visible changes:

   - fix race in zoned mode during device replace that can lead to
     use-after-free

   - update return codes and lower message levels for quota rescan where
     it's causing false alerts

   - fix unexpected qgroup id reuse under some conditions

   - fix condition when looking up extent refs

   - add option norecovery (removed in 6.8), the intended replacements
     haven't been used and some aplications still rely on the old one

   - build warning fixes"

* tag 'for-6.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: re-introduce 'norecovery' mount option
  btrfs: fix end of tree detection when searching for data extent ref
  btrfs: scrub: initialize ret in scrub_simple_mirror() to fix compilation warning
  btrfs: zoned: fix use-after-free due to race with dev replace
  btrfs: qgroup: fix qgroup id collision across mounts
  btrfs: qgroup: update rescan message levels and error codes
2024-05-24 09:40:31 -07:00
Linus Torvalds dcb9f48667 Changes since last update:
- Convert metadata APIs to byte offsets;
 
  - Avoid allocating DEFLATE streams unnecessarily;
 
  - Some erofs_show_options() cleanup.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmZQmHARHHhpYW5nQGtl
 cm5lbC5vcmcACgkQUXZn5Zlu5qrGnhAAnvOifMYekIgY/W0PSGSe85XtXps5vBjo
 rixZ/vNAl8NrLgzHY5lX+4dbENywEULzdxYAgF4VN9eKNGyuZ4oCBmYStoGueQ41
 N1oq36O/CVJDCOLkFUwjD6GpHngjJR3xiU8DRrhKdPZJeYXVEJwZB4KOOymorkO0
 Xn9SPrF/GC4YDWJL901RKT8p6gyRNWiWJ/+hwDAxfmCSuzW2uRNnBLeXNvjqj4Z3
 u5WEaFSlNRlLWnZPcHy8O3t/XAPkhvTN+C5+YeaePWyHc5WYOM9mWt8VLOFQb60K
 l+q/cnWXw+8NNbxnuccWVJfEb6zUJmZ5/yTm+Ndutrpk5dFSPb6DjZo5/K36dGls
 r02XysW+Jl24wBIFkYRHild2WT+gSqo/zyIDsSt/DF+DhpqmnIqAASx4yJenw7ib
 BNV4m4gQflLrORKpVmsKyHrm5GuHsTWsGc51iX1uqsdfDgN79mFgR1taBAZw162P
 pPeWuD6XYE+eT+t5nggnXqmZ5qatEhTFkYDjUzSq4ZQfyZnRG8Tl6zbBuyVhaxsO
 zH1rAmwtI6x+ehHI46Kurh8HT6UrB0CNM6RokYKr6JWVzIdFPPMVKkxcq2KozTPf
 CBu+Whh/WGFROM8JT2KGCnuz2ZBUZXDtNBJmW+ZnA+z9b7xZ1f31nio4vKKdZU+R
 swpnV+0q9cs=
 =qDDl
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull more erofs updates from Gao Xiang:
 "The main ones are metadata API conversion to byte offsets by Al Viro.

  Another patch gets rid of unnecessary memory allocation out of DEFLATE
  decompressor. The remaining one is a trivial cleanup.

   - Convert metadata APIs to byte offsets

   - Avoid allocating DEFLATE streams unnecessarily

   - Some erofs_show_options() cleanup"

* tag 'erofs-for-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: avoid allocating DEFLATE streams before mounting
  z_erofs_pcluster_begin(): don't bother with rounding position down
  erofs: don't round offset down for erofs_read_metabuf()
  erofs: don't align offset for erofs_read_metabuf() (simple cases)
  erofs: mechanically convert erofs_read_metabuf() to offsets
  erofs: clean up erofs_show_options()
2024-05-24 09:31:50 -07:00
Linus Torvalds c40b1994b9 bcachefs fixes for 6.10-rc1
Just a few syzbot fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmZQk0cACgkQE6szbY3K
 bna7gA/+MSY3I95CwaJ4bBq5SCxOaRcrX099LFh8Zrj+OF+DWE2PtVo1LhhgnYrQ
 KpZrS2Q9Qgb2yVqYzOY6LBfH4il1O/WwvloMG0MbuYiQFu9/JL/6CEK9uFyiGmaC
 fdiFEN3u+8AK6phTFaqUU2ncG0XFQ1Ple5zmFXo4Y3ZJeNaubJeEDac+kbRvOwYh
 rQ6Iy0FNoQymv0BzmuM7g2NsbhdAgHTv7rhGbfpNBZv3lu0yDXsfZZgWTr2oXMSP
 FMhm4bcTGAFp5hbwq9k56ND8oSFpamsH7SwS4bDlEe1CNOfMI1JjnrvSEuDrocAE
 1Jn2J2Gv9NXnEHKamVzzpUILG67buEtYzJyDQk51N4kulgThdpRzjm+11ylD5U0U
 wzIK1HXsKHtRdUiIhQGLCLW61FXM+0QBIk2eXhPq88jsM2zTL7iMbXR3P/nvgUDy
 8ia8g5Q+nKxcb223M8WmK0rBwlaNasE/hXiFT54ntt8bK5nmVJjPMxVXUmYth3hw
 7STkuT0k5jVsMG1NqLkg+wSupj1AuWbD2hIcas7GkxarEYAULbQcClHYGpMll3Tw
 +pJfLjAtBOkcE4TwWDLflVBhwWtdmPNhk51Q3iLVRp0Gm7t0rhE2vE6TjpsIFnrg
 rUAgaqQqQ2WXfsRaGa2wx0tRKoW+8Iigq13ndn1AZIrfEtQkYUs=
 =vuNC
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "Nothing exciting, just syzbot fixes (except for the one
  FMODE_CAN_ODIRECT patch).

  Looks like syzbot reports have slowed down; this is all catch up from
  two weeks of conferences.

  Next hardening project is using Thomas's error injection tooling to
  torture test repair"

* tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: Fix race path in bch2_inode_insert()
  bcachefs: Ensure we're RW before journalling
  bcachefs: Fix shutdown ordering
  bcachefs: Fix unsafety in bch2_dirent_name_bytes()
  bcachefs: Fix stack oob in __bch2_encrypt_bio()
  bcachefs: Fix btree_trans leak in bch2_readahead()
  bcachefs: Fix bogus verify_replicas_entry() assert
  bcachefs: Check for subvolues with bogus snapshot/inode fields
  bcachefs: bch2_checksum() returns 0 for unknown checksum type
  bcachefs: Fix bch2_alloc_ciphers()
  bcachefs: Add missing guard in bch2_snapshot_has_children()
  bcachefs: Fix missing parens in drop_locks_do()
  bcachefs: Improve bch2_assert_pos_locked()
  bcachefs: Fix shift overflows in replicas.c
  bcachefs: Fix shift overflow in btree_lost_data()
  bcachefs: Fix ref in trans_mark_dev_sbs() error path
  bcachefs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
  bcachefs: Fix rcu splat in check_fix_ptrs()
2024-05-24 09:07:22 -07:00
Linus Torvalds 9ea370f341 Input updates for v6.10-rc0
- a change to input core to trim amount of keys data in modalias string
   in case when a device declares too many keys and they do not fit in
   uevent buffer instead of reporting an error which results in uevent
   not being generated at all
 
 - support for Machenike G5 Pro Controller added to xpad driver
 
 - support for FocalTech FT5452 and FT8719 added to edt-ft5x06
 
 - support for new SPMI vibrator added to pm8xxx-vibrator driver
 
 - missing locking added to cyapa touchpad driver
 
 - removal of unused fields in various driver structures
 
 - explicit initialization of i2c_device_id::driver_data to 0 dropped
   from input drivers
 
 - other assorted fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQST2eWILY88ieB2DOtAj56VGEWXnAUCZk/rJQAKCRBAj56VGEWX
 nOFVAQD8lfavuaJwEc0k/P39hZGOnTh423Um5gqIj8FOMw/V3AEA3D9IdTFC32DA
 JphZ5YvneDAfqu76ZRnjQi2oyOikygo=
 =8zDF
 -----END PGP SIGNATURE-----

Merge tag 'input-for-v6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input updates from Dmitry Torokhov:

 - a change to input core to trim amount of keys data in modalias string
   in case when a device declares too many keys and they do not fit in
   uevent buffer instead of reporting an error which results in uevent
   not being generated at all

 - support for Machenike G5 Pro Controller added to xpad driver

 - support for FocalTech FT5452 and FT8719 added to edt-ft5x06

 - support for new SPMI vibrator added to pm8xxx-vibrator driver

 - missing locking added to cyapa touchpad driver

 - removal of unused fields in various driver structures

 - explicit initialization of i2c_device_id::driver_data to 0 dropped
   from input drivers

 - other assorted fixes and cleanups.

* tag 'input-for-v6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (24 commits)
  Input: edt-ft5x06 - add support for FocalTech FT5452 and FT8719
  dt-bindings: input: touchscreen: edt-ft5x06: Document FT5452 and FT8719 support
  Input: xpad - add support for Machenike G5 Pro Controller
  Input: try trimming too long modalias strings
  Input: drop explicit initialization of struct i2c_device_id::driver_data to 0
  Input: zet6223 - remove an unused field in struct zet6223_ts
  Input: chipone_icn8505 - remove an unused field in struct icn8505_data
  Input: cros_ec_keyb - remove an unused field in struct cros_ec_keyb
  Input: lpc32xx-keys - remove an unused field in struct lpc32xx_kscan_drv
  Input: matrix_keypad - remove an unused field in struct matrix_keypad
  Input: tca6416-keypad - remove unused struct tca6416_drv_data
  Input: tca6416-keypad - remove an unused field in struct tca6416_keypad_chip
  Input: da7280 - remove an unused field in struct da7280_haptic
  Input: ff-core - prefer struct_size over open coded arithmetic
  Input: cyapa - add missing input core locking to suspend/resume functions
  input: pm8xxx-vibrator: add new SPMI vibrator support
  dt-bindings: input: qcom,pm8xxx-vib: add new SPMI vibrator module
  input: pm8xxx-vibrator: refactor to support new SPMI vibrator
  Input: pm8xxx-vibrator - correct VIB_MAX_LEVELS calculation
  Input: sur40 - convert le16 to cpu before use
  ...
2024-05-24 09:01:21 -07:00
Linus Torvalds 041c9f71a4 sound fixes for 6.10-rc1
A collection of small fixes for 6.10-rc1.  Most of changes are
 various device-specific fixes and quirks, while there are a few small
 changes in ALSA core timer and module / built-in fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmZPUagOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE9LYg//QurZR7KBAvim5LcsVDLE5pFUjW0v3fz0+vKB
 /UpoK1EVxc9pqNXzKi8YDiRoKZY8J8krGHd5FV44qhZl2nVJ87hXbHU5b/i29QUu
 4xKC1pMmF0ncJ8qMGhzTynyxw0Hr7soCcxz+4ApDzN/pyzc7QTPEaUB1ND7jTB2z
 bcYgXyFprJQ1RmsV9u2mGhNEv3tYRaQO1GNxr9ktO/I13CCKd7LkGUSxo5UfOFwC
 bIrpqG35MDzeVrxEfB1UHlyKhULf9fmpUW0OYYS/DMQFptRa+PXEOgzN81wrhNwL
 sp2k41x4uRtKrB1DFCeweis4m0OHbV0yakkV/3PTdONzJk4PxWoPuGP4uZyoNz3B
 FwexeSpZICpgGHeS4WuS0RW3SbQ9n/3d33nzpCYrojyxqCuc6UXGPyiq6QHUVtXZ
 LnOPyeJRIhS52wpELByJmcnf9ev4ImLSnGWUzz/Mf5dFZCVSXKWVvgQ+UcWbZZnj
 vTp0mTMQUjuVhE0KuRawMx2YSUU7nuRBukFBihjIRSYJYvZETN7WNjMUA3UnpG1d
 uKXJaTEm2UqlZtsnKkXrWNIpj4EQjoZo0qgx4ZwSYicLgXDJ/WlHvltdo9fJpRh3
 u23957ye7wJ4JMikqyhd0Wzh/1UwOs4GTMWTcim6pKXwlkn8TwCB1F/OT/6xqlYZ
 gScnqBQ=
 =VeU/
 -----END PGP SIGNATURE-----

Merge tag 'sound-fix-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of small fixes for 6.10-rc1. Most of changes are various
  device-specific fixes and quirks, while there are a few small changes
  in ALSA core timer and module / built-in fixes"

* tag 'sound-fix-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek: fix mute/micmute LEDs don't work for ProBook 440/460 G11.
  ALSA: core: Enable proc module when CONFIG_MODULES=y
  ALSA: core: Fix NULL module pointer assignment at card init
  ALSA: hda/realtek: Enable headset mic of JP-IK LEAP W502 with ALC897
  ASoC: dt-bindings: stm32: Ensure compatible pattern matches whole string
  ASoC: tas2781: Fix wrong loading calibrated data sequence
  ASoC: tas2552: Add TX path for capturing AUDIO-OUT data
  ALSA: usb-audio: Fix for sampling rates support for Mbox3
  Documentation: sound: Fix trailing whitespaces
  ALSA: timer: Set lower bound of start tick time
  ASoC: codecs: ES8326: solve hp and button detect issue
  ASoC: rt5645: mic-in detection threshold modification
  ASoC: Intel: sof_sdw_rt_sdca_jack_common: Use name_prefix for `-sdca` detection
2024-05-24 08:48:51 -07:00
Linus Torvalds e292ead0c9 Char/Misc bugfix for 6.10-rc1
Here is one remaining bugfix for 6.10-rc1 that missed the 6.9-final
 merge window, and has been sitting in my tree and linux-next for quite a
 while now, but wasn't sent to you (my fault, travels...)
 
 It is a bugfix to resolve an error in the speakup code that could
 overflow a buffer.
 
 It has been in linux-next for a while with no reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZlA+4A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylbCQCg1yrG9xtO9Gg5CLBcV9gRmAEjKGIAn16Y8Hmm
 k9R7qGfSOhqPq/qt6Nxx
 =s69+
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.10-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc fix from Greg KH:
 "Here is one remaining bugfix for 6.10-rc1 that missed the 6.9-final
  merge window, and has been sitting in my tree and linux-next for quite
  a while now, but wasn't sent to you (my fault, travels...)

  It is a bugfix to resolve an error in the speakup code that could
  overflow a buffer.

  It has been in linux-next for a while with no reported problems"

* tag 'char-misc-6.10-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  speakup: Fix sizeof() vs ARRAY_SIZE() bug
2024-05-24 08:43:25 -07:00
Linus Torvalds f6d199c774 TTY/Serial fixes for 6.10-rc1
Here are some small TTY and Serial driver fixes that missed the
 6.9-final merge window, but have been in my tree for weeks (my fault,
 travel caused me to miss this.)
 
 These fixes include:
   - more n_gsm fixes for reported problems
   - 8520_mtk driver fix
   - 8250_bcm7271 driver fix
   - sc16is7xx driver fix
 
 All of these have been in linux-next for weeks without any reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZlBGKQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylOkwCfUOa00YQt3jJwBEC9bQUprW1z95MAoKW00V5g
 UJgQ7+1d+o4bT/ib5xpj
 =/O0m
 -----END PGP SIGNATURE-----

Merge tag 'tty-6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are some small TTY and Serial driver fixes that missed the
  6.9-final merge window, but have been in my tree for weeks (my fault,
  travel caused me to miss this)

  These fixes include:

   - more n_gsm fixes for reported problems

   - 8520_mtk driver fix

   - 8250_bcm7271 driver fix

   - sc16is7xx driver fix

  All of these have been in linux-next for weeks without any reported
  problems"

* tag 'tty-6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: sc16is7xx: fix bug in sc16is7xx_set_baud() when using prescaler
  serial: 8250_bcm7271: use default_mux_rate if possible
  serial: 8520_mtk: Set RTS on shutdown for Rx in-band wakeup
  tty: n_gsm: fix missing receive state reset after mode switch
  tty: n_gsm: fix possible out-of-bounds in gsm0_receive()
2024-05-24 08:38:28 -07:00
Linus Torvalds b0a9ba13ff hardening fixes for v6.10-rc1
- loadpin: Prevent SECURITY_LOADPIN_ENFORCE=y without module decompression
   (Stephen Boyd)
 
 - ubsan: Restore dependency on ARCH_HAS_UBSAN
 
 - kunit/fortify: Fix memcmp() test to be amplitude agnostic
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmZP0w0WHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJqYDEACWaY0Xjig6Izo+B+85IozTLf2R
 Wv3zlOjUhjbRn7enzhVBRRfU216nl/wp8s7pKhNYCEZ7gJ+04hYtZoLY6YV7jtZ0
 RAvpwc1dmUm7RZIBxjnzqiNTdttNBniPDE47goV0Yi9JVSDFY1Y/P5GwiAr0PO6W
 kt1+WBr2zADNpTZziH8MZou7jfK+y1bOZw8rUUFMODrMc0buuLGO2h+lZqASJXNs
 5NHPUOoJsZHvQxN/YSyE555VycpoyWiwMvA1XOz1NVKdr1eFP1heu88AnIRKOD7o
 cMz6W/yUZ+4dYr2yydDGNX+QvFmZuvPz0oXAlI7BAblpT0UU7xv0jaioAhIam87U
 WxVQSOgkLQBw6Ym79W66HplizCVfEl9aUAYDSK5UJlwdpNE/j16XLYDLKxDi0wUZ
 pjUy5CF0X7FFNyY7Kp5flqzKrQG31vfqZf/yWhtWu258x604LR6CTkO06IJDINx0
 UUrbehie3bGnbu5FS0oVKGH37Mq0aRn4Xk2aUZaFf1Vz/YtU4Wo3FbtyOyFZsdpl
 aCNyYzmNmfVijDQlLshy6HBACeLPV2DjIJ8pcC74abUV1FX6VOvIDsTy4ELkm9BF
 WZ8LNryo79lFsFMThhwfCDHubhXoaLjkl4rpOB5x+Ld0q+GgfIb5jMfF507YxrRj
 3KxJJKXzUKNf+JFnjg==
 =VTTF
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:

 - loadpin: Prevent SECURITY_LOADPIN_ENFORCE=y without module
   decompression (Stephen Boyd)

 - ubsan: Restore dependency on ARCH_HAS_UBSAN

 - kunit/fortify: Fix memcmp() test to be amplitude agnostic

* tag 'hardening-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  kunit/fortify: Fix memcmp() test to be amplitude agnostic
  ubsan: Restore dependency on ARCH_HAS_UBSAN
  loadpin: Prevent SECURITY_LOADPIN_ENFORCE=y without module decompression
2024-05-24 08:33:44 -07:00
Linus Torvalds 0eb03c7e8e tracefs/eventfs fixes and updates for v6.10:
Bug fixes:
 
 - The eventfs directories need to have unique inode numbers. Make sure that
   they do not get the default file inode number.
 
 - Update the inode uid and gid fields on remount.
   When a remount happens where a uid and/or gid is specified, all the tracefs
   files and directories should get the specified uid and/or gid. But this
   can be sporadic when some uids were assigned already. There's already
   a list of inodes that are allocated. Just update their uid and gid fields
   at the time of remount.
 
 - Update the eventfs_inodes on remount from the top level "events" descriptor.
   There was a bug where not all the eventfs files or directories where
   getting updated on remount. One fix was to clear the SAVED_UID/GID
   flags from the inode list during the iteration of the inodes during
   the remount. But because the eventfs inodes can be freed when the last
   referenced is released, not all the eventfs_inodes were being updated.
   This lead to the ownership selftest to fail if it was run a second
   time (the first time would leave eventfs_inodes with no corresponding
   tracefs_inode).
 
   Instead, for eventfs_inodes, only process the "events" eventfs_inode
   from the list iteration, as it is guaranteed to have a tracefs_inode
   (it's never freed while the "events" directory exists). As it has
   a list of its children, and the children have a list of their children,
   just iterate all the eventfs_inodes from the "events" descriptor and
   it is guaranteed to get all of them.
 
 - Clear the EVENT_INODE flag from the tracefs_drop_inode() callback.
   Currently the EVENTFS_INODE FLAG is cleared in the tracefs_d_iput()
   callback. But this is the wrong location. The iput() callback is
   called when the last reference to the dentry inode is hit. There could
   be a case where two dentry's have the same inode, and the flag will
   be cleared prematurely. The flag needs to be cleared when the last
   reference of the inode is dropped and that happens in the inode's
   drop_inode() callback handler.
 
 Clean ups:
 
 - Consolidate the creation of a tracefs_inode for an eventfs_inode
   A tracefs_inode is created for both files and directories of the
   eventfs system. It is open coded. Instead, consolidate it into a
   single eventfs_get_inode() function call.
 
 - Remove the eventfs getattr and permission callbacks.
   The permissions for the eventfs files and directories are updated
   when the inodes are created, on remount, and when the user sets
   them (via setattr). The inodes hold the current permissions so
   there is no need to have custom getattr or permissions callbacks
   as they will more likely cause them to be incorrect. The inode's
   permissions are updated when they should be updated. Remove the
   getattr and permissions inode callbacks.
 
 - Do not update eventfs_inode attributes on creation of inodes.
   The eventfs_inodes attribute field is used to store the permissions
   of the directories and files for when their corresponding inodes
   are freed and are created again. But when the creation of the inodes
   happen, the eventfs_inode attributes are recalculated. The
   recalculation should only happen when the permissions change for
   a given file or directory. Currently, the attribute changes are
   just being set to their current files so this is not a bug, but
   it's unnecessary and error prone. Stop doing that.
 
 - The events directory inode is created once when the events directory
   is created and deleted when it is deleted. It is now updated on
   remount and when the user changes the permissions. There's no need
   to use the eventfs_inode of the events directory to store the
   events directory permissions. But using it to store the default
   permissions for the files within the directory that have not been
   updated by the user can simplify the code.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZk+0ohQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qtWOAQCSdEsWYiNcFBqvKp1kSI+dH1sKfur3
 CAoe1trzDEdv/gEAsFkophR9OBzO193in4ZQYNKdEDfeaicEaDctzLxlkwY=
 =9zqq
 -----END PGP SIGNATURE-----

Merge tag 'trace-tracefs-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracefs/eventfs updates from Steven Rostedt:
 "Bug fixes:

   - The eventfs directories need to have unique inode numbers. Make
     sure that they do not get the default file inode number.

   - Update the inode uid and gid fields on remount.

     When a remount happens where a uid and/or gid is specified, all the
     tracefs files and directories should get the specified uid and/or
     gid. But this can be sporadic when some uids were assigned already.
     There's already a list of inodes that are allocated. Just update
     their uid and gid fields at the time of remount.

   - Update the eventfs_inodes on remount from the top level "events"
     descriptor.

     There was a bug where not all the eventfs files or directories
     where getting updated on remount. One fix was to clear the
     SAVED_UID/GID flags from the inode list during the iteration of the
     inodes during the remount. But because the eventfs inodes can be
     freed when the last referenced is released, not all the
     eventfs_inodes were being updated. This lead to the ownership
     selftest to fail if it was run a second time (the first time would
     leave eventfs_inodes with no corresponding tracefs_inode).

     Instead, for eventfs_inodes, only process the "events"
     eventfs_inode from the list iteration, as it is guaranteed to have
     a tracefs_inode (it's never freed while the "events" directory
     exists). As it has a list of its children, and the children have a
     list of their children, just iterate all the eventfs_inodes from
     the "events" descriptor and it is guaranteed to get all of them.

   - Clear the EVENT_INODE flag from the tracefs_drop_inode() callback.

     Currently the EVENTFS_INODE FLAG is cleared in the tracefs_d_iput()
     callback. But this is the wrong location. The iput() callback is
     called when the last reference to the dentry inode is hit. There
     could be a case where two dentry's have the same inode, and the
     flag will be cleared prematurely. The flag needs to be cleared when
     the last reference of the inode is dropped and that happens in the
     inode's drop_inode() callback handler.

  Cleanups:

   - Consolidate the creation of a tracefs_inode for an eventfs_inode

     A tracefs_inode is created for both files and directories of the
     eventfs system. It is open coded. Instead, consolidate it into a
     single eventfs_get_inode() function call.

   - Remove the eventfs getattr and permission callbacks.

     The permissions for the eventfs files and directories are updated
     when the inodes are created, on remount, and when the user sets
     them (via setattr). The inodes hold the current permissions so
     there is no need to have custom getattr or permissions callbacks as
     they will more likely cause them to be incorrect. The inode's
     permissions are updated when they should be updated. Remove the
     getattr and permissions inode callbacks.

   - Do not update eventfs_inode attributes on creation of inodes.

     The eventfs_inodes attribute field is used to store the permissions
     of the directories and files for when their corresponding inodes
     are freed and are created again. But when the creation of the
     inodes happen, the eventfs_inode attributes are recalculated. The
     recalculation should only happen when the permissions change for a
     given file or directory. Currently, the attribute changes are just
     being set to their current files so this is not a bug, but it's
     unnecessary and error prone. Stop doing that.

   - The events directory inode is created once when the events
     directory is created and deleted when it is deleted. It is now
     updated on remount and when the user changes the permissions.
     There's no need to use the eventfs_inode of the events directory to
     store the events directory permissions. But using it to store the
     default permissions for the files within the directory that have
     not been updated by the user can simplify the code"

* tag 'trace-tracefs-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  eventfs: Do not use attributes for events directory
  eventfs: Cleanup permissions in creation of inodes
  eventfs: Remove getattr and permission callbacks
  eventfs: Consolidate the eventfs_inode update in eventfs_get_inode()
  tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()
  eventfs: Update all the eventfs_inodes from the events descriptor
  tracefs: Update inode permissions on remount
  eventfs: Keep the directories from having the same inode number as files
2024-05-24 08:27:34 -07:00
Friedrich Vock 44382b3ed6 bpf: Fix potential integer overflow in resolve_btfids
err is a 32-bit integer, but elf_update returns an off_t, which is 64-bit
at least on 64-bit platforms. If symbols_patch is called on a binary between
2-4GB in size, the result will be negative when cast to a 32-bit integer,
which the code assumes means an error occurred. This can wrongly trigger
build failures when building very large kernel images.

Fixes: fbbb68de80 ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object")
Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240514070931.199694-1-friedrich.vock@gmx.de
2024-05-24 17:12:12 +02:00
David S. Miller 0b4f5add9f Merge branch 'mlx5-fixes'
Tariq Toukan says:

====================
mlx5 fixes 24-05-22

This patchset provides bug fixes to mlx5 core and Eth drivers.

Series generated against:
commit 9c91c7fadb ("net: mana: Fix the extra HZ in mana_hwc_send_request")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:08 +01:00
Gal Pressman 83fea49f27 net/mlx5e: Fix UDP GSO for encapsulated packets
When the skb is encapsulated, adjust the inner UDP header instead of the
outer one, and account for UDP header (instead of TCP) in the inline
header size calculation.

Fixes: 689adf0d48 ("net/mlx5e: Add UDP GSO support")
Reported-by: Jason Baron <jbaron@akamai.com>
Closes: https://lore.kernel.org/netdev/c42961cb-50b9-4a9a-bd43-87fe48d88d29@akamai.com/
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:08 +01:00
Carolina Jubran 5c74195d5d net/mlx5e: Use rx_missed_errors instead of rx_dropped for reporting buffer exhaustion
Previously, the driver incorrectly used rx_dropped to report device
buffer exhaustion.

According to the documentation, rx_dropped should not be used to count
packets dropped due to buffer exhaustion, which is the purpose of
rx_missed_errors.

Use rx_missed_errors as intended for counting packets dropped due to
buffer exhaustion.

Fixes: 269e6b3af3 ("net/mlx5e: Report additional error statistics in get stats ndo")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:08 +01:00
Rahul Rameshbabu f55cd31287 net/mlx5e: Do not use ptp structure for tx ts stats when not initialized
The ptp channel instance is only initialized when ptp traffic is first
processed by the driver. This means that there is a window in between when
port timestamping is enabled and ptp traffic is sent where the ptp channel
instance is not initialized. Accessing statistics during this window will
lead to an access violation (NULL + member offset). Check the validity of
the instance before attempting to query statistics.

  BUG: unable to handle page fault for address: 0000000000003524
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 109dfc067 P4D 109dfc067 PUD 1064ef067 PMD 0
  Oops: 0000 [#1] SMP
  CPU: 0 PID: 420 Comm: ethtool Not tainted 6.9.0-rc2-rrameshbabu+ #245
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/204
  RIP: 0010:mlx5e_stats_ts_get+0x4c/0x130
  <snip>
  Call Trace:
   <TASK>
   ? show_regs+0x60/0x70
   ? __die+0x24/0x70
   ? page_fault_oops+0x15f/0x430
   ? do_user_addr_fault+0x2c9/0x5c0
   ? exc_page_fault+0x63/0x110
   ? asm_exc_page_fault+0x27/0x30
   ? mlx5e_stats_ts_get+0x4c/0x130
   ? mlx5e_stats_ts_get+0x20/0x130
   mlx5e_get_ts_stats+0x15/0x20
  <snip>

Fixes: 3579032c08 ("net/mlx5e: Implement ethtool hardware timestamping statistics")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:08 +01:00
Rahul Rameshbabu 9a52f6d44f net/mlx5e: Fix IPsec tunnel mode offload feature check
Remove faulty check disabling checksum offload and GSO for offload of
simple IPsec tunnel L4 traffic. Comment previously describing the deleted
code incorrectly claimed the check prevented double tunnel (or three layers
of ip headers).

Fixes: f1267798c9 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:08 +01:00
Rahul Rameshbabu 16d66a4fa8 net/mlx5: Use mlx5_ipsec_rx_status_destroy to correctly delete status rules
rx_create no longer allocates a modify_hdr instance that needs to be
cleaned up. The mlx5_modify_header_dealloc call will lead to a NULL pointer
dereference. A leak in the rules also previously occurred since there are
now two rules populated related to status.

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 109907067 P4D 109907067 PUD 116890067 PMD 0
  Oops: 0000 [#1] SMP
  CPU: 1 PID: 484 Comm: ip Not tainted 6.9.0-rc2-rrameshbabu+ #254
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/2014
  RIP: 0010:mlx5_modify_header_dealloc+0xd/0x70
  <snip>
  Call Trace:
   <TASK>
   ? show_regs+0x60/0x70
   ? __die+0x24/0x70
   ? page_fault_oops+0x15f/0x430
   ? free_to_partial_list.constprop.0+0x79/0x150
   ? do_user_addr_fault+0x2c9/0x5c0
   ? exc_page_fault+0x63/0x110
   ? asm_exc_page_fault+0x27/0x30
   ? mlx5_modify_header_dealloc+0xd/0x70
   rx_create+0x374/0x590
   rx_add_rule+0x3ad/0x500
   ? rx_add_rule+0x3ad/0x500
   ? mlx5_cmd_exec+0x2c/0x40
   ? mlx5_create_ipsec_obj+0xd6/0x200
   mlx5e_accel_ipsec_fs_add_rule+0x31/0xf0
   mlx5e_xfrm_add_state+0x426/0xc00
  <snip>

Fixes: 94af50c0a9 ("net/mlx5e: Unify esw and normal IPsec status table creation/destruction")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:07 +01:00
Gal Pressman 1b9f86c6d5 net/mlx5: Fix MTMP register capability offset in MCAM register
The MTMP register (0x900a) capability offset is off-by-one, move it to
the right place.

Fixes: 1f507e80c7 ("net/mlx5: Expose NIC temperature via hardware monitoring kernel API")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:07 +01:00
Tariq Toukan fca3b47918 net/mlx5: Do not query MPIR on embedded CPU function
A proper query to MPIR needs to set the correct value in the depth field.
On embedded CPU this value is not necessarily zero. As there is no real
use case for multi-PF netdev on the embedded CPU of the smart NIC, block
this option.

This fixes the following failure:
ACCESS_REG(0x805) op_mod(0x1) failed, status bad system state(0x4), syndrome (0x685f19), err(-5)

Fixes: 678eb44805 ("net/mlx5: SD, Implement basic query and instantiation")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:07 +01:00
Maher Sanalla 51ef9305b8 net/mlx5: Lag, do bond only if slaves agree on roce state
Currently, the driver does not enforce that lag bond slaves must have
matching roce capabilities. Yet, in mlx5_do_bond(), the driver attempts
to enable roce on all vports of the bond slaves, causing the following
syndrome when one slave has no roce fw support:

mlx5_cmd_out_err:809:(pid 25427): MODIFY_NIC_VPORT_CONTEXT(0×755) op_mod(0×0)
failed, status bad parameter(0×3), syndrome (0xc1f678), err(-22)

Thus, create HW lag only if bond's slaves agree on roce state,
either all slaves have roce support resulting in a roce lag bond,
or none do, resulting in a raw eth bond.

Fixes: 7907f23adc ("net/mlx5: Implement RoCE LAG feature")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:07 +01:00
Christian Brauner 712182b67e swap: yield device immediately
Otherwise we can cause spurious EBUSY issues when trying to mount the
rootfs later on.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=218845
Reported-by: Petri Kaukasoina <petri.kaukasoina@tuni.fi>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:08 +02:00
David Howells c596bea145 netfs: Fix setting of BDP_ASYNC from iocb flags
Fix netfs_perform_write() to set BDP_ASYNC if IOCB_NOWAIT is set rather
than if IOCB_SYNC is not set.  It reflects asynchronicity in the sense of
not waiting rather than synchronicity in the sense of not returning until
the op is complete.

Without this, generic/590 fails on cifs in strict caching mode with a
complaint that one of the writes fails with EAGAIN.  The test can be
distilled down to:

        mount -t cifs /my/share /mnt -ostuff
        xfs_io -i -c 'falloc 0 8191M -c fsync -f /mnt/file
        xfs_io -i -c 'pwrite -b 1M -W 0 8191M' /mnt/file

Fixes: c38f4e96e6 ("netfs: Provide func to copy data to pagecache for buffered write")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/316306.1716306586@warthog.procyon.org.uk
Reviewed-by: Jens Axboe <axboe@kernel.dk>
cc: Jeff Layton <jlayton@kernel.org>
cc: Enzo Matsumiya <ematsumiya@suse.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:07 +02:00
Fedor Pchelkin 65bea99537 signalfd: drop an obsolete comment
Commit fbe38120eb ("signalfd: convert to ->read_iter()") removed the
call to anon_inode_getfd() by splitting fd setup into two parts. Drop the
comment referencing the internal details of that function.

Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Link: https://lore.kernel.org/r/20240520090819.76342-2-pchelkin@ispras.ru
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:07 +02:00
Fedor Pchelkin f826bc9d6f signalfd: fix error return code
If anon_inode_getfile() fails, return appropriate error code. This looks
like a single typo: the similar code changes in timerfd and userfaultfd
are okay.

Found by Linux Verification Center (linuxtesting.org).

Fixes: fbe38120eb ("signalfd: convert to ->read_iter()")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Link: https://lore.kernel.org/r/20240520090819.76342-1-pchelkin@ispras.ru
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:07 +02:00
Xu Yang 4e527d5841 iomap: fault in smaller chunks for non-large folio mappings
Since commit (5d8edfb900 "iomap: Copy larger chunks from userspace"),
iomap will try to copy in larger chunks than PAGE_SIZE. However, if the
mapping doesn't support large folio, only one page of maximum 4KB will
be created and 4KB data will be writen to pagecache each time. Then,
next 4KB will be handled in next iteration. This will cause potential
write performance problem.

If chunk is 2MB, total 512 pages need to be handled finally. During this
period, fault_in_iov_iter_readable() is called to check iov_iter readable
validity. Since only 4KB will be handled each time, below address space
will be checked over and over again:

start         	end
-
buf,    	buf+2MB
buf+4KB, 	buf+2MB
buf+8KB, 	buf+2MB
...
buf+2044KB 	buf+2MB

Obviously the checking size is wrong since only 4KB will be handled each
time. So this will get a correct chunk to let iomap work well in non-large
folio case.

With this change, the write speed will be stable. Tested on ARM64 device.

Before:

 - dd if=/dev/zero of=/dev/sda bs=400K  count=10485  (334 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=800K  count=5242   (278 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=1600K count=2621   (204 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=2200K count=1906   (170 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=3000K count=1398   (150 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=4500K count=932    (139 MB/s)

After:

 - dd if=/dev/zero of=/dev/sda bs=400K  count=10485  (339 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=800K  count=5242   (330 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=1600K count=2621   (332 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=2200K count=1906   (333 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=3000K count=1398   (333 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=4500K count=932    (333 MB/s)

Fixes: 5d8edfb900 ("iomap: Copy larger chunks from userspace")
Cc: stable@vger.kernel.org
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240521114939.2541461-2-xu.yang_2@nxp.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:07 +02:00
Xu Yang 79c1374548 filemap: add helper mapping_max_folio_size()
Add mapping_max_folio_size() to get the maximum folio size for this
pagecache mapping.

Fixes: 5d8edfb900 ("iomap: Copy larger chunks from userspace")
Cc: stable@vger.kernel.org
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240521114939.2541461-1-xu.yang_2@nxp.com
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:06 +02:00
David Howells 2c6b531020 netfs: Fix AIO error handling when doing write-through
If an error occurs whilst we're doing an AIO write in write-through mode,
we may end up calling ->ki_complete() *and* returning an error from
->write_iter().  This can result in either a UAF (the ->ki_complete() func
pointer may get overwritten, for example) or a refcount underflow in
io_submit() as ->ki_complete is called twice.

Fix this by making netfs_end_writethrough() - and thus
netfs_perform_write() - unconditionally return -EIOCBQUEUED if we're doing
an AIO write and wait for completion if we're not.

Fixes: 288ace2f57 ("netfs: New writeback implementation")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/295052.1716298587@warthog.procyon.org.uk
cc: Jeff Layton <jlayton@kernel.org>
cc: Enzo Matsumiya <ematsumiya@suse.de>
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:06 +02:00
David Howells 9b038d004c netfs: Fix io_uring based write-through
This can be triggered by mounting a cifs filesystem with a cache=strict
mount option and then, using the fsx program from xfstests, doing:

        ltp/fsx -A -d -N 1000 -S 11463 -P /tmp /cifs-mount/foo \
          --replay-ops=gen112-fsxops

Where gen112-fsxops holds:

        fallocate 0x6be7 0x8fc5 0x377d3
        copy_range 0x9c71 0x77e8 0x2edaf 0x377d3
        write 0x2776d 0x8f65 0x377d3

The problem is that netfs_io_request::len is being used for two purposes
and ends up getting set to the amount of data we transferred, not the
amount of data the caller asked to be transferred (for various reasons,
such as mmap'd writes, we might end up rounding out the data written to the
server to include the entire folio at each end).

Fix this by keeping the amount we were asked to write in ->len and using
->submitted to track what we issued ops for.  Then, when we come to calling
->ki_complete(), ->len is the right size.

This also required netfs_cleanup_dio_write() to change since we're no
longer advancing wreq->len.  Use wreq->transferred instead as we might have
done a short read.

With this, the generic/112 xfstest passes if cifs is forced to put all
non-DIO opens into write-through mode.

Fixes: 288ace2f57 ("netfs: New writeback implementation")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/295086.1716298663@warthog.procyon.org.uk
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <stfrench@microsoft.com>
cc: Enzo Matsumiya <ematsumiya@suse.de>
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:06 +02:00
Mathieu Othacehe 128d54fbcb net: phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ8061
Following a similar reinstate for the KSZ8081 and KSZ9031.

Older kernels would use the genphy_soft_reset if the PHY did not implement
a .soft_reset.

The KSZ8061 errata described here:
https://ww1.microchip.com/downloads/en/DeviceDoc/KSZ8061-Errata-DS80000688B.pdf
and worked around with 232ba3a51c ("net: phy: Micrel KSZ8061: link failure after cable connect")
is back again without this soft reset.

Fixes: 6e2d85ec05 ("net: phy: Stop with excessive soft reset")
Tested-by: Karim Ben Houcine <karim.benhoucine@landisgyr.com>
Signed-off-by: Mathieu Othacehe <othacehe@gnu.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 12:30:38 +01:00
dicken.ding b84a8aba80 genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()
irq_find_at_or_after() dereferences the interrupt descriptor which is
returned by mt_find() while neither holding sparse_irq_lock nor RCU read
lock, which means the descriptor can be freed between mt_find() and the
dereference:

    CPU0                            CPU1
    desc = mt_find()
                                    delayed_free_desc(desc)
    irq_desc_get_irq(desc)

The use-after-free is reported by KASAN:

    Call trace:
     irq_get_next_irq+0x58/0x84
     show_stat+0x638/0x824
     seq_read_iter+0x158/0x4ec
     proc_reg_read_iter+0x94/0x12c
     vfs_read+0x1e0/0x2c8

    Freed by task 4471:
     slab_free_freelist_hook+0x174/0x1e0
     __kmem_cache_free+0xa4/0x1dc
     kfree+0x64/0x128
     irq_kobj_release+0x28/0x3c
     kobject_put+0xcc/0x1e0
     delayed_free_desc+0x14/0x2c
     rcu_do_batch+0x214/0x720

Guard the access with a RCU read lock section.

Fixes: 721255b982 ("genirq: Use a maple tree for interrupt descriptor management")
Signed-off-by: dicken.ding <dicken.ding@mediatek.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240524091739.31611-1-dicken.ding@mediatek.com
2024-05-24 12:49:35 +02:00
Konstantin Komarov 302e9dca84
fs/ntfs3: Break dir enumeration if directory contents error
If we somehow attempt to read beyond the directory size, an error
is supposed to be returned.

However, in some cases, read requests do not stop and instead enter
into a loop.

To avoid this, we set the position in the directory to the end.

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: stable@vger.kernel.org
2024-05-24 12:50:12 +03:00
Konstantin Komarov 05afeeebca
fs/ntfs3: Fix case when index is reused during tree transformation
In most cases when adding a cluster to the directory index,
they are placed at the end, and in the bitmap, this cluster corresponds
to the last bit. The new directory size is calculated as follows:

	data_size = (u64)(bit + 1) << indx->index_bits;

In the case of reusing a non-final cluster from the index,
data_size is calculated incorrectly, resulting in the directory size
differing from the actual size.

A check for cluster reuse has been added, and the size update is skipped.

Fixes: 82cae269cf ("fs/ntfs3: Add initialization of super block")
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: stable@vger.kernel.org
2024-05-24 12:50:12 +03:00
Matt Jan 06e785aeb9 connector: Fix invalid conversion in cn_proc.h
The implicit conversion from unsigned int to enum
proc_cn_event is invalid, so explicitly cast it
for compilation in a C++ compiler.
/usr/include/linux/cn_proc.h: In function 'proc_cn_event valid_event(proc_cn_event)':
/usr/include/linux/cn_proc.h:72:17: error: invalid conversion from 'unsigned int' to 'proc_cn_event' [-fpermissive]
   72 |         ev_type &= PROC_EVENT_ALL;
      |                 ^
      |                 |
      |                 unsigned int

Signed-off-by: Matt Jan <zoo868e@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 10:36:55 +01:00
Jeff Xu a52b4f11a2 selftest mm/mseal read-only elf memory segment
Sealing read-only of elf mapping so it can't be changed by mprotect.

[jeffxu@chromium.org: style change]
  Link: https://lkml.kernel.org/r/20240416220944.2481203-2-jeffxu@chromium.org
[amer.shanawany@gmail.com: fix linker error for inline function]
  Link: https://lkml.kernel.org/r/20240420202346.546444-1-amer.shanawany@gmail.com
[jeffxu@chromium.org: fix compile warning]
  Link: https://lkml.kernel.org/r/20240420003515.345982-2-jeffxu@chromium.org
[jeffxu@chromium.org: fix arm build]
  Link: https://lkml.kernel.org/r/20240502225331.3806279-2-jeffxu@chromium.org
Link: https://lkml.kernel.org/r/20240415163527.626541-6-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Signed-off-by: Amer Al Shanawany <amer.shanawany@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-23 19:40:27 -07:00
Jeff Xu c010d09900 mseal: add documentation
Add documentation for mseal().

Link: https://lkml.kernel.org/r/20240415163527.626541-5-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-23 19:40:26 -07:00
Jeff Xu 4926c7a52d selftest mm/mseal memory sealing
selftest for memory sealing change in mmap() and mseal().

Link: https://lkml.kernel.org/r/20240415163527.626541-4-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-23 19:40:26 -07:00
Jeff Xu 8be7258aad mseal: add mseal syscall
The new mseal() is an syscall on 64 bit CPU, and with following signature:

int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.

mseal() blocks following operations for the given memory range.

1> Unmapping, moving to another location, and shrinking the size,
   via munmap() and mremap(), can leave an empty space, therefore can
   be replaced with a VMA with a new set of attributes.

2> Moving or expanding a different VMA into the current location,
   via mremap().

3> Modifying a VMA via mmap(MAP_FIXED).

4> Size expansion, via mremap(), does not appear to pose any specific
   risks to sealed VMAs. It is included anyway because the use case is
   unclear. In any case, users can rely on merging to expand a sealed VMA.

5> mprotect() and pkey_mprotect().

6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
   memory, when users don't have write permission to the memory. Those
   behaviors can alter region contents by discarding pages, effectively a
   memset(0) for anonymous memory.

Following input during RFC are incooperated into this patch:

Jann Horn: raising awareness and providing valuable insights on the
destructive madvise operations.
Linus Torvalds: assisting in defining system call signature and scope.
Liam R. Howlett: perf optimization.
Theo de Raadt: sharing the experiences and insight gained from
  implementing mimmutable() in OpenBSD.

Finally, the idea that inspired this patch comes from Stephen Röttger's
work in Chrome V8 CFI.

[jeffxu@chromium.org: add branch prediction hint, per Pedro]
  Link: https://lkml.kernel.org/r/20240423192825.1273679-2-jeffxu@chromium.org
Link: https://lkml.kernel.org/r/20240415163527.626541-3-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-23 19:40:26 -07:00
Jeff Xu ff388fe5c4 mseal: wire up mseal syscall
Patch series "Introduce mseal", v10.

This patchset proposes a new mseal() syscall for the Linux kernel.

In a nutshell, mseal() protects the VMAs of a given virtual memory range
against modifications, such as changes to their permission bits.

Modern CPUs support memory permissions, such as the read/write (RW) and
no-execute (NX) bits.  Linux has supported NX since the release of kernel
version 2.6.8 in August 2004 [1].  The memory permission feature improves
the security stance on memory corruption bugs, as an attacker cannot
simply write to arbitrary memory and point the code to it.  The memory
must be marked with the X bit, or else an exception will occur. 
Internally, the kernel maintains the memory permissions in a data
structure called VMA (vm_area_struct).  mseal() additionally protects the
VMA itself against modifications of the selected seal type.

Memory sealing is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system.  For example,
such an attacker primitive can break control-flow integrity guarantees
since read-only memory that is supposed to be trusted can become writable
or .text pages can get remapped.  Memory sealing can automatically be
applied by the runtime loader to seal .text and .rodata pages and
applications can additionally seal security critical data at runtime.  A
similar feature already exists in the XNU kernel with the
VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall
[4].  Also, Chrome wants to adopt this feature for their CFI work [2] and
this patchset has been designed to be compatible with the Chrome use case.

Two system calls are involved in sealing the map:  mmap() and mseal().

The new mseal() is an syscall on 64 bit CPU, and with following signature:

int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.

mseal() blocks following operations for the given memory range.

1> Unmapping, moving to another location, and shrinking the size,
   via munmap() and mremap(), can leave an empty space, therefore can
   be replaced with a VMA with a new set of attributes.

2> Moving or expanding a different VMA into the current location,
   via mremap().

3> Modifying a VMA via mmap(MAP_FIXED).

4> Size expansion, via mremap(), does not appear to pose any specific
   risks to sealed VMAs. It is included anyway because the use case is
   unclear. In any case, users can rely on merging to expand a sealed VMA.

5> mprotect() and pkey_mprotect().

6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
   memory, when users don't have write permission to the memory. Those
   behaviors can alter region contents by discarding pages, effectively a
   memset(0) for anonymous memory.

The idea that inspired this patch comes from Stephen Röttger’s work in
V8 CFI [5].  Chrome browser in ChromeOS will be the first user of this
API.

Indeed, the Chrome browser has very specific requirements for sealing,
which are distinct from those of most applications.  For example, in the
case of libc, sealing is only applied to read-only (RO) or read-execute
(RX) memory segments (such as .text and .RELRO) to prevent them from
becoming writable, the lifetime of those mappings are tied to the lifetime
of the process.

Chrome wants to seal two large address space reservations that are managed
by different allocators.  The memory is mapped RW- and RWX respectively
but write access to it is restricted using pkeys (or in the future ARM
permission overlay extensions).  The lifetime of those mappings are not
tied to the lifetime of the process, therefore, while the memory is
sealed, the allocators still need to free or discard the unused memory. 
For example, with madvise(DONTNEED).

However, always allowing madvise(DONTNEED) on this range poses a security
risk.  For example if a jump instruction crosses a page boundary and the
second page gets discarded, it will overwrite the target bytes with zeros
and change the control flow.  Checking write-permission before the discard
operation allows us to control when the operation is valid.  In this case,
the madvise will only succeed if the executing thread has PKEY write
permissions and PKRU changes are protected in software by control-flow
integrity.

Although the initial version of this patch series is targeting the Chrome
browser as its first user, it became evident during upstream discussions
that we would also want to ensure that the patch set eventually is a
complete solution for memory sealing and compatible with other use cases. 
The specific scenario currently in mind is glibc's use case of loading and
sealing ELF executables.  To this end, Stephen is working on a change to
glibc to add sealing support to the dynamic linker, which will seal all
non-writable segments at startup.  Once this work is completed, all
applications will be able to automatically benefit from these new
protections.

In closing, I would like to formally acknowledge the valuable
contributions received during the RFC process, which were instrumental in
shaping this patch:

Jann Horn: raising awareness and providing valuable insights on the
  destructive madvise operations.
Liam R. Howlett: perf optimization.
Linus Torvalds: assisting in defining system call signature and scope.
Theo de Raadt: sharing the experiences and insight gained from
  implementing mimmutable() in OpenBSD.

MM perf benchmarks
==================
This patch adds a loop in the mprotect/munmap/madvise(DONTNEED) to
check the VMAs’ sealing flag, so that no partial update can be made,
when any segment within the given memory range is sealed.

To measure the performance impact of this loop, two tests are developed.
[8]

The first is measuring the time taken for a particular system call,
by using clock_gettime(CLOCK_MONOTONIC). The second is using
PERF_COUNT_HW_REF_CPU_CYCLES (exclude user space). Both tests have
similar results.

The tests have roughly below sequence:
for (i = 0; i < 1000, i++)
    create 1000 mappings (1 page per VMA)
    start the sampling
    for (j = 0; j < 1000, j++)
        mprotect one mapping
    stop and save the sample
    delete 1000 mappings
calculates all samples.

Below tests are performed on Intel(R) Pentium(R) Gold 7505 @ 2.00GHz,
4G memory, Chromebook.

Based on the latest upstream code:
The first test (measuring time)
syscall__	vmas	t	t_mseal	delta_ns	per_vma	%
munmap__  	1	909	944	35	35	104%
munmap__  	2	1398	1502	104	52	107%
munmap__  	4	2444	2594	149	37	106%
munmap__  	8	4029	4323	293	37	107%
munmap__  	16	6647	6935	288	18	104%
munmap__  	32	11811	12398	587	18	105%
mprotect	1	439	465	26	26	106%
mprotect	2	1659	1745	86	43	105%
mprotect	4	3747	3889	142	36	104%
mprotect	8	6755	6969	215	27	103%
mprotect	16	13748	14144	396	25	103%
mprotect	32	27827	28969	1142	36	104%
madvise_	1	240	262	22	22	109%
madvise_	2	366	442	76	38	121%
madvise_	4	623	751	128	32	121%
madvise_	8	1110	1324	215	27	119%
madvise_	16	2127	2451	324	20	115%
madvise_	32	4109	4642	534	17	113%

The second test (measuring cpu cycle)
syscall__	vmas	cpu	cmseal	delta_cpu	per_vma	%
munmap__	1	1790	1890	100	100	106%
munmap__	2	2819	3033	214	107	108%
munmap__	4	4959	5271	312	78	106%
munmap__	8	8262	8745	483	60	106%
munmap__	16	13099	14116	1017	64	108%
munmap__	32	23221	24785	1565	49	107%
mprotect	1	906	967	62	62	107%
mprotect	2	3019	3203	184	92	106%
mprotect	4	6149	6569	420	105	107%
mprotect	8	9978	10524	545	68	105%
mprotect	16	20448	21427	979	61	105%
mprotect	32	40972	42935	1963	61	105%
madvise_	1	434	497	63	63	115%
madvise_	2	752	899	147	74	120%
madvise_	4	1313	1513	200	50	115%
madvise_	8	2271	2627	356	44	116%
madvise_	16	4312	4883	571	36	113%
madvise_	32	8376	9319	943	29	111%

Based on the result, for 6.8 kernel, sealing check adds
20-40 nano seconds, or around 50-100 CPU cycles, per VMA.

In addition, I applied the sealing to 5.10 kernel:
The first test (measuring time)
syscall__	vmas	t	tmseal	delta_ns	per_vma	%
munmap__	1	357	390	33	33	109%
munmap__	2	442	463	21	11	105%
munmap__	4	614	634	20	5	103%
munmap__	8	1017	1137	120	15	112%
munmap__	16	1889	2153	263	16	114%
munmap__	32	4109	4088	-21	-1	99%
mprotect	1	235	227	-7	-7	97%
mprotect	2	495	464	-30	-15	94%
mprotect	4	741	764	24	6	103%
mprotect	8	1434	1437	2	0	100%
mprotect	16	2958	2991	33	2	101%
mprotect	32	6431	6608	177	6	103%
madvise_	1	191	208	16	16	109%
madvise_	2	300	324	24	12	108%
madvise_	4	450	473	23	6	105%
madvise_	8	753	806	53	7	107%
madvise_	16	1467	1592	125	8	108%
madvise_	32	2795	3405	610	19	122%
					
The second test (measuring cpu cycle)
syscall__	nbr_vma	cpu	cmseal	delta_cpu	per_vma	%
munmap__	1	684	715	31	31	105%
munmap__	2	861	898	38	19	104%
munmap__	4	1183	1235	51	13	104%
munmap__	8	1999	2045	46	6	102%
munmap__	16	3839	3816	-23	-1	99%
munmap__	32	7672	7887	216	7	103%
mprotect	1	397	443	46	46	112%
mprotect	2	738	788	50	25	107%
mprotect	4	1221	1256	35	9	103%
mprotect	8	2356	2429	72	9	103%
mprotect	16	4961	4935	-26	-2	99%
mprotect	32	9882	10172	291	9	103%
madvise_	1	351	380	29	29	108%
madvise_	2	565	615	49	25	109%
madvise_	4	872	933	61	15	107%
madvise_	8	1508	1640	132	16	109%
madvise_	16	3078	3323	245	15	108%
madvise_	32	5893	6704	811	25	114%

For 5.10 kernel, sealing check adds 0-15 ns in time, or 10-30
CPU cycles, there is even decrease in some cases.

It might be interesting to compare 5.10 and 6.8 kernel
The first test (measuring time)
syscall__	vmas	t_5_10	t_6_8	delta_ns	per_vma	%
munmap__	1	357	909	552	552	254%
munmap__	2	442	1398	956	478	316%
munmap__	4	614	2444	1830	458	398%
munmap__	8	1017	4029	3012	377	396%
munmap__	16	1889	6647	4758	297	352%
munmap__	32	4109	11811	7702	241	287%
mprotect	1	235	439	204	204	187%
mprotect	2	495	1659	1164	582	335%
mprotect	4	741	3747	3006	752	506%
mprotect	8	1434	6755	5320	665	471%
mprotect	16	2958	13748	10790	674	465%
mprotect	32	6431	27827	21397	669	433%
madvise_	1	191	240	49	49	125%
madvise_	2	300	366	67	33	122%
madvise_	4	450	623	173	43	138%
madvise_	8	753	1110	357	45	147%
madvise_	16	1467	2127	660	41	145%
madvise_	32	2795	4109	1314	41	147%

The second test (measuring cpu cycle)
syscall__	vmas	cpu_5_10	c_6_8	delta_cpu	per_vma	%
munmap__	1	684	1790	1106	1106	262%
munmap__	2	861	2819	1958	979	327%
munmap__	4	1183	4959	3776	944	419%
munmap__	8	1999	8262	6263	783	413%
munmap__	16	3839	13099	9260	579	341%
munmap__	32	7672	23221	15549	486	303%
mprotect	1	397	906	509	509	228%
mprotect	2	738	3019	2281	1140	409%
mprotect	4	1221	6149	4929	1232	504%
mprotect	8	2356	9978	7622	953	423%
mprotect	16	4961	20448	15487	968	412%
mprotect	32	9882	40972	31091	972	415%
madvise_	1	351	434	82	82	123%
madvise_	2	565	752	186	93	133%
madvise_	4	872	1313	442	110	151%
madvise_	8	1508	2271	763	95	151%
madvise_	16	3078	4312	1234	77	140%
madvise_	32	5893	8376	2483	78	142%

From 5.10 to 6.8
munmap: added 250-550 ns in time, or 500-1100 in cpu cycle, per vma.
mprotect: added 200-750 ns in time, or 500-1200 in cpu cycle, per vma.
madvise: added 33-50 ns in time, or 70-110 in cpu cycle, per vma.

In comparison to mseal, which adds 20-40 ns or 50-100 CPU cycles, the
increase from 5.10 to 6.8 is significantly larger, approximately ten times
greater for munmap and mprotect.

When I discuss the mm performance with Brian Makin, an engineer who worked
on performance, it was brought to my attention that such performance
benchmarks, which measuring millions of mm syscall in a tight loop, may
not accurately reflect real-world scenarios, such as that of a database
service.  Also this is tested using a single HW and ChromeOS, the data
from another HW or distribution might be different.  It might be best to
take this data with a grain of salt.


This patch (of 5):

Wire up mseal syscall for all architectures.

Link: https://lkml.kernel.org/r/20240415163527.626541-1-jeffxu@chromium.org
Link: https://lkml.kernel.org/r/20240415163527.626541-2-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com> [Bug #2]
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-23 19:40:26 -07:00
Andrii Nakryiko 699646734a uprobes: prevent mutex_lock() under rcu_read_lock()
Recent changes made uprobe_cpu_buffer preparation lazy, and moved it
deeper into __uprobe_trace_func(). This is problematic because
__uprobe_trace_func() is called inside rcu_read_lock()/rcu_read_unlock()
block, which then calls prepare_uprobe_buffer() -> uprobe_buffer_get() ->
mutex_lock(&ucb->mutex), leading to a splat about using mutex under
non-sleepable RCU:

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
   in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 98231, name: stress-ng-sigq
   preempt_count: 0, expected: 0
   RCU nest depth: 1, expected: 0
   ...
   Call Trace:
    <TASK>
    dump_stack_lvl+0x3d/0xe0
    __might_resched+0x24c/0x270
    ? prepare_uprobe_buffer+0xd5/0x1d0
    __mutex_lock+0x41/0x820
    ? ___perf_sw_event+0x206/0x290
    ? __perf_event_task_sched_in+0x54/0x660
    ? __perf_event_task_sched_in+0x54/0x660
    prepare_uprobe_buffer+0xd5/0x1d0
    __uprobe_trace_func+0x4a/0x140
    uprobe_dispatcher+0x135/0x280
    ? uprobe_dispatcher+0x94/0x280
    uprobe_notify_resume+0x650/0xec0
    ? atomic_notifier_call_chain+0x21/0x110
    ? atomic_notifier_call_chain+0xf8/0x110
    irqentry_exit_to_user_mode+0xe2/0x1e0
    asm_exc_int3+0x35/0x40
   RIP: 0033:0x7f7e1d4da390
   Code: 33 04 00 0f 1f 80 00 00 00 00 f3 0f 1e fa b9 01 00 00 00 e9 b2 fc ff ff 66 90 f3 0f 1e fa 31 c9 e9 a5 fc ff ff 0f 1f 44 00 00 <cc> 0f 1e fa b8 27 00 00 00 0f 05 c3 0f 1f 40 00 f3 0f 1e fa b8 6e
   RSP: 002b:00007ffd2abc3608 EFLAGS: 00000246
   RAX: 0000000000000000 RBX: 0000000076d325f1 RCX: 0000000000000000
   RDX: 0000000076d325f1 RSI: 000000000000000a RDI: 00007ffd2abc3690
   RBP: 000000000000000a R08: 00017fb700000000 R09: 00017fb700000000
   R10: 00017fb700000000 R11: 0000000000000246 R12: 0000000000017ff2
   R13: 00007ffd2abc3610 R14: 0000000000000000 R15: 00007ffd2abc3780
    </TASK>

Luckily, it's easy to fix by moving prepare_uprobe_buffer() to be called
slightly earlier: into uprobe_trace_func() and uretprobe_trace_func(), outside
of RCU locked section. This still keeps this buffer preparation lazy and helps
avoid the overhead when it's not needed. E.g., if there is only BPF uprobe
handler installed on a given uprobe, buffer won't be initialized.

Note, the other user of prepare_uprobe_buffer(), __uprobe_perf_func(), is not
affected, as it doesn't prepare buffer under RCU read lock.

Link: https://lore.kernel.org/all/20240521053017.3708530-1-andrii@kernel.org/

Fixes: 1b8f85defb ("uprobes: prepare uprobe args buffer lazily")
Reported-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-05-24 07:44:44 +09:00
Linus Torvalds 6d69b6c12f NFS client updates for Linux 6.10
Highlights include:
 
 Stable fixes:
 - nfs: fix undefined behavior in nfs_block_bits()
 - NFSv4.2: Fix READ_PLUS when server doesn't support OP_READ_PLUS
 
 Bugfixes:
 - Fix mixing of the lock/nolock and local_lock mount options
 - NFSv4: Fixup smatch warning for ambiguous return
 - NFSv3: Fix remount when using the legacy binary mount api
 - SUNRPC: Fix the handling of expired RPCSEC_GSS contexts
 - SUNRPC: fix the NFSACL RPC retries when soft mounts are enabled
 - rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
 
 Features and cleanups:
 - NFSv3: Use the atomic_open API to fix open(O_CREAT|O_TRUNC)
 - pNFS/filelayout: S layout segment range in LAYOUTGET
 - pNFS: rework pnfs_generic_pg_check_layout to check IO range
 - NFSv2: Turn off enabling of NFS v2 by default
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmZPpYMACgkQZwvnipYK
 APITOw//acjE9YTZcST9kgkf2bfwuHFcdxvMZAr4MV0YsfqMesU2MYmaK/5YMLyo
 iNCHjLmlfE2iLAUqvFtakc1F3guACJqqFfMdnMHa1MwPznrL3yNNClGnBamovbPd
 XK2MBgpQBXb+xLxqH0A2TtOK2ofk0CFzb3x9eaziox8omBM2j3v6ZARsDHYehuhM
 Hig8IxW/kZ7kx5jxqSVktrgW3gDKqIuLssF6fJVINzh45jHC5QO98cuSwetx6Mi1
 Aw04HbOE6B66ORrzC1wyGN3PwOkTW2kgAiyB6UNNt+Hnvr0RD5TEqf3s3mzmhP9N
 7LJ3H1Okxdcpn0G/bR4LBUg26r5BWxhfPiTYG/l9vAQk65yt2LO1kFzXbECBEfaG
 ctGG7/7mMLVPs05kIFYm5S0cIYW2dYNuE20JY50LMaCIopjThdfruQj3yR4xibSt
 bHrAbG9wW4qg/cgx860t5h7nbZnD5OOYIqKOCDRNrUfP7P0mK/tD49HggLjDo47M
 vIMlYS3bTNSF7uEPTrv6bFr8XOD1I3BVXDQwGaJMZ8zyhkUIQtKO70+i4xM1E/Wl
 Jw5Z6NpM8saDD449ZqX4IRUPDAhvz4v00QqD3Tqr4MHEc5sWi898S7XcJgL3bEai
 QMJmBkAK8aDAP/suPw8VQc9wqplFNlB+QEh87p2WO+yRoEucn+A=
 =HMSC
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Stable fixes:
   - nfs: fix undefined behavior in nfs_block_bits()
   - NFSv4.2: Fix READ_PLUS when server doesn't support OP_READ_PLUS

  Bugfixes:
   - Fix mixing of the lock/nolock and local_lock mount options
   - NFSv4: Fixup smatch warning for ambiguous return
   - NFSv3: Fix remount when using the legacy binary mount api
   - SUNRPC: Fix the handling of expired RPCSEC_GSS contexts
   - SUNRPC: fix the NFSACL RPC retries when soft mounts are enabled
   - rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL

  Features and cleanups:
   - NFSv3: Use the atomic_open API to fix open(O_CREAT|O_TRUNC)
   - pNFS/filelayout: S layout segment range in LAYOUTGET
   - pNFS: rework pnfs_generic_pg_check_layout to check IO range
   - NFSv2: Turn off enabling of NFS v2 by default"

* tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: fix undefined behavior in nfs_block_bits()
  pNFS: rework pnfs_generic_pg_check_layout to check IO range
  pNFS/filelayout: check layout segment range
  pNFS/filelayout: fixup pNfs allocation modes
  rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
  NFS: Don't enable NFS v2 by default
  NFS: Fix READ_PLUS when server doesn't support OP_READ_PLUS
  sunrpc: fix NFSACL RPC retry on soft mount
  SUNRPC: fix handling expired GSS context
  nfs: keep server info for remounts
  NFSv4: Fixup smatch warning for ambiguous return
  NFS: make sure lock/nolock overriding local_lock mount option
  NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.
  pNFS/filelayout: Specify the layout segment range in LAYOUTGET
  pNFS/filelayout: Remove the whole file layout requirement
2024-05-23 13:51:09 -07:00
Linus Torvalds b4d88a60fe block-6.10-20240523
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmZPaegQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplkkD/4h1vxr2a6jg44TEUJ9f59rIOELuYHXJdpt
 5m7r8UWcy7LF6HfmMgSeHV/7Gr1bBw6jh1eMubZRt9pZJ1sSGnc6vQdrOU+RnG9k
 F9i0qogAD2WXClQPAxvHGC1KD1quSdeiKME0hNJdGA6SsV4cYnDVeR8O6SQbaomD
 KPeGGBdjvrygRFhyDBFDACWK3GuD5POlbswUOwASYNrAb4OrQsj+bX/QXkuOXir9
 n/NW/RfiQqAvI4m51yzaMqfFWw+s0irhXNfchl3i8RBMvDFBRNEkgtDN4y2rUynK
 +FaDeAwGXR51/qL9gr0ZScXAY6Q7f/B9FkrTUZR7S1lD3JsLXiS+uOefXEljKsDd
 RpNUc0sX3RjaSu1uNiUD/H4v+umvR+r3uuAyH6OXstCQt+98SJUbQvZuzphVGC60
 iM8W+NRsaYZUhjN4LBj0NBGgCiidHanm22GCPADWN1fxZbjRWUoA886sZXTqmmMj
 +GGqpPU3pbGtj09ysaJpLKxu1TbD3QmcCUVPWQ8+DKt8PGGDDa+vIRXV8xswwQDg
 DyZoq0s/s00DzCXiPsbvVyKwXCJ1XSB0sEq0gvjDfGXb+5h6T+lH2irbcjBxUlwq
 qbofAmk6PVjxeWMUP4NXE04oK5Itc/l20LT9ECFPWzMdc1ht31TsqmxldHLIpDqp
 KUeacOh94A==
 =Btam
 -----END PGP SIGNATURE-----

Merge tag 'block-6.10-20240523' of git://git.kernel.dk/linux

Pull more block updates from Jens Axboe:
 "Followup block updates, mostly due to NVMe being a bit late to the
  party. But nothing major in there, so not a big deal.

  In detail, this contains:

   - NVMe pull request via Keith:
       - Fabrics connection retries (Daniel, Hannes)
       - Fabrics logging enhancements (Tokunori)
       - RDMA delete optimization (Sagi)

   - ublk DMA alignment fix (me)

   - null_blk sparse warning fixes (Bart)

   - Discard support for brd (Keith)

   - blk-cgroup list corruption fixes (Ming)

   - blk-cgroup stat propagation fix (Waiman)

   - Regression fix for plugging stall with md (Yu)

   - Misc fixes or cleanups (David, Jeff, Justin)"

* tag 'block-6.10-20240523' of git://git.kernel.dk/linux: (24 commits)
  null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues'
  blk-throttle: remove unused struct 'avg_latency_bucket'
  block: fix lost bio for plug enabled bio based device
  block: t10-pi: add MODULE_DESCRIPTION()
  blk-mq: add helper for checking if one CPU is mapped to specified hctx
  blk-cgroup: Properly propagate the iostat update up the hierarchy
  blk-cgroup: fix list corruption from reorder of WRITE ->lqueued
  blk-cgroup: fix list corruption from resetting io stat
  cdrom: rearrange last_media_change check to avoid unintentional overflow
  nbd: Fix signal handling
  nbd: Remove a local variable from nbd_send_cmd()
  nbd: Improve the documentation of the locking assumptions
  nbd: Remove superfluous casts
  nbd: Use NULL to represent a pointer
  brd: implement discard support
  null_blk: Fix two sparse warnings
  ublk_drv: set DMA alignment mask to 3
  nvme-rdma, nvme-tcp: include max reconnects for reconnect logging
  nvmet-rdma: Avoid o(n^2) loop in delete_ctrl
  nvme: do not retry authentication failures
  ...
2024-05-23 13:44:47 -07:00
Linus Torvalds 483a351ed4 io_uring-6.10-20240523
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmZPahYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpu+CD/0V3y0Nok87IE8B+gKNVFO3yLZai+1iNVe3
 wjLjHSOXPleycJaYWSiDo7ujA6kYY6CAvKH1KpjHdTiWvemh6hfClvA4a6kdigTh
 EB2MOsJcIKhRSS0PyJ+WIK+rIQspP50es9S48HjPdmJ/NtdOJXa4nKOMe6K+tK+N
 nAkWFjjEvwMO0Sgzx23sjU5lWqw1eJb5TeeA8dYpJtlDeQ3+Py7Msugzvuis176/
 ElW8xNyja24OBJjurLLPFr7cAigeT9ra7ciDEzBlL6O5cvf+SrMW++ihgy8TJWbf
 nbIv8KpNgBNq3h658rLi3cql1hRhRaYpwRiLaek0OYzTb5HO6Xb8WLC1iND5njFT
 uO1+S7JPLUFJeCi0vqXtopjnzBKadfO7MYqvXWBEAa8B+J3q502WzTJuJ8uoiNLU
 Ub/12P3zopt19bKE5FMYktNgdHVXYAKC6JxbqXVYtn/aMNypLMnw/XJDdsvHpLjb
 Y6D3PNTtYya1cil24AvrdA3Kv/lEyBLPurrqmq2aHgxUhuAGbXCJpz7boHkK3AKj
 ESjz4IeVl1R2EAsYIkfYPlDEOjJN+p6PgmfUEWteREg0tpZsBmSr3VI7JMuKN9FD
 cisCa30nXWR8Pu4pURocyXZW7INdVODbIPDF1k28mwYAo92l4pAntaREtNOoBtHk
 FqN2gO/Z9A==
 =+97D
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-6.10-20240523' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:
 "Single fix here for a regression in 6.9, and then a simple cleanup
  removing some dead code"

* tag 'io_uring-6.10-20240523' of git://git.kernel.dk/linux:
  io_uring: remove checks for NULL 'sq_offset'
  io_uring/sqpoll: ensure that normal task_work is also run timely
2024-05-23 13:41:49 -07:00
Linus Torvalds c2c80ecdb4 regulator: Fixes for v6.10
A bunch of fixes that came in during the merge window, Matti found
 several issues with some of the more complexly configured Rohm
 regulators and the helpers they use and there were some errors in the
 specification of tps6594 when regulators are grouped together.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmZPLEoACgkQJNaLcl1U
 h9DNfQf7B5ppKeMfWEzJKGywWmmzG96AldSvl7rcMyFxdzXjFu/kptGO0tLkJwTs
 DnlZF6TBooB/RezoHhVgq6nLtJcowQPlbhM3Y4gujzvumZiJ2k2Chsyy+H0YRvkZ
 kC9Bb2VGk7v5PimZiSugwgc1ZF+AE3LsZyzwWo4d0LZxOE1EyeQQ3SzygoKXtqzF
 QJxsT6+ynaeYBxVwW+pHztWca8b3o+kqNLauxcWb0J0sPJPu/dl2PIQrKIZBsOWC
 E34Y8MrrktqPApGE7kVL7dkmofceZI7Qv71aq2UJvRJOhvWP47f2L7nYwYi3kuIH
 EjioQuqa1Bf7Wms59Q68WLjn4e0ILA==
 =NeiV
 -----END PGP SIGNATURE-----

Merge tag 'regulator-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "A bunch of fixes that came in during the merge window.

  Matti found several issues with some of the more complexly configured
  Rohm regulators and the helpers they use and there were some errors in
  the specification of tps6594 when regulators are grouped together"

* tag 'regulator-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: tps6594-regulator: Correct multi-phase configuration
  regulator: tps6287x: Force writing VSEL bit
  regulator: pickable ranges: don't always cache vsel
  regulator: rohm-regulator: warn if unsupported voltage is set
  regulator: bd71828: Don't overwrite runtime voltages
2024-05-23 13:39:42 -07:00
Linus Torvalds 09f8f2c4ca regmap: Fix for v6.10
Guenter ran with memory sanitisers and found an issue in the new KUnit
 tests that Richard added where an assumption in older test code was
 exposed, this was fixed quickly by Richard.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmZPLK0ACgkQJNaLcl1U
 h9CZGwf+PvFLBs2eNVOPhe6fub6KmOtrzBxp20QhY3i/SFUFx8BllCNffzQRV3wN
 6JLeCK3RAkpe77sjFCKWsr6Tb/IDKAGCJvA9q+/NsU0UQThiUfsr7vcOB4nw8hY3
 d1P8dHJR89ux3XarEg5JYJHjZi02u9hg+0yub9PP/s6pfk0LVGgbA0mHR2fNFp2S
 xx8KIVm+qonAqyyDxn3jrhTCvcwvQg+0Y7cPVSaEA3YedVsDzwzH2i9Lo3GN/QDM
 vjgy84JnlIo1SQAWMX6Xt6hCX7puGOZXpJHcQxOe5RtL6T+vUmzNelJLPLB/vCbi
 IWy+yiQG6e4ckSiwB9pwIOExQXM+nA==
 =sKgs
 -----END PGP SIGNATURE-----

Merge tag 'regmap-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap fix from Mark Brown:
 "Guenter ran with memory sanitisers and found an issue in the new KUnit
  tests that Richard added where an assumption in older test code was
  exposed, this was fixed quickly by Richard"

* tag 'regmap-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: kunit: Fix array overflow in stride() test
2024-05-23 13:38:31 -07:00
Dongli Zhang a6c11c0a52 genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline
The absence of IRQD_MOVE_PCNTXT prevents immediate effectiveness of
interrupt affinity reconfiguration via procfs. Instead, the change is
deferred until the next instance of the interrupt being triggered on the
original CPU.

When the interrupt next triggers on the original CPU, the new affinity is
enforced within __irq_move_irq(). A vector is allocated from the new CPU,
but the old vector on the original CPU remains and is not immediately
reclaimed. Instead, apicd->move_in_progress is flagged, and the reclaiming
process is delayed until the next trigger of the interrupt on the new CPU.

Upon the subsequent triggering of the interrupt on the new CPU,
irq_complete_move() adds a task to the old CPU's vector_cleanup list if it
remains online. Subsequently, the timer on the old CPU iterates over its
vector_cleanup list, reclaiming old vectors.

However, a rare scenario arises if the old CPU is outgoing before the
interrupt triggers again on the new CPU.

In that case irq_force_complete_move() is not invoked on the outgoing CPU
to reclaim the old apicd->prev_vector because the interrupt isn't currently
affine to the outgoing CPU, and irq_needs_fixup() returns false. Even
though __vector_schedule_cleanup() is later called on the new CPU, it
doesn't reclaim apicd->prev_vector; instead, it simply resets both
apicd->move_in_progress and apicd->prev_vector to 0.

As a result, the vector remains unreclaimed in vector_matrix, leading to a
CPU vector leak.

To address this issue, move the invocation of irq_force_complete_move()
before the irq_needs_fixup() call to reclaim apicd->prev_vector, if the
interrupt is currently or used to be affine to the outgoing CPU.

Additionally, reclaim the vector in __vector_schedule_cleanup() as well,
following a warning message, although theoretically it should never see
apicd->move_in_progress with apicd->prev_cpu pointing to an offline CPU.

Fixes: f0383c24b4 ("genirq/cpuhotplug: Add support for cleaning up move in progress")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240522220218.162423-1-dongli.zhang@oracle.com
2024-05-23 21:51:50 +02:00
Linus Torvalds 66ad4829dd Quite smaller than usual. Notably it includes the fix for the unix
regression you have been notified of in the past weeks.
 The TCP window fix will require some follow-up, already queued.
 
 Current release - regressions:
 
   - af_unix: fix garbage collection of embryos
 
 Previous releases - regressions:
 
   - af_unix: fix race between GC and receive path
 
   - ipv6: sr: fix missing sk_buff release in seg6_input_core
 
   - tcp: remove 64 KByte limit for initial tp->rcv_wnd value
 
   - eth: r8169: fix rx hangup
 
   - eth: lan966x: remove ptp traps in case the ptp is not enabled.
 
   - eth: ixgbe: fix link breakage vs cisco switches.
 
   - eth: ice: prevent ethtool from corrupting the channels.
 
 Previous releases - always broken:
 
   - openvswitch: set the skbuff pkt_type for proper pmtud support.
 
   - tcp: Fix shift-out-of-bounds in dctcp_update_alpha().
 
 Misc:
 
   - a bunch of selftests stabilization patches.
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmZPXmUSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOk/o4QAJTA/LcQmHkObgQWyJ7vSykhRFmxSsfR
 Qc/DstWuNkM+xDbasdjlxaM+BPgf0RduyB/bsPOr8UvGw0S0NUwQBC9V9bgQ0p67
 D9qrZH6gEDRbzG+mkbF49SXksJMSdNSygWc4YnYaCW+eufpCaZwN15q+4pAgAWfW
 UmSra9wCkgl9nRc7N4+UEJbhhi0Lso/yaRlHUUUooHOP0ENDe3JSKidUyS3UuhYc
 Ah75gKIMm9BygUhg/+mrsRyeb1kfXMfJ54ku/uEIimErG4rTntCJCAc+dBoRXtob
 pImg4xfgr1OBL1wQKTHM+nvhE+DThLAJOSguX2RYvTvklx/l00tL1PQkA/kn6XNM
 HdQGnDoN1JpUs3xw90hxWp0gzOwJ1XCjbXT/Dx2kp+ltFj0A1EZViTNNTgh6y2E0
 B5oo8NFD0y02ilMdaGW/KOpceglO82p2P4DEc0kBAYvCICQ8MKMdtThuubQeB0FK
 EO7Xs7lKbDXLJUDtmN4EiE1sofvLVD+1htGt5FG2jtizyQ5Ho/b2aTk2uq0kRN3F
 mZgaXcNR3sOJGBdaTvzquALZ2Dt69w0D3EHGv/30tD5zwQO8j71W5OoWTnjknWUp
 Nh7ytL/YlqvwJI47UuuTeDBh95jb/KpTWFv8EYsQLI0JOTfa1VXsoDxidg6rnHuX
 mvLdIOtzTZqU
 =zd2T
 -----END PGP SIGNATURE-----

Merge tag 'net-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Quite smaller than usual. Notably it includes the fix for the unix
  regression from the past weeks. The TCP window fix will require some
  follow-up, already queued.

  Current release - regressions:

   - af_unix: fix garbage collection of embryos

  Previous releases - regressions:

   - af_unix: fix race between GC and receive path

   - ipv6: sr: fix missing sk_buff release in seg6_input_core

   - tcp: remove 64 KByte limit for initial tp->rcv_wnd value

   - eth: r8169: fix rx hangup

   - eth: lan966x: remove ptp traps in case the ptp is not enabled

   - eth: ixgbe: fix link breakage vs cisco switches

   - eth: ice: prevent ethtool from corrupting the channels

  Previous releases - always broken:

   - openvswitch: set the skbuff pkt_type for proper pmtud support

   - tcp: Fix shift-out-of-bounds in dctcp_update_alpha()

  Misc:

   - a bunch of selftests stabilization patches"

* tag 'net-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (25 commits)
  r8169: Fix possible ring buffer corruption on fragmented Tx packets.
  idpf: Interpret .set_channels() input differently
  ice: Interpret .set_channels() input differently
  nfc: nci: Fix handling of zero-length payload packets in nci_rx_work()
  net: relax socket state check at accept time.
  tcp: remove 64 KByte limit for initial tp->rcv_wnd value
  net: ti: icssg_prueth: Fix NULL pointer dereference in prueth_probe()
  tls: fix missing memory barrier in tls_init
  net: fec: avoid lock evasion when reading pps_enable
  Revert "ixgbe: Manual AN-37 for troublesome link partners for X550 SFI"
  testing: net-drv: use stats64 for testing
  net: mana: Fix the extra HZ in mana_hwc_send_request
  net: lan966x: Remove ptp traps in case the ptp is not enabled.
  openvswitch: Set the skbuff pkt_type for proper pmtud support.
  selftest: af_unix: Make SCM_RIGHTS into OOB data.
  af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS
  tcp: Fix shift-out-of-bounds in dctcp_update_alpha().
  selftests/net: use tc rule to filter the na packet
  ipv6: sr: fix memleak in seg6_hmac_init_algo
  af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock.
  ...
2024-05-23 12:49:37 -07:00