Commit graph

1279576 commits

Author SHA1 Message Date
Ryusuke Konishi 936184eadd nilfs2: fix unexpected freezing of nilfs_segctor_sync()
A potential and reproducible race issue has been identified where
nilfs_segctor_sync() would block even after the log writer thread writes a
checkpoint, unless there is an interrupt or other trigger to resume log
writing.

This turned out to be because, depending on the execution timing of the
log writer thread running in parallel, the log writer thread may skip
responding to nilfs_segctor_sync(), which causes a call to schedule()
waiting for completion within nilfs_segctor_sync() to lose the opportunity
to wake up.

The reason why waking up the task waiting in nilfs_segctor_sync() may be
skipped is that updating the request generation issued using a shared
sequence counter and adding an wait queue entry to the request wait queue
to the log writer, are not done atomically.  There is a possibility that
log writing and request completion notification by nilfs_segctor_wakeup()
may occur between the two operations, and in that case, the wait queue
entry is not yet visible to nilfs_segctor_wakeup() and the wake-up of
nilfs_segctor_sync() will be carried over until the next request occurs.

Fix this issue by performing these two operations simultaneously within
the lock section of sc_state_lock.  Also, following the memory barrier
guidelines for event waiting loops, move the call to set_current_state()
in the same location into the event waiting loop to ensure that a memory
barrier is inserted just before the event condition determination.

Link: https://lkml.kernel.org/r/20240520132621.4054-3-konishi.ryusuke@gmail.com
Fixes: 9ff05123e3 ("nilfs2: segment constructor")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: "Bai, Shuangpeng" <sjb7183@psu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:07 -07:00
Ryusuke Konishi f5d4e04634 nilfs2: fix use-after-free of timer for log writer thread
Patch series "nilfs2: fix log writer related issues".

This bug fix series covers three nilfs2 log writer-related issues,
including a timer use-after-free issue and potential deadlock issue on
unmount, and a potential freeze issue in event synchronization found
during their analysis.  Details are described in each commit log.


This patch (of 3):

A use-after-free issue has been reported regarding the timer sc_timer on
the nilfs_sc_info structure.

The problem is that even though it is used to wake up a sleeping log
writer thread, sc_timer is not shut down until the nilfs_sc_info structure
is about to be freed, and is used regardless of the thread's lifetime.

Fix this issue by limiting the use of sc_timer only while the log writer
thread is alive.

Link: https://lkml.kernel.org/r/20240520132621.4054-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240520132621.4054-2-konishi.ryusuke@gmail.com
Fixes: fdce895ea5 ("nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: "Bai, Shuangpeng" <sjb7183@psu.edu>
Closes: https://groups.google.com/g/syzkaller/c/MK_LYqtt8ko/m/8rgdWeseAwAJ
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:07 -07:00
Michael Ellerman 1901472fa8 selftests/mm: fix build warnings on ppc64
Fix warnings like:

  In file included from uffd-unit-tests.c:8:
  uffd-unit-tests.c: In function `uffd_poison_handle_fault':
  uffd-common.h:45:33: warning: format `%llu' expects argument of type
  `long long unsigned int', but argument 3 has type `__u64' {aka `long
  unsigned int'} [-Wformat=]

By switching to unsigned long long for u64 for ppc64 builds.

Link: https://lkml.kernel.org/r/20240521030219.57439-1-mpe@ellerman.id.au
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:06 -07:00
Will Deacon b1480ed230 arm64: patching: fix handling of execmem addresses
Klara Modin reported warnings for a kernel configured with BPF_JIT but
without MODULES:

[   44.131296] Trying to vfree() bad address (000000004a17c299)
[   44.138024] WARNING: CPU: 1 PID: 193 at mm/vmalloc.c:3189 remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[   44.146675] CPU: 1 PID: 193 Comm: kworker/1:2 Tainted: G      D W          6.9.0-01786-g2c9e5d4a0082 #25
[   44.158229] Hardware name: Raspberry Pi 3 Model B (DT)
[   44.164433] Workqueue: events bpf_prog_free_deferred
[   44.170492] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   44.178601] pc : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[   44.183705] lr : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[   44.188772] sp : ffff800082a13c70
[   44.193112] x29: ffff800082a13c70 x28: 0000000000000000 x27: 0000000000000000
[   44.201384] x26: 0000000000000000 x25: ffff00003a44efa0 x24: 00000000d4202000
[   44.209658] x23: ffff800081223dd0 x22: ffff00003a198a40 x21: ffff8000814dd880
[   44.217924] x20: 00000000d4202000 x19: ffff8000814dd880 x18: 0000000000000006
[   44.226206] x17: 0000000000000000 x16: 0000000000000020 x15: 0000000000000002
[   44.234460] x14: ffff8000811a6370 x13: 0000000020000000 x12: 0000000000000000
[   44.242710] x11: ffff8000811a6370 x10: 0000000000000144 x9 : ffff8000811fe370
[   44.250959] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000811fe370
[   44.259206] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[   44.267457] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000002203240
[   44.275703] Call trace:
[   44.279158] remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
[   44.283858] vfree (mm/vmalloc.c:3322)
[   44.287835] execmem_free (mm/execmem.c:70)
[   44.292347] bpf_jit_free_exec+0x10/0x1c
[   44.297283] bpf_prog_pack_free (kernel/bpf/core.c:1006)
[   44.302457] bpf_jit_binary_pack_free (kernel/bpf/core.c:1195)
[   44.307951] bpf_jit_free (include/linux/filter.h:1083 arch/arm64/net/bpf_jit_comp.c:2474)
[   44.312342] bpf_prog_free_deferred (kernel/bpf/core.c:2785)
[   44.317785] process_one_work (kernel/workqueue.c:3273)
[   44.322684] worker_thread (kernel/workqueue.c:3342 (discriminator 2) kernel/workqueue.c:3429 (discriminator 2))
[   44.327292] kthread (kernel/kthread.c:388)
[   44.331342] ret_from_fork (arch/arm64/kernel/entry.S:861)

The problem is because bpf_arch_text_copy() silently fails to write to the
read-only area as a result of patch_map() faulting and the resulting
-EFAULT being chucked away.

Update patch_map() to use CONFIG_EXECMEM instead of
CONFIG_STRICT_MODULE_RWX to check for vmalloc addresses.

Link: https://lkml.kernel.org/r/20240521213813.703309-1-rppt@kernel.org
Fixes: 2c9e5d4a00 ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of")
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reported-by: Klara Modin <klarasmodin@gmail.com>
Closes: https://lore.kernel.org/all/7983fbbf-0127-457c-9394-8d6e4299c685@gmail.com
Tested-by: Klara Modin <klarasmodin@gmail.com>
Cc: Björn Töpel <bjorn@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:06 -07:00
Dev Jain fb9293b6b0 selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
Reset nr_hugepages to zero before the start of the test.

If a non-zero number of hugepages is already set before the start of the
test, the following problems arise:

 - The probability of the test getting OOM-killed increases.  Proof:
   The test wants to run on 80% of available memory to prevent OOM-killing
   (see original code comments).  Let the value of mem_free at the start
   of the test, when nr_hugepages = 0, be x.  In the other case, when
   nr_hugepages > 0, let the memory consumed by hugepages be y.  In the
   former case, the test operates on 0.8 * x of memory.  In the latter,
   the test operates on 0.8 * (x - y) of memory, with y already filled,
   hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 *
   x.  Q.E.D

 - The probability of a bogus test success increases.  Proof: Let the
   memory consumed by hugepages be greater than 25% of x, with x and y
   defined as above.  The definition of compaction_index is c_index = (x -
   y)/z where z is the memory consumed by hugepages after trying to
   increase them again.  In check_compaction(), we set the number of
   hugepages to zero, and then increase them back; the probability that
   they will be set back to consume at least y amount of memory again is
   very high (since there is not much delay between the two attempts of
   changing nr_hugepages).  Hence, z >= y > (x/4) (by the 25% assumption).
   Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3
   hence, c_index can always be forced to be less than 3, thereby the test
   succeeding always.  Q.E.D

Link: https://lkml.kernel.org/r/20240521074358.675031-4-dev.jain@arm.com
Fixes: bd67d5c15c ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain@arm.com>
Cc: <stable@vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sri Jayaramappa <sjayaram@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:06 -07:00
Dev Jain 9ad665ef55 selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
Currently, the test tries to set nr_hugepages to zero, but that is not
actually done because the file offset is not reset after read().  Fix that
using lseek().

Link: https://lkml.kernel.org/r/20240521074358.675031-3-dev.jain@arm.com
Fixes: bd67d5c15c ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain@arm.com>
Cc: <stable@vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sri Jayaramappa <sjayaram@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:06 -07:00
Dev Jain d4202e66a4 selftests/mm: compaction_test: fix bogus test success on Aarch64
Patch series "Fixes for compaction_test", v2.

The compaction_test memory selftest introduces fragmentation in memory
and then tries to allocate as many hugepages as possible. This series
addresses some problems.

On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since
compaction_index becomes 0, which is less than 3, due to no division by
zero exception being raised. We fix that by checking for division by
zero.

Secondly, correctly set the number of hugepages to zero before trying
to set a large number of them.

Now, consider a situation in which, at the start of the test, a non-zero
number of hugepages have been already set (while running the entire
selftests/mm suite, or manually by the admin). The test operates on 80%
of memory to avoid OOM-killer invocation, and because some memory is
already blocked by hugepages, it would increase the chance of OOM-killing.
Also, since mem_free used in check_compaction() is the value before we
set nr_hugepages to zero, the chance that the compaction_index will
be small is very high if the preset nr_hugepages was high, leading to a
bogus test success.


This patch (of 3):

Currently, if at runtime we are not able to allocate a huge page, the test
will trivially pass on Aarch64 due to no exception being raised on
division by zero while computing compaction_index.  Fix that by checking
for nr_hugepages == 0.  Anyways, in general, avoid a division by zero by
exiting the program beforehand.  While at it, fix a typo, and handle the
case where the number of hugepages may overflow an integer.

Link: https://lkml.kernel.org/r/20240521074358.675031-1-dev.jain@arm.com
Link: https://lkml.kernel.org/r/20240521074358.675031-2-dev.jain@arm.com
Fixes: bd67d5c15c ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sri Jayaramappa <sjayaram@akamai.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:05 -07:00
Satya Priya Kakitapalli c17d39f565 mailmap: update email address for Satya Priya
Update mailmap with my latest email ID, quic_c_skakit@quicinc.com
is no longer active.

Link: https://lkml.kernel.org/r/20240515-mailmap-update-v1-1-df4853f757a3@quicinc.com
Signed-off-by: Satya Priya Kakitapalli <quic_skakitap@quicinc.com>
Cc: Ajit Pandey <quic_ajipan@quicinc.com>
Cc: Bjorn Andersson <andersson@kernel.org>
Cc: Imran Shaik <quic_imrashai@quicinc.com>
Cc: Jagadeesh Kona <quic_jkona@quicinc.com>
Cc: Konrad Dybcio <konrad.dybcio@linaro.org>
Cc: Taniya Das <quic_tdas@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:05 -07:00
Miaohe Lin fe6f86f4b4 mm/huge_memory: don't unpoison huge_zero_folio
When I did memory failure tests recently, below panic occurs:

 kernel BUG at include/linux/mm.h:1135!
 invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
 CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14
 RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
 RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
 RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
 RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
 R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
 FS:  0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
 Call Trace:
  <TASK>
  do_shrink_slab+0x14f/0x6a0
  shrink_slab+0xca/0x8c0
  shrink_node+0x2d0/0x7d0
  balance_pgdat+0x33a/0x720
  kswapd+0x1f3/0x410
  kthread+0xd5/0x100
  ret_from_fork+0x2f/0x50
  ret_from_fork_asm+0x1a/0x30
  </TASK>
 Modules linked in: mce_inject hwpoison_inject
 ---[ end trace 0000000000000000 ]---
 RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
 RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
 RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
 RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
 R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
 FS:  0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0

The root cause is that HWPoison flag will be set for huge_zero_folio
without increasing the folio refcnt.  But then unpoison_memory() will
decrease the folio refcnt unexpectedly as it appears like a successfully
hwpoisoned folio leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0) when
releasing huge_zero_folio.

Skip unpoisoning huge_zero_folio in unpoison_memory() to fix this issue. 
We're not prepared to unpoison huge_zero_folio yet.

Link: https://lkml.kernel.org/r/20240516122608.22610-1-linmiaohe@huawei.com
Fixes: 478d134e95 ("mm/huge_memory: do not overkill when splitting huge_zero_page")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Xu Yu <xuyu@linux.alibaba.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:05 -07:00
Andrey Konovalov 2e577732e8 kasan, fortify: properly rename memintrinsics
After commit 69d4c0d321 ("entry, kasan, x86: Disallow overriding mem*()
functions") and the follow-up fixes, with CONFIG_FORTIFY_SOURCE enabled,
even though the compiler instruments meminstrinsics by generating calls to
__asan/__hwasan_ prefixed functions, FORTIFY_SOURCE still uses
uninstrumented memset/memmove/memcpy as the underlying functions.

As a result, KASAN cannot detect bad accesses in memset/memmove/memcpy. 
This also makes KASAN tests corrupt kernel memory and cause crashes.

To fix this, use __asan_/__hwasan_memset/memmove/memcpy as the underlying
functions whenever appropriate.  Do this only for the instrumented code
(as indicated by __SANITIZE_ADDRESS__).

Link: https://lkml.kernel.org/r/20240517130118.759301-1-andrey.konovalov@linux.dev
Fixes: 69d4c0d321 ("entry, kasan, x86: Disallow overriding mem*() functions")
Fixes: 51287dcb00 ("kasan: emit different calls for instrumentable memintrinsics")
Fixes: 36be5cba99 ("kasan: treat meminstrinsic as builtins in uninstrumented files")
Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com>
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Reported-by: Nico Pache <npache@redhat.com>
Closes: https://lore.kernel.org/all/20240501144156.17e65021@outsider.home/
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Nico Pache <npache@redhat.com>
Acked-by: Nico Pache <npache@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:05 -07:00
Suren Baghdasaryan a38568a0b4 lib: add version into /proc/allocinfo output
Add version string and a header at the beginning of /proc/allocinfo to
allow later format changes.  Example output:

> head /proc/allocinfo
allocinfo - version: 1.0
#     <size>  <calls> <tag info>
           0        0 init/main.c:1314 func:do_initcalls
           0        0 init/do_mounts.c:353 func:mount_nodev_root
           0        0 init/do_mounts.c:187 func:mount_root_generic
           0        0 init/do_mounts.c:158 func:do_mount_root
           0        0 init/initramfs.c:493 func:unpack_to_rootfs
           0        0 init/initramfs.c:492 func:unpack_to_rootfs
           0        0 init/initramfs.c:491 func:unpack_to_rootfs
         512        1 arch/x86/events/rapl.c:681 func:init_rapl_pmus
         128        1 arch/x86/events/rapl.c:571 func:rapl_cpu_online

[akpm@linux-foundation.org: remove stray newline from struct allocinfo_private]
Link: https://lkml.kernel.org/r/20240514163128.3662251-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:05 -07:00
Hailong.Liu 8e0545c83d mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
commit a421ef3030 ("mm: allow !GFP_KERNEL allocations for kvmalloc")
includes support for __GFP_NOFAIL, but it presents a conflict with commit
dd544141b9 ("vmalloc: back off when the current task is OOM-killed").  A
possible scenario is as follows:

process-a
__vmalloc_node_range(GFP_KERNEL | __GFP_NOFAIL)
    __vmalloc_area_node()
        vm_area_alloc_pages()
		--> oom-killer send SIGKILL to process-a
        if (fatal_signal_pending(current)) break;
--> return NULL;

To fix this, do not check fatal_signal_pending() in vm_area_alloc_pages()
if __GFP_NOFAIL set.

This issue occurred during OPLUS KASAN TEST. Below is part of the log
-> oom-killer sends signal to process
[65731.222840] [ T1308] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/apps/uid_10198,task=gs.intelligence,pid=32454,uid=10198

[65731.259685] [T32454] Call trace:
[65731.259698] [T32454]  dump_backtrace+0xf4/0x118
[65731.259734] [T32454]  show_stack+0x18/0x24
[65731.259756] [T32454]  dump_stack_lvl+0x60/0x7c
[65731.259781] [T32454]  dump_stack+0x18/0x38
[65731.259800] [T32454]  mrdump_common_die+0x250/0x39c [mrdump]
[65731.259936] [T32454]  ipanic_die+0x20/0x34 [mrdump]
[65731.260019] [T32454]  atomic_notifier_call_chain+0xb4/0xfc
[65731.260047] [T32454]  notify_die+0x114/0x198
[65731.260073] [T32454]  die+0xf4/0x5b4
[65731.260098] [T32454]  die_kernel_fault+0x80/0x98
[65731.260124] [T32454]  __do_kernel_fault+0x160/0x2a8
[65731.260146] [T32454]  do_bad_area+0x68/0x148
[65731.260174] [T32454]  do_mem_abort+0x151c/0x1b34
[65731.260204] [T32454]  el1_abort+0x3c/0x5c
[65731.260227] [T32454]  el1h_64_sync_handler+0x54/0x90
[65731.260248] [T32454]  el1h_64_sync+0x68/0x6c

[65731.260269] [T32454]  z_erofs_decompress_queue+0x7f0/0x2258
--> be->decompressed_pages = kvcalloc(be->nr_pages, sizeof(struct page *), GFP_KERNEL | __GFP_NOFAIL);
	kernel panic by NULL pointer dereference.
	erofs assume kvmalloc with __GFP_NOFAIL never return NULL.
[65731.260293] [T32454]  z_erofs_runqueue+0xf30/0x104c
[65731.260314] [T32454]  z_erofs_readahead+0x4f0/0x968
[65731.260339] [T32454]  read_pages+0x170/0xadc
[65731.260364] [T32454]  page_cache_ra_unbounded+0x874/0xf30
[65731.260388] [T32454]  page_cache_ra_order+0x24c/0x714
[65731.260411] [T32454]  filemap_fault+0xbf0/0x1a74
[65731.260437] [T32454]  __do_fault+0xd0/0x33c
[65731.260462] [T32454]  handle_mm_fault+0xf74/0x3fe0
[65731.260486] [T32454]  do_mem_abort+0x54c/0x1b34
[65731.260509] [T32454]  el0_da+0x44/0x94
[65731.260531] [T32454]  el0t_64_sync_handler+0x98/0xb4
[65731.260553] [T32454]  el0t_64_sync+0x198/0x19c

Link: https://lkml.kernel.org/r/20240510100131.1865-1-hailong.liu@oppo.com
Fixes: 9376130c39 ("mm/vmalloc: add support for __GFP_NOFAIL")
Signed-off-by: Hailong.Liu <hailong.liu@oppo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Barry Song <21cnbao@gmail.com>
Reported-by: Oven <liyangouwen1@oppo.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:04 -07:00
Linus Torvalds f1f9984fdc RISC-V Patches for the 6.10 Merge Window, Part 2
* The compression format used for boot images is now configurable at
   build time, and these formats are shown in `make help`.
 * access_ok() has been optimized.
 * A pair of performance bugs have been fixed in the uaccess handlers.
 * Various fixes and cleanups, including one for the IMSIC build failure
   and one for the early-boot ftrace illegal NOPs bug.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmZQtRwTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiWPBD/0UitwMg88m6urvMd0Pfvwwbu/OnGqW
 TZT8C55iJi/e5f9K4mBrSyjATI8z/MblD+Zz0adX8ygavS4JuQ7DoWwb1yTT3pww
 +z74FkWeJuiar+HfbhQ602CfMrnzvWjnyJ3URemqy5pIBKyvD9gGkDJDZwf8hJTk
 Vqh5qVtnBqFBO9kWpIx+/pLCfpyHVNkhWr1AzKfoqQ1WPIpZ/o0IGdvS88rL+EBR
 QOXiwVhEsRfC+LT6Jhn8l2bGp7PaSRVOid19OxNsJKpAhpL6AOscaafclVrLBuTd
 gkys0rT2dHdoWTAkPHQpvlOI6OmGTgopxo5pUKJHS8J9VRoBun25zC1FGBF8uyVd
 05CabWPnh7olNsRge9XiNj3x8PXjGVi7X7wUbRgOBG5aDc6TbKdxu37J0tXe0M7a
 Q74ctQvk8Nk6bQWirgTNlfJJHzL5pJbKc9VwY5uGX4qTmH+yEvCIt45ZXgXOuS/F
 eqijStkkdXUDnkMdcpaZJvXP80rHcgfP8bqevvPymRli8ER9zj9aXJQ3rmCUcPz+
 EtbyS+vOEN31wNTA1EQlfIRxfvr22x7r70DDdRwmhuD1W1tgfblm+R0Cq76I5rnJ
 VSgXKq1b4mY0eautqXEnPGyqb7H8iJIq7AoyfbzzWN+4u6yVEUvpDKueeksy+fFt
 sGNtjWqGhWyKXg==
 =/Qtt
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull more RISC-V updates from Palmer Dabbelt:

 - The compression format used for boot images is now configurable at
   build time, and these formats are shown in `make help`

 - access_ok() has been optimized

 - A pair of performance bugs have been fixed in the uaccess handlers

 - Various fixes and cleanups, including one for the IMSIC build failure
   and one for the early-boot ftrace illegal NOPs bug

* tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fix early ftrace nop patching
  irqchip: riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
  riscv: selftests: Add signal handling vector tests
  riscv: mm: accelerate pagefault when badaccess
  riscv: uaccess: Relax the threshold for fast path
  riscv: uaccess: Allow the last potential unrolled copy
  riscv: typo in comment for get_f64_reg
  Use bool value in set_cpu_online()
  riscv: selftests: Add hwprobe binaries to .gitignore
  riscv: stacktrace: fixed walk_stackframe()
  ftrace: riscv: move from REGS to ARGS
  riscv: do not select MODULE_SECTIONS by default
  riscv: show help string for riscv-specific targets
  riscv: make image compression configurable
  riscv: cpufeature: Fix extension subset checking
  riscv: cpufeature: Fix thead vector hwcap removal
  riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context
  riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled
  riscv: Define TASK_SIZE_MAX for __access_ok()
  riscv: Remove PGDIR_SIZE_L3 and TASK_SIZE_MIN
2024-05-24 10:46:35 -07:00
Linus Torvalds 9351f138d1 xen: branch for v6.10-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZlCW5wAKCRCAXGG7T9hj
 vmgfAPwMj6Pf6faPJ8Db4cUkeJqxT60RCjOoCLoiJ5MYtrxIBgEAqFv3JOHaoDCH
 nogrS10fldxUTtxtx8DciFtzZ59jJws=
 =LXuw
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.10a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - a small cleanup in the drivers/xen/xenbus Makefile

 - a fix of the Xen xenstore driver to improve connecting to a late
   started Xenstore

 - an enhancement for better support of ballooning in PVH guests

 - a cleanup using try_cmpxchg() instead of open coding it

* tag 'for-linus-6.10a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  drivers/xen: Improve the late XenStore init protocol
  xen/xenbus: Use *-y instead of *-objs in Makefile
  xen/x86: add extra pages to unpopulated-alloc if available
  locking/x86/xen: Use try_cmpxchg() in xen_alloc_p2m_entry()
2024-05-24 10:24:49 -07:00
Linus Torvalds 02c438bbff for-6.10-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmZQjzoACgkQxWXV+ddt
 WDsFaw/+O6lH+rPLhvUoqtnrydC6QLnEW5Qj5EURDt3HkROOsXHszdNGKdsETZ2i
 /s4dDiCRwLv7PP/bWlFfQbHzckoBHI9I/1GxHKQM3OM27BpXvILacXSMJ13zw4vq
 DRQIUdTwfUkegEytZb0ddv6+++R1YyU6nE6LfiF2Pf4XJMQ2WXPRNu6bAa27xUia
 4ITHB6m92zynhATJk0/RpfCU64HWwj919WnJDmoVOJ7Nr8Pslz4jKm7HS1qiehNd
 EbhduQPhj7UvWiL4C9/iFFndgzm1tX1WNlJDu5c0KqwYIHq2+BmDv3Cqhkazkdvu
 veU0wO62bZzV42vmTvQXzyXeXjNXRyLOvK6uHXv0VCO8VVsl2/WnYTRWmH44ECar
 z4tByfBKA7nIL2e23ztkyqnhygDf8Y1/Dy+GfprR6JPhyYGHJDqLcB3Gyw9y/AXO
 b/2MoAEgET9QPM/0HLqdonDJ75D2PF0qmwp1ys79w/BGH0BUoxZs/POL2UT87EJO
 rO5kW0/nZy99sbWFfZRwDUxTj1IlDqdudaHPOdJs/tUb3wPseLm5abQEyk+Dns6K
 3y7OviNVQy0x325JY9RmdfnJv60KHvv5pqws1Nkuhqk1LH8csL6MsYlcybhR+vOk
 G9qkNxg35aNqjNlBi7RacMT8OgwVbVhik8jVr+MfXk30grIevzU=
 =XR4r
 -----END PGP SIGNATURE-----

Merge tag 'for-6.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull more btrfs updates from David Sterba:
 "A few more updates, mostly stability fixes or user visible changes:

   - fix race in zoned mode during device replace that can lead to
     use-after-free

   - update return codes and lower message levels for quota rescan where
     it's causing false alerts

   - fix unexpected qgroup id reuse under some conditions

   - fix condition when looking up extent refs

   - add option norecovery (removed in 6.8), the intended replacements
     haven't been used and some aplications still rely on the old one

   - build warning fixes"

* tag 'for-6.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: re-introduce 'norecovery' mount option
  btrfs: fix end of tree detection when searching for data extent ref
  btrfs: scrub: initialize ret in scrub_simple_mirror() to fix compilation warning
  btrfs: zoned: fix use-after-free due to race with dev replace
  btrfs: qgroup: fix qgroup id collision across mounts
  btrfs: qgroup: update rescan message levels and error codes
2024-05-24 09:40:31 -07:00
Linus Torvalds dcb9f48667 Changes since last update:
- Convert metadata APIs to byte offsets;
 
  - Avoid allocating DEFLATE streams unnecessarily;
 
  - Some erofs_show_options() cleanup.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmZQmHARHHhpYW5nQGtl
 cm5lbC5vcmcACgkQUXZn5Zlu5qrGnhAAnvOifMYekIgY/W0PSGSe85XtXps5vBjo
 rixZ/vNAl8NrLgzHY5lX+4dbENywEULzdxYAgF4VN9eKNGyuZ4oCBmYStoGueQ41
 N1oq36O/CVJDCOLkFUwjD6GpHngjJR3xiU8DRrhKdPZJeYXVEJwZB4KOOymorkO0
 Xn9SPrF/GC4YDWJL901RKT8p6gyRNWiWJ/+hwDAxfmCSuzW2uRNnBLeXNvjqj4Z3
 u5WEaFSlNRlLWnZPcHy8O3t/XAPkhvTN+C5+YeaePWyHc5WYOM9mWt8VLOFQb60K
 l+q/cnWXw+8NNbxnuccWVJfEb6zUJmZ5/yTm+Ndutrpk5dFSPb6DjZo5/K36dGls
 r02XysW+Jl24wBIFkYRHild2WT+gSqo/zyIDsSt/DF+DhpqmnIqAASx4yJenw7ib
 BNV4m4gQflLrORKpVmsKyHrm5GuHsTWsGc51iX1uqsdfDgN79mFgR1taBAZw162P
 pPeWuD6XYE+eT+t5nggnXqmZ5qatEhTFkYDjUzSq4ZQfyZnRG8Tl6zbBuyVhaxsO
 zH1rAmwtI6x+ehHI46Kurh8HT6UrB0CNM6RokYKr6JWVzIdFPPMVKkxcq2KozTPf
 CBu+Whh/WGFROM8JT2KGCnuz2ZBUZXDtNBJmW+ZnA+z9b7xZ1f31nio4vKKdZU+R
 swpnV+0q9cs=
 =qDDl
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull more erofs updates from Gao Xiang:
 "The main ones are metadata API conversion to byte offsets by Al Viro.

  Another patch gets rid of unnecessary memory allocation out of DEFLATE
  decompressor. The remaining one is a trivial cleanup.

   - Convert metadata APIs to byte offsets

   - Avoid allocating DEFLATE streams unnecessarily

   - Some erofs_show_options() cleanup"

* tag 'erofs-for-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: avoid allocating DEFLATE streams before mounting
  z_erofs_pcluster_begin(): don't bother with rounding position down
  erofs: don't round offset down for erofs_read_metabuf()
  erofs: don't align offset for erofs_read_metabuf() (simple cases)
  erofs: mechanically convert erofs_read_metabuf() to offsets
  erofs: clean up erofs_show_options()
2024-05-24 09:31:50 -07:00
Linus Torvalds c40b1994b9 bcachefs fixes for 6.10-rc1
Just a few syzbot fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmZQk0cACgkQE6szbY3K
 bna7gA/+MSY3I95CwaJ4bBq5SCxOaRcrX099LFh8Zrj+OF+DWE2PtVo1LhhgnYrQ
 KpZrS2Q9Qgb2yVqYzOY6LBfH4il1O/WwvloMG0MbuYiQFu9/JL/6CEK9uFyiGmaC
 fdiFEN3u+8AK6phTFaqUU2ncG0XFQ1Ple5zmFXo4Y3ZJeNaubJeEDac+kbRvOwYh
 rQ6Iy0FNoQymv0BzmuM7g2NsbhdAgHTv7rhGbfpNBZv3lu0yDXsfZZgWTr2oXMSP
 FMhm4bcTGAFp5hbwq9k56ND8oSFpamsH7SwS4bDlEe1CNOfMI1JjnrvSEuDrocAE
 1Jn2J2Gv9NXnEHKamVzzpUILG67buEtYzJyDQk51N4kulgThdpRzjm+11ylD5U0U
 wzIK1HXsKHtRdUiIhQGLCLW61FXM+0QBIk2eXhPq88jsM2zTL7iMbXR3P/nvgUDy
 8ia8g5Q+nKxcb223M8WmK0rBwlaNasE/hXiFT54ntt8bK5nmVJjPMxVXUmYth3hw
 7STkuT0k5jVsMG1NqLkg+wSupj1AuWbD2hIcas7GkxarEYAULbQcClHYGpMll3Tw
 +pJfLjAtBOkcE4TwWDLflVBhwWtdmPNhk51Q3iLVRp0Gm7t0rhE2vE6TjpsIFnrg
 rUAgaqQqQ2WXfsRaGa2wx0tRKoW+8Iigq13ndn1AZIrfEtQkYUs=
 =vuNC
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "Nothing exciting, just syzbot fixes (except for the one
  FMODE_CAN_ODIRECT patch).

  Looks like syzbot reports have slowed down; this is all catch up from
  two weeks of conferences.

  Next hardening project is using Thomas's error injection tooling to
  torture test repair"

* tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: Fix race path in bch2_inode_insert()
  bcachefs: Ensure we're RW before journalling
  bcachefs: Fix shutdown ordering
  bcachefs: Fix unsafety in bch2_dirent_name_bytes()
  bcachefs: Fix stack oob in __bch2_encrypt_bio()
  bcachefs: Fix btree_trans leak in bch2_readahead()
  bcachefs: Fix bogus verify_replicas_entry() assert
  bcachefs: Check for subvolues with bogus snapshot/inode fields
  bcachefs: bch2_checksum() returns 0 for unknown checksum type
  bcachefs: Fix bch2_alloc_ciphers()
  bcachefs: Add missing guard in bch2_snapshot_has_children()
  bcachefs: Fix missing parens in drop_locks_do()
  bcachefs: Improve bch2_assert_pos_locked()
  bcachefs: Fix shift overflows in replicas.c
  bcachefs: Fix shift overflow in btree_lost_data()
  bcachefs: Fix ref in trans_mark_dev_sbs() error path
  bcachefs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
  bcachefs: Fix rcu splat in check_fix_ptrs()
2024-05-24 09:07:22 -07:00
Linus Torvalds 9ea370f341 Input updates for v6.10-rc0
- a change to input core to trim amount of keys data in modalias string
   in case when a device declares too many keys and they do not fit in
   uevent buffer instead of reporting an error which results in uevent
   not being generated at all
 
 - support for Machenike G5 Pro Controller added to xpad driver
 
 - support for FocalTech FT5452 and FT8719 added to edt-ft5x06
 
 - support for new SPMI vibrator added to pm8xxx-vibrator driver
 
 - missing locking added to cyapa touchpad driver
 
 - removal of unused fields in various driver structures
 
 - explicit initialization of i2c_device_id::driver_data to 0 dropped
   from input drivers
 
 - other assorted fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQST2eWILY88ieB2DOtAj56VGEWXnAUCZk/rJQAKCRBAj56VGEWX
 nOFVAQD8lfavuaJwEc0k/P39hZGOnTh423Um5gqIj8FOMw/V3AEA3D9IdTFC32DA
 JphZ5YvneDAfqu76ZRnjQi2oyOikygo=
 =8zDF
 -----END PGP SIGNATURE-----

Merge tag 'input-for-v6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input updates from Dmitry Torokhov:

 - a change to input core to trim amount of keys data in modalias string
   in case when a device declares too many keys and they do not fit in
   uevent buffer instead of reporting an error which results in uevent
   not being generated at all

 - support for Machenike G5 Pro Controller added to xpad driver

 - support for FocalTech FT5452 and FT8719 added to edt-ft5x06

 - support for new SPMI vibrator added to pm8xxx-vibrator driver

 - missing locking added to cyapa touchpad driver

 - removal of unused fields in various driver structures

 - explicit initialization of i2c_device_id::driver_data to 0 dropped
   from input drivers

 - other assorted fixes and cleanups.

* tag 'input-for-v6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (24 commits)
  Input: edt-ft5x06 - add support for FocalTech FT5452 and FT8719
  dt-bindings: input: touchscreen: edt-ft5x06: Document FT5452 and FT8719 support
  Input: xpad - add support for Machenike G5 Pro Controller
  Input: try trimming too long modalias strings
  Input: drop explicit initialization of struct i2c_device_id::driver_data to 0
  Input: zet6223 - remove an unused field in struct zet6223_ts
  Input: chipone_icn8505 - remove an unused field in struct icn8505_data
  Input: cros_ec_keyb - remove an unused field in struct cros_ec_keyb
  Input: lpc32xx-keys - remove an unused field in struct lpc32xx_kscan_drv
  Input: matrix_keypad - remove an unused field in struct matrix_keypad
  Input: tca6416-keypad - remove unused struct tca6416_drv_data
  Input: tca6416-keypad - remove an unused field in struct tca6416_keypad_chip
  Input: da7280 - remove an unused field in struct da7280_haptic
  Input: ff-core - prefer struct_size over open coded arithmetic
  Input: cyapa - add missing input core locking to suspend/resume functions
  input: pm8xxx-vibrator: add new SPMI vibrator support
  dt-bindings: input: qcom,pm8xxx-vib: add new SPMI vibrator module
  input: pm8xxx-vibrator: refactor to support new SPMI vibrator
  Input: pm8xxx-vibrator - correct VIB_MAX_LEVELS calculation
  Input: sur40 - convert le16 to cpu before use
  ...
2024-05-24 09:01:21 -07:00
Linus Torvalds 041c9f71a4 sound fixes for 6.10-rc1
A collection of small fixes for 6.10-rc1.  Most of changes are
 various device-specific fixes and quirks, while there are a few small
 changes in ALSA core timer and module / built-in fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmZPUagOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE9LYg//QurZR7KBAvim5LcsVDLE5pFUjW0v3fz0+vKB
 /UpoK1EVxc9pqNXzKi8YDiRoKZY8J8krGHd5FV44qhZl2nVJ87hXbHU5b/i29QUu
 4xKC1pMmF0ncJ8qMGhzTynyxw0Hr7soCcxz+4ApDzN/pyzc7QTPEaUB1ND7jTB2z
 bcYgXyFprJQ1RmsV9u2mGhNEv3tYRaQO1GNxr9ktO/I13CCKd7LkGUSxo5UfOFwC
 bIrpqG35MDzeVrxEfB1UHlyKhULf9fmpUW0OYYS/DMQFptRa+PXEOgzN81wrhNwL
 sp2k41x4uRtKrB1DFCeweis4m0OHbV0yakkV/3PTdONzJk4PxWoPuGP4uZyoNz3B
 FwexeSpZICpgGHeS4WuS0RW3SbQ9n/3d33nzpCYrojyxqCuc6UXGPyiq6QHUVtXZ
 LnOPyeJRIhS52wpELByJmcnf9ev4ImLSnGWUzz/Mf5dFZCVSXKWVvgQ+UcWbZZnj
 vTp0mTMQUjuVhE0KuRawMx2YSUU7nuRBukFBihjIRSYJYvZETN7WNjMUA3UnpG1d
 uKXJaTEm2UqlZtsnKkXrWNIpj4EQjoZo0qgx4ZwSYicLgXDJ/WlHvltdo9fJpRh3
 u23957ye7wJ4JMikqyhd0Wzh/1UwOs4GTMWTcim6pKXwlkn8TwCB1F/OT/6xqlYZ
 gScnqBQ=
 =VeU/
 -----END PGP SIGNATURE-----

Merge tag 'sound-fix-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of small fixes for 6.10-rc1. Most of changes are various
  device-specific fixes and quirks, while there are a few small changes
  in ALSA core timer and module / built-in fixes"

* tag 'sound-fix-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek: fix mute/micmute LEDs don't work for ProBook 440/460 G11.
  ALSA: core: Enable proc module when CONFIG_MODULES=y
  ALSA: core: Fix NULL module pointer assignment at card init
  ALSA: hda/realtek: Enable headset mic of JP-IK LEAP W502 with ALC897
  ASoC: dt-bindings: stm32: Ensure compatible pattern matches whole string
  ASoC: tas2781: Fix wrong loading calibrated data sequence
  ASoC: tas2552: Add TX path for capturing AUDIO-OUT data
  ALSA: usb-audio: Fix for sampling rates support for Mbox3
  Documentation: sound: Fix trailing whitespaces
  ALSA: timer: Set lower bound of start tick time
  ASoC: codecs: ES8326: solve hp and button detect issue
  ASoC: rt5645: mic-in detection threshold modification
  ASoC: Intel: sof_sdw_rt_sdca_jack_common: Use name_prefix for `-sdca` detection
2024-05-24 08:48:51 -07:00
Linus Torvalds e292ead0c9 Char/Misc bugfix for 6.10-rc1
Here is one remaining bugfix for 6.10-rc1 that missed the 6.9-final
 merge window, and has been sitting in my tree and linux-next for quite a
 while now, but wasn't sent to you (my fault, travels...)
 
 It is a bugfix to resolve an error in the speakup code that could
 overflow a buffer.
 
 It has been in linux-next for a while with no reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZlA+4A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylbCQCg1yrG9xtO9Gg5CLBcV9gRmAEjKGIAn16Y8Hmm
 k9R7qGfSOhqPq/qt6Nxx
 =s69+
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.10-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc fix from Greg KH:
 "Here is one remaining bugfix for 6.10-rc1 that missed the 6.9-final
  merge window, and has been sitting in my tree and linux-next for quite
  a while now, but wasn't sent to you (my fault, travels...)

  It is a bugfix to resolve an error in the speakup code that could
  overflow a buffer.

  It has been in linux-next for a while with no reported problems"

* tag 'char-misc-6.10-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  speakup: Fix sizeof() vs ARRAY_SIZE() bug
2024-05-24 08:43:25 -07:00
Linus Torvalds f6d199c774 TTY/Serial fixes for 6.10-rc1
Here are some small TTY and Serial driver fixes that missed the
 6.9-final merge window, but have been in my tree for weeks (my fault,
 travel caused me to miss this.)
 
 These fixes include:
   - more n_gsm fixes for reported problems
   - 8520_mtk driver fix
   - 8250_bcm7271 driver fix
   - sc16is7xx driver fix
 
 All of these have been in linux-next for weeks without any reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZlBGKQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylOkwCfUOa00YQt3jJwBEC9bQUprW1z95MAoKW00V5g
 UJgQ7+1d+o4bT/ib5xpj
 =/O0m
 -----END PGP SIGNATURE-----

Merge tag 'tty-6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are some small TTY and Serial driver fixes that missed the
  6.9-final merge window, but have been in my tree for weeks (my fault,
  travel caused me to miss this)

  These fixes include:

   - more n_gsm fixes for reported problems

   - 8520_mtk driver fix

   - 8250_bcm7271 driver fix

   - sc16is7xx driver fix

  All of these have been in linux-next for weeks without any reported
  problems"

* tag 'tty-6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: sc16is7xx: fix bug in sc16is7xx_set_baud() when using prescaler
  serial: 8250_bcm7271: use default_mux_rate if possible
  serial: 8520_mtk: Set RTS on shutdown for Rx in-band wakeup
  tty: n_gsm: fix missing receive state reset after mode switch
  tty: n_gsm: fix possible out-of-bounds in gsm0_receive()
2024-05-24 08:38:28 -07:00
Linus Torvalds b0a9ba13ff hardening fixes for v6.10-rc1
- loadpin: Prevent SECURITY_LOADPIN_ENFORCE=y without module decompression
   (Stephen Boyd)
 
 - ubsan: Restore dependency on ARCH_HAS_UBSAN
 
 - kunit/fortify: Fix memcmp() test to be amplitude agnostic
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmZP0w0WHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJqYDEACWaY0Xjig6Izo+B+85IozTLf2R
 Wv3zlOjUhjbRn7enzhVBRRfU216nl/wp8s7pKhNYCEZ7gJ+04hYtZoLY6YV7jtZ0
 RAvpwc1dmUm7RZIBxjnzqiNTdttNBniPDE47goV0Yi9JVSDFY1Y/P5GwiAr0PO6W
 kt1+WBr2zADNpTZziH8MZou7jfK+y1bOZw8rUUFMODrMc0buuLGO2h+lZqASJXNs
 5NHPUOoJsZHvQxN/YSyE555VycpoyWiwMvA1XOz1NVKdr1eFP1heu88AnIRKOD7o
 cMz6W/yUZ+4dYr2yydDGNX+QvFmZuvPz0oXAlI7BAblpT0UU7xv0jaioAhIam87U
 WxVQSOgkLQBw6Ym79W66HplizCVfEl9aUAYDSK5UJlwdpNE/j16XLYDLKxDi0wUZ
 pjUy5CF0X7FFNyY7Kp5flqzKrQG31vfqZf/yWhtWu258x604LR6CTkO06IJDINx0
 UUrbehie3bGnbu5FS0oVKGH37Mq0aRn4Xk2aUZaFf1Vz/YtU4Wo3FbtyOyFZsdpl
 aCNyYzmNmfVijDQlLshy6HBACeLPV2DjIJ8pcC74abUV1FX6VOvIDsTy4ELkm9BF
 WZ8LNryo79lFsFMThhwfCDHubhXoaLjkl4rpOB5x+Ld0q+GgfIb5jMfF507YxrRj
 3KxJJKXzUKNf+JFnjg==
 =VTTF
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:

 - loadpin: Prevent SECURITY_LOADPIN_ENFORCE=y without module
   decompression (Stephen Boyd)

 - ubsan: Restore dependency on ARCH_HAS_UBSAN

 - kunit/fortify: Fix memcmp() test to be amplitude agnostic

* tag 'hardening-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  kunit/fortify: Fix memcmp() test to be amplitude agnostic
  ubsan: Restore dependency on ARCH_HAS_UBSAN
  loadpin: Prevent SECURITY_LOADPIN_ENFORCE=y without module decompression
2024-05-24 08:33:44 -07:00
Linus Torvalds 0eb03c7e8e tracefs/eventfs fixes and updates for v6.10:
Bug fixes:
 
 - The eventfs directories need to have unique inode numbers. Make sure that
   they do not get the default file inode number.
 
 - Update the inode uid and gid fields on remount.
   When a remount happens where a uid and/or gid is specified, all the tracefs
   files and directories should get the specified uid and/or gid. But this
   can be sporadic when some uids were assigned already. There's already
   a list of inodes that are allocated. Just update their uid and gid fields
   at the time of remount.
 
 - Update the eventfs_inodes on remount from the top level "events" descriptor.
   There was a bug where not all the eventfs files or directories where
   getting updated on remount. One fix was to clear the SAVED_UID/GID
   flags from the inode list during the iteration of the inodes during
   the remount. But because the eventfs inodes can be freed when the last
   referenced is released, not all the eventfs_inodes were being updated.
   This lead to the ownership selftest to fail if it was run a second
   time (the first time would leave eventfs_inodes with no corresponding
   tracefs_inode).
 
   Instead, for eventfs_inodes, only process the "events" eventfs_inode
   from the list iteration, as it is guaranteed to have a tracefs_inode
   (it's never freed while the "events" directory exists). As it has
   a list of its children, and the children have a list of their children,
   just iterate all the eventfs_inodes from the "events" descriptor and
   it is guaranteed to get all of them.
 
 - Clear the EVENT_INODE flag from the tracefs_drop_inode() callback.
   Currently the EVENTFS_INODE FLAG is cleared in the tracefs_d_iput()
   callback. But this is the wrong location. The iput() callback is
   called when the last reference to the dentry inode is hit. There could
   be a case where two dentry's have the same inode, and the flag will
   be cleared prematurely. The flag needs to be cleared when the last
   reference of the inode is dropped and that happens in the inode's
   drop_inode() callback handler.
 
 Clean ups:
 
 - Consolidate the creation of a tracefs_inode for an eventfs_inode
   A tracefs_inode is created for both files and directories of the
   eventfs system. It is open coded. Instead, consolidate it into a
   single eventfs_get_inode() function call.
 
 - Remove the eventfs getattr and permission callbacks.
   The permissions for the eventfs files and directories are updated
   when the inodes are created, on remount, and when the user sets
   them (via setattr). The inodes hold the current permissions so
   there is no need to have custom getattr or permissions callbacks
   as they will more likely cause them to be incorrect. The inode's
   permissions are updated when they should be updated. Remove the
   getattr and permissions inode callbacks.
 
 - Do not update eventfs_inode attributes on creation of inodes.
   The eventfs_inodes attribute field is used to store the permissions
   of the directories and files for when their corresponding inodes
   are freed and are created again. But when the creation of the inodes
   happen, the eventfs_inode attributes are recalculated. The
   recalculation should only happen when the permissions change for
   a given file or directory. Currently, the attribute changes are
   just being set to their current files so this is not a bug, but
   it's unnecessary and error prone. Stop doing that.
 
 - The events directory inode is created once when the events directory
   is created and deleted when it is deleted. It is now updated on
   remount and when the user changes the permissions. There's no need
   to use the eventfs_inode of the events directory to store the
   events directory permissions. But using it to store the default
   permissions for the files within the directory that have not been
   updated by the user can simplify the code.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZk+0ohQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qtWOAQCSdEsWYiNcFBqvKp1kSI+dH1sKfur3
 CAoe1trzDEdv/gEAsFkophR9OBzO193in4ZQYNKdEDfeaicEaDctzLxlkwY=
 =9zqq
 -----END PGP SIGNATURE-----

Merge tag 'trace-tracefs-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracefs/eventfs updates from Steven Rostedt:
 "Bug fixes:

   - The eventfs directories need to have unique inode numbers. Make
     sure that they do not get the default file inode number.

   - Update the inode uid and gid fields on remount.

     When a remount happens where a uid and/or gid is specified, all the
     tracefs files and directories should get the specified uid and/or
     gid. But this can be sporadic when some uids were assigned already.
     There's already a list of inodes that are allocated. Just update
     their uid and gid fields at the time of remount.

   - Update the eventfs_inodes on remount from the top level "events"
     descriptor.

     There was a bug where not all the eventfs files or directories
     where getting updated on remount. One fix was to clear the
     SAVED_UID/GID flags from the inode list during the iteration of the
     inodes during the remount. But because the eventfs inodes can be
     freed when the last referenced is released, not all the
     eventfs_inodes were being updated. This lead to the ownership
     selftest to fail if it was run a second time (the first time would
     leave eventfs_inodes with no corresponding tracefs_inode).

     Instead, for eventfs_inodes, only process the "events"
     eventfs_inode from the list iteration, as it is guaranteed to have
     a tracefs_inode (it's never freed while the "events" directory
     exists). As it has a list of its children, and the children have a
     list of their children, just iterate all the eventfs_inodes from
     the "events" descriptor and it is guaranteed to get all of them.

   - Clear the EVENT_INODE flag from the tracefs_drop_inode() callback.

     Currently the EVENTFS_INODE FLAG is cleared in the tracefs_d_iput()
     callback. But this is the wrong location. The iput() callback is
     called when the last reference to the dentry inode is hit. There
     could be a case where two dentry's have the same inode, and the
     flag will be cleared prematurely. The flag needs to be cleared when
     the last reference of the inode is dropped and that happens in the
     inode's drop_inode() callback handler.

  Cleanups:

   - Consolidate the creation of a tracefs_inode for an eventfs_inode

     A tracefs_inode is created for both files and directories of the
     eventfs system. It is open coded. Instead, consolidate it into a
     single eventfs_get_inode() function call.

   - Remove the eventfs getattr and permission callbacks.

     The permissions for the eventfs files and directories are updated
     when the inodes are created, on remount, and when the user sets
     them (via setattr). The inodes hold the current permissions so
     there is no need to have custom getattr or permissions callbacks as
     they will more likely cause them to be incorrect. The inode's
     permissions are updated when they should be updated. Remove the
     getattr and permissions inode callbacks.

   - Do not update eventfs_inode attributes on creation of inodes.

     The eventfs_inodes attribute field is used to store the permissions
     of the directories and files for when their corresponding inodes
     are freed and are created again. But when the creation of the
     inodes happen, the eventfs_inode attributes are recalculated. The
     recalculation should only happen when the permissions change for a
     given file or directory. Currently, the attribute changes are just
     being set to their current files so this is not a bug, but it's
     unnecessary and error prone. Stop doing that.

   - The events directory inode is created once when the events
     directory is created and deleted when it is deleted. It is now
     updated on remount and when the user changes the permissions.
     There's no need to use the eventfs_inode of the events directory to
     store the events directory permissions. But using it to store the
     default permissions for the files within the directory that have
     not been updated by the user can simplify the code"

* tag 'trace-tracefs-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  eventfs: Do not use attributes for events directory
  eventfs: Cleanup permissions in creation of inodes
  eventfs: Remove getattr and permission callbacks
  eventfs: Consolidate the eventfs_inode update in eventfs_get_inode()
  tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()
  eventfs: Update all the eventfs_inodes from the events descriptor
  tracefs: Update inode permissions on remount
  eventfs: Keep the directories from having the same inode number as files
2024-05-24 08:27:34 -07:00
Friedrich Vock 44382b3ed6 bpf: Fix potential integer overflow in resolve_btfids
err is a 32-bit integer, but elf_update returns an off_t, which is 64-bit
at least on 64-bit platforms. If symbols_patch is called on a binary between
2-4GB in size, the result will be negative when cast to a 32-bit integer,
which the code assumes means an error occurred. This can wrongly trigger
build failures when building very large kernel images.

Fixes: fbbb68de80 ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object")
Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240514070931.199694-1-friedrich.vock@gmx.de
2024-05-24 17:12:12 +02:00
David S. Miller 0b4f5add9f Merge branch 'mlx5-fixes'
Tariq Toukan says:

====================
mlx5 fixes 24-05-22

This patchset provides bug fixes to mlx5 core and Eth drivers.

Series generated against:
commit 9c91c7fadb ("net: mana: Fix the extra HZ in mana_hwc_send_request")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:08 +01:00
Gal Pressman 83fea49f27 net/mlx5e: Fix UDP GSO for encapsulated packets
When the skb is encapsulated, adjust the inner UDP header instead of the
outer one, and account for UDP header (instead of TCP) in the inline
header size calculation.

Fixes: 689adf0d48 ("net/mlx5e: Add UDP GSO support")
Reported-by: Jason Baron <jbaron@akamai.com>
Closes: https://lore.kernel.org/netdev/c42961cb-50b9-4a9a-bd43-87fe48d88d29@akamai.com/
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:08 +01:00
Carolina Jubran 5c74195d5d net/mlx5e: Use rx_missed_errors instead of rx_dropped for reporting buffer exhaustion
Previously, the driver incorrectly used rx_dropped to report device
buffer exhaustion.

According to the documentation, rx_dropped should not be used to count
packets dropped due to buffer exhaustion, which is the purpose of
rx_missed_errors.

Use rx_missed_errors as intended for counting packets dropped due to
buffer exhaustion.

Fixes: 269e6b3af3 ("net/mlx5e: Report additional error statistics in get stats ndo")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:08 +01:00
Rahul Rameshbabu f55cd31287 net/mlx5e: Do not use ptp structure for tx ts stats when not initialized
The ptp channel instance is only initialized when ptp traffic is first
processed by the driver. This means that there is a window in between when
port timestamping is enabled and ptp traffic is sent where the ptp channel
instance is not initialized. Accessing statistics during this window will
lead to an access violation (NULL + member offset). Check the validity of
the instance before attempting to query statistics.

  BUG: unable to handle page fault for address: 0000000000003524
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 109dfc067 P4D 109dfc067 PUD 1064ef067 PMD 0
  Oops: 0000 [#1] SMP
  CPU: 0 PID: 420 Comm: ethtool Not tainted 6.9.0-rc2-rrameshbabu+ #245
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/204
  RIP: 0010:mlx5e_stats_ts_get+0x4c/0x130
  <snip>
  Call Trace:
   <TASK>
   ? show_regs+0x60/0x70
   ? __die+0x24/0x70
   ? page_fault_oops+0x15f/0x430
   ? do_user_addr_fault+0x2c9/0x5c0
   ? exc_page_fault+0x63/0x110
   ? asm_exc_page_fault+0x27/0x30
   ? mlx5e_stats_ts_get+0x4c/0x130
   ? mlx5e_stats_ts_get+0x20/0x130
   mlx5e_get_ts_stats+0x15/0x20
  <snip>

Fixes: 3579032c08 ("net/mlx5e: Implement ethtool hardware timestamping statistics")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:08 +01:00
Rahul Rameshbabu 9a52f6d44f net/mlx5e: Fix IPsec tunnel mode offload feature check
Remove faulty check disabling checksum offload and GSO for offload of
simple IPsec tunnel L4 traffic. Comment previously describing the deleted
code incorrectly claimed the check prevented double tunnel (or three layers
of ip headers).

Fixes: f1267798c9 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:08 +01:00
Rahul Rameshbabu 16d66a4fa8 net/mlx5: Use mlx5_ipsec_rx_status_destroy to correctly delete status rules
rx_create no longer allocates a modify_hdr instance that needs to be
cleaned up. The mlx5_modify_header_dealloc call will lead to a NULL pointer
dereference. A leak in the rules also previously occurred since there are
now two rules populated related to status.

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 109907067 P4D 109907067 PUD 116890067 PMD 0
  Oops: 0000 [#1] SMP
  CPU: 1 PID: 484 Comm: ip Not tainted 6.9.0-rc2-rrameshbabu+ #254
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/2014
  RIP: 0010:mlx5_modify_header_dealloc+0xd/0x70
  <snip>
  Call Trace:
   <TASK>
   ? show_regs+0x60/0x70
   ? __die+0x24/0x70
   ? page_fault_oops+0x15f/0x430
   ? free_to_partial_list.constprop.0+0x79/0x150
   ? do_user_addr_fault+0x2c9/0x5c0
   ? exc_page_fault+0x63/0x110
   ? asm_exc_page_fault+0x27/0x30
   ? mlx5_modify_header_dealloc+0xd/0x70
   rx_create+0x374/0x590
   rx_add_rule+0x3ad/0x500
   ? rx_add_rule+0x3ad/0x500
   ? mlx5_cmd_exec+0x2c/0x40
   ? mlx5_create_ipsec_obj+0xd6/0x200
   mlx5e_accel_ipsec_fs_add_rule+0x31/0xf0
   mlx5e_xfrm_add_state+0x426/0xc00
  <snip>

Fixes: 94af50c0a9 ("net/mlx5e: Unify esw and normal IPsec status table creation/destruction")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:07 +01:00
Gal Pressman 1b9f86c6d5 net/mlx5: Fix MTMP register capability offset in MCAM register
The MTMP register (0x900a) capability offset is off-by-one, move it to
the right place.

Fixes: 1f507e80c7 ("net/mlx5: Expose NIC temperature via hardware monitoring kernel API")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:07 +01:00
Tariq Toukan fca3b47918 net/mlx5: Do not query MPIR on embedded CPU function
A proper query to MPIR needs to set the correct value in the depth field.
On embedded CPU this value is not necessarily zero. As there is no real
use case for multi-PF netdev on the embedded CPU of the smart NIC, block
this option.

This fixes the following failure:
ACCESS_REG(0x805) op_mod(0x1) failed, status bad system state(0x4), syndrome (0x685f19), err(-5)

Fixes: 678eb44805 ("net/mlx5: SD, Implement basic query and instantiation")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:07 +01:00
Maher Sanalla 51ef9305b8 net/mlx5: Lag, do bond only if slaves agree on roce state
Currently, the driver does not enforce that lag bond slaves must have
matching roce capabilities. Yet, in mlx5_do_bond(), the driver attempts
to enable roce on all vports of the bond slaves, causing the following
syndrome when one slave has no roce fw support:

mlx5_cmd_out_err:809:(pid 25427): MODIFY_NIC_VPORT_CONTEXT(0×755) op_mod(0×0)
failed, status bad parameter(0×3), syndrome (0xc1f678), err(-22)

Thus, create HW lag only if bond's slaves agree on roce state,
either all slaves have roce support resulting in a roce lag bond,
or none do, resulting in a raw eth bond.

Fixes: 7907f23adc ("net/mlx5: Implement RoCE LAG feature")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:07 +01:00
Christian Brauner 712182b67e swap: yield device immediately
Otherwise we can cause spurious EBUSY issues when trying to mount the
rootfs later on.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=218845
Reported-by: Petri Kaukasoina <petri.kaukasoina@tuni.fi>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:08 +02:00
David Howells c596bea145 netfs: Fix setting of BDP_ASYNC from iocb flags
Fix netfs_perform_write() to set BDP_ASYNC if IOCB_NOWAIT is set rather
than if IOCB_SYNC is not set.  It reflects asynchronicity in the sense of
not waiting rather than synchronicity in the sense of not returning until
the op is complete.

Without this, generic/590 fails on cifs in strict caching mode with a
complaint that one of the writes fails with EAGAIN.  The test can be
distilled down to:

        mount -t cifs /my/share /mnt -ostuff
        xfs_io -i -c 'falloc 0 8191M -c fsync -f /mnt/file
        xfs_io -i -c 'pwrite -b 1M -W 0 8191M' /mnt/file

Fixes: c38f4e96e6 ("netfs: Provide func to copy data to pagecache for buffered write")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/316306.1716306586@warthog.procyon.org.uk
Reviewed-by: Jens Axboe <axboe@kernel.dk>
cc: Jeff Layton <jlayton@kernel.org>
cc: Enzo Matsumiya <ematsumiya@suse.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:07 +02:00
Fedor Pchelkin 65bea99537 signalfd: drop an obsolete comment
Commit fbe38120eb ("signalfd: convert to ->read_iter()") removed the
call to anon_inode_getfd() by splitting fd setup into two parts. Drop the
comment referencing the internal details of that function.

Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Link: https://lore.kernel.org/r/20240520090819.76342-2-pchelkin@ispras.ru
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:07 +02:00
Fedor Pchelkin f826bc9d6f signalfd: fix error return code
If anon_inode_getfile() fails, return appropriate error code. This looks
like a single typo: the similar code changes in timerfd and userfaultfd
are okay.

Found by Linux Verification Center (linuxtesting.org).

Fixes: fbe38120eb ("signalfd: convert to ->read_iter()")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Link: https://lore.kernel.org/r/20240520090819.76342-1-pchelkin@ispras.ru
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:07 +02:00
Xu Yang 4e527d5841 iomap: fault in smaller chunks for non-large folio mappings
Since commit (5d8edfb900 "iomap: Copy larger chunks from userspace"),
iomap will try to copy in larger chunks than PAGE_SIZE. However, if the
mapping doesn't support large folio, only one page of maximum 4KB will
be created and 4KB data will be writen to pagecache each time. Then,
next 4KB will be handled in next iteration. This will cause potential
write performance problem.

If chunk is 2MB, total 512 pages need to be handled finally. During this
period, fault_in_iov_iter_readable() is called to check iov_iter readable
validity. Since only 4KB will be handled each time, below address space
will be checked over and over again:

start         	end
-
buf,    	buf+2MB
buf+4KB, 	buf+2MB
buf+8KB, 	buf+2MB
...
buf+2044KB 	buf+2MB

Obviously the checking size is wrong since only 4KB will be handled each
time. So this will get a correct chunk to let iomap work well in non-large
folio case.

With this change, the write speed will be stable. Tested on ARM64 device.

Before:

 - dd if=/dev/zero of=/dev/sda bs=400K  count=10485  (334 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=800K  count=5242   (278 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=1600K count=2621   (204 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=2200K count=1906   (170 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=3000K count=1398   (150 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=4500K count=932    (139 MB/s)

After:

 - dd if=/dev/zero of=/dev/sda bs=400K  count=10485  (339 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=800K  count=5242   (330 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=1600K count=2621   (332 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=2200K count=1906   (333 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=3000K count=1398   (333 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=4500K count=932    (333 MB/s)

Fixes: 5d8edfb900 ("iomap: Copy larger chunks from userspace")
Cc: stable@vger.kernel.org
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240521114939.2541461-2-xu.yang_2@nxp.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:07 +02:00
Xu Yang 79c1374548 filemap: add helper mapping_max_folio_size()
Add mapping_max_folio_size() to get the maximum folio size for this
pagecache mapping.

Fixes: 5d8edfb900 ("iomap: Copy larger chunks from userspace")
Cc: stable@vger.kernel.org
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240521114939.2541461-1-xu.yang_2@nxp.com
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:06 +02:00
David Howells 2c6b531020 netfs: Fix AIO error handling when doing write-through
If an error occurs whilst we're doing an AIO write in write-through mode,
we may end up calling ->ki_complete() *and* returning an error from
->write_iter().  This can result in either a UAF (the ->ki_complete() func
pointer may get overwritten, for example) or a refcount underflow in
io_submit() as ->ki_complete is called twice.

Fix this by making netfs_end_writethrough() - and thus
netfs_perform_write() - unconditionally return -EIOCBQUEUED if we're doing
an AIO write and wait for completion if we're not.

Fixes: 288ace2f57 ("netfs: New writeback implementation")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/295052.1716298587@warthog.procyon.org.uk
cc: Jeff Layton <jlayton@kernel.org>
cc: Enzo Matsumiya <ematsumiya@suse.de>
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:06 +02:00
David Howells 9b038d004c netfs: Fix io_uring based write-through
This can be triggered by mounting a cifs filesystem with a cache=strict
mount option and then, using the fsx program from xfstests, doing:

        ltp/fsx -A -d -N 1000 -S 11463 -P /tmp /cifs-mount/foo \
          --replay-ops=gen112-fsxops

Where gen112-fsxops holds:

        fallocate 0x6be7 0x8fc5 0x377d3
        copy_range 0x9c71 0x77e8 0x2edaf 0x377d3
        write 0x2776d 0x8f65 0x377d3

The problem is that netfs_io_request::len is being used for two purposes
and ends up getting set to the amount of data we transferred, not the
amount of data the caller asked to be transferred (for various reasons,
such as mmap'd writes, we might end up rounding out the data written to the
server to include the entire folio at each end).

Fix this by keeping the amount we were asked to write in ->len and using
->submitted to track what we issued ops for.  Then, when we come to calling
->ki_complete(), ->len is the right size.

This also required netfs_cleanup_dio_write() to change since we're no
longer advancing wreq->len.  Use wreq->transferred instead as we might have
done a short read.

With this, the generic/112 xfstest passes if cifs is forced to put all
non-DIO opens into write-through mode.

Fixes: 288ace2f57 ("netfs: New writeback implementation")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/295086.1716298663@warthog.procyon.org.uk
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <stfrench@microsoft.com>
cc: Enzo Matsumiya <ematsumiya@suse.de>
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:06 +02:00
Mathieu Othacehe 128d54fbcb net: phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ8061
Following a similar reinstate for the KSZ8081 and KSZ9031.

Older kernels would use the genphy_soft_reset if the PHY did not implement
a .soft_reset.

The KSZ8061 errata described here:
https://ww1.microchip.com/downloads/en/DeviceDoc/KSZ8061-Errata-DS80000688B.pdf
and worked around with 232ba3a51c ("net: phy: Micrel KSZ8061: link failure after cable connect")
is back again without this soft reset.

Fixes: 6e2d85ec05 ("net: phy: Stop with excessive soft reset")
Tested-by: Karim Ben Houcine <karim.benhoucine@landisgyr.com>
Signed-off-by: Mathieu Othacehe <othacehe@gnu.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 12:30:38 +01:00
dicken.ding b84a8aba80 genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()
irq_find_at_or_after() dereferences the interrupt descriptor which is
returned by mt_find() while neither holding sparse_irq_lock nor RCU read
lock, which means the descriptor can be freed between mt_find() and the
dereference:

    CPU0                            CPU1
    desc = mt_find()
                                    delayed_free_desc(desc)
    irq_desc_get_irq(desc)

The use-after-free is reported by KASAN:

    Call trace:
     irq_get_next_irq+0x58/0x84
     show_stat+0x638/0x824
     seq_read_iter+0x158/0x4ec
     proc_reg_read_iter+0x94/0x12c
     vfs_read+0x1e0/0x2c8

    Freed by task 4471:
     slab_free_freelist_hook+0x174/0x1e0
     __kmem_cache_free+0xa4/0x1dc
     kfree+0x64/0x128
     irq_kobj_release+0x28/0x3c
     kobject_put+0xcc/0x1e0
     delayed_free_desc+0x14/0x2c
     rcu_do_batch+0x214/0x720

Guard the access with a RCU read lock section.

Fixes: 721255b982 ("genirq: Use a maple tree for interrupt descriptor management")
Signed-off-by: dicken.ding <dicken.ding@mediatek.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240524091739.31611-1-dicken.ding@mediatek.com
2024-05-24 12:49:35 +02:00
Konstantin Komarov 302e9dca84
fs/ntfs3: Break dir enumeration if directory contents error
If we somehow attempt to read beyond the directory size, an error
is supposed to be returned.

However, in some cases, read requests do not stop and instead enter
into a loop.

To avoid this, we set the position in the directory to the end.

Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: stable@vger.kernel.org
2024-05-24 12:50:12 +03:00
Konstantin Komarov 05afeeebca
fs/ntfs3: Fix case when index is reused during tree transformation
In most cases when adding a cluster to the directory index,
they are placed at the end, and in the bitmap, this cluster corresponds
to the last bit. The new directory size is calculated as follows:

	data_size = (u64)(bit + 1) << indx->index_bits;

In the case of reusing a non-final cluster from the index,
data_size is calculated incorrectly, resulting in the directory size
differing from the actual size.

A check for cluster reuse has been added, and the size update is skipped.

Fixes: 82cae269cf ("fs/ntfs3: Add initialization of super block")
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: stable@vger.kernel.org
2024-05-24 12:50:12 +03:00
Matt Jan 06e785aeb9 connector: Fix invalid conversion in cn_proc.h
The implicit conversion from unsigned int to enum
proc_cn_event is invalid, so explicitly cast it
for compilation in a C++ compiler.
/usr/include/linux/cn_proc.h: In function 'proc_cn_event valid_event(proc_cn_event)':
/usr/include/linux/cn_proc.h:72:17: error: invalid conversion from 'unsigned int' to 'proc_cn_event' [-fpermissive]
   72 |         ev_type &= PROC_EVENT_ALL;
      |                 ^
      |                 |
      |                 unsigned int

Signed-off-by: Matt Jan <zoo868e@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 10:36:55 +01:00
Jeff Xu a52b4f11a2 selftest mm/mseal read-only elf memory segment
Sealing read-only of elf mapping so it can't be changed by mprotect.

[jeffxu@chromium.org: style change]
  Link: https://lkml.kernel.org/r/20240416220944.2481203-2-jeffxu@chromium.org
[amer.shanawany@gmail.com: fix linker error for inline function]
  Link: https://lkml.kernel.org/r/20240420202346.546444-1-amer.shanawany@gmail.com
[jeffxu@chromium.org: fix compile warning]
  Link: https://lkml.kernel.org/r/20240420003515.345982-2-jeffxu@chromium.org
[jeffxu@chromium.org: fix arm build]
  Link: https://lkml.kernel.org/r/20240502225331.3806279-2-jeffxu@chromium.org
Link: https://lkml.kernel.org/r/20240415163527.626541-6-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Signed-off-by: Amer Al Shanawany <amer.shanawany@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-23 19:40:27 -07:00
Jeff Xu c010d09900 mseal: add documentation
Add documentation for mseal().

Link: https://lkml.kernel.org/r/20240415163527.626541-5-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-23 19:40:26 -07:00
Jeff Xu 4926c7a52d selftest mm/mseal memory sealing
selftest for memory sealing change in mmap() and mseal().

Link: https://lkml.kernel.org/r/20240415163527.626541-4-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-23 19:40:26 -07:00
Jeff Xu 8be7258aad mseal: add mseal syscall
The new mseal() is an syscall on 64 bit CPU, and with following signature:

int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.

mseal() blocks following operations for the given memory range.

1> Unmapping, moving to another location, and shrinking the size,
   via munmap() and mremap(), can leave an empty space, therefore can
   be replaced with a VMA with a new set of attributes.

2> Moving or expanding a different VMA into the current location,
   via mremap().

3> Modifying a VMA via mmap(MAP_FIXED).

4> Size expansion, via mremap(), does not appear to pose any specific
   risks to sealed VMAs. It is included anyway because the use case is
   unclear. In any case, users can rely on merging to expand a sealed VMA.

5> mprotect() and pkey_mprotect().

6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
   memory, when users don't have write permission to the memory. Those
   behaviors can alter region contents by discarding pages, effectively a
   memset(0) for anonymous memory.

Following input during RFC are incooperated into this patch:

Jann Horn: raising awareness and providing valuable insights on the
destructive madvise operations.
Linus Torvalds: assisting in defining system call signature and scope.
Liam R. Howlett: perf optimization.
Theo de Raadt: sharing the experiences and insight gained from
  implementing mimmutable() in OpenBSD.

Finally, the idea that inspired this patch comes from Stephen Röttger's
work in Chrome V8 CFI.

[jeffxu@chromium.org: add branch prediction hint, per Pedro]
  Link: https://lkml.kernel.org/r/20240423192825.1273679-2-jeffxu@chromium.org
Link: https://lkml.kernel.org/r/20240415163527.626541-3-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-23 19:40:26 -07:00