Commit graph

1280008 commits

Author SHA1 Message Date
Sasha Neftin 7d67d11fbe igc: Fix Energy Efficient Ethernet support declaration
The commit 01cf893bf0 ("net: intel: i40e/igc: Remove setting Autoneg in
EEE capabilities") removed SUPPORTED_Autoneg field but left inappropriate
ethtool_keee structure initialization. When "ethtool --show <device>"
(get_eee) invoke, the 'ethtool_keee' structure was accidentally overridden.
Remove the 'ethtool_keee' overriding and add EEE declaration as per IEEE
specification that allows reporting Energy Efficient Ethernet capabilities.

Examples:
Before fix:
ethtool --show-eee enp174s0
EEE settings for enp174s0:
	EEE status: not supported

After fix:
EEE settings for enp174s0:
	EEE status: disabled
	Tx LPI: disabled
	Supported EEE link modes:  100baseT/Full
	                           1000baseT/Full
	                           2500baseT/Full

Fixes: 01cf893bf0 ("net: intel: i40e/igc: Remove setting Autoneg in EEE capabilities")
Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-6-e3563aa89b0c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 19:27:56 -07:00
Larysa Zaremba f3df404425 ice: map XDP queues to vectors in ice_vsi_map_rings_to_vectors()
ice_pf_dcb_recfg() re-maps queues to vectors with
ice_vsi_map_rings_to_vectors(), which does not restore the previous
state for XDP queues. This leads to no AF_XDP traffic after rebuild.

Map XDP queues to vectors in ice_vsi_map_rings_to_vectors().
Also, move the code around, so XDP queues are mapped independently only
through .ndo_bpf().

Fixes: 6624e780a5 ("ice: split ice_vsi_setup into smaller functions")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-5-e3563aa89b0c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 19:27:56 -07:00
Larysa Zaremba 744d197162 ice: add flag to distinguish reset from .ndo_bpf in XDP rings config
Commit 6624e780a5 ("ice: split ice_vsi_setup into smaller functions")
has placed ice_vsi_free_q_vectors() after ice_destroy_xdp_rings() in
the rebuild process. The behaviour of the XDP rings config functions is
context-dependent, so the change of order has led to
ice_destroy_xdp_rings() doing additional work and removing XDP prog, when
it was supposed to be preserved.

Also, dependency on the PF state reset flags creates an additional,
fortunately less common problem:

* PFR is requested e.g. by tx_timeout handler
* .ndo_bpf() is asked to delete the program, calls ice_destroy_xdp_rings(),
  but reset flag is set, so rings are destroyed without deleting the
  program
* ice_vsi_rebuild tries to delete non-existent XDP rings, because the
  program is still on the VSI
* system crashes

With a similar race, when requested to attach a program,
ice_prepare_xdp_rings() can actually skip setting the program in the VSI
and nevertheless report success.

Instead of reverting to the old order of function calls, add an enum
argument to both ice_prepare_xdp_rings() and ice_destroy_xdp_rings() in
order to distinguish between calls from rebuild and .ndo_bpf().

Fixes: efc2214b60 ("ice: Add support for XDP")
Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-4-e3563aa89b0c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 19:27:56 -07:00
Larysa Zaremba adbf5a4234 ice: remove af_xdp_zc_qps bitmap
Referenced commit has introduced a bitmap to distinguish between ZC and
copy-mode AF_XDP queues, because xsk_get_pool_from_qid() does not do this
for us.

The bitmap would be especially useful when restoring previous state after
rebuild, if only it was not reallocated in the process. This leads to e.g.
xdpsock dying after changing number of queues.

Instead of preserving the bitmap during the rebuild, remove it completely
and distinguish between ZC and copy-mode queues based on the presence of
a device associated with the pool.

Fixes: e102db780e ("ice: track AF_XDP ZC enabled queues in bitmap")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-3-e3563aa89b0c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 19:27:56 -07:00
Jacob Keller cfa747a66e ice: fix reads from NVM Shadow RAM on E830 and E825-C devices
The ice driver reads data from the Shadow RAM portion of the NVM during
initialization, including data used to identify the NVM image and device,
such as the ETRACK ID used to populate devlink dev info fw.bundle.

Currently it is using a fixed offset defined by ICE_CSS_HEADER_LENGTH to
compute the appropriate offset. This worked fine for E810 and E822 devices
which both have CSS header length of 330 words.

Other devices, including both E825-C and E830 devices have different sizes
for their CSS header. The use of a hard coded value results in the driver
reading from the wrong block in the NVM when attempting to access the
Shadow RAM copy. This results in the driver reporting the fw.bundle as 0x0
in both the devlink dev info and ethtool -i output.

The first E830 support was introduced by commit ba20ecb1d1 ("ice: Hook up
4 E830 devices by adding their IDs") and the first E825-C support was
introducted by commit f64e189442 ("ice: introduce new E825C devices
family")

The NVM actually contains the CSS header length embedded in it. Remove the
hard coded value and replace it with logic to read the length from the NVM
directly. This is more resilient against all existing and future hardware,
vs looking up the expected values from a table. It ensures the driver will
read from the appropriate place when determining the ETRACK ID value used
for populating the fw.bundle_id and for reporting in ethtool -i.

The CSS header length for both the active and inactive flash bank is stored
in the ice_bank_info structure to avoid unnecessary duplicate work when
accessing multiple words of the Shadow RAM. Both banks are read in the
unlikely event that the header length is different for the NVM in the
inactive bank, rather than being different only by the overall device
family.

Fixes: ba20ecb1d1 ("ice: Hook up 4 E830 devices by adding their IDs")
Co-developed-by: Paul Greenwalt <paul.greenwalt@intel.com>
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-2-e3563aa89b0c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 19:27:55 -07:00
Jacob Keller 03e4a092be ice: fix iteration of TLVs in Preserved Fields Area
The ice_get_pfa_module_tlv() function iterates over the Type-Length-Value
structures in the Preserved Fields Area (PFA) of the NVM. This is used by
the driver to access data such as the Part Board Assembly identifier.

The function uses simple logic to iterate over the PFA. First, the pointer
to the PFA in the NVM is read. Then the total length of the PFA is read
from the first word.

A pointer to the first TLV is initialized, and a simple loop iterates over
each TLV. The pointer is moved forward through the NVM until it exceeds the
PFA area.

The logic seems sound, but it is missing a key detail. The Preserved
Fields Area length includes one additional final word. This is documented
in the device data sheet as a dummy word which contains 0xFFFF. All NVMs
have this extra word.

If the driver tries to scan for a TLV that is not in the PFA, it will read
past the size of the PFA. It reads and interprets the last dummy word of
the PFA as a TLV with type 0xFFFF. It then reads the word following the PFA
as a length.

The PFA resides within the Shadow RAM portion of the NVM, which is
relatively small. All of its offsets are within a 16-bit size. The PFA
pointer and TLV pointer are stored by the driver as 16-bit values.

In almost all cases, the word following the PFA will be such that
interpreting it as a length will result in 16-bit arithmetic overflow. Once
overflowed, the new next_tlv value is now below the maximum offset of the
PFA. Thus, the driver will continue to iterate the data as TLVs. In the
worst case, the driver hits on a sequence of reads which loop back to
reading the same offsets in an endless loop.

To fix this, we need to correct the loop iteration check to account for
this extra word at the end of the PFA. This alone is sufficient to resolve
the known cases of this issue in the field. However, it is plausible that
an NVM could be misconfigured or have corrupt data which results in the
same kind of overflow. Protect against this by using check_add_overflow
when calculating both the maximum offset of the TLVs, and when calculating
the next_tlv offset at the end of each loop iteration. This ensures that
the driver will not get stuck in an infinite loop when scanning the PFA.

Fixes: e961b679fb ("ice: add board identifier info to devlink .info_get")
Co-developed-by: Paul Greenwalt <paul.greenwalt@intel.com>
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-1-e3563aa89b0c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 19:27:55 -07:00
Dr. David Alan Gilbert b91e05f1fc drm/vmwgfx: remove unused struct 'vmw_stdu_dma'
'vmw_stdu_dma' is unused since
commit 39985eea5a ("drm/vmwgfx: Abstract placement selection")
Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240517232858.230860-1-linux@treblig.org
2024-06-05 22:20:25 -04:00
Ryusuke Konishi 7373a51e79 nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors
The error handling in nilfs_empty_dir() when a directory folio/page read
fails is incorrect, as in the old ext2 implementation, and if the
folio/page cannot be read or nilfs_check_folio() fails, it will falsely
determine the directory as empty and corrupt the file system.

In addition, since nilfs_empty_dir() does not immediately return on a
failed folio/page read, but continues to loop, this can cause a long loop
with I/O if i_size of the directory's inode is also corrupted, causing the
log writer thread to wait and hang, as reported by syzbot.

Fix these issues by making nilfs_empty_dir() immediately return a false
value (0) if it fails to get a directory folio/page.

Link: https://lkml.kernel.org/r/20240604134255.7165-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+c8166c541d3971bf6c87@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c8166c541d3971bf6c87
Fixes: 2ba466d74e ("nilfs2: directory entry operations")
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:27 -07:00
Suren Baghdasaryan 9415983599 mm: fix xyz_noprof functions calling profiled functions
Grepping /proc/allocinfo for "noprof" reveals several xyz_noprof
functions, which means internally they are calling profiled functions. 
This should never happen as such calls move allocation charge from a
higher level location where it should be accounted for into these lower
level helpers.  Fix this by replacing profiled function calls with noprof
ones.

Link: https://lkml.kernel.org/r/20240531205350.3973009-1-surenb@google.com
Fixes: b951aaff50 ("mm: enable page allocation tagging")
Fixes: e26d8769da ("mempool: hook up to memory allocation profiling")
Fixes: 88ae5fb755 ("mm: vmalloc: enable memory allocation profiling")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Reviewed-by: Kees Cook <kees@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:26 -07:00
Thadeu Lima de Souza Cascardo 3f0c44c8c2 codetag: avoid race at alloc_slab_obj_exts
When CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled, the following warning
may be noticed:

[   48.299584] ------------[ cut here ]------------
[   48.300092] alloc_tag was not set
[   48.300528] WARNING: CPU: 2 PID: 1361 at include/linux/alloc_tag.h:130 alloc_tagging_slab_free_hook+0x84/0xc7
[   48.301305] Modules linked in:
[   48.301553] CPU: 2 PID: 1361 Comm: systemd-udevd Not tainted 6.10.0-rc1-00003-gac8755535862 #176
[   48.302196] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   48.302752] RIP: 0010:alloc_tagging_slab_free_hook+0x84/0xc7
[   48.303169] Code: 8d 1c c4 48 85 db 74 4d 48 83 3b 00 75 1e 80 3d 65 02 86 04 00 75 15 48 c7 c7 11 48 1d 85 c6 05 55 02 86 04 01 e8 64 44 a5 ff <0f> 0b 48 8b 03 48 85 c0 74 21 48 83 f8 01 74 14 48 8b 50 20 48 f7
[   48.304411] RSP: 0018:ffff8880111b7d40 EFLAGS: 00010282
[   48.304916] RAX: 0000000000000000 RBX: ffff88800fcc9008 RCX: 0000000000000000
[   48.305455] RDX: 0000000080000000 RSI: ffff888014060000 RDI: ffffed1002236f97
[   48.305979] RBP: 0000000000001100 R08: fffffbfff0aa73a1 R09: 0000000000000000
[   48.306473] R10: ffffffff814515e5 R11: 0000000000000003 R12: ffff88800fcc9000
[   48.306943] R13: ffff88800b2e5cc0 R14: ffff8880111b7d90 R15: 0000000000000000
[   48.307529] FS:  00007faf5d1908c0(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
[   48.308223] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   48.308710] CR2: 000058fb220c9118 CR3: 00000000110cc000 CR4: 0000000000750ef0
[   48.309274] PKRU: 55555554
[   48.309804] Call Trace:
[   48.310029]  <TASK>
[   48.310290]  ? show_regs+0x84/0x8d
[   48.310722]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.311298]  ? __warn+0x13b/0x2ff
[   48.311580]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.311987]  ? report_bug+0x2ce/0x3ab
[   48.312292]  ? handle_bug+0x8c/0x107
[   48.312563]  ? exc_invalid_op+0x34/0x6f
[   48.312842]  ? asm_exc_invalid_op+0x1a/0x20
[   48.313173]  ? this_cpu_in_panic+0x1c/0x72
[   48.313503]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.313880]  ? putname+0x143/0x14e
[   48.314152]  kmem_cache_free+0xe9/0x214
[   48.314454]  putname+0x143/0x14e
[   48.314712]  do_unlinkat+0x413/0x45e
[   48.315001]  ? __pfx_do_unlinkat+0x10/0x10
[   48.315388]  ? __check_object_size+0x4d7/0x525
[   48.315744]  ? __sanitizer_cov_trace_pc+0x20/0x4a
[   48.316167]  ? __sanitizer_cov_trace_pc+0x20/0x4a
[   48.316757]  ? getname_flags+0x4ed/0x500
[   48.317261]  __x64_sys_unlink+0x42/0x4a
[   48.317741]  do_syscall_64+0xe2/0x149
[   48.318171]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   48.318602] RIP: 0033:0x7faf5d8850ab
[   48.318891] Code: fd ff ff e8 27 dd 01 00 0f 1f 80 00 00 00 00 f3 0f 1e fa b8 5f 00 00 00 0f 05 c3 0f 1f 40 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 41 2d 0e 00 f7 d8
[   48.320649] RSP: 002b:00007ffc44982b38 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
[   48.321182] RAX: ffffffffffffffda RBX: 00005ba344a44680 RCX: 00007faf5d8850ab
[   48.321667] RDX: 0000000000000000 RSI: 00005ba344a44430 RDI: 00007ffc44982b40
[   48.322139] RBP: 00007ffc44982c00 R08: 0000000000000000 R09: 0000000000000007
[   48.322598] R10: 00005ba344a44430 R11: 0000000000000246 R12: 0000000000000000
[   48.323071] R13: 00007ffc44982b40 R14: 0000000000000000 R15: 0000000000000000
[   48.323596]  </TASK>

This is due to a race when two objects are allocated from the same slab,
which did not have an obj_exts allocated for.

In such a case, the two threads will notice the NULL obj_exts and after
one assigns slab->obj_exts, the second one will happily do the exchange if
it reads this new assigned value.

In order to avoid that, verify that the read obj_exts does not point to an
allocated obj_exts before doing the exchange.

Link: https://lkml.kernel.org/r/20240527183007.1595037-1-cascardo@igalia.com
Fixes: 09c46563ff ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:26 -07:00
Oscar Salvador 8daf9c702e mm/hugetlb: do not call vma_add_reservation upon ENOMEM
sysbot reported a splat [1] on __unmap_hugepage_range().  This is because
vma_needs_reservation() can return -ENOMEM if
allocate_file_region_entries() fails to allocate the file_region struct
for the reservation.

Check for that and do not call vma_add_reservation() if that is the case,
otherwise region_abort() and region_del() will see that we do not have any
file_regions.

If we detect that vma_needs_reservation() returned -ENOMEM, we clear the
hugetlb_restore_reserve flag as if this reservation was still consumed, so
free_huge_folio() will not increment the resv count.

[1] https://lore.kernel.org/linux-mm/0000000000004096100617c58d54@google.com/T/#ma5983bc1ab18a54910da83416b3f89f3c7ee43aa

Link: https://lkml.kernel.org/r/20240528205323.20439-1-osalvador@suse.de
Fixes: df7a6d1f64 ("mm/hugetlb: restore the reservation if needed")
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reported-and-tested-by: syzbot+d3fe2dc5ffe9380b714b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-mm/0000000000004096100617c58d54@google.com/
Cc: Breno Leitao <leitao@debian.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:26 -07:00
Chengming Zhou c2dc78b86e mm/ksm: fix ksm_zero_pages accounting
We normally ksm_zero_pages++ in ksmd when page is merged with zero page,
but ksm_zero_pages-- is done from page tables side, where there is no any
accessing protection of ksm_zero_pages.

So we can read very exceptional value of ksm_zero_pages in rare cases,
such as -1, which is very confusing to users.

Fix it by changing to use atomic_long_t, and the same case with the
mm->ksm_zero_pages.

Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-2-34bb358fdc13@linux.dev
Fixes: e2942062e0 ("ksm: count all zero pages placed by KSM")
Fixes: 6080d19f07 ("ksm: add ksm zero pages for each process")
Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:26 -07:00
Chengming Zhou 730cdc2c72 mm/ksm: fix ksm_pages_scanned accounting
Patch series "mm/ksm: fix some accounting problems", v3.

We encountered some abnormal ksm_pages_scanned and ksm_zero_pages during
some random tests.

1. ksm_pages_scanned unchanged even ksmd scanning has progress.
2. ksm_zero_pages maybe -1 in some rare cases.


This patch (of 2):

During testing, I found ksm_pages_scanned is unchanged although the
scan_get_next_rmap_item() did return valid rmap_item that is not NULL.

The reason is the scan_get_next_rmap_item() will return NULL after a full
scan, so ksm_do_scan() just return without accounting of the
ksm_pages_scanned.

Fix it by just putting ksm_pages_scanned accounting in that loop, and it
will be accounted more timely if that loop would last for a long time.

Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-0-34bb358fdc13@linux.dev
Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-1-34bb358fdc13@linux.dev
Fixes: b348b5fe2b ("mm/ksm: add pages scanned metric")
Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: xu xin <xu.xin16@zte.com.cn>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:25 -07:00
Alexander Potapenko 2ef3cec44c kmsan: do not wipe out origin when doing partial unpoisoning
As noticed by Brian, KMSAN should not be zeroing the origin when
unpoisoning parts of a four-byte uninitialized value, e.g.:

    char a[4];
    kmsan_unpoison_memory(a, 1);

This led to false negatives, as certain poisoned values could receive zero
origins, preventing those values from being reported.

To fix the problem, check that kmsan_internal_set_shadow_origin() writes
zero origins only to slots which have zero shadow.

Link: https://lkml.kernel.org/r/20240528104807.738758-1-glider@google.com
Fixes: f80be4571b ("kmsan: add KMSAN runtime core")
Signed-off-by: Alexander Potapenko <glider@google.com>
Reported-by: Brian Johannesmeyer <bjohannesmeyer@gmail.com>
  Link: https://lore.kernel.org/lkml/20240524232804.1984355-1-bjohannesmeyer@gmail.com/T/
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Brian Johannesmeyer <bjohannesmeyer@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:25 -07:00
Cong Wang 0105eaabb2 vmalloc: check CONFIG_EXECMEM in is_vmalloc_or_module_addr()
After commit 2c9e5d4a00 ("bpf: remove CONFIG_BPF_JIT dependency on
CONFIG_MODULES of") CONFIG_BPF_JIT does not depend on CONFIG_MODULES any
more and bpf jit also uses the [MODULES_VADDR, MODULES_END] memory region.
But is_vmalloc_or_module_addr() still checks CONFIG_MODULES, which then
returns false for a bpf jit memory region when CONFIG_MODULES is not
defined.  It leads to the following kernel BUG:

[    1.567023] ------------[ cut here ]------------
[    1.567883] kernel BUG at mm/vmalloc.c:745!
[    1.568477] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[    1.569367] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.0+ #448
[    1.570247] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[    1.570786] RIP: 0010:vmalloc_to_page+0x48/0x1ec
[    1.570786] Code: 0f 00 00 e8 eb 1a 05 00 b8 37 00 00 00 48 ba fe ff ff ff ff 1f 00 00 4c 03 25 76 49 c6 02 48 c1 e0 28 48 01 e8 48 39 d0 76 02 <0f> 0b 4c 89 e7 e8 bf 1a 05 00 49 8b 04 24 48 a9 9f ff ff ff 0f 84
[    1.570786] RSP: 0018:ffff888007787960 EFLAGS: 00010212
[    1.570786] RAX: 000036ffa0000000 RBX: 0000000000000640 RCX: ffffffff8147e93c
[    1.570786] RDX: 00001ffffffffffe RSI: dffffc0000000000 RDI: ffffffff840e32c8
[    1.570786] RBP: ffffffffa0000000 R08: 0000000000000000 R09: 0000000000000000
[    1.570786] R10: ffff888007787a88 R11: ffffffff8475d8e7 R12: ffffffff83e80ff8
[    1.570786] R13: 0000000000000640 R14: 0000000000000640 R15: 0000000000000640
[    1.570786] FS:  0000000000000000(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000
[    1.570786] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.570786] CR2: ffff888006a01000 CR3: 0000000003e80000 CR4: 0000000000350ef0
[    1.570786] Call Trace:
[    1.570786]  <TASK>
[    1.570786]  ? __die_body+0x1b/0x58
[    1.570786]  ? die+0x31/0x4b
[    1.570786]  ? do_trap+0x9d/0x138
[    1.570786]  ? vmalloc_to_page+0x48/0x1ec
[    1.570786]  ? do_error_trap+0xcd/0x102
[    1.570786]  ? vmalloc_to_page+0x48/0x1ec
[    1.570786]  ? vmalloc_to_page+0x48/0x1ec
[    1.570786]  ? handle_invalid_op+0x2f/0x38
[    1.570786]  ? vmalloc_to_page+0x48/0x1ec
[    1.570786]  ? exc_invalid_op+0x2b/0x41
[    1.570786]  ? asm_exc_invalid_op+0x16/0x20
[    1.570786]  ? vmalloc_to_page+0x26/0x1ec
[    1.570786]  ? vmalloc_to_page+0x48/0x1ec
[    1.570786]  __text_poke+0xb6/0x458
[    1.570786]  ? __pfx_text_poke_memcpy+0x10/0x10
[    1.570786]  ? __pfx___mutex_lock+0x10/0x10
[    1.570786]  ? __pfx___text_poke+0x10/0x10
[    1.570786]  ? __pfx_get_random_u32+0x10/0x10
[    1.570786]  ? srso_return_thunk+0x5/0x5f
[    1.570786]  text_poke_copy_locked+0x70/0x84
[    1.570786]  text_poke_copy+0x32/0x4f
[    1.570786]  bpf_arch_text_copy+0xf/0x27
[    1.570786]  bpf_jit_binary_pack_finalize+0x26/0x5a
[    1.570786]  bpf_int_jit_compile+0x576/0x8ad
[    1.570786]  ? __pfx_bpf_int_jit_compile+0x10/0x10
[    1.570786]  ? srso_return_thunk+0x5/0x5f
[    1.570786]  ? __kmalloc_node_track_caller+0x2b5/0x2e0
[    1.570786]  bpf_prog_select_runtime+0x7c/0x199
[    1.570786]  bpf_prepare_filter+0x1e9/0x25b
[    1.570786]  ? __pfx_bpf_prepare_filter+0x10/0x10
[    1.570786]  ? srso_return_thunk+0x5/0x5f
[    1.570786]  ? _find_next_bit+0x29/0x7e
[    1.570786]  bpf_prog_create+0xb8/0xe0
[    1.570786]  ptp_classifier_init+0x75/0xa1
[    1.570786]  ? __pfx_ptp_classifier_init+0x10/0x10
[    1.570786]  ? srso_return_thunk+0x5/0x5f
[    1.570786]  ? register_pernet_subsys+0x36/0x42
[    1.570786]  ? srso_return_thunk+0x5/0x5f
[    1.570786]  sock_init+0x99/0xa3
[    1.570786]  ? __pfx_sock_init+0x10/0x10
[    1.570786]  do_one_initcall+0x104/0x2c4
[    1.570786]  ? __pfx_do_one_initcall+0x10/0x10
[    1.570786]  ? parameq+0x25/0x2d
[    1.570786]  ? rcu_is_watching+0x1c/0x3c
[    1.570786]  ? trace_kmalloc+0x81/0xb2
[    1.570786]  ? srso_return_thunk+0x5/0x5f
[    1.570786]  ? __kmalloc+0x29c/0x2c7
[    1.570786]  ? srso_return_thunk+0x5/0x5f
[    1.570786]  do_initcalls+0xf9/0x123
[    1.570786]  kernel_init_freeable+0x24f/0x289
[    1.570786]  ? __pfx_kernel_init+0x10/0x10
[    1.570786]  kernel_init+0x19/0x13a
[    1.570786]  ret_from_fork+0x24/0x41
[    1.570786]  ? __pfx_kernel_init+0x10/0x10
[    1.570786]  ret_from_fork_asm+0x1a/0x30
[    1.570786]  </TASK>
[    1.570819] ---[ end trace 0000000000000000 ]---
[    1.571463] RIP: 0010:vmalloc_to_page+0x48/0x1ec
[    1.572111] Code: 0f 00 00 e8 eb 1a 05 00 b8 37 00 00 00 48 ba fe ff ff ff ff 1f 00 00 4c 03 25 76 49 c6 02 48 c1 e0 28 48 01 e8 48 39 d0 76 02 <0f> 0b 4c 89 e7 e8 bf 1a 05 00 49 8b 04 24 48 a9 9f ff ff ff 0f 84
[    1.574632] RSP: 0018:ffff888007787960 EFLAGS: 00010212
[    1.575129] RAX: 000036ffa0000000 RBX: 0000000000000640 RCX: ffffffff8147e93c
[    1.576097] RDX: 00001ffffffffffe RSI: dffffc0000000000 RDI: ffffffff840e32c8
[    1.577084] RBP: ffffffffa0000000 R08: 0000000000000000 R09: 0000000000000000
[    1.578077] R10: ffff888007787a88 R11: ffffffff8475d8e7 R12: ffffffff83e80ff8
[    1.578810] R13: 0000000000000640 R14: 0000000000000640 R15: 0000000000000640
[    1.579823] FS:  0000000000000000(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000
[    1.580992] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.581869] CR2: ffff888006a01000 CR3: 0000000003e80000 CR4: 0000000000350ef0
[    1.582800] Kernel panic - not syncing: Fatal exception
[    1.583765] ---[ end Kernel panic - not syncing: Fatal exception ]---

Fix this by checking CONFIG_EXECMEM instead.

Link: https://lkml.kernel.org/r/20240528160838.102223-1-xiyou.wangcong@gmail.com
Fixes: 2c9e5d4a00 ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of")
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:25 -07:00
Johannes Weiner 7cc5a5d650 mm: page_alloc: fix highatomic typing in multi-block buddies
Christoph reports a page allocator splat triggered by xfstests:

generic/176 214s ... [ 1204.507931] run fstests generic/176 at 2024-05-27 12:52:30
XFS (nvme0n1): Mounting V5 Filesystem cd936307-415f-48a3-b99d-a2d52ae1f273
XFS (nvme0n1): Ending clean mount
XFS (nvme1n1): Mounting V5 Filesystem ab3ee1a4-af62-4934-9a6a-6c2fde321850
XFS (nvme1n1): Ending clean mount
XFS (nvme1n1): Unmounting Filesystem ab3ee1a4-af62-4934-9a6a-6c2fde321850
XFS (nvme1n1): Mounting V5 Filesystem 7099b02d-9c58-4d1d-be1d-2cc472d12cd9
XFS (nvme1n1): Ending clean mount
------------[ cut here ]------------
page type is 3, passed migratetype is 1 (nr=512)
WARNING: CPU: 0 PID: 509870 at mm/page_alloc.c:645 expand+0x1c5/0x1f0
Modules linked in: i2c_i801 crc32_pclmul i2c_smbus [last unloaded: scsi_debug]
CPU: 0 PID: 509870 Comm: xfs_io Not tainted 6.10.0-rc1+ #2437
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:expand+0x1c5/0x1f0
Code: 05 16 70 bf 02 01 e8 ca fc ff ff 8b 54 24 34 44 89 e1 48 c7 c7 80 a2 28 83 48 89 c6 b8 01 00 3
RSP: 0018:ffffc90003b2b968 EFLAGS: 00010082
RAX: 0000000000000000 RBX: ffffffff83fa9480 RCX: 0000000000000000
RDX: 0000000000000005 RSI: 0000000000000027 RDI: 00000000ffffffff
RBP: 00000000001f2600 R08: 00000000fffeffff R09: 0000000000000001
R10: 0000000000000000 R11: ffffffff83676200 R12: 0000000000000009
R13: 0000000000000200 R14: 0000000000000001 R15: ffffea0007c98000
FS:  00007f72ca3d5780(0000) GS:ffff8881f9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f72ca1fff38 CR3: 00000001aa0c6002 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 ? __warn+0x7b/0x120
 ? expand+0x1c5/0x1f0
 ? report_bug+0x191/0x1c0
 ? handle_bug+0x3c/0x80
 ? exc_invalid_op+0x17/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? expand+0x1c5/0x1f0
 ? expand+0x1c5/0x1f0
 __rmqueue_pcplist+0x3a9/0x730
 get_page_from_freelist+0x7a0/0xf00
 __alloc_pages_noprof+0x153/0x2e0
 __folio_alloc_noprof+0x10/0xa0
 __filemap_get_folio+0x16b/0x370
 iomap_write_begin+0x496/0x680

While trying to service a movable allocation (page type 1), the page
allocator runs into a two-pageblock buddy on the movable freelist whose
second block is typed as highatomic (page type 3).

This inconsistency is caused by the highatomic reservation system
operating on single pageblocks, while MAX_ORDER can be bigger than that -
in this configuration, pageblock_order is 9 while MAX_PAGE_ORDER is 10. 
The test case is observed to make several adjacent order-3 requests with
__GFP_DIRECT_RECLAIM cleared, which marks the surrounding block as
highatomic.  Upon freeing, the blocks merge into an order-10 buddy.  When
the highatomic pool is drained later on, this order-10 buddy gets moved
back to the movable list, but only the first pageblock is marked movable
again.  A subsequent expand() of this buddy warns about the tail being of
a different type.

This is a long-standing bug that's surfaced by the recent block type
warnings added to the allocator.  The consequences seem mostly benign, it
just results in odd behavior: the highatomic tail blocks are not properly
drained, instead they end up on the movable list first, then go back to
the highatomic list after an alloc-free cycle.

To fix this, make the highatomic reservation code aware that
allocations/buddies can be larger than a pageblock.

While it's an old quirk, the recently added type consistency warnings seem
to be the most prominent consequence of it.  Set the Fixes: tag
accordingly to highlight this backporting dependency.

Link: https://lkml.kernel.org/r/20240530114203.GA1222079@cmpxchg.org
Fixes: e0932b6c1f ("mm: page_alloc: consolidate free page accounting")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Tested-by: Christoph Hellwig <hch@lst.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:25 -07:00
Ryusuke Konishi a4ca369ca2 nilfs2: fix potential kernel bug due to lack of writeback flag waiting
Destructive writes to a block device on which nilfs2 is mounted can cause
a kernel bug in the folio/page writeback start routine or writeback end
routine (__folio_start_writeback in the log below):

 kernel BUG at mm/page-writeback.c:3070!
 Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
 ...
 RIP: 0010:__folio_start_writeback+0xbaa/0x10e0
 Code: 25 ff 0f 00 00 0f 84 18 01 00 00 e8 40 ca c6 ff e9 17 f6 ff ff
  e8 36 ca c6 ff 4c 89 f7 48 c7 c6 80 c0 12 84 e8 e7 b3 0f 00 90 <0f>
  0b e8 1f ca c6 ff 4c 89 f7 48 c7 c6 a0 c6 12 84 e8 d0 b3 0f 00
 ...
 Call Trace:
  <TASK>
  nilfs_segctor_do_construct+0x4654/0x69d0 [nilfs2]
  nilfs_segctor_construct+0x181/0x6b0 [nilfs2]
  nilfs_segctor_thread+0x548/0x11c0 [nilfs2]
  kthread+0x2f0/0x390
  ret_from_fork+0x4b/0x80
  ret_from_fork_asm+0x1a/0x30
  </TASK>

This is because when the log writer starts a writeback for segment summary
blocks or a super root block that use the backing device's page cache, it
does not wait for the ongoing folio/page writeback, resulting in an
inconsistent writeback state.

Fix this issue by waiting for ongoing writebacks when putting
folios/pages on the backing device into writeback state.

Link: https://lkml.kernel.org/r/20240530141556.4411-1-konishi.ryusuke@gmail.com
Fixes: 9ff05123e3 ("nilfs2: segment constructor")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:24 -07:00
Sebastian Andrzej Siewior 36eef400c2 memcg: remove the lockdep assert from __mod_objcg_mlstate()
The assert was introduced in the commit cited below as an insurance that
the semantic is the same after the local_irq_save() has been removed and
the function has been made static.

The original requirement to disable interrupt was due the modification
of per-CPU counters which require interrupts to be disabled because the
counter update operation is not atomic and some of the counters are
updated from interrupt context.

All callers of __mod_objcg_mlstate() acquire a lock
(memcg_stock.stock_lock) which disables interrupts on !PREEMPT_RT and
the lockdep assert is satisfied. On PREEMPT_RT the interrupts are not
disabled and the assert triggers.

The safety of the counter update is already ensured by
VM_WARN_ON_IRQS_ENABLED() which is part of __mod_memcg_lruvec_state() and
does not require yet another check.

Remove the lockdep assert from __mod_objcg_mlstate().

Link: https://lkml.kernel.org/r/20240528141341.rz_rytN_@linutronix.de
Fixes: 91882c1617 ("memcg: simple cleanup of stats update functions")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:24 -07:00
Barry Song 6434e69814 mm: arm64: fix the out-of-bounds issue in contpte_clear_young_dirty_ptes
We are passing a huge nr to __clear_young_dirty_ptes() right now.  While
we should pass the number of pages, we are actually passing CONT_PTE_SIZE.
This is causing lots of crashes of MADV_FREE, panic oops could vary
everytime.

Link: https://lkml.kernel.org/r/20240524005444.135417-1-21cnbao@gmail.com
Fixes: 89e86854fb ("mm/arm64: override clear_young_dirty_ptes() batch helper")
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Lance Yang <ioworker0@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Chris Li <chrisl@kernel.org>
Cc: Barry Song <21cnbao@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Jeff Xie <xiehuan09@gmail.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yin Fengwei <fengwei.yin@intel.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:24 -07:00
Barry Song 94d46bf179 mm: huge_mm: fix undefined reference to `mthp_stats' for CONFIG_SYSFS=n
if CONFIG_SYSFS is not enabled in config, we get the below error,

All errors (new ones prefixed by >>):

   s390-linux-ld: mm/memory.o: in function `count_mthp_stat':
>> include/linux/huge_mm.h:285:(.text+0x191c): undefined reference to `mthp_stats'
   s390-linux-ld: mm/huge_memory.o:(.rodata+0x10): undefined reference to `mthp_stats'

vim +285 include/linux/huge_mm.h

   279
   280  static inline void count_mthp_stat(int order, enum mthp_stat_item item)
   281  {
   282          if (order <= 0 || order > PMD_ORDER)
   283                  return;
   284
 > 285          this_cpu_inc(mthp_stats.stats[order][item]);
   286  }
   287

Link: https://lkml.kernel.org/r/20240523210045.40444-1-21cnbao@gmail.com
Fixes: ec33687c67 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405231728.tCAogiSI-lkp@intel.com/
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Tested-by: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:24 -07:00
Baolin Wang 0d648dd5c8 mm: drop the 'anon_' prefix for swap-out mTHP counters
The mTHP swap related counters: 'anon_swpout' and 'anon_swpout_fallback'
are confusing with an 'anon_' prefix, since the shmem can swap out
non-anonymous pages.  So drop the 'anon_' prefix to keep consistent with
the old swap counter names.

This is needed in 6.10-rcX to avoid having an inconsistent ABI out in the
field.

Link: https://lkml.kernel.org/r/7a8989c13299920d7589007a30065c3e2c19f0e0.1716431702.git.baolin.wang@linux.alibaba.com
Fixes: d0f048ac39 ("mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters")
Fixes: 42248b9d34 ("mm: add docs for per-order mTHP counters and transhuge_page ABI")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Suggested-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05 19:19:23 -07:00
Ian Forbes 7ef91dcba1 drm/vmwgfx: Don't destroy Screen Target when CRTC is enabled but inactive
drm_crtc_helper_funcs::atomic_disable can be called even when the CRTC is
still enabled. This can occur when the mode changes or the CRTC is set as
inactive.

In the case where the CRTC is being set as inactive we only want to
blank the screen. The Screen Target should remain intact as long as the
mode has not changed and CRTC is enabled.

This fixes a bug with GDM where locking the screen results in a permanent
black screen because the Screen Target is no longer defined.

Fixes: 7b0062036c ("drm/vmwgfx: Implement virtual crc generation")
Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Reviewed-by: Martin Krastev <martin.krastev@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240531203358.26677-1-ian.forbes@broadcom.com
2024-06-05 22:11:42 -04:00
Ian Forbes a54a200f3d drm/vmwgfx: Standardize use of kibibytes when logging
Use the same standard abbreviation KiB instead of incorrect variants.

Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-5-ian.forbes@broadcom.com
2024-06-05 22:10:49 -04:00
Ian Forbes dde1de06bd drm/vmwgfx: Remove STDU logic from generic mode_valid function
STDU has its own mode_valid function now so this logic can be removed from
the generic version.

Fixes: 935f795045 ("drm/vmwgfx: Refactor drm connector probing for display modes")

Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-4-ian.forbes@broadcom.com
2024-06-05 22:10:48 -04:00
Ian Forbes fb5e19d2dd drm/vmwgfx: 3D disabled should not effect STDU memory limits
This limit became a hard cap starting with the change referenced below.
Surface creation on the device will fail if the requested size is larger
than this limit so altering the value arbitrarily will expose modes that
are too large for the device's hard limits.

Fixes: 7ebb47c9f9 ("drm/vmwgfx: Read new register for GB memory when available")

Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-3-ian.forbes@broadcom.com
2024-06-05 22:10:47 -04:00
Ian Forbes 4268269331 drm/vmwgfx: Filter modes which exceed graphics memory
SVGA requires individual surfaces to fit within graphics memory
(max_mob_pages) which means that modes with a final buffer size that would
exceed graphics memory must be pruned otherwise creation will fail.

Additionally llvmpipe requires its buffer height and width to be a multiple
of its tile size which is 64. As a result we have to anticipate that
llvmpipe will round up the mode size passed to it by the compositor when
it creates buffers and filter modes where this rounding exceeds graphics
memory.

This fixes an issue where VMs with low graphics memory (< 64MiB) configured
with high resolution mode boot to a black screen because surface creation
fails.

Fixes: d947d1b71d ("drm/vmwgfx: Add and connect connector helper function")
Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-2-ian.forbes@broadcom.com
2024-06-05 22:10:46 -04:00
Jakub Kicinski 886bf9172d bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZmAYPgAKCRDbK58LschI
 g2XdAP9M8zYLRw4IG8DUFug7F+oqRPqgbs+Gvsf9YNl5/PSiTQEA6WKa/ObaG/W9
 vre9VxhMWKgcMfzqZyztNHAiDm8R+QI=
 =l7gV
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-06-05

We've added 8 non-merge commits during the last 6 day(s) which contain
a total of 9 files changed, 34 insertions(+), 35 deletions(-).

The main changes are:

1) Fix a potential use-after-free in bpf_link_free when the link uses
   dealloc_deferred to free the link object but later still tests for
   presence of link->ops->dealloc, from Cong Wang.

2) Fix BPF test infra to set the run context for rawtp test_run callback
   where syzbot reported a crash, from Jiri Olsa.

3) Fix bpf_session_cookie BTF_ID in the special_kfunc_set list to exclude
   it for the case of !CONFIG_FPROBE, also from Jiri Olsa.

4) Fix a Coverity static analysis report to not close() a link_fd of -1
   in the multi-uprobe feature detector, from Andrii Nakryiko.

5) Revert support for redirect to any xsk socket bound to the same umem
   as it can result in corrupted ring state which can lead to a crash when
   flushing rings. A different approach will be pursued for bpf-next to
   address it safely, from Magnus Karlsson.

6) Fix inet_csk_accept prototype in test_sk_storage_tracing.c which caused
   BPF CI failure after the last tree fast forwarding, from Andrii Nakryiko.

7) Fix a coccicheck warning in BPF devmap that iterator variable cannot
   be NULL, from Thorsten Blum.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  Revert "xsk: Document ability to redirect to any socket bound to the same umem"
  Revert "xsk: Support redirect to any socket bound to the same umem"
  bpf: Set run context for rawtp test_run callback
  bpf: Fix a potential use-after-free in bpf_link_free()
  bpf, devmap: Remove unnecessary if check in for loop
  libbpf: don't close(-1) in multi-uprobe feature detector
  bpf: Fix bpf_session_cookie BTF_ID in special_kfunc_set list
  selftests/bpf: fix inet_csk_accept prototype in test_sk_storage_tracing.c
====================

Link: https://lore.kernel.org/r/20240605091525.22628-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 19:03:08 -07:00
Dave Airlie 1cfa043fc0 Driver Changes:
- drm/xe/pf: Update the LMTT when freeing VF GT config
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRskUM7w1oG5rx2IZO4FpNVCsYGvwUCZl8tgAAKCRC4FpNVCsYG
 v4byAP4/+xeuOE9lc6J/PxGwbb3gji6v3bBirvXU330MLvXG8AEAunrSb3b5O6on
 YgmSTCPgyR8/GXrvlCtGkA1ecZqxHg0=
 =1FmW
 -----END PGP SIGNATURE-----

Merge tag 'drm-xe-fixes-2024-06-04' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- drm/xe/pf: Update the LMTT when freeing VF GT config

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Zl8uFrQp0YjTtX4p@fedora
2024-06-06 11:38:51 +10:00
Karol Kolacinski 323a359f9b ptp: Fix error message on failed pin verification
On failed verification of PTP clock pin, error message prints channel
number instead of pin index after "pin", which is incorrect.

Fix error message by adding channel number to the message and printing
pin number instead of channel number.

Fixes: 6092315dfd ("ptp: introduce programmable pins.")
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20240604120555.16643-1-karol.kolacinski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 16:01:16 -07:00
Eric Dumazet f921a58ae2 net/sched: taprio: always validate TCA_TAPRIO_ATTR_PRIOMAP
If one TCA_TAPRIO_ATTR_PRIOMAP attribute has been provided,
taprio_parse_mqprio_opt() must validate it, or userspace
can inject arbitrary data to the kernel, the second time
taprio_change() is called.

First call (with valid attributes) sets dev->num_tc
to a non zero value.

Second call (with arbitrary mqprio attributes)
returns early from taprio_parse_mqprio_opt()
and bad things can happen.

Fixes: a3d43c0d56 ("taprio: Add support adding an admin schedule")
Reported-by: Noam Rathaus <noamr@ssd-disclosure.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20240604181511.769870-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 15:54:51 -07:00
Linus Torvalds 2df0193e62 Thermal control fixes for 6.10-rc3
Fix issues related to the handling of invalid trip points in the thermal
 core and in the thermal debug code that have been overlooked by some
 recent thermal control core changes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmZgpJcSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxn2EP/0YxHS9yw7XC50hwDQBmIQjFsAPIaz29
 uLGokZkn4LU7aTvza3Vo7HDd7TVTDucVxtRD1R0+YPVFW/zzuj/BqiHntdPfBd/y
 krOQIHREY6vF/9wJPXlwHgGrai8PSAar90gZ13YuCruI6VtKmQ3dLuVDRFbnccqX
 GazHnQG3YL2mwjKVS9mnYwYlNjtctPbOjjTQ1+Zbr4/rnpJYYfJDwZ/AudcH1gSr
 28LkOh3PvXxb126j4slBjJwlwPgpQKQW3ozEbvH+RgJQiYA5pVOpaM+o+9R2zd8J
 AcmPvuP6Qtgyzs+Su7N+0NpQJ/vw0OCMEJ6BRUhw/NoeYOdKRvQZYrXeBjbiw+Yt
 b+7l1sHhtoIUnZJK8JpDo6+4nLzF9sV3aPousMEEzc+NsPwFJdCClU/6HN4Gr1bN
 H4GahJ0NWf/JzYrTQsoIte2JfdueXanhZ/oxoKHXBamo3t5eIKp3v1l05H+Upe4y
 rtB5DrG/fLJXlO4H2orc4vuLcurxGOpMXhEC4dAIu1y6dndxPJC+fGhSNp6+4eTv
 R46iJBKNyZ1YILVH3iNW7k+dQZKCxKgN/MKQFPBGh28KsZ+9zpjFn7XX3OTzFYHR
 gKyvhYOWr2kcKlSLh1LPKg0zv6cu2b62qttrKHPAcPVZvOW9nNhbCZGrC4Wnh3xM
 PIOXPftKHelM
 =uz/z
 -----END PGP SIGNATURE-----

Merge tag 'thermal-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
 "Fix issues related to the handling of invalid trip points in the
  thermal core and in the thermal debug code that have been overlooked
  by some recent thermal control core changes"

* tag 'thermal-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: trip: Trigger trip down notifications when trips involved in mitigation become invalid
  thermal: core: Introduce thermal_trip_crossed()
  thermal/debugfs: Allow tze_seq_show() to print statistics for invalid trips
  thermal/debugfs: Print initial trip temperature and hysteresis in tze_seq_show()
2024-06-05 15:28:20 -07:00
Linus Torvalds 553352597d ACPI fixes for 6.10-rc3
- Fix error handling during EC operation region accesses in the ACPI EC
    driver (Armin Wolf).
 
  - Fix a memory leak in the APEI error injection driver introduced
    during its converion to a platform driver (Dan Williams).
 
  - Fix build failures related to the dev_is_pnp() macro by redefining it
    as a proper function and exporting it to modules as appropriate and
    unexport pnp_bus_type which need not be exported any more (Andy
    Shevchenko).
 
  - Update the ACPI AC driver to use power_supply_changed() to let the
    power supply core handle configuration changes properly (Thomas
    Weißschuh).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmZgpBASHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxBwkP/13LL7HqBcftRLFAYhszJPUhxeCimBrE
 Mw4qIjtY7BjOJKCIeh8v/SAqcBS3k2ZatOCMhF0vgpiLzWSd1avF4eHBYfQVLpIO
 h6SuUWcDC2nZEt4LVEJOYrWrzBh/50wz64xDiIjc+y23T/i7M77HvmyXiPn4xmjQ
 ERyaleptFKmIAJdwmxSUFOI+O+18zaz2zvITU3eWAPb3XBxwir2WrLqm7HJMfDtz
 55u1uyoYBTxGD5Dd1YaqS6oDD30k6tx8ury6yY2uCA8MLhVNFKMRO6NWY2pnv5NL
 82SX/dKuxaikE3y2Nq3hdPn9di0g5rh6t4qge3mM02NHZygvnJuCNBTlwCQHu6Mu
 V3DtyPWvG+DHYDtyGNuOVztPPQDd/wO4rZxmLBhNEYoLERm5h1zPnjuvoCwv+JUm
 e1npSwvn4aarHWnhLUbya1YY644NQldxK79ytHAgEh2kZk9FfL5yTJy31DyQj5Yz
 Act98yK8hYIXLmLzcjkoASIz0TcEgzfmzi83NlsF+/Jl31FJbUeQxtQhL6o9ePnH
 bvIBn71hzxVSHfkHpYOn0NAm6XAE8/l6WobVUJxbtJ/vIet3RM1rcZomtgwyzv7W
 oPpivrWoEB4SL9aHddg11I4jsCJfiOfXcMl44PUt4Dm3HfJCn6dJYPQBt/1tRgog
 no5v+ALFhCKD
 =j7lP
 -----END PGP SIGNATURE-----

Merge tag 'acpi-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These fix the ACPI EC and AC drivers, the ACPI APEI error injection
  driver and build issues related to the dev_is_pnp() macro referring to
  pnp_bus_type that is not exported to modules.

  Specifics:

   - Fix error handling during EC operation region accesses in the ACPI
     EC driver (Armin Wolf)

   - Fix a memory leak in the APEI error injection driver introduced
     during its converion to a platform driver (Dan Williams)

   - Fix build failures related to the dev_is_pnp() macro by redefining
     it as a proper function and exporting it to modules as appropriate
     and unexport pnp_bus_type which need not be exported any more (Andy
     Shevchenko)

   - Update the ACPI AC driver to use power_supply_changed() to let the
     power supply core handle configuration changes properly (Thomas
     Weißschuh)"

* tag 'acpi-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: AC: Properly notify powermanagement core about changes
  PNP: Hide pnp_bus_type from the non-PNP code
  PNP: Make dev_is_pnp() to be a function and export it for modules
  ACPI: EC: Avoid returning AE_OK on errors in address space handler
  ACPI: EC: Abort address space access upon error
  ACPI: APEI: EINJ: Fix einj_dev release leak
2024-06-05 15:19:15 -07:00
Linus Torvalds 64c6a36d79 Power management fixes for 6.10-rc3
- Fix a recently introduced unchecked HWP MSR access in the
    intel_pstate driver (Srinivas Pandruvada).
 
  - Add missing conversion from MHz to KHz to amd_pstate_set_boost()
    to address sysfs inteface inconsistency and fix P-state frequency
    reporting on AMD Family 1Ah CPUs in the cpupower utility (Dhananjay
    Ugwekar).
 
  - Get rid of an excess global header file used by the amd-pstate
    cpufreq driver (Arnd Bergmann).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmZgo68SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxRkIP+wXPamRibHU8VMFDewo0xYDNEcjTXBm9
 Ng4D21CNteci5GB96SmEWPViEmBTZ0HiaPVZX0YViL5HT1pSALTWYuQq7iKh9E9m
 fqpq4iITc6u0M5wQTTHeG+S5tAVM7z32ZXosHI/sr9j7V7kfa3mmfQn1jy4Jso7S
 B3NkkpaRKngpo+EBZqJ4prGKB6I3Wp1WVIHo4BSqLEA5JInwd1fJP1J6ugl2vnUe
 V1IFRtYrzYLbzesx+OA6mRppc6Dgva+Nw+8O1zPsLuUmPJ1NVybe465uyi/u9cip
 PDNpycuimgI3ScllwjLeafTAzHqDxkBWIie56HpGZYbddkM7Xgg9JSu8iJ2Rcv0G
 bP5YIC2Vwo3nhvNfA8uVCvIAw4zNoxR9Po7SIC5zW8C51OVX7qig3BHh9Uts8Pqz
 uSXjLRl3aWeFekRgiI9q0Lmtw3y8UBQqEBXSKa0vXxFHlji/17v9iBAiHuP64wut
 u5NE+8iLmhA358uBELG32L7sR/CG0asyRsm5VFnpYVf9UDPmcpmj/WwzyLQaI0cc
 a5rcU5C+YEH3GWMTnyvKHEbWi+3gMaR81fVsnDdMM739HNSUltdgqSc7rjeUf3B1
 FAfVVfV76dqVruVSFuu5oaZDE4BvdxrefKGur/AaKenDIauTpf5YQWiI6qkL/ZKF
 zkSbVmLWKOQw
 =lugn
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix the intel_pstate and amd-pstate cpufreq drivers and the
  cpupower utility.

  Specifics:

   - Fix a recently introduced unchecked HWP MSR access in the
     intel_pstate driver (Srinivas Pandruvada)

   - Add missing conversion from MHz to KHz to amd_pstate_set_boost() to
     address sysfs inteface inconsistency and fix P-state frequency
     reporting on AMD Family 1Ah CPUs in the cpupower utility (Dhananjay
     Ugwekar)

   - Get rid of an excess global header file used by the amd-pstate
     cpufreq driver (Arnd Bergmann)"

* tag 'pm-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: intel_pstate: Fix unchecked HWP MSR access
  cpufreq: amd-pstate: Fix the inconsistency in max frequency units
  cpufreq: amd-pstate: remove global header file
  tools/power/cpupower: Fix Pstate frequency reporting on AMD Family 1Ah CPUs
2024-06-05 15:12:35 -07:00
Aleksandr Mishin 229bedbf62 net/mlx5: Fix tainted pointer delete is case of flow rules creation fail
In case of flow rule creation fail in mlx5_lag_create_port_sel_table(),
instead of previously created rules, the tainted pointer is deleted
deveral times.
Fix this bug by using correct flow rules pointers.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 352899f384 ("net/mlx5: Lag, use buckets in hash mode")
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240604100552.25201-1-amishin@t-argos.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 12:51:25 -07:00
Linus Torvalds 19ca0d8a43 for-6.10-rc2-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmZggXMACgkQxWXV+ddt
 WDupkA/9Foo2OsWR6wIQyBqzmHnhgzBwJ67q0F6MO2/iFfMRW/YIJH3Fk+0+PP40
 BDK4xiz1DIl/qJvoSv4bpPNvy/lAovtVB/AV8rH+JaJNHP/fTjkqA3Ad6ZtZN45J
 KoHE4SoX4NT1v+zwJ2irrH1W2mPh8tNTYvZINPcLC/nX2UzYoNjiIFLRCMSe003M
 ybNjvv6VUHPk+9JAWsVt5pjDLu5E1EmXakXv5mvGaIVr0ljNUPCwhFip20YMpVfo
 17t6MezmeqwGbrJgMpJyPOSsghaA68lzuzVVyAFFoxqlGLZ5rgtXTmK4O4NsyZfr
 EMkwNR1IDt7fVXUkHy4X/8f9V8Wwmmwp8bSY4rTTgA4hg3w0w4FCX+uNOWHagkaS
 8vWWTJBSvJKJwLUfWhKVHIaiUEkFEhmnUQPjqlfSxc+mQgxJcK1djgdVkVxSudrp
 l0xdDG0WTWiO0zniIXbIlZ7tCeUgL1kcovZmDIA6em+HSipryvSFdYT+h7VKgzzv
 XTJvdXKMSiqMvXoT2BRYkmWVeuUBhJ1EptkGidZBgTZ7EFfuGnhBCRgq9YSaWnak
 2SBvgjxKQzyxVpqWllOsksRg2/fSl9vdlGK3KjyGW1pAwrZD/zbmG/ZqH2MVOfjt
 LdswuwKd25pYpamYZqrCyJtIZlTSUrWpasaX1P28gs0uRCuFaiY=
 =q3Ic
 -----END PGP SIGNATURE-----

Merge tag 'for-6.10-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "A fix for fast fsync that needs to handle errors during writes after
  some COW failure so it does not lead to an inconsistent state"

* tag 'for-6.10-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: ensure fast fsync waits for ordered extents after a write failure
2024-06-05 11:28:25 -07:00
Linus Torvalds e20b269d73 bcachefs fixes for 6.10-rc3
Just a few small fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmZgfSQACgkQE6szbY3K
 bnaNwg/+ItbzvZqECfZqmSnPbcOXMKnuyAqnARRTn4GiAH3sJ8PEZPyPqeEyKGt+
 xWRG9ROIiwyVkNA3k/AuORKcwdzkEZyASQX8wPnHZdUXaGDmeRte/bhSgiqZJOi6
 0i98eISFGIjyVcAdYQoFN/eGO+k4ATgU5P0Kn1YfXwgGUbwHbXyPCHcm44G6wLP7
 FXsHEVBlexvPDVTle2i9LIwP9oNLBwycQalyg83zyNnSHe3zhP8ddUSCBGJZVxum
 hyXfUch6KjkDKFiFvL/kRpZUgsG0d5/ouoBV8LDfz7iFDEBnsscAeUcUvL00gKzN
 A4rErDL80tarbx0D5Wdp5JsxDWyuklwbXc6bIY3rhhIFn/JZgyqQWGqpdwQy+T4B
 4qeyDCLcITiosMT5K9fSPLc+DwkOqSOwASitmAQuHPDRzXk466bjJmUZBy3fvJv0
 w9px2adgP+3/wlYJMEIcQx3vh/m2t3QDs8Ut3xZm10g6Fu0x92/QiOVGB6yxQHlg
 txYRTpLvb5DZ9dZUlyYmA8d7HG4828DrFNdmtkFx+1STPpH1ZYkoN6IYwQQAl+kQ
 po63h55xS8CLUrTplcwBAiS+fDWMO2hZ1TTXjAGeNF19A1PygFQARGROcHGYi9Jp
 5L2xdYWDdf9YYBxMDyDgXkRGtYEv8b7sJTHEGb7SQaaUMHU7Mo4=
 =ODoL
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2024-06-05' of https://evilpiepirate.org/git/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "Just a few small fixes"

* tag 'bcachefs-2024-06-05' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: Fix trans->locked assert
  bcachefs: Rereplicate now moves data off of durability=0 devices
  bcachefs: Fix GFP_KERNEL allocation in break_cycle()
2024-06-05 11:25:41 -07:00
Jens Axboe 27d024235b nvme fixes Linux 6.10
- Use reserved tags for special fabrics operations (Chunguang)
  - Persistent Reservation status masking fix (Weiwen)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE3Fbyvv+648XNRdHTPe3zGtjzRgkFAmZgpvoACgkQPe3zGtjz
 RgnwNw//ZFcPzB9nLJT3r4qpvg5MfAJgQ8T/jE21kkHZWO+IZe1PozkNk8pKA3VX
 oGviG7K5FayNb2wtrNNdGuJB+ODZFrcxOGpEvihWKUkZguHfKvBTYfbbtiiari4o
 pVdm/u9cTI2uM2K1ah+y7LnAe4NAbbCFJzNGIpsSN6JIrUvftkJ4ZDwgW2gpSzxs
 Dbgpr1y5vO8mKIWAWoslRYBCLP2GK5MAnomdLSqlQfQPI8ELZcGpscXFUgVDjiBP
 QiOKbvXxUehYVjErmHPaTunbORyYSXxpf1NYKeat9Sq71L/hU8KJyObrzcYiul/j
 U33hHtagcKgcbMKPW7gwl9LvBxkgSPN5StcU/DZ6dNucXAP/jtTtgvDM56Q4Hpkb
 9kP5FdPanVtmUH3nOkptK9LAdlxCZ49+AVpBhTD9BehyXt56AR62sq80+CEYPefN
 jfdAE8+rg4r8ML88hN3QaNtKeswXaZSrGB9A8nGE+jMpdZCqxp/h0LftYYKOcHyz
 SZRIB0Yf4/pvJMuV3g1hqXFJMbKQA6sgBDfC9qyPhF/FWufLa3DFP5CIdN12pa3d
 7BhcvhmbxO6aOxpHqsfEzfUduoWyRfswHlSfM24JG8F7HuGllJhaWeaTzK+TS2OD
 oAgkQO0c297BEZIkmtIJ9ySlUoyjD8zBA/Msr31JcNSfl3xWm3g=
 =YlZu
 -----END PGP SIGNATURE-----

Merge tag 'nvme-6.10-2024-06-05' of git://git.infradead.org/nvme into block-6.10

Pull NVMe fixes from Keith:

"nvme fixes Linux 6.10

 - Use reserved tags for special fabrics operations (Chunguang)
 - Persistent Reservation status masking fix (Weiwen)"

* tag 'nvme-6.10-2024-06-05' of git://git.infradead.org/nvme:
  nvme: fix nvme_pr_* status code parsing
  nvme-fabrics: use reserved tag for reg read/write command
2024-06-05 12:13:00 -06:00
Andreas Hindborg c462ecd659 null_blk: fix validation of block size
Block size should be between 512 and PAGE_SIZE and be a power of 2. The current
check does not validate this, so update the check.

Without this patch, null_blk would Oops due to a null pointer deref when
loaded with bs=1536 [1].

Link: https://lore.kernel.org/all/87wmn8mocd.fsf@metaspace.dk/

Signed-off-by: Andreas Hindborg <a.hindborg@samsung.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240603192645.977968-1-nmi@metaspace.dk
[axboe: remove unnecessary braces and != 0 check]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-05 12:12:54 -06:00
Tasos Sahanidis c6c4dd5401 drm/amdgpu/pptable: Fix UBSAN array-index-out-of-bounds
Flexible arrays used [1] instead of []. Replace the former with the latter
to resolve multiple UBSAN warnings observed on boot with a BONAIRE card.

In addition, use the __counted_by attribute where possible to hint the
length of the arrays to the compiler and any sanitizers.

Signed-off-by: Tasos Sahanidis <tasos@tasossah.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 13:43:34 -04:00
Mario Limonciello 267cace556 drm/amd: Fix shutdown (again) on some SMU v13.0.4/11 platforms
commit cd94d1b182 ("dm/amd/pm: Fix problems with reboot/shutdown for
some SMU 13.0.4/13.0.11 users") attempted to fix shutdown issues
that were reported since commit 31729e8c21 ("drm/amd/pm: fixes a
random hang in S4 for SMU v13.0.4/11") but caused issues for some
people.

Adjust the workaround flow to properly only apply in the S4 case:
-> For shutdown go through SMU_MSG_PrepareMp1ForUnload
-> For S4 go through SMU_MSG_GfxDeviceDriverReset and
   SMU_MSG_PrepareMp1ForUnload

Reported-and-tested-by: lectrode <electrodexsnet@gmail.com>
Closes: https://github.com/void-linux/void-packages/issues/50417
Cc: stable@vger.kernel.org
Fixes: cd94d1b182 ("dm/amd/pm: Fix problems with reboot/shutdown for some SMU 13.0.4/13.0.11 users")
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 13:41:56 -04:00
Linus Torvalds 558dc49aac This should have been my second pull request during the merge window but
one dependency in the drm subsystem fell through the cracks and was only
 applied for rc2. Now, we can finally remove I2C_CLASS_SPD.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmZgEAUACgkQFA3kzBSg
 KbYYuRAAqM7zFUhT/GADWy3+atgsoTCaYfVuvsbihr9xiwBgsJy5x0phZDjfOksK
 utr2neui7SYZKuSYhXKq9jJqwxfDRvfjfz1gbzyVEUpugXNbKs49R2eeHgzfqVQf
 S+SiQsea4TQjl6fD6+Evacyf/OhkVQTnZ9exK+JPl18pUWXLduq0b/E2/VhhgaCn
 TyUXZqx31BW0dIvtR0EnB4IxVReYa3n/rSWn05UGOCK51hWZ0jRWUWD71y0808wl
 gdR0gP5XWKRPxNHt8YwhzfgE117fWQ2j/zaAmoYnXScHyDQMaJ/JvMczyl0Sip9Y
 y2qzUgy6XXc6V70s7Lfxc+XFmjSmlDAEs8aPlaGzjiW4rjDQ6ahFjtv0VNPR+zFM
 uLUlI8XO7DjHuWLGcnzLKPVeVFrTgoi3kOflDuLkpihNrjPudKsl2rEeTN7XIQkD
 liWYQJBcoOiyDXshcdLvXbad2YlHl3kOb9l0qCg+4DJ7Lf1TuFCcitf/cDZl2oAR
 WtqK2WAIN0UDArN6oGawTUE99Rop2pLAv27wtVvHKCvq8nL3LnDkG7/IYdxoDfDF
 77VP+OKQ1UIbe0CLSU2CcTMBK6r2a9fI0+eGLA9/8w+WEpJXRIuekqd/9ysEtkYR
 qWjFXMYDqWT/pcos2hqZIBgMK78w+/excq2dMWPXpInsUG7mBvM=
 =yBu4
 -----END PGP SIGNATURE-----

Merge tag 'i2c-for-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "This should have been my second pull request during the merge window
  but one dependency in the drm subsystem fell through the cracks and
  was only applied for rc2.

  Now we can finally remove I2C_CLASS_SPD"

* tag 'i2c-for-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: Remove I2C_CLASS_SPD
  i2c: synquacer: Remove a clk reference from struct synquacer_i2c
2024-06-05 10:32:20 -07:00
Linus Torvalds 208d9b65c0 Hi,
The bug fix for tpm_tis_core_init() is not that critical but still makes
 sense to get into release for the sake of better quality. I included the
 Intel CPU model define change mainly to help Tony just a bit, as for this
 subsystem it cannot realistically speaking cause any possible harm.
 
 BR, Jarkko
 -----BEGIN PGP SIGNATURE-----
 
 iJYEABYKAD4WIQRE6pSOnaBC00OEHEIaerohdGur0gUCZl/I3iAcamFya2tvLnNh
 a2tpbmVuQGxpbnV4LmludGVsLmNvbQAKCRAaerohdGur0u0gAPoCkr/3zGmVd7pB
 SabhfpG6jPWDaIH0UZiuwjMwBw1ddwD/TjeAD/zwfGhEXwaezCkyyv/lVBVVDNAA
 9OsxqWYkIQE=
 =Cryk
 -----END PGP SIGNATURE-----

Merge tag 'tpmdd-next-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull tpm fixes from Jarkko Sakkinen:
 "The bug fix for tpm_tis_core_init() is not that critical but still
  makes sense to get into release for the sake of better quality.

  I included the Intel CPU model define change mainly to help Tony just
  a bit, as for this subsystem it cannot realistically speaking cause
  any possible harm"

* tag 'tpmdd-next-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm: Switch to new Intel CPU model defines
  tpm_tis: Do *not* flush uninitialized work
2024-06-05 10:29:13 -07:00
Filipe Manana fb33eb2ef0 btrfs: fix leak of qgroup extent records after transaction abort
Qgroup extent records are created when delayed ref heads are created and
then released after accounting extents at btrfs_qgroup_account_extents(),
called during the transaction commit path.

If a transaction is aborted we free the qgroup records by calling
btrfs_qgroup_destroy_extent_records() at btrfs_destroy_delayed_refs(),
unless we don't have delayed references. We are incorrectly assuming
that no delayed references means we don't have qgroup extents records.

We can currently have no delayed references because we ran them all
during a transaction commit and the transaction was aborted after that
due to some error in the commit path.

So fix this by ensuring we btrfs_qgroup_destroy_extent_records() at
btrfs_destroy_delayed_refs() even if we don't have any delayed references.

Reported-by: syzbot+0fecc032fa134afd49df@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/0000000000004e7f980619f91835@google.com/
Fixes: 81f7eb00ff ("btrfs: destroy qgroup extent records on transaction abort")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-05 18:06:54 +02:00
Omar Sandoval 9d274c19a7 btrfs: fix crash on racing fsync and size-extending write into prealloc
We have been seeing crashes on duplicate keys in
btrfs_set_item_key_safe():

  BTRFS critical (device vdb): slot 4 key (450 108 8192) new key (450 108 8192)
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/ctree.c:2620!
  invalid opcode: 0000 [#1] PREEMPT SMP PTI
  CPU: 0 PID: 3139 Comm: xfs_io Kdump: loaded Not tainted 6.9.0 #6
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
  RIP: 0010:btrfs_set_item_key_safe+0x11f/0x290 [btrfs]

With the following stack trace:

  #0  btrfs_set_item_key_safe (fs/btrfs/ctree.c:2620:4)
  #1  btrfs_drop_extents (fs/btrfs/file.c:411:4)
  #2  log_one_extent (fs/btrfs/tree-log.c:4732:9)
  #3  btrfs_log_changed_extents (fs/btrfs/tree-log.c:4955:9)
  #4  btrfs_log_inode (fs/btrfs/tree-log.c:6626:9)
  #5  btrfs_log_inode_parent (fs/btrfs/tree-log.c:7070:8)
  #6  btrfs_log_dentry_safe (fs/btrfs/tree-log.c:7171:8)
  #7  btrfs_sync_file (fs/btrfs/file.c:1933:8)
  #8  vfs_fsync_range (fs/sync.c:188:9)
  #9  vfs_fsync (fs/sync.c:202:9)
  #10 do_fsync (fs/sync.c:212:9)
  #11 __do_sys_fdatasync (fs/sync.c:225:9)
  #12 __se_sys_fdatasync (fs/sync.c:223:1)
  #13 __x64_sys_fdatasync (fs/sync.c:223:1)
  #14 do_syscall_x64 (arch/x86/entry/common.c:52:14)
  #15 do_syscall_64 (arch/x86/entry/common.c:83:7)
  #16 entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121)

So we're logging a changed extent from fsync, which is splitting an
extent in the log tree. But this split part already exists in the tree,
triggering the BUG().

This is the state of the log tree at the time of the crash, dumped with
drgn (https://github.com/osandov/drgn/blob/main/contrib/btrfs_tree.py)
to get more details than btrfs_print_leaf() gives us:

  >>> print_extent_buffer(prog.crashed_thread().stack_trace()[0]["eb"])
  leaf 33439744 level 0 items 72 generation 9 owner 18446744073709551610
  leaf 33439744 flags 0x100000000000000
  fs uuid e5bd3946-400c-4223-8923-190ef1f18677
  chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da
          item 0 key (450 INODE_ITEM 0) itemoff 16123 itemsize 160
                  generation 7 transid 9 size 8192 nbytes 8473563889606862198
                  block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
                  sequence 204 flags 0x10(PREALLOC)
                  atime 1716417703.220000000 (2024-05-22 15:41:43)
                  ctime 1716417704.983333333 (2024-05-22 15:41:44)
                  mtime 1716417704.983333333 (2024-05-22 15:41:44)
                  otime 17592186044416.000000000 (559444-03-08 01:40:16)
          item 1 key (450 INODE_REF 256) itemoff 16110 itemsize 13
                  index 195 namelen 3 name: 193
          item 2 key (450 XATTR_ITEM 1640047104) itemoff 16073 itemsize 37
                  location key (0 UNKNOWN.0 0) type XATTR
                  transid 7 data_len 1 name_len 6
                  name: user.a
                  data a
          item 3 key (450 EXTENT_DATA 0) itemoff 16020 itemsize 53
                  generation 9 type 1 (regular)
                  extent data disk byte 303144960 nr 12288
                  extent data offset 0 nr 4096 ram 12288
                  extent compression 0 (none)
          item 4 key (450 EXTENT_DATA 4096) itemoff 15967 itemsize 53
                  generation 9 type 2 (prealloc)
                  prealloc data disk byte 303144960 nr 12288
                  prealloc data offset 4096 nr 8192
          item 5 key (450 EXTENT_DATA 8192) itemoff 15914 itemsize 53
                  generation 9 type 2 (prealloc)
                  prealloc data disk byte 303144960 nr 12288
                  prealloc data offset 8192 nr 4096
  ...

So the real problem happened earlier: notice that items 4 (4k-12k) and 5
(8k-12k) overlap. Both are prealloc extents. Item 4 straddles i_size and
item 5 starts at i_size.

Here is the state of the filesystem tree at the time of the crash:

  >>> root = prog.crashed_thread().stack_trace()[2]["inode"].root
  >>> ret, nodes, slots = btrfs_search_slot(root, BtrfsKey(450, 0, 0))
  >>> print_extent_buffer(nodes[0])
  leaf 30425088 level 0 items 184 generation 9 owner 5
  leaf 30425088 flags 0x100000000000000
  fs uuid e5bd3946-400c-4223-8923-190ef1f18677
  chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da
  	...
          item 179 key (450 INODE_ITEM 0) itemoff 4907 itemsize 160
                  generation 7 transid 7 size 4096 nbytes 12288
                  block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
                  sequence 6 flags 0x10(PREALLOC)
                  atime 1716417703.220000000 (2024-05-22 15:41:43)
                  ctime 1716417703.220000000 (2024-05-22 15:41:43)
                  mtime 1716417703.220000000 (2024-05-22 15:41:43)
                  otime 1716417703.220000000 (2024-05-22 15:41:43)
          item 180 key (450 INODE_REF 256) itemoff 4894 itemsize 13
                  index 195 namelen 3 name: 193
          item 181 key (450 XATTR_ITEM 1640047104) itemoff 4857 itemsize 37
                  location key (0 UNKNOWN.0 0) type XATTR
                  transid 7 data_len 1 name_len 6
                  name: user.a
                  data a
          item 182 key (450 EXTENT_DATA 0) itemoff 4804 itemsize 53
                  generation 9 type 1 (regular)
                  extent data disk byte 303144960 nr 12288
                  extent data offset 0 nr 8192 ram 12288
                  extent compression 0 (none)
          item 183 key (450 EXTENT_DATA 8192) itemoff 4751 itemsize 53
                  generation 9 type 2 (prealloc)
                  prealloc data disk byte 303144960 nr 12288
                  prealloc data offset 8192 nr 4096

Item 5 in the log tree corresponds to item 183 in the filesystem tree,
but nothing matches item 4. Furthermore, item 183 is the last item in
the leaf.

btrfs_log_prealloc_extents() is responsible for logging prealloc extents
beyond i_size. It first truncates any previously logged prealloc extents
that start beyond i_size. Then, it walks the filesystem tree and copies
the prealloc extent items to the log tree.

If it hits the end of a leaf, then it calls btrfs_next_leaf(), which
unlocks the tree and does another search. However, while the filesystem
tree is unlocked, an ordered extent completion may modify the tree. In
particular, it may insert an extent item that overlaps with an extent
item that was already copied to the log tree.

This may manifest in several ways depending on the exact scenario,
including an EEXIST error that is silently translated to a full sync,
overlapping items in the log tree, or this crash. This particular crash
is triggered by the following sequence of events:

- Initially, the file has i_size=4k, a regular extent from 0-4k, and a
  prealloc extent beyond i_size from 4k-12k. The prealloc extent item is
  the last item in its B-tree leaf.
- The file is fsync'd, which copies its inode item and both extent items
  to the log tree.
- An xattr is set on the file, which sets the
  BTRFS_INODE_COPY_EVERYTHING flag.
- The range 4k-8k in the file is written using direct I/O. i_size is
  extended to 8k, but the ordered extent is still in flight.
- The file is fsync'd. Since BTRFS_INODE_COPY_EVERYTHING is set, this
  calls copy_inode_items_to_log(), which calls
  btrfs_log_prealloc_extents().
- btrfs_log_prealloc_extents() finds the 4k-12k prealloc extent in the
  filesystem tree. Since it starts before i_size, it skips it. Since it
  is the last item in its B-tree leaf, it calls btrfs_next_leaf().
- btrfs_next_leaf() unlocks the path.
- The ordered extent completion runs, which converts the 4k-8k part of
  the prealloc extent to written and inserts the remaining prealloc part
  from 8k-12k.
- btrfs_next_leaf() does a search and finds the new prealloc extent
  8k-12k.
- btrfs_log_prealloc_extents() copies the 8k-12k prealloc extent into
  the log tree. Note that it overlaps with the 4k-12k prealloc extent
  that was copied to the log tree by the first fsync.
- fsync calls btrfs_log_changed_extents(), which tries to log the 4k-8k
  extent that was written.
- This tries to drop the range 4k-8k in the log tree, which requires
  adjusting the start of the 4k-12k prealloc extent in the log tree to
  8k.
- btrfs_set_item_key_safe() sees that there is already an extent
  starting at 8k in the log tree and calls BUG().

Fix this by detecting when we're about to insert an overlapping file
extent item in the log tree and truncating the part that would overlap.

CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-05 18:06:30 +02:00
Linus Torvalds 71d7b52cc3 ARM:
* Large set of FP/SVE fixes for pKVM, addressing the fallout
   from the per-CPU data rework and making sure that the host
   is not involved in the FP/SVE switching any more
 
 * Allow FEAT_BTI to be enabled with NV now that FEAT_PAUTH
   is completely supported
 
 * Fix for the respective priorities of Failed PAC, Illegal
   Execution state and Instruction Abort exceptions
 
 * Fix the handling of AArch32 instruction traps failing their
   condition code, which was broken by the introduction of
   ESR_EL2.ISS2
 
 * Allow vcpus running in AArch32 state to be restored in
   System mode
 
 * Fix AArch32 GPR restore that would lose the 64 bit state
   under some conditions
 
 RISC-V:
 
 * No need to use mask when hart-index-bits is 0
 
 * Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext()
 
 x86:
 
 * Fixes and debugging help for the #VE sanity check.  Also disable
   it by default, even for CONFIG_DEBUG_KERNEL, because it was found
   to trigger spuriously (most likely a processor erratum as the
   exact symptoms vary by generation).
 
 * Avoid WARN() when two NMIs arrive simultaneously during an NMI-disabled
   situation (GIF=0 or interrupt shadow) when the processor supports
   virtual NMI.  While generally KVM will not request an NMI window
   when virtual NMIs are supported, in this case it *does* have to
   single-step over the interrupt shadow or enable the STGI intercept,
   in order to deliver the latched second NMI.
 
 * Drop support for hand tuning APIC timer advancement from userspace.
   Since we have adaptive tuning, and it has proved to work well,
   drop the module parameter for manual configuration and with it a
   few stupid bugs that it had.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmZgQcYUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPvuAf/XbP9hcOeCv1lm6Mo+FjQbDAZKqqa
 RYVMd7FBAF7Y8LqWyp/QqhMz3bYX3et5y9hsdN2647FNyKORTlcqSDk4p6SlcYhG
 g2VIcJ7yaxSKBRz7UujhNsVXt9JcNafAJEGzLz3lFpwW6F/QKepkRTmnOSXOW/k+
 QPxuTRdRGzHSWAkmN+VFpxscqUEXV/+DbXaNbrSJsTfJXNDnzR5ESXHn3GjOooeC
 DtoUrPjfbPpT/+YpRxOij2Y9NZPgAnTFv/ji6UuC05SEIqFytLT7cTKVuwDRjLTg
 xP1fe3U5P8YuI7dYfcmVdLneovjl1mJfbnfrbkHNRsv4JIriCZK5mKdODg==
 =v//J
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "This is dominated by a couple large series for ARM and x86
  respectively, but apart from that things are calm.

  ARM:

   - Large set of FP/SVE fixes for pKVM, addressing the fallout from the
     per-CPU data rework and making sure that the host is not involved
     in the FP/SVE switching any more

   - Allow FEAT_BTI to be enabled with NV now that FEAT_PAUTH is
     completely supported

   - Fix for the respective priorities of Failed PAC, Illegal Execution
     state and Instruction Abort exceptions

   - Fix the handling of AArch32 instruction traps failing their
     condition code, which was broken by the introduction of
     ESR_EL2.ISS2

   - Allow vcpus running in AArch32 state to be restored in System mode

   - Fix AArch32 GPR restore that would lose the 64 bit state under some
     conditions

  RISC-V:

   - No need to use mask when hart-index-bits is 0

   - Fix incorrect reg_subtype labels in
     kvm_riscv_vcpu_set_reg_isa_ext()

  x86:

   - Fixes and debugging help for the #VE sanity check.

     Also disable it by default, even for CONFIG_DEBUG_KERNEL, because
     it was found to trigger spuriously (most likely a processor erratum
     as the exact symptoms vary by generation).

   - Avoid WARN() when two NMIs arrive simultaneously during an
     NMI-disabled situation (GIF=0 or interrupt shadow) when the
     processor supports virtual NMI.

     While generally KVM will not request an NMI window when virtual
     NMIs are supported, in this case it *does* have to single-step over
     the interrupt shadow or enable the STGI intercept, in order to
     deliver the latched second NMI.

   - Drop support for hand tuning APIC timer advancement from userspace.

     Since we have adaptive tuning, and it has proved to work well, drop
     the module parameter for manual configuration and with it a few
     stupid bugs that it had"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (32 commits)
  KVM: x86/mmu: Don't save mmu_invalidate_seq after checking private attr
  KVM: arm64: Ensure that SME controls are disabled in protected mode
  KVM: arm64: Refactor CPACR trap bit setting/clearing to use ELx format
  KVM: arm64: Consolidate initializing the host data's fpsimd_state/sve in pKVM
  KVM: arm64: Eagerly restore host fpsimd/sve state in pKVM
  KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVM
  KVM: arm64: Specialize handling of host fpsimd state on trap
  KVM: arm64: Abstract set/clear of CPTR_EL2 bits behind helper
  KVM: arm64: Fix prototype for __sve_save_state/__sve_restore_state
  KVM: arm64: Reintroduce __sve_save_state
  KVM: x86: Drop support for hand tuning APIC timer advancement from userspace
  KVM: SEV-ES: Delegate LBR virtualization to the processor
  KVM: SEV-ES: Disallow SEV-ES guests when X86_FEATURE_LBRV is absent
  KVM: SEV-ES: Prevent MSR access post VMSA encryption
  RISC-V: KVM: Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext function
  RISC-V: KVM: No need to use mask when hart-index-bit is 0
  KVM: arm64: nv: Expose BTI and CSV_frac to a guest hypervisor
  KVM: arm64: nv: Fix relative priorities of exceptions generated by ERETAx
  KVM: arm64: AArch32: Fix spurious trapping of conditional instructions
  KVM: arm64: Allow AArch32 PSTATE.M to be restored as System mode
  ...
2024-06-05 08:43:41 -07:00
Rafael J. Wysocki 9b7e7ff0fe Merge branch 'pm-cpufreq'
Merge cpufreq fixes for 6.10-rc3:

 - Fix a recently introduced unchecked HWP MSR access in the
   intel_pstate driver (Srinivas Pandruvada).

 - Add missing conversion from MHz to KHz to amd_pstate_set_boost()
   to address sysfs inteface inconsistency (Dhananjay Ugwekar).

 - Get rid of an excess global header file used by the amd-pstate
   cpufreq driver (Arnd Bergmann).

* pm-cpufreq:
  cpufreq: intel_pstate: Fix unchecked HWP MSR access
  cpufreq: amd-pstate: Fix the inconsistency in max frequency units
  cpufreq: amd-pstate: remove global header file
2024-06-05 17:11:47 +02:00
David Hildenbrand 01c51a32dc KVM: s390x: selftests: Add shared zeropage test
Let's test that we can have shared zeropages in our process as long as
storage keys are not getting used, that shared zeropages are properly
unshared (replaced by anonymous pages) once storage keys are enabled,
and that no new shared zeropages are populated after storage keys
were enabled.

We require the new pagemap interface to detect the shared zeropage.

On an old kernel (zeropages always disabled):
	# ./s390x/shared_zeropage_test
	TAP version 13
	1..3
	not ok 1 Shared zeropages should be enabled
	ok 2 Shared zeropage should be gone
	ok 3 Shared zeropages should be disabled
	# Totals: pass:2 fail:1 xfail:0 xpass:0 skip:0 error:0

On a fixed kernel:
	# ./s390x/shared_zeropage_test
	TAP version 13
	1..3
	ok 1 Shared zeropages should be enabled
	ok 2 Shared zeropage should be gone
	ok 3 Shared zeropages should be disabled
	# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0

Testing of UFFDIO_ZEROPAGE can be added later.

[ agordeev: Fixed checkpatch complaint, added ucall_common.h include ]

Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Tested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20240412084329.30315-1-david@redhat.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-06-05 17:03:24 +02:00
Alexander Gordeev d38e48563c s390/crash: Do not use VM info if os_info does not have it
The virtual memory information stored in os_info area is
required for creation of the kernel image PT_LOAD program
header for kernels since commit a2ec5bec56dd ("s390/mm:
uncouple physical vs virtual address spaces").

By contrast, if such information in os_info is absent the
PT_LOAD program header should not be created.

Currently the proper PT_LOAD program header is created for
kernels that contain the virtual memory information, but
for kernels without one an invalid header of zero size is
created. That in turn leads to stand-alone dump failures.

Use OS_INFO_KASLR_OFFSET variable to check whether os_info
is present or not (same as crash and makedumpfile tools do)
and based on that create or do not create the kernel image
PT_LOAD program header.

Fixes: f4cac27dc0 ("s390/crash: Use old os_info to create PT_LOAD headers")
Tested-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Acked-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-06-05 17:03:24 +02:00
Rafael J. Wysocki 1bfc0835d4 Merge branches 'acpi-ec', 'acpi-apei' and 'pnp'
Merge ACPI EC driver fixes, an ACPI APEI fix and PNP fixes for
6.10-rc3:

 - Fix error handling during EC operation region accesses in the ACPI EC
   driver (Armin Wolf).

 - Fix a memory leak in the APEI error injection driver introduced
   during its converion to a platform driver (Dan Williams).

 - Fix build failures related to the dev_is_pnp() macro by redefining it
   as a proper function and exporting it to modules as appropriate and
   unexport pnp_bus_type which need not be exported any more (Andy
   Shevchenko).

* acpi-ec:
  ACPI: EC: Avoid returning AE_OK on errors in address space handler
  ACPI: EC: Abort address space access upon error

* acpi-apei:
  ACPI: APEI: EINJ: Fix einj_dev release leak

* pnp:
  PNP: Hide pnp_bus_type from the non-PNP code
  PNP: Make dev_is_pnp() to be a function and export it for modules
2024-06-05 16:58:09 +02:00
Kent Overstreet 319fef29e9 bcachefs: Fix trans->locked assert
in bch2_move_data_btree, we might start with the trans unlocked from a
previous loop iteration - we need a trans_begin() before iter_init().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-05 10:44:08 -04:00