Commit graph

1090323 commits

Author SHA1 Message Date
Filipe Manana 750ee45490 btrfs: fix assertion failure when logging directory key range item
When inserting a key range item (BTRFS_DIR_LOG_INDEX_KEY) while logging
a directory, we don't expect the insertion to fail with -EEXIST, because
we are holding the directory's log_mutex and we have dropped all existing
BTRFS_DIR_LOG_INDEX_KEY keys from the log tree before we started to log
the directory. However it's possible that during the logging we attempt
to insert the same BTRFS_DIR_LOG_INDEX_KEY key twice, but for this to
happen we need to race with insertions of items from other inodes in the
subvolume's tree while we are logging a directory. Here's how this can
happen:

1) We are logging a directory with inode number 1000 that has its items
   spread across 3 leaves in the subvolume's tree:

   leaf A - has index keys from the range 2 to 20 for example. The last
   item in the leaf corresponds to a dir item for index number 20. All
   these dir items were created in a past transaction.

   leaf B - has index keys from the range 22 to 100 for example. It has
   no keys from other inodes, all its keys are dir index keys for our
   directory inode number 1000. Its first key is for the dir item with
   a sequence number of 22. All these dir items were also created in a
   past transaction.

   leaf C - has index keys for our directory for the range 101 to 120 for
   example. This leaf also has items from other inodes, and its first
   item corresponds to the dir item for index number 101 for our directory
   with inode number 1000;

2) When we finish processing the items from leaf A at log_dir_items(),
   we log a BTRFS_DIR_LOG_INDEX_KEY key with an offset of 21 and a last
   offset of 21, meaning the log is authoritative for the index range
   from 21 to 21 (a single sequence number). At this point leaf B was
   not yet modified in the current transaction;

3) When we return from log_dir_items() we have released our read lock on
   leaf B, and have set *last_offset_ret to 21 (index number of the first
   item on leaf B minus 1);

4) Some other task inserts an item for other inode (inode number 1001 for
   example) into leaf C. That resulted in pushing some items from leaf C
   into leaf B, in order to make room for the new item, so now leaf B
   has dir index keys for the sequence number range from 22 to 102 and
   leaf C has the dir items for the sequence number range 103 to 120;

5) At log_directory_changes() we call log_dir_items() again, passing it
   a 'min_offset' / 'min_key' value of 22 (*last_offset_ret from step 3
   plus 1, so 21 + 1). Then btrfs_search_forward() leaves us at slot 0
   of leaf B, since leaf B was modified in the current transaction.

   We have also initialized 'last_old_dentry_offset' to 20 after calling
   btrfs_previous_item() at log_dir_items(), as it left us at the last
   item of leaf A, which refers to the dir item with sequence number 20;

6) We then call process_dir_items_leaf() to process the dir items of
   leaf B, and when we process the first item, corresponding to slot 0,
   sequence number 22, we notice the dir item was created in a past
   transaction and its sequence number is greater than the value of
   *last_old_dentry_offset + 1 (20 + 1), so we decide to log again a
   BTRFS_DIR_LOG_INDEX_KEY key with an offset of 21 and an end range
   of 21 (key.offset - 1 == 22 - 1 == 21), which results in an -EEXIST
   error from insert_dir_log_key(), as we have already inserted that
   key at step 2, triggering the assertion at process_dir_items_leaf().

The trace produced in dmesg is like the following:

assertion failed: ret != -EEXIST, in fs/btrfs/tree-log.c:3857
[198255.980839][ T7460] ------------[ cut here ]------------
[198255.981666][ T7460] kernel BUG at fs/btrfs/ctree.h:3617!
[198255.983141][ T7460] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[198255.984080][ T7460] CPU: 0 PID: 7460 Comm: repro-ghost-dir Not tainted 5.18.0-5314c78ac373-misc-next+
[198255.986027][ T7460] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[198255.988600][ T7460] RIP: 0010:assertfail.constprop.0+0x1c/0x1e
[198255.989465][ T7460] Code: 8b 4c 89 (...)
[198255.992599][ T7460] RSP: 0018:ffffc90007387188 EFLAGS: 00010282
[198255.993414][ T7460] RAX: 000000000000003d RBX: 0000000000000065 RCX: 0000000000000000
[198255.996056][ T7460] RDX: 0000000000000001 RSI: ffffffff8b62b180 RDI: fffff52000e70e24
[198255.997668][ T7460] RBP: ffffc90007387188 R08: 000000000000003d R09: ffff8881f0e16507
[198255.999199][ T7460] R10: ffffed103e1c2ca0 R11: 0000000000000001 R12: 00000000ffffffef
[198256.000683][ T7460] R13: ffff88813befc630 R14: ffff888116c16e70 R15: ffffc90007387358
[198256.007082][ T7460] FS:  00007fc7f7c24640(0000) GS:ffff8881f0c00000(0000) knlGS:0000000000000000
[198256.009939][ T7460] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[198256.014133][ T7460] CR2: 0000560bb16d0b78 CR3: 0000000140b34005 CR4: 0000000000170ef0
[198256.015239][ T7460] Call Trace:
[198256.015674][ T7460]  <TASK>
[198256.016313][ T7460]  log_dir_items.cold+0x16/0x2c
[198256.018858][ T7460]  ? replay_one_extent+0xbf0/0xbf0
[198256.025932][ T7460]  ? release_extent_buffer+0x1d2/0x270
[198256.029658][ T7460]  ? rcu_read_lock_sched_held+0x16/0x80
[198256.031114][ T7460]  ? lock_acquired+0xbe/0x660
[198256.032633][ T7460]  ? rcu_read_lock_sched_held+0x16/0x80
[198256.034386][ T7460]  ? lock_release+0xcf/0x8a0
[198256.036152][ T7460]  log_directory_changes+0xf9/0x170
[198256.036993][ T7460]  ? log_dir_items+0xba0/0xba0
[198256.037661][ T7460]  ? do_raw_write_unlock+0x7d/0xe0
[198256.038680][ T7460]  btrfs_log_inode+0x233b/0x26d0
[198256.041294][ T7460]  ? log_directory_changes+0x170/0x170
[198256.042864][ T7460]  ? btrfs_attach_transaction_barrier+0x60/0x60
[198256.045130][ T7460]  ? rcu_read_lock_sched_held+0x16/0x80
[198256.046568][ T7460]  ? lock_release+0xcf/0x8a0
[198256.047504][ T7460]  ? lock_downgrade+0x420/0x420
[198256.048712][ T7460]  ? ilookup5_nowait+0x81/0xa0
[198256.049747][ T7460]  ? lock_downgrade+0x420/0x420
[198256.050652][ T7460]  ? do_raw_spin_unlock+0xa9/0x100
[198256.051618][ T7460]  ? __might_resched+0x128/0x1c0
[198256.052511][ T7460]  ? __might_sleep+0x66/0xc0
[198256.053442][ T7460]  ? __kasan_check_read+0x11/0x20
[198256.054251][ T7460]  ? iget5_locked+0xbd/0x150
[198256.054986][ T7460]  ? run_delayed_iput_locked+0x110/0x110
[198256.055929][ T7460]  ? btrfs_iget+0xc7/0x150
[198256.056630][ T7460]  ? btrfs_orphan_cleanup+0x4a0/0x4a0
[198256.057502][ T7460]  ? free_extent_buffer+0x13/0x20
[198256.058322][ T7460]  btrfs_log_inode+0x2654/0x26d0
[198256.059137][ T7460]  ? log_directory_changes+0x170/0x170
[198256.060020][ T7460]  ? rcu_read_lock_sched_held+0x16/0x80
[198256.060930][ T7460]  ? rcu_read_lock_sched_held+0x16/0x80
[198256.061905][ T7460]  ? lock_contended+0x770/0x770
[198256.062682][ T7460]  ? btrfs_log_inode_parent+0xd04/0x1750
[198256.063582][ T7460]  ? lock_downgrade+0x420/0x420
[198256.064432][ T7460]  ? preempt_count_sub+0x18/0xc0
[198256.065550][ T7460]  ? __mutex_lock+0x580/0xdc0
[198256.066654][ T7460]  ? stack_trace_save+0x94/0xc0
[198256.068008][ T7460]  ? __kasan_check_write+0x14/0x20
[198256.072149][ T7460]  ? __mutex_unlock_slowpath+0x12a/0x430
[198256.073145][ T7460]  ? mutex_lock_io_nested+0xcd0/0xcd0
[198256.074341][ T7460]  ? wait_for_completion_io_timeout+0x20/0x20
[198256.075345][ T7460]  ? lock_downgrade+0x420/0x420
[198256.076142][ T7460]  ? lock_contended+0x770/0x770
[198256.076939][ T7460]  ? do_raw_spin_lock+0x1c0/0x1c0
[198256.078401][ T7460]  ? btrfs_sync_file+0x5e6/0xa40
[198256.080598][ T7460]  btrfs_log_inode_parent+0x523/0x1750
[198256.081991][ T7460]  ? wait_current_trans+0xc8/0x240
[198256.083320][ T7460]  ? lock_downgrade+0x420/0x420
[198256.085450][ T7460]  ? btrfs_end_log_trans+0x70/0x70
[198256.086362][ T7460]  ? rcu_read_lock_sched_held+0x16/0x80
[198256.087544][ T7460]  ? lock_release+0xcf/0x8a0
[198256.088305][ T7460]  ? lock_downgrade+0x420/0x420
[198256.090375][ T7460]  ? dget_parent+0x8e/0x300
[198256.093538][ T7460]  ? do_raw_spin_lock+0x1c0/0x1c0
[198256.094918][ T7460]  ? lock_downgrade+0x420/0x420
[198256.097815][ T7460]  ? do_raw_spin_unlock+0xa9/0x100
[198256.101822][ T7460]  ? dget_parent+0xb7/0x300
[198256.103345][ T7460]  btrfs_log_dentry_safe+0x48/0x60
[198256.105052][ T7460]  btrfs_sync_file+0x629/0xa40
[198256.106829][ T7460]  ? start_ordered_ops.constprop.0+0x120/0x120
[198256.109655][ T7460]  ? __fget_files+0x161/0x230
[198256.110760][ T7460]  vfs_fsync_range+0x6d/0x110
[198256.111923][ T7460]  ? start_ordered_ops.constprop.0+0x120/0x120
[198256.113556][ T7460]  __x64_sys_fsync+0x45/0x70
[198256.114323][ T7460]  do_syscall_64+0x5c/0xc0
[198256.115084][ T7460]  ? syscall_exit_to_user_mode+0x3b/0x50
[198256.116030][ T7460]  ? do_syscall_64+0x69/0xc0
[198256.116768][ T7460]  ? do_syscall_64+0x69/0xc0
[198256.117555][ T7460]  ? do_syscall_64+0x69/0xc0
[198256.118324][ T7460]  ? sysvec_call_function_single+0x57/0xc0
[198256.119308][ T7460]  ? asm_sysvec_call_function_single+0xa/0x20
[198256.120363][ T7460]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[198256.121334][ T7460] RIP: 0033:0x7fc7fe97b6ab
[198256.122067][ T7460] Code: 0f 05 48 (...)
[198256.125198][ T7460] RSP: 002b:00007fc7f7c23950 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[198256.126568][ T7460] RAX: ffffffffffffffda RBX: 00007fc7f7c239f0 RCX: 00007fc7fe97b6ab
[198256.127942][ T7460] RDX: 0000000000000002 RSI: 000056167536bcf0 RDI: 0000000000000004
[198256.129302][ T7460] RBP: 0000000000000004 R08: 0000000000000000 R09: 000000007ffffeb8
[198256.130670][ T7460] R10: 00000000000001ff R11: 0000000000000293 R12: 0000000000000001
[198256.132046][ T7460] R13: 0000561674ca8140 R14: 00007fc7f7c239d0 R15: 000056167536dab8
[198256.133403][ T7460]  </TASK>

Fix this by treating -EEXIST as expected at insert_dir_log_key() and have
it update the item with an end offset corresponding to the maximum between
the previously logged end offset and the new requested end offset. The end
offsets may be different due to dir index key deletions that happened as
part of unlink operations while we are logging a directory (triggered when
fsyncing some other inode parented by the directory) or during renames
which always attempt to log a single dir index deletion.

Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Link: https://lore.kernel.org/linux-btrfs/YmyefE9mc2xl5ZMz@hungrycats.org/
Fixes: 732d591a5d ("btrfs: stop copying old dir items when logging a directory")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-05 21:05:56 +02:00
Naohiro Aota ceb4f60830 btrfs: zoned: activate block group properly on unlimited active zone device
btrfs_zone_activate() checks if it activated all the underlying zones in
the loop. However, that check never hit on an unlimited activate zone
device (max_active_zones == 0).

Fortunately, it still works without ENOSPC because btrfs_zone_activate()
returns true in the end, even if block_group->zone_is_active == 0. But, it
is confusing to have non zone_is_active block group still usable for
allocation. Also, we are wasting CPU time to iterate the loop every time
btrfs_zone_activate() is called for the blog groups.

Since error case in the loop is handled by out_unlock, we can just set
zone_is_active and do the list stuff after the loop.

Fixes: f9a912a3c4 ("btrfs: zoned: make zone activation multi stripe capable")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-05 21:05:56 +02:00
Naohiro Aota 549577127a btrfs: zoned: move non-changing condition check out of the loop
btrfs_zone_activate() checks if block_group->alloc_offset ==
block_group->zone_capacity every time it iterates the loop. But, it is
not depending on the index. Move out the check and do it only once.

Fixes: f9a912a3c4 ("btrfs: zoned: make zone activation multi stripe capable")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-05 21:05:56 +02:00
Qu Wenruo 9f73f1aef9 btrfs: force v2 space cache usage for subpage mount
[BUG]
For a 4K sector sized btrfs with v1 cache enabled and only mounted on
systems with 4K page size, if it's mounted on subpage (64K page size)
systems, it can cause the following warning on v1 space cache:

 BTRFS error (device dm-1): csum mismatch on free space cache
 BTRFS warning (device dm-1): failed to load free space cache for block group 84082688, rebuilding it now

Although not a big deal, as kernel can rebuild it without problem, such
warning will bother end users, especially if they want to switch the
same btrfs seamlessly between different page sized systems.

[CAUSE]
V1 free space cache is still using fixed PAGE_SIZE for various bitmap,
like BITS_PER_BITMAP.

Such hard-coded PAGE_SIZE usage will cause various mismatch, from v1
cache size to checksum.

Thus kernel will always reject v1 cache with a different PAGE_SIZE with
csum mismatch.

[FIX]
Although we should fix v1 cache, it's already going to be marked
deprecated soon.

And we have v2 cache based on metadata (which is already fully subpage
compatible), and it has almost everything superior than v1 cache.

So just force subpage mount to use v2 cache on mount.

Reported-by: Matt Corallo <blnxfsl@bluematt.me>
CC: stable@vger.kernel.org # 5.15+
Link: https://lore.kernel.org/linux-btrfs/61aa27d1-30fc-c1a9-f0f4-9df544395ec3@bluematt.me/
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-05 21:05:56 +02:00
Linus Torvalds 0f5d752b13 s390 updates for 5.18-rc6
- Disable -Warray-bounds warning for gcc12, since there known way to
   workaround false positive warnings on lowcore accesses would result
   in worse code on fast paths.
 
 - Avoid lockdep_assert_held() warning in kvm vm memop code.
 
 - Reduce overhead within gmap_rmap code to get rid of long latencies
   when e.g. shutting down 2nd level guests.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmJzvigACgkQIg7DeRsp
 bsLM7A/7BFCekdwNQSQMHPzEo4auzowIRDVbj+EE5MyoSoi+t5tQ67fgpZPYLqAZ
 W3PFxrX0jxIFBpy2NuCzwPjEwDnApqKlmwiwHJ1euXxRDSoOsX4sl/l9sore5eaK
 NCo5VC9d1y0sSCS6hQ+SWLU9jDImFdBSdPMpNQw2SWwN1MCWHKoE996XJ5VwkDau
 5s21nM3uO43yZdLaGlfTAdoBvIDHC0UX5NAto0W988s/eReBnoKudL6ZRflbbW5H
 /dao1oyN90adTnSrj1BMl182Cx8OyQzeMtud0vud7hYzmxO/SxWc5doLKv/cuXkx
 fsPnJ7smnQw2By5xeEt4xj10amLU6c+tXI+PS0YdxBP7odz3dcXBI1Bt6yDO0NH/
 LotbA9v/D4VhTXOysK9fnlKI/7cKDNt/kE5kBTyGtSb4AfL2LRnYwRvoCnfjJ57j
 gBNa48bTLfX5Nz6BFLDzOAsLPoGaKT6Eun7l3iaK864pGBCimvpdM1gNghzfIJSY
 2C6cJxqoCDXYWFt4TWaZaGPs1J2DI6AtucIA/FlMmV7YqYyOIJxUh/j3fh8ln8+/
 eCg1CQwj3IIsnkA6lQVc7Ne01ita9m8kTd1Ep6o5xqQXg46FcGOuJjBYkKbCzVxX
 kG9pjg4ATU8kgGH3hvRa9Wy3s/w+AyKfrbht/M/GFk289ski0yM=
 =KjLw
 -----END PGP SIGNATURE-----

Merge tag 's390-5.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Heiko Carstens:

 - Disable -Warray-bounds warning for gcc12, since the only known way to
   workaround false positive warnings on lowcore accesses would result
   in worse code on fast paths.

 - Avoid lockdep_assert_held() warning in kvm vm memop code.

 - Reduce overhead within gmap_rmap code to get rid of long latencies
   when e.g. shutting down 2nd level guests.

* tag 's390-5.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  KVM: s390: vsie/gmap: reduce gmap_rmap overhead
  KVM: s390: Fix lockdep issue in vm memop
  s390: disable -Warray-bounds
2022-05-05 10:38:11 -07:00
Linus Torvalds 905a6537e7 Extend R4000/R4400 CPU erratum workaround to all revisions
-----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmJzuqcaHHRzYm9nZW5k
 QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHAkhBAAlg6wkfx1OdtCxC3iERqF
 pNh4OWpvGc/N3aTj8cnEl5XnyZo9cIKdt+HU1ayr46dFbCJQW5t/hEYjre5Dm9xO
 9+9mRpoB/y4xpTAJCT+ffNPwtp3wK/w3HCG+DAKFAzI5S0Bz3LOg8x098IfgqrcO
 xVt/Kc15V+dxGstUSvvJjUmlxzIbN0BceGk//MRIm/dTo2Uu/Np3NjII5CUkNHvw
 jM2sXV146lpfM1LqEYkwnplgWWSIs/BKsG1Lzztb6WSanT3w2Dg+v7KdDVksysRb
 zKuRFx/xMkGmIgblM9fIc+E3ya2MmbMP1a8jLNOlqyQQ450Z7IVGcltlxzGk9XDO
 n9HhlWuY7quK1Bl/pQIHfdx5zUk0wad+KiiI83kahupD3MaiCcubV8gnHbF1rqT8
 r5qzWkj2h7JQMVFg26r8dsExbOQs6/h7ddPEiUs+AQmwrcaAkleJwaqQguUQmDq/
 GJT1wuTpHJrNcLJA/HB1vXwlhDbNQlUe5s5BX5fG1XUxOborEupirNqbdvFAEnm4
 0/wEUgVhCaap8yqskcuLtgbNk3iheCDqU6X2GHEhctVFQ11zG2/Oz8STkC2FFm+e
 GFFVisz8Kx+uVzjs6NGd/0Pugm6uXfCQAxhenuFCE20AefTFvAIsesfexezxRkVF
 55DoLKGkrbVQzq/uKpgoGLU=
 =vhRB
 -----END PGP SIGNATURE-----

Merge tag 'mips-fixes_5.18_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fix from Thomas Bogendoerfer:
 "Extend R4000/R4400 CPU erratum workaround to all revisions"

* tag 'mips-fixes_5.18_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: Fix CP0 counter erratum detection for R4k CPUs
2022-05-05 10:27:30 -07:00
Linus Torvalds 68533eb1fb Networking fixes for 5.18-rc6, including fixes from can, rxrpc and
wireguard
 
 Previous releases - regressions:
   - igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter()
 
   - mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter()
 
   - rds: acquire netns refcount on TCP sockets
 
   - rxrpc: enable IPv6 checksums on transport socket
 
   - nic: hinic: fix bug of wq out of bound access
 
   - nic: thunder: don't use pci_irq_vector() in atomic context
 
   - nic: bnxt_en: fix possible bnxt_open() failure caused by wrong RFS flag
 
   - nic: mlx5e:
     - lag, fix use-after-free in fib event handler
     - fix deadlock in sync reset flow
 
 Previous releases - always broken:
   - tcp: fix insufficient TCP source port randomness
 
   - can: grcan: grcan_close(): fix deadlock
 
   - nfc: reorder destructive operations in to avoid bugs
 
 Misc:
   - wireguard: improve selftests reliability
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmJznX8SHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkDrgP/R9tErvWO/uvXpNgDr6Qh8osYt5Z297l
 EWyhz7cUm4LKi6MYWrRKR4uRK9n43DK+OVws5LXrYL0tIdJH3uYBE0RS67W9WmjA
 kE2Srq1A6wUi4koiYKeYDXtodCJLC93n+QnLBfih44Pc+xmk8t+G6qZ1n45qjRss
 gzV75AlIfErmjqyYi81DaZ6Z0TV4H5qPM4ZXRViIzH+Ccyx6rk/KNqU4wepoqRSi
 lCckTvMt9V7OiYHzM5Pu1kTUV07Jtiy7xkIQMdKYXCZpyqkmqyPFMM+0B7fDOEeP
 WZnkdUwi69WMVmeefcpEn7XsoNbVadGkTQM2EcUWvrxuCeawmGxYoORvvFs0IpAX
 YkYXk1US0Sd1L2XlMaus+HLsmmx4fWnb/hWqGL/D+arZOvTCOhBQItSRmKA6d+kM
 OLfj/gh0YLBsHVrCiHUN06oopvhWuBEBAJbVFkbJCvXoFGqHigijBCVjFBVH1p4o
 L5bWVEAQ8tkFdofXw0nOe6vRCD5BGN34N5DkqC5E8mj/uLP0FVEWOISV3TzKKF5B
 mEDGZAGN5bTf/ScvbF8XEaqtdk/cxv2ohWNn9wtgoaNBorgKtpTf99pXJtxV2+fs
 3RiPM0My9uz8/wMveSfKShQntMSdnmQPMpJ4Vm0e4bOS1K0LRGUgZxOpX2/BTokq
 Iv5msx85X5/S
 =XuN7
 -----END PGP SIGNATURE-----

Merge tag 'net-5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from can, rxrpc and wireguard.

  Previous releases - regressions:

   - igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter()

   - mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter()

   - rds: acquire netns refcount on TCP sockets

   - rxrpc: enable IPv6 checksums on transport socket

   - nic: hinic: fix bug of wq out of bound access

   - nic: thunder: don't use pci_irq_vector() in atomic context

   - nic: bnxt_en: fix possible bnxt_open() failure caused by wrong RFS
     flag

   - nic: mlx5e:
      - lag, fix use-after-free in fib event handler
      - fix deadlock in sync reset flow

  Previous releases - always broken:

   - tcp: fix insufficient TCP source port randomness

   - can: grcan: grcan_close(): fix deadlock

   - nfc: reorder destructive operations in to avoid bugs

  Misc:

   - wireguard: improve selftests reliability"

* tag 'net-5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
  NFC: netlink: fix sleep in atomic bug when firmware download timeout
  selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer
  tcp: drop the hash_32() part from the index calculation
  tcp: increase source port perturb table to 2^16
  tcp: dynamically allocate the perturb table used by source ports
  tcp: add small random increments to the source port
  tcp: resalt the secret every 10 seconds
  tcp: use different parts of the port_offset for index and offset
  secure_seq: use the 64 bits of the siphash for port offset calculation
  wireguard: selftests: set panic_on_warn=1 from cmdline
  wireguard: selftests: bump package deps
  wireguard: selftests: restore support for ccache
  wireguard: selftests: use newer toolchains to fill out architectures
  wireguard: selftests: limit parallelism to $(nproc) tests at once
  wireguard: selftests: make routing loop test non-fatal
  net/mlx5: Fix matching on inner TTC
  net/mlx5: Avoid double clear or set of sync reset requested
  net/mlx5: Fix deadlock in sync reset flow
  net/mlx5e: Fix trust state reset in reload
  net/mlx5e: Avoid checking offload capability in post_parse action
  ...
2022-05-05 09:45:12 -07:00
Nobuhiro Iwamatsu 171865dab0 gpio: visconti: Fix fwnode of GPIO IRQ
The fwnode of GPIO IRQ must be set to its own fwnode, not the fwnode of the
parent IRQ. Therefore, this sets own fwnode instead of the parent IRQ fwnode to
GPIO IRQ's.

Fixes: 2ad74f40da ("gpio: visconti: Add Toshiba Visconti GPIO support")
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-05-05 14:39:02 +02:00
Thomas Pfaff 8707898e22 genirq: Synchronize interrupt thread startup
A kernel hang can be observed when running setserial in a loop on a kernel
with force threaded interrupts. The sequence of events is:

   setserial
     open("/dev/ttyXXX")
       request_irq()
     do_stuff()
      -> serial interrupt
         -> wake(irq_thread)
	      desc->threads_active++;
     close()
       free_irq()
         kthread_stop(irq_thread)
     synchronize_irq() <- hangs because desc->threads_active != 0

The thread is created in request_irq() and woken up, but does not get on a
CPU to reach the actual thread function, which would handle the pending
wake-up. kthread_stop() sets the should stop condition which makes the
thread immediately exit, which in turn leaves the stale threads_active
count around.

This problem was introduced with commit 519cc8652b, which addressed a
interrupt sharing issue in the PCIe code.

Before that commit free_irq() invoked synchronize_irq(), which waits for
the hard interrupt handler and also for associated threads to complete.

To address the PCIe issue synchronize_irq() was replaced with
__synchronize_hardirq(), which only waits for the hard interrupt handler to
complete, but not for threaded handlers.

This was done under the assumption, that the interrupt thread already
reached the thread function and waits for a wake-up, which is guaranteed to
be handled before acting on the stop condition. The problematic case, that
the thread would not reach the thread function, was obviously overlooked.

Make sure that the interrupt thread is really started and reaches
thread_fn() before returning from __setup_irq().

This utilizes the existing wait queue in the interrupt descriptor. The
wait queue is unused for non-shared interrupts. For shared interrupts the
usage might cause a spurious wake-up of a waiter in synchronize_irq() or the
completion of a threaded handler might cause a spurious wake-up of the
waiter for the ready flag. Both are harmless and have no functional impact.

[ tglx: Amended changelog ]

Fixes: 519cc8652b ("genirq: Synchronize only with single thread on free_irq()")
Signed-off-by: Thomas Pfaff <tpfaff@pcs.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/552fe7b4-9224-b183-bb87-a8f36d335690@pcs.com
2022-05-05 11:54:05 +02:00
Bartosz Golaszewski 2d3535ed2c MAINTAINERS: update the GPIO git tree entry
My git tree has become the de facto main GPIO tree. Update the
MAINTAINERS file to reflect that.

Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Reported-by: Baruch Siach <baruch@tkos.co.il>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2022-05-05 10:21:25 +02:00
Duoming Zhou 4071bf121d NFC: netlink: fix sleep in atomic bug when firmware download timeout
There are sleep in atomic bug that could cause kernel panic during
firmware download process. The root cause is that nlmsg_new with
GFP_KERNEL parameter is called in fw_dnld_timeout which is a timer
handler. The call trace is shown below:

BUG: sleeping function called from invalid context at include/linux/sched/mm.h:265
Call Trace:
kmem_cache_alloc_node
__alloc_skb
nfc_genl_fw_download_done
call_timer_fn
__run_timers.part.0
run_timer_softirq
__do_softirq
...

The nlmsg_new with GFP_KERNEL parameter may sleep during memory
allocation process, and the timer handler is run as the result of
a "software interrupt" that should not call any other function
that could sleep.

This patch changes allocation mode of netlink message from GFP_KERNEL
to GFP_ATOMIC in order to prevent sleep in atomic bug. The GFP_ATOMIC
flag makes memory allocation operation could be used in atomic context.

Fixes: 9674da8759 ("NFC: Add firmware upload netlink command")
Fixes: 9ea7187c53 ("NFC: netlink: Rename CMD_FW_UPLOAD to CMD_FW_DOWNLOAD")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220504055847.38026-1-duoming@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-05 10:18:15 +02:00
Matthew Wilcox (Oracle) b9ff43dd27 mm/readahead: Fix readahead with large folios
Reading 100KB chunks from a big file (eg dd bs=100K) leads to poor
readahead behaviour.  Studying the traces in detail, I noticed two
problems.

The first is that we were setting the readahead flag on the folio which
contains the last byte read from the block.  This is wrong because we
will trigger readahead at the end of the read without waiting to see
if a subsequent read is going to use the pages we just read.  Instead,
we need to set the readahead flag on the first folio _after_ the one
which contains the last byte that we're reading.

The second is that we were looking for the index of the folio with the
readahead flag set to exactly match the start + size - async_size.
If we've rounded this, either down (as previously) or up (as now),
we'll think we hit a folio marked as readahead by a different read,
and try to read the wrong pages.  So round the expected index to the
order of the folio we hit.

Reported-by: Guo Xuenan <guoxuenan@huawei.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-05 00:47:29 -04:00
Matthew Wilcox (Oracle) 170f37d6aa block: Do not call folio_next() on an unreferenced folio
It is unsafe to call folio_next() on a folio unless you hold a reference
on it that prevents it from being split or freed.  After returning
from the iterator, iomap calls folio_end_writeback() which may drop
the last reference to the page, or allow the page to be split.  If that
happens, the iterator will not advance far enough through the bio_vec,
leading to assertion failures like the BUG() in folio_end_writeback()
that checks we're not trying to end writeback on a page not currently
under writeback.  Other assertion failures were also seen, but they're
all explained by this one bug.

Fix the bug by remembering where the next folio starts before returning
from the iterator.  There are other ways of fixing this bug, but this
seems the simplest.

Reported-by: Darrick J. Wong <djwong@kernel.org>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Reported-by: Brian Foster <bfoster@redhat.com>
Tested-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-05 00:47:29 -04:00
Vladimir Oltean 5a7c5f70c7 selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer
As discussed here with Ido Schimmel:
https://patchwork.kernel.org/project/netdevbpf/patch/20220224102908.5255-2-jianbol@nvidia.com/

the default conform-exceed action is "reclassify", for a reason we don't
really understand.

The point is that hardware can't offload that police action, so not
specifying "conform-exceed" was always wrong, even though the command
used to work in hardware (but not in software) until the kernel started
adding validation for it.

Fix the command used by the selftest by making the policer drop on
exceed, and pass the packet to the next action (goto) on conform.

Fixes: 8cd6b020b6 ("selftests: ocelot: add some example VCAP IS1, IS2 and ES0 tc offloads")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220503121428.842906-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04 19:40:19 -07:00
Jakub Kicinski ef56248981 Merge branch 'insufficient-tcp-source-port-randomness'
Willy Tarreau says:

====================
insufficient TCP source port randomness

In a not-yet published paper, Moshe Kol, Amit Klein, and Yossi Gilad
report being able to accurately identify a client by forcing it to emit
only 40 times more connections than the number of entries in the
table_perturb[] table, which is indexed by hashing the connection tuple.
The current 2^8 setting allows them to perform that attack with only 10k
connections, which is not hard to achieve in a few seconds.

Eric, Amit and I have been working on this for a few weeks now imagining,
testing and eliminating a number of approaches that Amit and his team were
still able to break or that were found to be too risky or too expensive,
and ended up with the simple improvements in this series that resists to
the attack, doesn't degrade the performance, and preserves a reliable port
selection algorithm to avoid connection failures, including the odd/even
port selection preference that allows bind() to always find a port quickly
even under strong connect() stress.

The approach relies on several factors:
  - resalting the hash secret that's used to choose the table_perturb[]
    entry every 10 seconds to eliminate slow attacks and force the
    attacker to forget everything that was learned after this delay.
    This already eliminates most of the problem because if a client
    stays silent for more than 10 seconds there's no link between the
    previous and the next patterns, and 10s isn't yet frequent enough
    to cause too frequent repetition of a same port that may induce a
    connection failure ;

  - adding small random increments to the source port. Previously, a
    random 0 or 1 was added every 16 ports. Now a random 0 to 7 is
    added after each port. This means that with the default 32768-60999
    range, a worst case rollover happens after 1764 connections, and
    an average of 3137. This doesn't stop statistical attacks but
    requires significantly more iterations of the same attack to
    confirm a guess.

  - increasing the table_perturb[] size from 2^8 to 2^16, which Amit
    says will require 2.6 million connections to be attacked with the
    changes above, making it pointless to get a fingerprint that will
    only last 10 seconds. Due to the size, the table was made dynamic.

  - a few minor improvements on the bits used from the hash, to eliminate
    some unfortunate correlations that may possibly have been exploited
    to design future attack models.

These changes were tested under the most extreme conditions, up to
1.1 million connections per second to one and a few targets, showing no
performance regression, and only 2 connection failures within 13 billion,
which is less than 2^-32 and perfectly within usual values.

The series is split into small reviewable changes and was already reviewed
by Amit and Eric.
====================

Link: https://lore.kernel.org/r/20220502084614.24123-1-w@1wt.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04 19:22:35 -07:00
Willy Tarreau e8161345dd tcp: drop the hash_32() part from the index calculation
In commit 190cc82489 ("tcp: change source port randomizarion at
connect() time"), the table_perturb[] array was introduced and an
index was taken from the port_offset via hash_32(). But it turns
out that hash_32() performs a multiplication while the input here
comes from the output of SipHash in secure_seq, that is well
distributed enough to avoid the need for yet another hash.

Suggested-by: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04 19:22:33 -07:00
Willy Tarreau 4c2c8f03a5 tcp: increase source port perturb table to 2^16
Moshe Kol, Amit Klein, and Yossi Gilad reported being able to accurately
identify a client by forcing it to emit only 40 times more connections
than there are entries in the table_perturb[] table. The previous two
improvements consisting in resalting the secret every 10s and adding
randomness to each port selection only slightly improved the situation,
and the current value of 2^8 was too small as it's not very difficult
to make a client emit 10k connections in less than 10 seconds.

Thus we're increasing the perturb table from 2^8 to 2^16 so that the
same precision now requires 2.6M connections, which is more difficult in
this time frame and harder to hide as a background activity. The impact
is that the table now uses 256 kB instead of 1 kB, which could mostly
affect devices making frequent outgoing connections. However such
components usually target a small set of destinations (load balancers,
database clients, perf assessment tools), and in practice only a few
entries will be visited, like before.

A live test at 1 million connections per second showed no performance
difference from the previous value.

Reported-by: Moshe Kol <moshe.kol@mail.huji.ac.il>
Reported-by: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04 19:22:28 -07:00
Willy Tarreau e926147618 tcp: dynamically allocate the perturb table used by source ports
We'll need to further increase the size of this table and it's likely
that at some point its size will not be suitable anymore for a static
table. Let's allocate it on boot from inet_hashinfo2_init(), which is
called from tcp_init().

Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04 19:22:21 -07:00
Willy Tarreau ca7af04025 tcp: add small random increments to the source port
Here we're randomly adding between 0 and 7 random increments to the
selected source port in order to add some noise in the source port
selection that will make the next port less predictable.

With the default port range of 32768-60999 this means a worst case
reuse scenario of 14116/8=1764 connections between two consecutive
uses of the same port, with an average of 14116/4.5=3137. This code
was stressed at more than 800000 connections per second to a fixed
target with all connections closed by the client using RSTs (worst
condition) and only 2 connections failed among 13 billion, despite
the hash being reseeded every 10 seconds, indicating a perfectly
safe situation.

Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04 19:22:21 -07:00
Eric Dumazet 4dfa9b438e tcp: resalt the secret every 10 seconds
In order to limit the ability for an observer to recognize the source
ports sequence used to contact a set of destinations, we should
periodically shuffle the secret. 10 seconds looks effective enough
without causing particular issues.

Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04 19:22:21 -07:00
Willy Tarreau 9e9b70ae92 tcp: use different parts of the port_offset for index and offset
Amit Klein suggests that we use different parts of port_offset for the
table's index and the port offset so that there is no direct relation
between them.

Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04 19:22:20 -07:00
Willy Tarreau b2d057560b secure_seq: use the 64 bits of the siphash for port offset calculation
SipHash replaced MD5 in secure_ipv{4,6}_port_ephemeral() via commit
7cd23e5300 ("secure_seq: use SipHash in place of MD5"), but the output
remained truncated to 32-bit only. In order to exploit more bits from the
hash, let's make the functions return the full 64-bit of siphash_3u32().
We also make sure the port offset calculation in __inet_hash_connect()
remains done on 32-bit to avoid the need for div_u64_rem() and an extra
cost on 32-bit systems.

Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04 19:22:20 -07:00
Jakub Kicinski 205557ba99 Merge branch 'wireguard-patches-for-5-18-rc6'
Jason A. Donenfeld says:

====================
wireguard patches for 5.18-rc6

In working on some other problems, I wound up leaning on the WireGuard
CI more than usual and uncovered a few small issues with reliability.
These are fairly low key changes, since they don't impact kernel code
itself.

One change does stick out in particular, though, which is the "make
routing loop test non-fatal" commit. I'm not thrilled about doing this,
but currently [1] remains unsolved, and I'm still working on a real
solution to that (hopefully for 5.19 or 5.20 if I can come up with a
good idea...), so for now that test just prints a big red warning
instead.

[1] https://lore.kernel.org/netdev/YmszSXueTxYOC41G@zx2c4.com/
====================

Link: https://lore.kernel.org/r/20220504202920.72908-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04 17:50:00 -07:00
Jason A. Donenfeld 3fc1b11e5d wireguard: selftests: set panic_on_warn=1 from cmdline
Rather than setting this once init is running, set panic_on_warn from
the kernel command line, so that it catches splats from WireGuard
initialization code and the various crypto selftests.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04 17:49:57 -07:00
Jason A. Donenfeld a6b8ea9144 wireguard: selftests: bump package deps
Use newer, more reliable package dependencies. These should hopefully
reduce flakes. However, we keep the old iputils package, as it
accumulated bugs after resulting in flakes on slow machines.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04 17:49:57 -07:00
Jason A. Donenfeld d261ba6aa4 wireguard: selftests: restore support for ccache
When moving to non-system toolchains, we inadvertantly killed the
ability to use ccache. So instead, build ccache support into the test
harness directly.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04 17:49:56 -07:00
Jason A. Donenfeld d5d9b29bc9 wireguard: selftests: use newer toolchains to fill out architectures
Rather than relying on the system to have cross toolchains available,
simply download musl.cc's ones and use that libc.so, and then we use it
to fill in a few missing platforms, such as riscv64, riscv64, powerpc64,
and s390x.

Since riscv doesn't have a second serial port in its device description,
we have to use virtio's vport. This is actually the same situation on
ARM, but we were previously hacking QEMU up to work around this, which
required a custom QEMU. Instead just do the vport trick on ARM too.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04 17:49:56 -07:00
Jason A. Donenfeld 39f02bf1e5 wireguard: selftests: limit parallelism to $(nproc) tests at once
The parallel tests were added to catch queueing issues from multiple
cores. But what happens in reality when testing tons of processes is
that these separate threads wind up fighting with the scheduler, and we
wind up with contention in places we don't care about that decrease the
chances of hitting a bug. So just do a test with the number of CPU
cores, rather than trying to scale up arbitrarily.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04 17:49:56 -07:00
Jason A. Donenfeld ae2de669c1 wireguard: selftests: make routing loop test non-fatal
I hate to do this, but I still do not have a good solution to actually
fix this bug across architectures. So just disable it for now, so that
the CI can still deliver actionable results. This commit adds a large
red warning, so that at least the failure isn't lost forever, and
hopefully this can be revisited down the line.

Link: https://lore.kernel.org/netdev/CAHmME9pv1x6C4TNdL6648HydD8r+txpV4hTUXOBVkrapBXH4QQ@mail.gmail.com/
Link: https://lore.kernel.org/netdev/YmszSXueTxYOC41G@zx2c4.com/
Link: https://lore.kernel.org/wireguard/CAHmME9rNnBiNvBstb7MPwK-7AmAN0sOfnhdR=eeLrowWcKxaaQ@mail.gmail.com/
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04 17:49:56 -07:00
Thomas Gleixner 59f5ede3bc x86/fpu: Prevent FPU state corruption
The FPU usage related to task FPU management is either protected by
disabling interrupts (switch_to, return to user) or via fpregs_lock() which
is a wrapper around local_bh_disable(). When kernel code wants to use the
FPU then it has to check whether it is possible by calling irq_fpu_usable().

But the condition in irq_fpu_usable() is wrong. It allows FPU to be used
when:

   !in_interrupt() || interrupted_user_mode() || interrupted_kernel_fpu_idle()

The latter is checking whether some other context already uses FPU in the
kernel, but if that's not the case then it allows FPU to be used
unconditionally even if the calling context interrupted a fpregs_lock()
critical region. If that happens then the FPU state of the interrupted
context becomes corrupted.

Allow in kernel FPU usage only when no other context has in kernel FPU
usage and either the calling context is not hard interrupt context or the
hard interrupt did not interrupt a local bottomhalf disabled region.

It's hard to find a proper Fixes tag as the condition was broken in one way
or the other for a very long time and the eager/lazy FPU changes caused a
lot of churn. Picked something remotely connected from the history.

This survived undetected for quite some time as FPU usage in interrupt
context is rare, but the recent changes to the random code unearthed it at
least on a kernel which had FPU debugging enabled. There is probably a
higher rate of silent corruption as not all issues can be detected by the
FPU debugging code. This will be addressed in a subsequent change.

Fixes: 5d2bd7009f ("x86, fpu: decouple non-lazy/eager fpu restore from xsave")
Reported-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220501193102.588689270@linutronix.de
2022-05-05 02:40:19 +02:00
Bob Pearson bfdc0edd11 RDMA/rxe: Change mcg_lock to a _bh lock
rxe_mcast.c currently uses _irqsave spinlocks for rxe->mcg_lock while
rxe_recv.c uses _bh spinlocks for the same lock.

As there is no case where the mcg_lock can be taken from an IRQ, change
these all to bh locks so we don't have confusing mismatched lock types on
the same spinlock.

Fixes: 6090a0c4c7 ("RDMA/rxe: Cleanup rxe_mcast.c")
Link: https://lore.kernel.org/r/20220504202817.98247-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-04 21:29:25 -03:00
Bob Pearson a926a903b7 RDMA/rxe: Do not call dev_mc_add/del() under a spinlock
These routines were not intended to be called under a spinlock and will
throw debugging warnings:

   raw_local_irq_restore() called with IRQs enabled
   WARNING: CPU: 13 PID: 3107 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x2f/0x50
   CPU: 13 PID: 3107 Comm: python3 Tainted: G            E     5.18.0-rc1+ #7
   Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
   RIP: 0010:warn_bogus_irq_restore+0x2f/0x50
   Call Trace:
    <TASK>
    _raw_spin_unlock_irqrestore+0x75/0x80
    rxe_attach_mcast+0x304/0x480 [rdma_rxe]
    ib_attach_mcast+0x88/0xa0 [ib_core]
    ib_uverbs_attach_mcast+0x186/0x1e0 [ib_uverbs]
    ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xcd/0x140 [ib_uverbs]
    ib_uverbs_cmd_verbs+0xdb0/0xea0 [ib_uverbs]
    ib_uverbs_ioctl+0xd2/0x160 [ib_uverbs]
    do_syscall_64+0x5c/0x80
    entry_SYSCALL_64_after_hwframe+0x44/0xae

Move them out of the spinlock, it is OK if there is some races setting up
the MC reception at the ethernet layer with rbtree lookups.

Fixes: 6090a0c4c7 ("RDMA/rxe: Cleanup rxe_mcast.c")
Link: https://lore.kernel.org/r/20220504202817.98247-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-04 21:16:57 -03:00
Cheng Xu ef91271c65 RDMA/siw: Fix a condition race issue in MPA request processing
The calling of siw_cm_upcall and detaching new_cep with its listen_cep
should be atomistic semantics. Otherwise siw_reject may be called in a
temporary state, e,g, siw_cm_upcall is called but the new_cep->listen_cep
has not being cleared.

This fixes a WARN:

  WARNING: CPU: 7 PID: 201 at drivers/infiniband/sw/siw/siw_cm.c:255 siw_cep_put+0x125/0x130 [siw]
  CPU: 2 PID: 201 Comm: kworker/u16:22 Kdump: loaded Tainted: G            E     5.17.0-rc7 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  Workqueue: iw_cm_wq cm_work_handler [iw_cm]
  RIP: 0010:siw_cep_put+0x125/0x130 [siw]
  Call Trace:
   <TASK>
   siw_reject+0xac/0x180 [siw]
   iw_cm_reject+0x68/0xc0 [iw_cm]
   cm_work_handler+0x59d/0xe20 [iw_cm]
   process_one_work+0x1e2/0x3b0
   worker_thread+0x50/0x3a0
   ? rescuer_thread+0x390/0x390
   kthread+0xe5/0x110
   ? kthread_complete_and_exit+0x20/0x20
   ret_from_fork+0x1f/0x30
   </TASK>

Fixes: 6c52fdc244 ("rdma/siw: connection management")
Link: https://lore.kernel.org/r/d528d83466c44687f3872eadcb8c184528b2e2d4.1650526554.git.chengyou@linux.alibaba.com
Reported-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-04 20:59:54 -03:00
Hector Martin 5dc4630426 dt-bindings: pci: apple,pcie: Drop max-link-speed from example
We no longer use these since 111659c2a5 (and they never worked
anyway); drop them from the example to avoid confusion.

Fixes: 111659c2a5 ("arm64: dts: apple: t8103: Remove PCIe max-link-speed properties")
Signed-off-by: Hector Martin <marcan@marcan.st>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220502091308.28233-1-marcan@marcan.st
2022-05-04 17:18:27 -05:00
Rob Herring caf83e494d dt-bindings: Drop redundant 'maxItems/minItems' in if/then schemas
Another round of removing redundant minItems/maxItems when 'items' list is
specified. This time it is in if/then schemas as the meta-schema was
failing to check this case.

If a property has an 'items' list, then a 'minItems' or 'maxItems' with the
same size as the list is redundant and can be dropped. Note that is DT
schema specific behavior and not standard json-schema behavior. The tooling
will fixup the final schema adding any unspecified minItems/maxItems.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-By: Vinod Koul <vkoul@kernel.org>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> #for IIO
Link: https://lore.kernel.org/r/20220503162738.3827041-1-robh@kernel.org
2022-05-04 16:19:03 -05:00
Rob Herring b2b701b31e dt-bindings: pinctrl: Allow values for drive-push-pull and drive-open-drain
A few platforms, at91 and tegra, use drive-push-pull and
drive-open-drain with a 0 or 1 value. There's not really a need for values
as '1' should be equivalent to no value (it wasn't treated that way) and
drive-push-pull disabled is equivalent to drive-open-drain. So dropping the
value can't be done without breaking existing OSs. As we don't want new
cases, mark the case with values as deprecated.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Jonathan Hunter <jonathanh@nvidia.com>
Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
Cc: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220429194610.2741437-1-robh@kernel.org
2022-05-04 16:18:26 -05:00
Josh Poimboeuf 770fb0942c MAINTAINERS: Update Josh Poimboeuf's email address
Change to my kernel.org email address.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/1abc3de4b00dc6f915ac975a2ec29ed545d96dc4.1651687652.git.jpoimboe@redhat.com
2022-05-04 20:43:08 +02:00
Linus Torvalds a7391ad357 IOMMU Fixes for Linux v5.18-rc5
Including:
 
 	- Fix for a regression in IOMMU core code which could cause NULL-ptr
 	  dereferences
 
 	- Arm SMMU fixes for 5.18
 	  - Fix off-by-one in SMMUv3 SVA TLB invalidation
 	  - Disable large mappings to workaround nvidia erratum
 
 	- Intel VT-d fixes
 	  - Handle PCI stop marker messages in IOMMU driver to meet the
 	    requirement of I/O page fault handling framework.
 	  - Calculate a feasible mask for non-aligned page-selective IOTLB
 	    invalidation.
 
 	- Two fixes for Apple DART IOMMU driver
 	  - Fix potential NULL-ptr dereference
 	  - Set module owner
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmJyoYIACgkQK/BELZcB
 GuMLOhAAu4ZTxHmBbS/JDt0+DUKODB2z3vRDBoZxI+c6QCRvu5PhrCnR1nTmoNGg
 Falu/QlzcVmTupUOT0OLrTd4H8KKzlXJuFcVJAQh98ReSQ5zIkBX+b7ruL3+KKAU
 Y/zDy/nD5G9G1vM9/PjZsSvr2U9Vq1p3qNArEZDnd8UYaVokkC0nNsxQhAoVA43U
 WjLR9Y7wCnxLFweaa3ujzSzmAOgVcLeMBV/siaRcpu4ohwHS6kKOikVAJWlgp/pc
 bCHpfxJbBLSG7lIAagWHbC4EF8GxTl/+u1+8pw1fVXpBG4Qaj60VSiITznLqSBi4
 avTTlwwPsLQOMshmARllIDWJny7TCEJZhqxSO22ki9Em32+zMA7e2ozKp2cSwyn2
 qXFnF4o3lSpDE99qnQPHWqWXQVVabkuLgwyqqDHMg0BSKx3MPDhagXdmYh/eI9hF
 j0BeY1hkMJU+kt+bNTSKSwwsooanG8hZvsTepct9+oeIZndEbyG0oKi7nW+xK2zD
 bKDyrY8cr2MBIBcV+yiDPQoCmbegDBouxCZtkLDo4cug30IVqDWvu5efld1WkhBX
 BLJC5LI+EibhzEKBiHk6qb3Gz4nIvvbAs+oGVdPsjI2LMrICq4eJ4BFaZT8I8PfY
 QGg9iKW3SCtfwYlfm6/a2Y23k72T86zZFlgMW7fEN4pyR5SEkcc=
 =jB96
 -----END PGP SIGNATURE-----

Merge tag 'iomm-fixes-v5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:
 "IOMMU core:

   - Fix for a regression which could cause NULL-ptr dereferences

  Arm SMMU:

   - Fix off-by-one in SMMUv3 SVA TLB invalidation

   - Disable large mappings to workaround nvidia erratum

  Intel VT-d:

   - Handle PCI stop marker messages in IOMMU driver to meet the
     requirement of I/O page fault handling framework.

   - Calculate a feasible mask for non-aligned page-selective IOTLB
     invalidation.

  Apple DART IOMMU:

   - Fix potential NULL-ptr dereference

   - Set module owner"

* tag 'iomm-fixes-v5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu: Make sysfs robust for non-API groups
  iommu/dart: Add missing module owner to ops structure
  iommu/dart: check return value after calling platform_get_resource()
  iommu/vt-d: Drop stop marker messages
  iommu/vt-d: Calculate mask for non-aligned flushes
  iommu: arm-smmu: disable large page mappings for Nvidia arm-smmu
  iommu/arm-smmu-v3: Fix size calculation in arm_smmu_mm_invalidate_range()
2022-05-04 11:04:52 -07:00
Linus Torvalds 3118d7ab3f Fix some issues that were reported
This has been in for-next for a bit (longer than the times would
 indicate, I had to rebase to add some text to the headers) and these are
 fixes that need to go in.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE/Q1c5nzg9ZpmiCaGYfOMkJGb/4EFAmJyaB4ACgkQYfOMkJGb
 /4HOwBAAqdLDIo7WSYjWqmK4hUzLS5iBvgLiH41QKt/AVGkwX20YYANmlQFRShT/
 Ue08cjbCcYKVyQCLAnV7nG4RU26U+eAlLL/owH8CVLGXEND9IGahzii1Up350fhX
 EjZgoHoEiU+xTGiRs/Tsag0v5WL++78qiXi7tQX2NsWwBT+vCDMEo/+Xv9h+TKzg
 okVi78/4LATI5jZJpCkz7uRjrEIPoUs5rTkfe5p4EhBfis+pd0CIs0gIsrsxB12p
 cnD55NOJoxAS0EQnRiLakjy320GrvlHUy0VfFByEkMWKWhl4c6aE60mWI85CcGMo
 qrovIeAiGZ8r3gqwP986WJBNbSve3f9b2gNeLlai+k4loeyANUOoTx7rgGZzzqNo
 KKJd9v/wjDXW/cGF41sTYoFHpJXF/CZfm6rusYcQrC/L052sEwG3clasWsmVsmml
 suRbEYzDoDPfyOqEcLztA/OJr6bkA+/eu458W4Q75WWNfF86mM1KapZjoKNXokq+
 baE855AP4bdyoG+1lKpORk+3oOlqMWYyP5Bkmkz/JIyQAmzB8DhJZeD4SMHkMgp2
 Bb/rjm/AZRxjEX/bvQyO31mpSaqM39Yxua4qRZcmxAW+QXTYBgtE9psNxZ7BGpRJ
 dfl/siFeVMUBUnFTlB5pf49Mt9QZomcQ2wBZ029gHncWvOjCSXs=
 =f3CV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.17-2' of https://github.com/cminyard/linux-ipmi

Pull IPMI fixes from Corey Minyard:
 "Fix some issues that were reported.

  This has been in for-next for a bit (longer than the times would
  indicate, I had to rebase to add some text to the headers) and these
  are fixes that need to go in"

* tag 'for-linus-5.17-2' of https://github.com/cminyard/linux-ipmi:
  ipmi:ipmi_ipmb: Fix null-ptr-deref in ipmi_unregister_smi()
  ipmi: When handling send message responses, don't process the message
2022-05-04 11:01:31 -07:00
Harry Wentland 3dfe85fa87 drm/amd/display: Avoid reading audio pattern past AUDIO_CHANNELS_COUNT
A faulty receiver might report an erroneous channel count. We
should guard against reading beyond AUDIO_CHANNELS_COUNT as
that would overflow the dpcd_pattern_period array.

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-05-04 12:21:41 -04:00
Marek Marczykowski-Górecki 19965d8259 drm/amdgpu: do not use passthrough mode in Xen dom0
While technically Xen dom0 is a virtual machine too, it does have
access to most of the hardware so it doesn't need to be considered a
"passthrough". Commit b818a5d374 ("drm/amdgpu/gmc: use PCI BARs for
APUs in passthrough") changed how FB is accessed based on passthrough
mode. This breaks amdgpu in Xen dom0 with message like this:

    [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3

While the reason for this failure is unclear, the passthrough mode is
not really necessary in Xen dom0 anyway. So, to unbreak booting affected
kernels, disable passthrough mode in this case.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1985
Fixes: b818a5d374 ("drm/amdgpu/gmc: use PCI BARs for APUs in passthrough")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-05-04 12:20:14 -04:00
Robin Murphy 392bf51946 iommu: Make sysfs robust for non-API groups
Groups created by VFIO backends outside the core IOMMU API should never
be passed directly into the API itself, however they still expose their
standard sysfs attributes, so we can still stumble across them that way.
Take care to consider those cases before jumping into our normal
assumptions of a fully-initialised core API group.

Fixes: 3f6634d997 ("iommu: Use right way to retrieve iommu_ops")
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/86ada41986988511a8424e84746dfe9ba7f87573.1651667683.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-04 15:13:39 +02:00
Michael Ellerman 6d65028eb6 powerpc/vdso: Fix incorrect CFI in gettimeofday.S
As reported by Alan, the CFI (Call Frame Information) in the VDSO time
routines is incorrect since commit ce7d8056e3 ("powerpc/vdso: Prepare
for switching VDSO to generic C implementation.").

DWARF has a concept called the CFA (Canonical Frame Address), which on
powerpc is calculated as an offset from the stack pointer (r1). That
means when the stack pointer is changed there must be a corresponding
CFI directive to update the calculation of the CFA.

The current code is missing those directives for the changes to r1,
which prevents gdb from being able to generate a backtrace from inside
VDSO functions, eg:

  Breakpoint 1, 0x00007ffff7f804dc in __kernel_clock_gettime ()
  (gdb) bt
  #0  0x00007ffff7f804dc in __kernel_clock_gettime ()
  #1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  #2  0x00007fffffffd960 in ?? ()
  #3  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  Backtrace stopped: frame did not save the PC

Alan helpfully describes some rules for correctly maintaining the CFI information:

  1) Every adjustment to the current frame address reg (ie. r1) must be
     described, and exactly at the instruction where r1 changes. Why?
     Because stack unwinding might want to access previous frames.

  2) If a function changes LR or any non-volatile register, the save
     location for those regs must be given. The CFI can be at any
     instruction after the saves up to the point that the reg is
     changed.
     (Exception: LR save should be described before a bl. not after)

  3) If asychronous unwind info is needed then restores of LR and
     non-volatile regs must also be described. The CFI can be at any
     instruction after the reg is restored up to the point where the
     save location is (potentially) trashed.

Fix the inability to backtrace by adding CFI directives describing the
changes to r1, ie. satisfying rule 1.

Also change the information for LR to point to the copy saved on the
stack, not the value in r0 that will be overwritten by the function
call.

Finally, add CFI directives describing the save/restore of r2.

With the fix gdb can correctly back trace and navigate up and down the stack:

  Breakpoint 1, 0x00007ffff7f804dc in __kernel_clock_gettime ()
  (gdb) bt
  #0  0x00007ffff7f804dc in __kernel_clock_gettime ()
  #1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  #2  0x0000000100015b60 in gettime ()
  #3  0x000000010000c8bc in print_long_format ()
  #4  0x000000010000d180 in print_current_files ()
  #5  0x00000001000054ac in main ()
  (gdb) up
  #1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  (gdb)
  #2  0x0000000100015b60 in gettime ()
  (gdb)
  #3  0x000000010000c8bc in print_long_format ()
  (gdb)
  #4  0x000000010000d180 in print_current_files ()
  (gdb)
  #5  0x00000001000054ac in main ()
  (gdb)
  Initial frame selected; you cannot go up.
  (gdb) down
  #4  0x000000010000d180 in print_current_files ()
  (gdb)
  #3  0x000000010000c8bc in print_long_format ()
  (gdb)
  #2  0x0000000100015b60 in gettime ()
  (gdb)
  #1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  (gdb)
  #0  0x00007ffff7f804dc in __kernel_clock_gettime ()
  (gdb)

Fixes: ce7d8056e3 ("powerpc/vdso: Prepare for switching VDSO to generic C implementation.")
Cc: stable@vger.kernel.org # v5.11+
Reported-by: Alan Modra <amodra@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Link: https://lore.kernel.org/r/20220502125010.1319370-1-mpe@ellerman.id.au
2022-05-04 22:12:10 +10:00
Haren Myneni 57831bfb5e powerpc/pseries/vas: Use QoS credits from the userspace
The user can change the QoS credits dynamically with the
management console interface which notifies OS with sysfs. After
returning from the OS interface successfully, the management
console updates the hypervisor. Since the VAS capabilities in
the hypervisor is not updated when the OS gets the update,
the kernel is using the old total credits value from the
hypervisor. Fix this issue by using the new QoS credits
from the userspace instead of depending on VAS capabilities
from the hypervisor.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/76d156f8af1e03cc09369d68e0bfad0c40031bcc.camel@linux.ibm.com
2022-05-04 22:00:47 +10:00
Shaik Sajida Bhanu 3e5a8e8494 mmc: sdhci-msm: Reset GCC_SDCC_BCR register for SDHC
Reset GCC_SDCC_BCR register before every fresh initilazation. This will
reset whole SDHC-msm controller, clears the previous power control
states and avoids, software reset timeout issues as below.

[ 5.458061][ T262] mmc1: Reset 0x1 never completed.
[ 5.462454][ T262] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 5.469065][ T262] mmc1: sdhci: Sys addr: 0x00000000 | Version: 0x00007202
[ 5.475688][ T262] mmc1: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000
[ 5.482315][ T262] mmc1: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000
[ 5.488927][ T262] mmc1: sdhci: Present: 0x01f800f0 | Host ctl: 0x00000000
[ 5.495539][ T262] mmc1: sdhci: Power: 0x00000000 | Blk gap: 0x00000000
[ 5.502162][ T262] mmc1: sdhci: Wake-up: 0x00000000 | Clock: 0x00000003
[ 5.508768][ T262] mmc1: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000
[ 5.515381][ T262] mmc1: sdhci: Int enab: 0x00000000 | Sig enab: 0x00000000
[ 5.521996][ T262] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[ 5.528607][ T262] mmc1: sdhci: Caps: 0x362dc8b2 | Caps_1: 0x0000808f
[ 5.535227][ T262] mmc1: sdhci: Cmd: 0x00000000 | Max curr: 0x00000000
[ 5.541841][ T262] mmc1: sdhci: Resp[0]: 0x00000000 | Resp[1]: 0x00000000
[ 5.548454][ T262] mmc1: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000
[ 5.555079][ T262] mmc1: sdhci: Host ctl2: 0x00000000
[ 5.559651][ T262] mmc1: sdhci_msm: ----------- VENDOR REGISTER DUMP-----------
[ 5.566621][ T262] mmc1: sdhci_msm: DLL sts: 0x00000000 | DLL cfg: 0x6000642c | DLL cfg2: 0x0020a000
[ 5.575465][ T262] mmc1: sdhci_msm: DLL cfg3: 0x00000000 | DLL usr ctl: 0x00010800 | DDR cfg: 0x80040873
[ 5.584658][ T262] mmc1: sdhci_msm: Vndr func: 0x00018a9c | Vndr func2 : 0xf88218a8 Vndr func3: 0x02626040

Fixes: 0eb0d9f4de ("mmc: sdhci-msm: Initial support for Qualcomm chipsets")
Signed-off-by: Shaik Sajida Bhanu <quic_c_sbhanu@quicinc.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Tested-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1650816153-23797-1-git-send-email-quic_c_sbhanu@quicinc.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2022-05-04 12:31:55 +02:00
David S. Miller ad0724b90a mlx5-fixes-2022-05-03
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmJyJHcACgkQSD+KveBX
 +j60dQf/UFKW7VzUfUyT4+pgMZHdCQLsEkf7lgHzFjcWD3XaiBr7him3RIFl0ZrI
 0mD2AHA7YT6ePtub8Pjyqbfu4hC7rYWMDW6J2iqV7V05co/AVdWLFgxex1zEMhK8
 nD2f4nFkgs5rOeSn6tOj0ttSlXQOPmPPh5FfIGy2oiGTeNWOYG5URw1crOoNe+HG
 0Z4uAmQDXETFLgim+A8DDTrbjSY60EDT/qnpOhPISpE/EtiG8OCaPDtcsSEmF9re
 0LebIKUvx4X9fsHPvxyC0TfKvYpxSHkbz+VAx//O8+aGmHKgco07/3aeWM5Z9D/7
 ulbQJQTIwcsM3GwlAFygCjFWFqiLvQ==
 =gXIJ
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2022-05-03' of git://git.kernel.org/pub/scm/linux/kernel/g
it/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2022-05-03

This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-04 10:51:05 +01:00
Samuel Holland e9f3fb523d mmc: sunxi-mmc: Fix DMA descriptors allocated above 32 bits
Newer variants of the MMC controller support a 34-bit physical address
space by using word addresses instead of byte addresses. However, the
code truncates the DMA descriptor address to 32 bits before applying the
shift. This breaks DMA for descriptors allocated above the 32-bit limit.

Fixes: 3536b82e58 ("mmc: sunxi: add support for A100 mmc controller")
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220424231751.32053-1-samuel@sholland.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2022-05-04 11:50:12 +02:00
Hector Martin 2ac2fab529 iommu/dart: Add missing module owner to ops structure
This is required to make loading this as a module work.

Signed-off-by: Hector Martin <marcan@marcan.st>
Fixes: 46d1fb072e ("iommu/dart: Add DART iommu driver")
Reviewed-by: Sven Peter <sven@svenpeter.dev>
Link: https://lore.kernel.org/r/20220502092238.30486-1-marcan@marcan.st
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-04 10:36:25 +02:00
Fabien Parent 841e512ffb drm/bridge: ite-it6505: add missing Kconfig option select
The IT6505 is using functions provided by the DRM_DP_HELPER driver.
In order to avoid having the bridge enabled but the helper disabled,
let's add a select in order to be sure that the DP helper functions are
always available.

Fixes: b5c84a9edc ("drm/bridge: add it6505 driver")
Signed-off-by: Fabien Parent <fparent@baylibre.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220426141536.274727-1-fparent@baylibre.com
2022-05-04 10:14:16 +02:00
Mark Bloch a042d7f5bb net/mlx5: Fix matching on inner TTC
The cited commits didn't use proper matching on inner TTC
as a result distribution of encapsulated packets wasn't symmetric
between the physical ports.

Fixes: 4c71ce50d2 ("net/mlx5: Support partial TTC rules")
Fixes: 8e25a2bc66 ("net/mlx5: Lag, add support to create TTC tables for LAG port selection")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-04 00:00:07 -07:00