Commit graph

1308810 commits

Author SHA1 Message Date
Linus Torvalds f04ff5a02b five smb3 client fixes, and an immportant netfs fix for cifs write regression
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmb3FfgACgkQiiy9cAdy
 T1HLMQv/ZXkiNPvMmCHJE3rWdSBZQdGLDV1qOhxuW9y4CvenIhukwDmwOq7wjWOn
 3dQHDaqFkVTVurosozFOJK9Aw94iz2Ad9dlryMcNN+Gb4vY3d9l3AvsbqmgbZSsg
 DmdqOg1SA9NgDaHl6RFsFQQY9O5BsjkEeHBX71gZZUYMw1d6CTpFUT+wTD43L0LQ
 g8r0Ksil7edw9f5WGvu8YzB4rclR45QiTVG1OMgXmr43cvoJz3GrIPXeHNm7j5D7
 hzx6ELNviY77DPKnxSd55UGPngVms6c1qqWCOJMefsJRY5bhh3lEc+TQyX11HGft
 Kta1TQI2gI1xgueqpR2Dh/bUuprWcc6vCbNjxezXpFOiSMt6qNfGzQDE9gZAALAj
 568lRpwcMPgS9laqK4Sh9v5+Vw6E8T+FUAJKvtLRidv5iBoqI0+50mhHTBlBgiy6
 XdyiAdUGSzejcu/OiOGc9C4PvVRgf0qw9+hqqZeZAsRydKgw3QwPtJ6yW+RYu86p
 xQ5CE6zs
 =mpUs
 -----END PGP SIGNATURE-----

Merge tag '6.12rc-more-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull xmb client fixes from Steve French:

 - Noisy log message cleanup

 - Important netfs fix for cifs crash in generic/074

 - Three minor improvements to use of hashing (multichannel and mount
   improvements)

 - Fix decryption crash for large read with small esize

* tag '6.12rc-more-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: make SHA-512 TFM ephemeral
  smb: client: make HMAC-MD5 TFM ephemeral
  smb: client: stop flooding dmesg in smb2_calc_signature()
  smb: client: allocate crypto only for primary server
  smb: client: fix UAF in async decryption
  netfs: Fix write oops in generic/346 (9p) and generic/074 (cifs)
2024-09-28 08:30:27 -07:00
Kent Overstreet 3a5895e3ac bcachefs: check_subvol_path() now prints subvol root inode
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 22:32:23 -04:00
Kent Overstreet 0b0f0ad93c bcachefs: remove_backpointer() now checks if dirent points to inode
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 22:32:23 -04:00
Kent Overstreet a6508079b1 bcachefs: dirent_points_to_inode() now warns on mismatch
if an inode backpointer points to a dirent that doesn't point back,
that's an error we should warn about.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 22:32:23 -04:00
Alan Huang e057a290ef bcachefs: Fix lost wake up
If the reader acquires the read lock and then the writer enters the slow
path, while the reader proceeds to the unlock path, the following scenario
can occur without the change:

writer: pcpu_read_count(lock) return 1 (so __do_six_trylock will return 0)
reader: this_cpu_dec(*lock->readers)
reader: smp_mb()
reader: state = atomic_read(&lock->state) (there is no waiting flag set)
writer: six_set_bitmask()

then the writer will sleep forever.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 22:32:23 -04:00
Kent Overstreet d50d7a5fa4 bcachefs: Check for logged ops when clean
If we shut down successfully, there shouldn't be any logged ops to
resume.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 22:32:22 -04:00
Kent Overstreet 1c0ee43b2c bcachefs: BCH_FS_clean_recovery
Add a filesystem flag to indicate whether we did a clean recovery -
using c->sb.clean after we've got rw is incorrect, since c->sb is
updated whenever we write the superblock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 22:32:22 -04:00
Kent Overstreet 9773547b16 bcachefs: Convert disk accounting BUG_ON() to WARN_ON()
We had a bug where disk accounting keys didn't always have their version
field set in journal replay; change the BUG_ON() to a WARN(), and
exclude this case since it's now checked for elsewhere (in the bkey
validate function).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 22:32:22 -04:00
Kent Overstreet a3581ca35d bcachefs: Fix BCH_TRANS_COMMIT_skip_accounting_apply
This was added to avoid double-counting accounting keys in journal
replay. But applied incorrectly (easily done since it applies to the
transaction commit, not a particular update), it leads to skipping
in-mem accounting for real accounting updates, and failure to give them
a version number - which leads to journal replay becoming very confused
the next time around.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 22:32:20 -04:00
Kent Overstreet f8911ad88d bcachefs: Check for accounting keys with bversion=0
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:35 -04:00
Kent Overstreet cf49f8a8c2 bcachefs: rename version -> bversion
give bversions a more distinct name, to aid in grepping

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:35 -04:00
Kent Overstreet fd65378db9 bcachefs: Don't delete unlinked inodes before logged op resume
Previously, check_inode() would delete unlinked inodes if they weren't
on the deleted list - this code dating from before there was a deleted
list.

But, if we crash during a logged op (truncate or finsert/fcollapse) of
an unlinked file, logged op resume will get confused if the inode has
already been deleted - instead, just add it to the deleted list if it
needs to be there; delete_dead_inodes runs after logged op resume.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:35 -04:00
Kent Overstreet 8d65b15f8d bcachefs: Fix BCH_SB_ERRS() so we can reorder
BCH_SB_ERRS() has a field for the actual enum val so that we can reorder
to reorganize, but the way BCH_SB_ERR_MAX was defined didn't allow for
this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:35 -04:00
Kent Overstreet 5612daafb7 bcachefs: Fix fsck warnings from bkey validation
__bch2_fsck_err() warns if the current task has a btree_trans object and
it wasn't passed in, because if it has to prompt for user input it has
to be able to unlock it.

But plumbing the btree_trans through bkey_validate(), as well as
transaction restarts, is problematic - so instead make bkey fsck errors
FSCK_AUTOFIX, which doesn't need to warn.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:35 -04:00
Kent Overstreet 7c980a43e9 bcachefs: Move transaction commit path validation to as late as possible
In order to check for accounting keys with version=0, we need to run
validation after they've been assigned version numbers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:35 -04:00
Kent Overstreet 431312b59c bcachefs: Fix disk accounting attempting to mark invalid replicas entry
This fixes the following bug, where a disk accounting key has an invalid
replicas entry, and we attempt to add it to the superblock:

bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): starting version 1.12: rebalance_work_acct_fix opts=metadata_replicas=2,data_replicas=2,foreground_target=ssd,background_target=hdd,nopromote_whole_extents,verbose,fsck,fix_errors=yes
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): recovering from clean shutdown, journal seq 15211644
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): accounting_read...
accounting not marked in superblock replicas
  replicas cached: 1/1 [0], fixing
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): sb invalid before write: Invalid superblock section replicas_v0: invalid device 0 in entry cached: 1/1 [0]
replicas_v0 (size 88):
user: 2 [3 5] user: 2 [1 4] cached: 1 [2] btree: 2 [1 2] user: 2 [2 5] cached: 1 [0] cached: 1 [4] journal: 2 [1 5] user: 2 [1 2] user: 2 [2 3] user: 2 [3 4] user: 2 [4 5] cached: 1 [1] cached: 1 [3] cached: 1 [5] journal: 2 [1 2] journal: 2 [2 5] btree: 2 [2 5] user: 2 [1 3] user: 2 [1 5] user: 2 [2 4]

bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): inconsistency detected - emergency read only at journal seq 15211644
accounting not marked in superblock replicas
  replicas user: 1/1 [3], fixing
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): sb invalid before write: Invalid superblock section replicas_v0: invalid device 0 in entry cached: 1/1 [0]
replicas_v0 (size 96):
user: 2 [3 5] user: 2 [1 3] cached: 1 [2] btree: 2 [1 2] user: 2 [2 4] cached: 1 [0] cached: 1 [4] journal: 2 [1 5] user: 1 [3] user: 2 [1 5] user: 2 [3 4] user: 2 [4 5] cached: 1 [1] cached: 1 [3] cached: 1 [5] journal: 2 [1 2] journal: 2 [2 5] btree: 2 [2 5] user: 2 [1 2] user: 2 [1 4] user: 2 [2 3] user: 2 [2 5]

accounting not marked in superblock replicas
  replicas user: 1/2 [3 7], fixing
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): sb invalid before write: Invalid superblock section replicas_v0: invalid device 7 in entry user: 1/2 [3 7]
replicas_v0 (size 96):
user: 2 [3 7] user: 2 [1 3] cached: 1 [2] btree: 2 [1 2] user: 2 [2 4] cached: 1 [0] cached: 1 [4] journal: 2 [1 5] user: 1 [3] user: 2 [1 5] user: 2 [3 4] user: 2 [4 5] cached: 1 [1] cached: 1 [3] cached: 1 [5] journal: 2 [1 2] journal: 2 [2 5] btree: 2 [2 5] user: 2 [1 2] user: 2 [1 4] user: 2 [2 3] user: 2 [2 5] user: 2 [3 5]

 done
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): alloc_read... done
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): stripes_read... done
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): snapshots_read... done

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:35 -04:00
Kent Overstreet 49fd90b2cc bcachefs: Fix unlocked access to c->disk_sb.sb in bch2_replicas_entry_validate()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:35 -04:00
Kent Overstreet 9104fc1928 bcachefs: Fix accounting read + device removal
accounting read was checking if accounting replicas entries were marked
in the superblock prior to applying accounting from the journal,
which meant that a recently removed device could spuriously trigger a
"not marked in superblocked" error (when journal entries zero out the
offending counter).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:35 -04:00
Kent Overstreet 1e0272ef47 bcachefs: bch_accounting_mode
Minor refactoring - replace multiple bool arguments with an enum; prep
work for fixing a bug in accounting read.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:35 -04:00
Kent Overstreet 3672bda8f5 bcachefs: fix transaction restart handling in check_extents(), check_dirents()
Dealing with outside state within a btree transaction is always tricky.

check_extents() and check_dirents() have to accumulate counters for
i_sectors and i_nlink (for subdirectories). There were two bugs:

- transaction commit may return a restart; therefore we have to commit
  before accumulating to those counters
- get_inode_all_snapshots() may return a transaction restart, before
  updating w->last_pos; then, on the restart,
  check_i_sectors()/check_subdir_count() would see inodes that were not
  for w->last_pos

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:35 -04:00
Kent Overstreet 22a507d68e bcachefs: kill inode_walker_entry.seen_this_pos
dead code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:34 -04:00
Kent Overstreet b29c30ab48 bcachefs: Fix incorrect IS_ERR_OR_NULL usage
Returning a positive integer instead of an error code causes error paths
to become very confused.

Closes: syzbot+c0360e8367d6d8d04a66@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:34 -04:00
Hongbo Li dc5bfdf8ea bcachefs: fix the memory leak in exception case
The pointer clean points the memory allocated by kmemdup, when the
return value of bch2_sb_clean_validate_late is not zero. The memory
pointed by clean is leaked. So we should free it in this case.

Fixes: a37ad1a3ab ("bcachefs: sb-clean.c")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:34 -04:00
Hongbo Li 3125c95ea6 bcachefs: fast exit when darray_make_room failed
In downgrade_table_extra, the return value is needed. When it
return failed, we should exit immediately.

Fixes: 7773df19c3 ("bcachefs: metadata version bucket_stripe_sectors")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:34 -04:00
Kent Overstreet 951dd86e7c bcachefs: Fix iterator leak in check_subvol()
A couple small error handling fixes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:34 -04:00
Kent Overstreet 2a1df87346 bcachefs: Add snapshot to bch_inode_unpacked
this allows for various cleanups in fsck

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:34 -04:00
Diogo Jahchan Koike 40d40c6bea bcachefs: assign return error when iterating through layout
syzbot reported a null ptr deref in __copy_user [0]

In __bch2_read_super, when a corrupt backup superblock matches the
default opts offset, no error is assigned to ret and the freed superblock
gets through, possibly being assigned as the best sb in bch2_fs_open and
being later dereferenced, causing a fault. Assign EINVALID to ret when
iterating through layout.

[0]: https://syzkaller.appspot.com/bug?extid=18a5c5e8a9c856944876

Reported-by: syzbot+18a5c5e8a9c856944876@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=18a5c5e8a9c856944876
Signed-off-by: Diogo Jahchan Koike <djahchankoike@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:34 -04:00
Kent Overstreet c6040447c5 bcachefs: Fix srcu warning in check_topology
check_topology doesn't need the srcu lock and doesn't use normal btree
transactions - we can just drop the srcu lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:34 -04:00
Kent Overstreet 18c520f408 bcachefs: Fix error path in check_dirent_inode_dirent()
fsck_err() jumps to the fsck_err label when bailing out; need to make
sure bp_iter was initialized...

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:34 -04:00
Piotr Zalewski 0696a18a8c bcachefs: memset bounce buffer portion to 0 after key_sort_fix_overlapping
Zero-initialize part of allocated bounce buffer which wasn't touched by
subsequent bch2_key_sort_fix_overlapping to mitigate later uinit-value
use KMSAN bug[1].

After applying the patch reproducer still triggers stack overflow[2] but
it seems unrelated to the uninit-value use warning. After further
investigation it was found that stack overflow occurs because KMSAN adds
too many function calls[3]. Backtrace of where the stack magic number gets
smashed was added as a reply to syzkaller thread[3].

It was confirmed that task's stack magic number gets smashed after the code
path where KSMAN detects uninit-value use is executed, so it can be assumed
that it doesn't contribute in any way to uninit-value use detection.

[1] https://syzkaller.appspot.com/bug?extid=6f655a60d3244d0c6718
[2] https://lore.kernel.org/lkml/66e57e46.050a0220.115905.0002.GAE@google.com
[3] https://lore.kernel.org/all/rVaWgPULej8K7HqMPNIu8kVNyXNjjCiTB-QBtItLFBmk0alH6fV2tk4joVPk97Evnuv4ZRDd8HB5uDCkiFG6u81xKdzDj-KrtIMJSlF6Kt8=@proton.me

Reported-by: syzbot+6f655a60d3244d0c6718@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6f655a60d3244d0c6718
Fixes: ec4edd7b9d ("bcachefs: Prep work for variable size btree node buffers")
Suggested-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Piotr Zalewski <pZ010001011111@proton.me>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:34 -04:00
Kent Overstreet 51b7cc7c0f bcachefs: Improve bch2_is_inode_open() warning message
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:34 -04:00
Kent Overstreet 4a8f8fafbd bcachefs: Add extra padding in bkey_make_mut_noupdate()
This fixes a kasan splat in propagate_key_to_snapshot_leaves() -
varint_decode_fast() does reads (that it never uses) up to 7 bytes past
the end of the integer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:34 -04:00
Kent Overstreet f890c8513f bcachefs: Mark inode errors as autofix
Most or all errors will be autofix in the future, we're currently just
doing the ones that we know are well tested.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:34 -04:00
Dave Airlie e7268dd9bb amd-drm-fixes-6.12-2024-09-27:
amdgpu:
 - MES 12 fix
 - KFD fence sync fix
 - SR-IOV fixes
 - VCN 4.0.6 fix
 - SDMA 7.x fix
 - Bump driver version to note cleared VRAM support
 - SWSMU fix
 
 amdgpu:
 - CU occupancy logic fix
 - SDMA queue fix
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZvcUAQAKCRC93/aFa7yZ
 2M1SAQDloEHjf+AxBWT5GhrjPCAgmHPhwTCHF4ILZaS2UuIZ+QD/TU2p8cnzn7ZZ
 7eDeMYcZCeY7xI96SmxyAwPBbM70EAY=
 =DAqO
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-fixes-6.12-2024-09-27' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-fixes-6.12-2024-09-27:

amdgpu:
- MES 12 fix
- KFD fence sync fix
- SR-IOV fixes
- VCN 4.0.6 fix
- SDMA 7.x fix
- Bump driver version to note cleared VRAM support
- SWSMU fix

amdgpu:
- CU occupancy logic fix
- SDMA queue fix

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240927202819.2978109-1-alexander.deucher@amd.com
2024-09-28 08:42:53 +10:00
Andrey Shumilin 9cf14f5a27 fbdev: sisfb: Fix strbuf array overflow
The values of the variables xres and yres are placed in strbuf.
These variables are obtained from strbuf1.
The strbuf1 array contains digit characters
and a space if the array contains non-digit characters.
Then, when executing sprintf(strbuf, "%ux%ux8", xres, yres);
more than 16 bytes will be written to strbuf.
It is suggested to increase the size of the strbuf array to 24.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Andrey Shumilin <shum.sdl@nppct.ru>
Signed-off-by: Helge Deller <deller@gmx.de>
2024-09-28 00:42:11 +02:00
Linus Torvalds ad46e8f95e Power management fix for 6.12-rc1
Fix idle states enumeration in the intel_idle driver on platforms
 supporting multiple flavors of the C6 idle state (Artem Bityutskiy).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmb3EYUSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxGMYP/in7Rn12lx5fafFwpX2lWeRPeql3dMHL
 gRuj0vR5Ao5gXTCGVAfC707x92nXdXx8PK9TyWxYbX+ZmY/sjOZPp33Wd9R/sVeN
 rs6gsAis3symLBlSdBrDpB19lR/4frug0JglruC+G8Pf5OMdAiNcl5eEK0R/rCEv
 pKg8OYbEp460aoO0OD/BIBBY4K1jdFN7IOVbDI6TH3B1mupe8kg17sYjLXyqbddE
 eC3BKQAdmJMc5ibVLAWrwlyRgSgIhiwrYf9pehX4UHdmWcRX6EG2QdGK16+42KR3
 HzEFo6qRFs4LAAOFomnUOFBMZiQXzXbLeYVt1qNYnuJauk0V5BaypWwIjBgqopJ4
 3ysGc3A6Jksj0mvtPPqqKTn70GKv5xNmlVF2csO+a4kh5ycz7tdM5RgJ/wAOT0St
 Wx1yhxTzbVCRh5cGK2QkgB1M9LCtxIS6aNSpyCbE/s0u0D25ms70Zm229XElypP9
 R7umyUw9Biv1JspQld6+KXJh/LAnkFw+5oc4Y51P+fO2sEJ8DRIsaZ92UTb/F3Zg
 GT7mf8oryKZD0V6+/4LiIto6tzbOwde9TSpvF7R3u2uvgrSkCQuHdYSJp7qSzQ+K
 d1o8/4ARuaUXxF16fN5NXRjxrZA7cwLq83zIoSRSMU/wdpKb8GT+8ZNIxS+XuwOk
 MfXxFMytjm0C
 =jiG2
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "Fix idle states enumeration in the intel_idle driver on platforms
  supporting multiple flavors of the C6 idle state (Artem Bityutskiy)"

* tag 'pm-6.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  intel_idle: fix ACPI _CST matching for newer Xeon platforms
2024-09-27 13:30:07 -07:00
Zhang Qiao 95b873693a sched_ext: Remove redundant p->nr_cpus_allowed checker
select_rq_task() already checked that 'p->nr_cpus_allowed > 1',
'p->nr_cpus_allowed == 1' checker in scx_select_cpu_dfl() is redundant.

Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-27 10:23:45 -10:00
Tejun Heo efe231d9de sched_ext: Decouple locks in scx_ops_enable()
The enable path uses three big locks - scx_fork_rwsem, scx_cgroup_rwsem and
cpus_read_lock. Currently, the locks are grabbed together which is prone to
locking order problems.

For example, currently, there is a possible deadlock involving
scx_fork_rwsem and cpus_read_lock. cpus_read_lock has to nest inside
scx_fork_rwsem due to locking order existing in other subsystems. However,
there exists a dependency in the other direction during hotplug if hotplug
needs to fork a new task, which happens in some cases. This leads to the
following deadlock:

       scx_ops_enable()                               hotplug

                                          percpu_down_write(&cpu_hotplug_lock)
   percpu_down_write(&scx_fork_rwsem)
   block on cpu_hotplug_lock
                                          kthread_create() waits for kthreadd
					  kthreadd blocks on scx_fork_rwsem

Note that this doesn't trigger lockdep because the hotplug side dependency
bounces through kthreadd.

With the preceding scx_cgroup_enabled change, this can be solved by
decoupling cpus_read_lock, which is needed for static_key manipulations,
from the other two locks.

- Move the first block of static_key manipulations outside of scx_fork_rwsem
  and scx_cgroup_rwsem. This is now safe with the preceding
  scx_cgroup_enabled change.

- Drop scx_cgroup_rwsem and scx_fork_rwsem between the two task iteration
  blocks so that __scx_ops_enabled static_key enabling is outside the two
  rwsems.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Link: http://lkml.kernel.org/r/8cd0ec0c4c7c1bc0119e61fbef0bee9d5e24022d.camel@linux.ibm.com
2024-09-27 10:02:40 -10:00
Tejun Heo 160216568c sched_ext: Decouple locks in scx_ops_disable_workfn()
The disable path uses three big locks - scx_fork_rwsem, scx_cgroup_rwsem and
cpus_read_lock. Currently, the locks are grabbed together which is prone to
locking order problems. With the preceding scx_cgroup_enabled change, we can
decouple them:

- As cgroup disabling no longer requires modifying a static_key which
  requires cpus_read_lock(), no need to grab cpus_read_lock() before
  grabbing scx_cgroup_rwsem.

- cgroup can now be independently disabled before tasks are moved back to
  the fair class.

Relocate scx_cgroup_exit() invocation before scx_fork_rwsem is grabbed, drop
now unnecessary cpus_read_lock() and move static_key operations out of
scx_fork_rwsem. This decouples all three locks in the disable path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Link: http://lkml.kernel.org/r/8cd0ec0c4c7c1bc0119e61fbef0bee9d5e24022d.camel@linux.ibm.com
2024-09-27 10:02:40 -10:00
Tejun Heo 568894edbe sched_ext: Add scx_cgroup_enabled to gate cgroup operations and fix scx_tg_online()
If the BPF scheduler does not implement ops.cgroup_init(), scx_tg_online()
didn't set SCX_TG_INITED which meant that ops.cgroup_exit(), even if
implemented, won't be called from scx_tg_offline(). This is because
SCX_HAS_OP(cgroupt_init) is used to test both whether SCX cgroup operations
are enabled and ops.cgroup_init() exists.

Fix it by introducing a separate bool scx_cgroup_enabled to gate cgroup
operations and use SCX_HAS_OP(cgroup_init) only to test whether
ops.cgroup_init() exists. Make all cgroup operations consistently use
scx_cgroup_enabled to test whether cgroup operations are enabled.
scx_cgroup_enabled is added instead of using scx_enabled() to ease planned
locking updates.

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-27 10:02:40 -10:00
Tejun Heo 4269c603cc sched_ext: Enable scx_ops_init_task() separately
scx_ops_init_task() and the follow-up scx_ops_enable_task() in the fork path
were gated by scx_enabled() test and thus __scx_ops_enabled had to be turned
on before the first scx_ops_init_task() loop in scx_ops_enable(). However,
if an external entity causes sched_class switch before the loop is complete,
tasks which are not initialized could be switched to SCX.

The following can be reproduced by running a program which keeps toggling a
process between SCHED_OTHER and SCHED_EXT using sched_setscheduler(2).

  sched_ext: Invalid task state transition 0 -> 3 for fish[1623]
  WARNING: CPU: 1 PID: 1650 at kernel/sched/ext.c:3392 scx_ops_enable_task+0x1a1/0x200
  ...
  Sched_ext: simple (enabling)
  RIP: 0010:scx_ops_enable_task+0x1a1/0x200
  ...
   switching_to_scx+0x13/0xa0
   __sched_setscheduler+0x850/0xa50
   do_sched_setscheduler+0x104/0x1c0
   __x64_sys_sched_setscheduler+0x18/0x30
   do_syscall_64+0x7b/0x140
   entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fix it by gating scx_ops_init_task() separately using
scx_ops_init_task_enabled. __scx_ops_enabled is now set after all tasks are
finished with scx_ops_init_task().

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-27 10:02:40 -10:00
Tejun Heo 9753358a6a sched_ext: Fix SCX_TASK_INIT -> SCX_TASK_READY transitions in scx_ops_enable()
scx_ops_enable() has two task iteration loops. The first one calls
scx_ops_init_task() on every task and the latter switches the eligible ones
into SCX. The first loop left the tasks in SCX_TASK_INIT state and then the
second loop switched it into READY before switching the task into SCX.

The distinction between INIT and READY is only meaningful in the fork path
where it's used to tell whether the task finished forking so that we can
tell ops.exit_task() accordingly. Leaving task in INIT state between the two
loops is incosistent with the fork path and incorrect. The following can be
triggered by running a program which keeps toggling a task between
SCHED_OTHER and SCHED_SCX while enabling a task:

  sched_ext: Invalid task state transition 1 -> 3 for fish[1526]
  WARNING: CPU: 2 PID: 1615 at kernel/sched/ext.c:3393 scx_ops_enable_task+0x1a1/0x200
  ...
  Sched_ext: qmap (enabling+all)
  RIP: 0010:scx_ops_enable_task+0x1a1/0x200
  ...
   switching_to_scx+0x13/0xa0
   __sched_setscheduler+0x850/0xa50
   do_sched_setscheduler+0x104/0x1c0
   __x64_sys_sched_setscheduler+0x18/0x30
   do_syscall_64+0x7b/0x140
   entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fix it by transitioning to READY in the first loop right after
scx_ops_init_task() succeeds.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
2024-09-27 10:02:40 -10:00
Tejun Heo 8c2090c504 sched_ext: Initialize in bypass mode
scx_ops_enable() used preempt_disable() around the task iteration loop to
switch tasks into SCX to guarantee forward progress of the task which is
running scx_ops_enable(). However, in the gap between setting
__scx_ops_enabled and preeempt_disable(), an external entity can put tasks
including the enabling one into SCX prematurely, which can lead to
malfunctions including stalls.

The bypass mode can wrap the entire enabling operation and guarantee forward
progress no matter what the BPF scheduler does. Use the bypass mode instead
to guarantee forward progress while enabling.

While at it, release and regrab scx_tasks_lock between the two task
iteration locks in scx_ops_enable() for clarity as there is no reason to
keep holding the lock between them.

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-27 10:02:40 -10:00
Tejun Heo fc1fcebead sched_ext: Remove SCX_OPS_PREPPING
The distinction between SCX_OPS_PREPPING and SCX_OPS_ENABLING is not used
anywhere and only adds confusion. Drop SCX_OPS_PREPPING.

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-27 10:02:39 -10:00
Tejun Heo 1bbcfe620e sched_ext: Relocate check_hotplug_seq() call in scx_ops_enable()
check_hotplug_seq() is used to detect CPU hotplug event which occurred while
the BPF scheduler is being loaded so that initialization can be retried if
CPU hotplug events take place before the CPU hotplug callbacks are online.

As such, the best place to call it is in the same cpu_read_lock() section
that enables the CPU hotplug ops. Currently, it is called in the next
cpus_read_lock() block in scx_ops_enable(). The side effect of this
placement is a small window in which hotplug sequence detection can trigger
unnecessarily, which isn't critical.

Move check_hotplug_seq() invocation to the same cpus_read_lock() block as
the hotplug operation enablement to close the window and get the invocation
out of the way for planned locking updates.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
2024-09-27 10:02:39 -10:00
Linus Torvalds 12cc5240f4 This pull request contains the following changes for UML:
- Removal of dead code (TT mode leftovers, etc.)
 - Fixes for the network vector driver
 - Fixes for time-travel mode
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmb3Bf8WHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wZLED/9xfTCB2l8gbiw+vslfXNRKdt6A
 jcqC4MfZ/dxkt5X2iy9pGROBl0dRe2WAbdSJIBIQikVDaTWg4Vix7WcnMs+FTXxk
 gGlJU8xx69mh7MWH0DopJrCHYX1SxA4bxve4J7iljTzuMleYhUUJ6bdqmG9XBRBN
 M1hYWpAodR/BPaDRZtLpzdYuuo3cZPM3ESpke5GupsFEWxqRZqvdlgwdsb9RfrOe
 3HvWZUMrw+hWJg1NkQ7+ljqPjoaafGu8/ic89r44bNtNQqN+o5b1v1E792e9qCJD
 1jRXTQMtTDH+ETgYskzWEIFyMQ4WlRy/N+mKKUHUJrTwAm76zdSpxmvU2Fh8cvLy
 ofWdbtqR127WnKii7UpZLf6kXzmC6pcmLbHU78PohoOnEjk4TeMjEKw6FrcSZY51
 wGgz29mLJOZ33mh7So37bRU/x5OKkq0u+BHyrhZYiHXdcBN8R5KBSJlWvl+A+A7y
 F5VpUvqAazc6H0HazZDWtoPDJ4HpbSEbH/8G4rR3IlZ4DmqyRuOr6f4AeiEPFz2n
 VNQVivgFL59zPflo8eWsLQvK8ZaZSop05RDYRk53uMooUZUNKvhTFmIRCb6bNFT/
 c4Ycoi3qa+YQQxSUEKmrE71dOYxdT1nHvl7YgR0BGABWYt2G1j/UyTjJMJ7L49Ws
 HGpAnI4FdPELu3qIVA==
 =DHl2
 -----END PGP SIGNATURE-----

Merge tag 'uml-for-linus-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux

Pull UML updates from Richard Weinberger:

 - Removal of dead code (TT mode leftovers, etc)

 - Fixes for the network vector driver

 - Fixes for time-travel mode

* tag 'uml-for-linus-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
  um: fix time-travel syscall scheduling hack
  um: Remove outdated asm/sysrq.h header
  um: Remove the declaration of user_thread function
  um: Remove the call to SUBARCH_EXECVE1 macro
  um: Remove unused mm_fd field from mm_id
  um: Remove unused fields from thread_struct
  um: Remove the redundant newpage check in update_pte_range
  um: Remove unused kpte_clear_flush macro
  um: Remove obsoleted declaration for execute_syscall_skas
  user_mode_linux_howto_v2: add VDE vector support in doc
  vector_user: add VDE support
  um: remove ARCH_NO_PREEMPT_DYNAMIC
  um: vector: Fix NAPI budget handling
  um: vector: Replace locks guarding queue depth with atomics
  um: remove variable stack array in os_rcv_fd_msg()
2024-09-27 12:48:48 -07:00
Amir Goldstein 0c33037c82 ovl: fix file leak in ovl_real_fdget_meta()
ovl_open_realfile() is wrongly called twice after conversion to
new struct fd.

Fixes: 88a2f6468d ("struct fd: representation change")
Reported-by: syzbot+d9efec94dcbfa0de1c07@syzkaller.appspotmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-27 12:38:47 -07:00
Linus Torvalds 34e1a5d43c Random number generator fixes for Linux 6.12-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmb26H8ACgkQSfxwEqXe
 A64wsRAAitdh0yY2dQh+jPa1gmZRS95V2J5CBlxtUHfVzjTYHlommqM8bd/Y1v2e
 B9Pr2q+b6AyNIBSKsHcNlL+rr7QgjCSC0yP4JIv7ulwpR4ds21eAVLLNCyAJxjb7
 4nl+7rEsP5QDs6qokSPgFkFr/5uaKzHJqhKixvLC0B0UkeQzc8moPXKto7aZL1JU
 h3iafedKxT23qkm8JmRuuhxPV832phnWEFCkXW5L+jlklET3k5hpvXonKp2yMpp5
 DHMe5Ml7hX+nQrXXVeZGb65mMn7DQQCz+i8OdT79YsBZlzb8wAYl9XXILPOOp1ni
 8R+MunaGV4djwMmQtfipQENClYR1MwENwxcUsI/zSmYNfy0wekvWrCZvcOj5rl5G
 UNJzGuCPeb+vkmBIPqe1WZs0BRyeV/zXXka94JkFGxoErymBOuFB+5+S7E8v5CCw
 xycWLSJdFbLSHOAMh+o6hmnprtsdbJz99lJu9VtfvwvHvuXdNji2w9rZz/PLJtT5
 bNkeqBCL049+NV2kY74E2upzfIJMkYLeuA3fS0iL4VFMCXoWx/MahMgQGHxF/Fhi
 OsAPfjROXijQl65lkzHMz6vAsOzZ4mMY0t6si3lMf8NCw+aVWH/rEfx8kwGITXIP
 7TMUOfETiPwhQw7YEFn9yVMxGDKXZjPgyRd+fRowiGX5kl7UVdw=
 =lhy+
 -----END PGP SIGNATURE-----

Merge tag 'random-6.12-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull more random number generator updates from Jason Donenfeld:

 - Christophe realized that the LoongArch64 instructions could be
   scheduled more similar to how GCC generates code, which Ruoyao
   implemented, for a 5% speedup from basically some rearrangements

 - An update to MAINTAINERS to match the right files

* tag 'random-6.12-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  LoongArch: vDSO: Tune chacha implementation
  MAINTAINERS: make vDSO getrandom matches more generic
2024-09-27 12:32:06 -07:00
Linus Torvalds 9c44575c78 bitmap-for-6.12
- switch all bitmamp APIs from inline to __always_inline from Brian Norris;
  - introduce GENMASK_U128() macro from Anshuman Khandual;
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmb22isACgkQsUSA/Tof
 vsie2gwAl3l5vye90xnD6N8wFmKBKAWXMn8Iby7JyM9gAn6j1QuE5AppS+3JtIpZ
 rPRSgFZIVPOgBtiKjb6zAWj7KbtCmaSW+L5ZVaLQ+vtwBVNpWIWHsHKu0uIpuugT
 3wp/IeaE92bc/mioqb27pj2Gnv+lzYBmbK7Mu08a3q1Adwv0I7BJ4GvqxN1lLAEW
 xrFB86xztqdV7QC45J7Q5nIyUw7UBYK078elQ8iKSj5BR8MeaEJiavETwx9DHgAO
 Z8cG94ek3IpvLpiexNcgG+FTezZj9PnTVHxry9o7CIctafiqjYqXAJ9gks1Q4QUu
 q1IjPAdueLTAMPkpK67sI3fwC6zPyX5d8DVDUTuA6qhCsMyHW687gTRy4LPR14LL
 gd1Tzg+J9DQ5KBoG4TYN/g5VoP1hkKQqpetaJhdPqmYocfmqZuzyItb+gBjhyvSp
 3YOgLg/4lULy3sZ6Qd/q8CWglWlaNYXXzf13H8f2qUpVx4NLTDOwjj/CVjZR/D0C
 wje/8XU3
 =8jNc
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-for-6.12' of https://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - switch all bitmamp APIs from inline to __always_inline (Brian Norris)

   The __always_inline series improves on code generation, and now with
   the latest compiler versions is required to avoid compilation
   warnings. It spent enough in my backlog, and I'm thankful to Brian
   Norris for taking over and moving it forward.

 - introduce GENMASK_U128() macro (Anshuman Khandual)

   GENMASK_U128() is a prerequisite needed for arm64 development

* tag 'bitmap-for-6.12' of https://github.com/norov/linux:
  lib/test_bits.c: Add tests for GENMASK_U128()
  uapi: Define GENMASK_U128
  nodemask: Switch from inline to __always_inline
  cpumask: Switch from inline to __always_inline
  bitmap: Switch from inline to __always_inline
  find: Switch from inline to __always_inline
2024-09-27 12:10:45 -07:00
Linus Torvalds ba33a49fcd One bugfix patch, one preparation patch, and one conversion patch.
TOMOYO is useful as an analysis tool for learning how a Linux system works.
 My boss was hoping that SELinux's policy is generated from what TOMOYO has
 observed. A translated paper describing it is available at
 https://master.dl.sourceforge.net/project/tomoyo/docs/nsf2003-en.pdf/nsf2003-en.pdf?viasf=1 .
 Although that attempt failed due to mapping problem between inode and pathname,
 TOMOYO remains as an access restriction tool due to ability to write custom
 policy by individuals.
 
 I was delivering pure LKM version of TOMOYO (named AKARI) to users who
 cannot afford rebuilding their distro kernels with TOMOYO enabled. But
 since the LSM framework was converted to static calls, it became more
 difficult to deliver AKARI to such users. Therefore, I decided to update
 TOMOYO so that people can use mostly LKM version of TOMOYO with minimal
 burden for both distributors and users.
 
 Tetsuo Handa (3):
   tomoyo: preparation step for building as a loadable LSM module
   tomoyo: allow building as a loadable LSM module
   tomoyo: fallback to realpath if symlink's pathname does not exist
 
  security/tomoyo/Kconfig         |   15 ++++++++
  security/tomoyo/Makefile        |   10 ++++-
  security/tomoyo/common.c        |   14 ++++++-
  security/tomoyo/common.h        |   72 ++++++++++++++++++++++++++++++++++++++
  security/tomoyo/domain.c        |    9 +++-
  security/tomoyo/gc.c            |    3 +
  security/tomoyo/hooks.h         |  110 -----------------------------------------------------------
  security/tomoyo/init.c          |  366 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  security/tomoyo/load_policy.c   |   12 ++++++
  security/tomoyo/proxy.c         |   82 ++++++++++++++++++++++++++++++++++++++++++++
  security/tomoyo/securityfs_if.c |   12 ++++--
  security/tomoyo/util.c          |    3 -
  12 files changed, 585 insertions(+), 123 deletions(-)
 -----BEGIN PGP SIGNATURE-----
 
 iQJXBAABCABBFiEEQ8gzaWI9etOpbC/HQl8SjQxk9SoFAmb2iQUjHHBlbmd1aW4t
 a2VybmVsQGktbG92ZS5zYWt1cmEubmUuanAACgkQQl8SjQxk9Soxaw/+N3CEsVbs
 npXTNsj515QWSd66Eg8UkAeE2YmwTnxQ5ZRR83SCHf65MVhhkwIKZvFy8VcnJnBu
 Xsl9p2Vme8klopTUXmjABzba2OyfU4qwHCBv6dJehTgKNUW1NVCAgliS0C20k//j
 63iGAWJ1cjQl6fbMXFS/JW/KR49EJLMKtybVdH0ldwo+fk3YivQCalxzB9FlnqKA
 LrhLI2BPq3CMDlC64572XiGQrLY31PR7WDwAUux4CYjE/cMubLpRzL/NjgGO2DJs
 V6fXwqRqdywZhM2ltzbWiSz5US0hvWO0E5Ql3dKX180LLlVGlhs0eCiuZGwPUY+j
 vjZEB5ygjOI4SO4AfWxJgqprjDAEPzfyo7+txENwW8q4c/li9woY9B8ma7S1VMGQ
 6vKx2N8kw0bCZi6FkBu6ymc4sQJcG1acWJT/1IEbcfHOHDAEbmPeHAtvJsn/JPLN
 yiKCfCXM7ecNuCyRlMvDssIK3qRh+E2cFC0WVUQc9MEEkCYneGqU1R2lUKZHqEsN
 94kdhjiUroQZjcQVrtJwDv5Baf2OgMw7DGuhUi0CwKm8Hs7NH5lZ+i+xtvnZCdG7
 GoF62GKGEc91WN9OzPmvWRAKZkr+KrfNvVm/GJE2P6l0MaCgyDztrn7xduuSD/Rr
 sazREZUXqmqltyzgAisth4/BPkqvhCSPMEM=
 =7347
 -----END PGP SIGNATURE-----

Merge tag 'tomoyo-pr-20240927' of git://git.code.sf.net/p/tomoyo/tomoyo

Pull tomoyo updates from Tetsuo Handa:
 "One bugfix patch, one preparation patch, and one conversion patch.

  TOMOYO is useful as an analysis tool for learning how a Linux system
  works. My boss was hoping that SELinux's policy is generated from what
  TOMOYO has observed. A translated paper describing it is available at

    https://master.dl.sourceforge.net/project/tomoyo/docs/nsf2003-en.pdf/nsf2003-en.pdf?viasf=1

  Although that attempt failed due to mapping problem between inode and
  pathname, TOMOYO remains as an access restriction tool due to ability
  to write custom policy by individuals.

  I was delivering pure LKM version of TOMOYO (named AKARI) to users who
  cannot afford rebuilding their distro kernels with TOMOYO enabled. But
  since the LSM framework was converted to static calls, it became more
  difficult to deliver AKARI to such users. Therefore, I decided to
  update TOMOYO so that people can use mostly LKM version of TOMOYO with
  minimal burden for both distributors and users"

* tag 'tomoyo-pr-20240927' of git://git.code.sf.net/p/tomoyo/tomoyo:
  tomoyo: fallback to realpath if symlink's pathname does not exist
  tomoyo: allow building as a loadable LSM module
  tomoyo: preparation step for building as a loadable LSM module
2024-09-27 12:03:48 -07:00