Commit graph

1248717 commits

Author SHA1 Message Date
Valentin Obst 6269fadf35 rust: kernel: add blank lines in front of code blocks
Throughout the code base, blank lines are used before starting a code
block. Adapt outliers to improve consistency within the kernel crate.

Signed-off-by: Valentin Obst <kernel@valentinobst.de>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20240131-doc-fixes-v3-v3-9-0c8af94ed7de@valentinobst.de
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-02-18 21:22:27 +01:00
Valentin Obst af8b18d740 rust: kernel: mark code fragments in docs with backticks
Fix places where comments include code fragments that are not enclosed
in backticks.

Signed-off-by: Valentin Obst <kernel@valentinobst.de>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20240131-doc-fixes-v3-v3-8-0c8af94ed7de@valentinobst.de
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-02-18 21:22:27 +01:00
Valentin Obst ebf2b8a75a rust: kernel: unify spelling of refcount in docs
Replace instances of 'ref-count[ed]' with 'refcount[ed]' to increase
consistency within the Rust documentation. The latter form is used more
widely in the rest of the kernel:

```console
$ rg '(\*|//).*?\srefcount(|ed)[\s,.]' | wc -l
1605
$ rg '(\*|//).*?\sref-count(|ed)[\s,.]' | wc -l
43
```

(numbers are for commit 052d534373 ("Merge tag 'exfat-for-6.8-rc1'
of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat"))

Signed-off-by: Valentin Obst <kernel@valentinobst.de>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20240131-doc-fixes-v3-v3-7-0c8af94ed7de@valentinobst.de
[ Reworded to use the kernel's commit description style. ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-02-18 21:22:27 +01:00
Valentin Obst 4c62348d5b rust: str: move SAFETY comment in front of unsafe block
SAFETY comments should immediately precede the unsafe block they
justify. Move assignment to `bar` past comment as it is safe.

Signed-off-by: Valentin Obst <kernel@valentinobst.de>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20240131-doc-fixes-v3-v3-6-0c8af94ed7de@valentinobst.de
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-02-18 21:22:27 +01:00
Valentin Obst 8cfce47d75 rust: str: use NUL instead of 0 in doc comments
Throughout the module, bytes with the value zero are referred to as
`NUL` bytes. Adapt the only two outliers.

Signed-off-by: Valentin Obst <kernel@valentinobst.de>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20240131-doc-fixes-v3-v3-5-0c8af94ed7de@valentinobst.de
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-02-18 21:22:27 +01:00
Valentin Obst ed8596532a rust: kernel: add srctree-relative doclinks
Convert existing references to C header files to make use of
Commit bc2e7d5c29 ("rust: support `srctree`-relative links").

Signed-off-by: Valentin Obst <kernel@valentinobst.de>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20240131-doc-fixes-v3-v3-4-0c8af94ed7de@valentinobst.de
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-02-18 21:22:27 +01:00
Valentin Obst 16dca5d12e rust: ioctl: end top-level module docs with full stop
Every other module ends its first line of documentation with a full
stop. Adapt the only outlier.

Signed-off-by: Valentin Obst <kernel@valentinobst.de>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20240131-doc-fixes-v3-v3-3-0c8af94ed7de@valentinobst.de
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-02-18 21:22:27 +01:00
Valentin Obst 69d5fbb015 rust: error: improve unsafe code in example
The `from_err_ptr` function is safe. There is no need for the call to it
to be inside the unsafe block.

Reword the SAFETY comment to provide a better justification of why the
FFI call is safe.

Signed-off-by: Valentin Obst <kernel@valentinobst.de>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Link: https://lore.kernel.org/r/20240131-doc-fixes-v3-v3-2-0c8af94ed7de@valentinobst.de
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-02-18 21:22:27 +01:00
Valentin Obst b6cda913bb rust: kernel: fix multiple typos in documentation
Fixes multiple trivial typos in documentation and comments of the
kernel crate.

allocator:
- Fix a trivial list item alignment issue in the last SAFETY comment of
  `krealloc_aligned`.

init:
- Replace 'type' with 'trait' in the doc comments of the `PinInit` and
  `Init` traits.
- Add colons before starting lists.
- Add spaces between the type and equal sign to respect the code
  formatting rules in example code.
- End a sentence with a full stop instead of a colon.

ioctl:
- Replace 'an' with 'a' where appropriate.

str:
- Replace 'Return' with 'Returns' in the doc comment of `bytes_written`
  as the text describes what the function does.

sync/lock:
- Fix a trivial list item alignment issue in the Safety section of the
  `Backend` trait's description.

sync/lock/spinlock:
- The code in this module operates on spinlocks, not mutexes. Thus,
  replace 'mutex' with 'spinlock' in the SAFETY comment of `unlock`.

workqueue:
- Replace "wont" with "won't" in the doc comment of `__enqueue`.

Signed-off-by: Valentin Obst <kernel@valentinobst.de>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20240131-doc-fixes-v3-v3-1-0c8af94ed7de@valentinobst.de
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-02-18 21:22:27 +01:00
Mika Westerberg 789809a3d5 rust: bindings: Order headers alphabetically
As the comment on top of the file suggests, sort the headers
alphabetically.

No functional changes.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://github.com/Rust-for-Linux/linux/issues/1002
Link: https://lore.kernel.org/r/20240216152723.993445-1-mika.westerberg@linux.intel.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-02-18 21:21:48 +01:00
Alice Ryhl f090f0d0ee rust: sync: update integer types in CondVar
Reduce the chances of compilation failures due to integer type
mismatches in `CondVar`.

When an integer is defined using a #define in C, bindgen doesn't know
which integer type it is supposed to be, so it will just use `u32` by
default (if it fits in an u32). Whenever the right type is something
else, we insert a cast in Rust. However, this means that the code has a
lot of extra casts, and sometimes the code will be missing casts if u32
happens to be correct on the developer's machine, even though the type
might be something else on a different platform.

This patch updates all uses of such constants in
`rust/kernel/sync/condvar.rs` to use constants defined with the right
type. This allows us to remove various unnecessary casts, while also
future-proofing for the case where `unsigned int != u32` (even though
that is unlikely to ever happen in the kernel).

I wrote this patch at the suggestion of Benno in [1].

Link: https://lore.kernel.org/all/nAEg-6vbtX72ZY3oirDhrSEf06TBWmMiTt73EklMzEAzN4FD4mF3TPEyAOxBZgZtjzoiaBYtYr3s8sa9wp1uYH9vEWRf2M-Lf4I0BY9rAgk=@proton.me/ [1]
Suggested-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Tiago Lam <tiagolam@gmail.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20240108-rb-new-condvar-methods-v4-4-88e0c871cc05@google.com
[ Added note on the unlikeliness of `sizeof(int)` changing. ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-01-28 20:55:28 +01:00
Alice Ryhl e7b9b1ff1d rust: sync: add CondVar::wait_timeout
Sleep on a condition variable with a timeout.

This is used by Rust Binder for process freezing. There, we want to
sleep until the freeze operation completes, but we want to be able to
abort the process freezing if it doesn't complete within some timeout.

Note that it is not enough to avoid jiffies by introducing a variant of
`CondVar::wait_timeout` that takes the timeout in msecs because we need
to be able to restart the sleep with the remaining sleep duration if it
is interrupted, and if the API takes msecs rather than jiffies, then
that would require a conversion roundtrip jiffies->msecs->jiffies that
is best avoided.

Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Tiago Lam <tiagolam@gmail.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Link: https://lore.kernel.org/r/20240108-rb-new-condvar-methods-v4-3-88e0c871cc05@google.com
[ Added `CondVarTimeoutResult` re-export and fixed typo. ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-01-28 20:54:35 +01:00
Alice Ryhl 82e1708748 rust: time: add msecs to jiffies conversion
Defines type aliases and conversions for msecs and jiffies.

This is used by Rust Binder for process freezing. There, we want to
sleep until the freeze operation completes, but we want to be able to
abort the process freezing if it doesn't complete within some timeout.
The freeze timeout is supplied in msecs.

Note that we need to convert to jiffies in Binder. It is not enough to
introduce a variant of `CondVar::wait_timeout` that takes the timeout in
msecs because we need to be able to restart the sleep with the remaining
sleep duration if it is interrupted, and if the API takes msecs rather
than jiffies, then that would require a conversion roundtrip jiffies->
msecs->jiffies that is best avoided.

Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Tiago Lam <tiagolam@gmail.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20240108-rb-new-condvar-methods-v4-2-88e0c871cc05@google.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-01-28 19:50:48 +01:00
Alice Ryhl 3e6454177f rust: sync: add CondVar::notify_sync
Wake up another thread synchronously.

This method behaves like `notify_one`, except that it hints to the
scheduler that the current thread is about to go to sleep, so it should
schedule the target thread on the same CPU.

This is used by Rust Binder as a performance optimization. When sending
a transaction to a different process, we usually know which thread will
handle it, so we can schedule that thread for execution next on this
CPU for better cache locality.

Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Tiago Lam <tiagolam@gmail.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20240108-rb-new-condvar-methods-v4-1-88e0c871cc05@google.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-01-28 19:50:35 +01:00
Charalampos Mitrodimas 6b1b2326b2 rust: sync: CondVar rename "wait_list" to "wait_queue_head"
Fields named "wait_list" usually are of type "struct list_head". To
avoid confusion and because it is of type
"Opaque<bindings::wait_queue_head>" we are renaming "wait_list" to
"wait_queue_head".

Signed-off-by: Charalampos Mitrodimas <charmitro@posteo.net>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Link: https://lore.kernel.org/r/20240105012930.1426214-1-charmitro@posteo.net
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-01-22 15:22:55 +01:00
Miguel Ojeda c5fed8ce65 rust: upgrade to Rust 1.75.0
This is the next upgrade to the Rust toolchain, from 1.74.1 to 1.75.0
(i.e. the latest) [1].

See the upgrade policy [2] and the comments on the first upgrade in
commit 3ed03f4da0 ("rust: upgrade to Rust 1.68.2").

# Unstable features

The `const_maybe_uninit_zeroed` unstable feature [3] was stabilized in
Rust 1.75.0, which we were using in the PHYLIB abstractions.

The only unstable features allowed to be used outside the `kernel` crate
are still `new_uninit,offset_of`, though other code to be upstreamed
may increase the list.

Please see [4] for details.

# Other improvements

Rust 1.75.0 stabilized `pointer_byte_offsets` [5] which we could
potentially use as an alternative for `ptr_metadata` in the future.

# Required changes

For this upgrade, no changes were required (i.e. on our side).

# `alloc` upgrade and reviewing

The vast majority of changes are due to our `alloc` fork being upgraded
at once.

There are two kinds of changes to be aware of: the ones coming from
upstream, which we should follow as closely as possible, and the updates
needed in our added fallible APIs to keep them matching the newer
infallible APIs coming from upstream.

Instead of taking a look at the diff of this patch, an alternative
approach is reviewing a diff of the changes between upstream `alloc` and
the kernel's. This allows to easily inspect the kernel additions only,
especially to check if the fallible methods we already have still match
the infallible ones in the new version coming from upstream.

Another approach is reviewing the changes introduced in the additions in
the kernel fork between the two versions. This is useful to spot
potentially unintended changes to our additions.

To apply these approaches, one may follow steps similar to the following
to generate a pair of patches that show the differences between upstream
Rust and the kernel (for the subset of `alloc` we use) before and after
applying this patch:

    # Get the difference with respect to the old version.
    git -C rust checkout $(linux/scripts/min-tool-version.sh rustc)
    git -C linux ls-tree -r --name-only HEAD -- rust/alloc |
        cut -d/ -f3- |
        grep -Fv README.md |
        xargs -IPATH cp rust/library/alloc/src/PATH linux/rust/alloc/PATH
    git -C linux diff --patch-with-stat --summary -R > old.patch
    git -C linux restore rust/alloc

    # Apply this patch.
    git -C linux am rust-upgrade.patch

    # Get the difference with respect to the new version.
    git -C rust checkout $(linux/scripts/min-tool-version.sh rustc)
    git -C linux ls-tree -r --name-only HEAD -- rust/alloc |
        cut -d/ -f3- |
        grep -Fv README.md |
        xargs -IPATH cp rust/library/alloc/src/PATH linux/rust/alloc/PATH
    git -C linux diff --patch-with-stat --summary -R > new.patch
    git -C linux restore rust/alloc

Now one may check the `new.patch` to take a look at the additions (first
approach) or at the difference between those two patches (second
approach). For the latter, a side-by-side tool is recommended.

Link: https://github.com/rust-lang/rust/blob/stable/RELEASES.md#version-1750-2023-12-28 [1]
Link: https://rust-for-linux.com/rust-version-policy [2]
Link: https://github.com/rust-lang/rust/issues/91850 [3]
Link: https://github.com/Rust-for-Linux/linux/issues/2 [4]
Link: https://github.com/rust-lang/rust/issues/96283 [5]
Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Tested-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20231224172128.271447-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-01-22 15:18:05 +01:00
Linus Torvalds 6613476e22 Linux 6.8-rc1 2024-01-21 14:11:32 -08:00
Linus Torvalds 35a4474b5c More bcachefs updates for 6.7-rc1
- assorted prep work for disk space accounting rewrite
  - BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this
    makes our trigger context more explicit
  - A few fixes to avoid excessive transaction restarts on multithreaded
    workloads: fstests (in addition to ktest tests) are now checking
    slowpath counters, and that's shaking out a few bugs
  - Assorted tracepoint improvements
  - Starting to break up bcachefs_format.h and move on disk types so
    they're with the code they belong to; this will make room to start
    documenting the on disk format better.
  - A few minor fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmWtjOsACgkQE6szbY3K
 bnbyXRAAsx+yM81TFqsLzRRqf8oocRwf2dj5XzExz9Ig/lYQS5LIVROS2OxwDsAc
 DeaYQSTcph9dkOswCrNR96bBnEgmmZ1ClfVI6WRXvm6vs4rjhSMNbNaVyySrMUVn
 5p/Lsn1/RKl0lWMYlHrdryo+106zRcr6z1Hiv9QCXkXhzdkV8wFYDkfbMveShUsu
 KobC29wvd2EfZr04nqsIXS/y/iRIXhtZqJmFCiAguN70UWrwUwArpELHI5Ve+WPZ
 9VjgFXW6Ka3QxJs/20tX+t24DrC+eDXR44DzQmxwG5mPBBpXkcSk5UgRw/EUag5U
 5+mDZQ5Ei3gvZvUwrilMosVy3pIw0IuvqeqwDGFoFXs1cce01QCMN+NG/dBTQw9i
 KGGxJw5sOrZ8fIiFnypk1M+r9NVtA8MjriLNR5bJjCWPSpWqzkT2HzxFXc6HmTZu
 vsE/AxwC1RLA6B2HZlDEqLOdHE3cofkDiIzWM5ABvb4p118iyk9hE6HhAufk5UdE
 HaG646kGB8pUY/sCxBIOD6K2pgthDFv+fftTM7X+uIazD3bovvPQCEInu48/KAHn
 /KmslSPO0txyjnRFMbXFJvd4Fgfo44GcBCeqGpy3B79aEJ3nroyRZ0qNnnsqj0Gl
 picUWjTn4W561Q1zBXuE/6cLWEp+sfaqYQcM8L3CCitRTVDPaCQ=
 =yd+F
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs

Pull more bcachefs updates from Kent Overstreet:
 "Some fixes, Some refactoring, some minor features:

   - Assorted prep work for disk space accounting rewrite

   - BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this
     makes our trigger context more explicit

   - A few fixes to avoid excessive transaction restarts on
     multithreaded workloads: fstests (in addition to ktest tests) are
     now checking slowpath counters, and that's shaking out a few bugs

   - Assorted tracepoint improvements

   - Starting to break up bcachefs_format.h and move on disk types so
     they're with the code they belong to; this will make room to start
     documenting the on disk format better.

   - A few minor fixes"

* tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs: (46 commits)
  bcachefs: Improve inode_to_text()
  bcachefs: logged_ops_format.h
  bcachefs: reflink_format.h
  bcachefs; extents_format.h
  bcachefs: ec_format.h
  bcachefs: subvolume_format.h
  bcachefs: snapshot_format.h
  bcachefs: alloc_background_format.h
  bcachefs: xattr_format.h
  bcachefs: dirent_format.h
  bcachefs: inode_format.h
  bcachefs; quota_format.h
  bcachefs: sb-counters_format.h
  bcachefs: counters.c -> sb-counters.c
  bcachefs: comment bch_subvolume
  bcachefs: bch_snapshot::btime
  bcachefs: add missing __GFP_NOWARN
  bcachefs: opts->compression can now also be applied in the background
  bcachefs: Prep work for variable size btree node buffers
  bcachefs: grab s_umount only if snapshotting
  ...
2024-01-21 14:01:12 -08:00
Linus Torvalds 4fbbed7872 Updates for time and clocksources:
- A fix for the idle and iowait time accounting vs. CPU hotplug.
     The time is reset on CPU hotplug which makes the accumulated
     systemwide time jump backwards.
 
  - Assorted fixes and improvements for clocksource/event drivers
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmWtTLgTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoUXiD/4uN4Ntps8TwxSdg1X11M6++rizg9q9
 EmIfwWcfQQJDM5Ss5FE88ye55NxIOwJ1brYo08+yTAXjnnZ/yNP1BBegHbMNiGil
 NCHye7tYKZle25+hErdgfBB9n6brPz7dPOvV04/wRRWW+9p2ejt/5nEvojkyco9Y
 S9KgBCxkvUqScMbdKKFW1UsThWh2euxwQXRGiWhTPPkbKcVynPvQJjvVyRxn01NS
 eEhTn8YUNcAPT+1YApouGXrSCxo/IzBJ36CxOoCoUfaXcJ6FG1LLeAjNxKZ26Dfs
 Ah0e3Hhyv6KOsBvBNwwabXDwryd6L8rZd8yL2KakI1vIC51uS2wneFy8GCieDVGh
 xmy3U/tfkS0L7pmN+dQW2l4k9PHRNrwvbISKhs0UAHSOgGIMHZcjE6aFbYKru5i4
 1W+dEjiktlceZ94mrEHbLpKmxWH2z5P8m0BzUs4kt3nkaOf6CTUKqa/qdAiU5dv+
 lovKT26L8HBrMXf48I70UpgW/bYzOUGk55sR6hiLTXAelz1z02D1uYHFkshc0NCO
 /O4wvHcgvMM46CtWVbim42AlRcyyWCr+FrY+jvfiG2icOcHPLqc81iHL8EKj7pJl
 IxLgyPHVckgnE5gx+GQ8aDkg/qwCZnj4rFWgub8QMYtjI+pO+9T9kPAYPCxFhP7J
 gmcJxZAB2RnKXA==
 =RD6E
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "Updates for time and clocksources:

   - A fix for the idle and iowait time accounting vs CPU hotplug.

     The time is reset on CPU hotplug which makes the accumulated
     systemwide time jump backwards.

   - Assorted fixes and improvements for clocksource/event drivers"

* tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug
  clocksource/drivers/ep93xx: Fix error handling during probe
  clocksource/drivers/cadence-ttc: Fix some kernel-doc warnings
  clocksource/drivers/timer-ti-dm: Fix make W=n kerneldoc warnings
  clocksource/timer-riscv: Add riscv_clock_shutdown callback
  dt-bindings: timer: Add StarFive JH8100 clint
  dt-bindings: timer: thead,c900-aclint-mtimer: separate mtime and mtimecmp regs
2024-01-21 11:14:40 -08:00
Linus Torvalds 7b297a5cc9 powerpc fixes for 6.8 #2
- 18f14afe28 powerpc/64s: Increase default stack size to 32KB BY: Michael Ellerman
 
 Thanks to:
 Michael Ellerman
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTYs9CDOrDQRwKRmtrJvCLnGrjHVgUCZayxkgAKCRDJvCLnGrjH
 Vv2hAQDwvyYydFw64D7bnaFJDLvOwi3SL02OBaFYV1JTr8rf/QEA8NcTuqXis5o5
 NedFYVE5PhYGWfyPD63aL+JpUKxsXwc=
 =Ud9v
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Aneesh Kumar:

 - Increase default stack size to 32KB for Book3S

Thanks to Michael Ellerman.

* tag 'powerpc-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Increase default stack size to 32KB
2024-01-21 11:04:29 -08:00
Kent Overstreet 249f441f83 bcachefs: Improve inode_to_text()
Add line breaks - inode_to_text() is now much easier to read.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:11 -05:00
Kent Overstreet d826cc57c5 bcachefs: logged_ops_format.h
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:11 -05:00
Kent Overstreet 8d52ba60c4 bcachefs: reflink_format.h
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:11 -05:00
Kent Overstreet b2fa1b633b bcachefs; extents_format.h
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:11 -05:00
Kent Overstreet 0560eb9abf bcachefs: ec_format.h
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:11 -05:00
Kent Overstreet c6c4ff6507 bcachefs: subvolume_format.h
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:11 -05:00
Kent Overstreet 8fed323b14 bcachefs: snapshot_format.h
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet d455179fce bcachefs: alloc_background_format.h
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet 72e0801049 bcachefs: xattr_format.h
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet 7ffc4daa5f bcachefs: dirent_format.h
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet b36425da71 bcachefs: inode_format.h
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet 82de6207fb bcachefs; quota_format.h
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet 43314801a4 bcachefs: sb-counters_format.h
bcachefs_format.h has gotten too big; let's do some organizing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet 3a58dfbc46 bcachefs: counters.c -> sb-counters.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet 12207f49ef bcachefs: comment bch_subvolume
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet d32088f2f2 bcachefs: bch_snapshot::btime
Add a field to bch_snapshot for creation time; this will be important
when we start exposing the snapshot tree to userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet 7be0208fc9 bcachefs: add missing __GFP_NOWARN
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet d7e77f53e9 bcachefs: opts->compression can now also be applied in the background
The "apply this compression method in the background" paths now use the
compression option if background_compression is not set; this means that
setting or changing the compression option will cause existing data to
be compressed accordingly in the background.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet ec4edd7b9d bcachefs: Prep work for variable size btree node buffers
bcachefs btree nodes are big - typically 256k - and btree roots are
pinned in memory. As we're now up to 18 btrees, we now have significant
memory overhead in mostly empty btree roots.

And in the future we're going to start enforcing that certain btree node
boundaries exist, to solve lock contention issues - analagous to XFS's
AGIs.

Thus, we need to start allocating smaller btree node buffers when we
can. This patch changes code that refers to the filesystem constant
c->opts.btree_node_size to refer to the btree node buffer size -
btree_buf_bytes() - where appropriate.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Su Yue 2acc59dd88 bcachefs: grab s_umount only if snapshotting
When I was testing mongodb over bcachefs with compression,
there is a lockdep warning when snapshotting mongodb data volume.

$ cat test.sh
prog=bcachefs

$prog subvolume create /mnt/data
$prog subvolume create /mnt/data/snapshots

while true;do
    $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s)
    sleep 1s
done

$ cat /etc/mongodb.conf
systemLog:
  destination: file
  logAppend: true
  path: /mnt/data/mongod.log

storage:
  dbPath: /mnt/data/

lockdep reports:
[ 3437.452330] ======================================================
[ 3437.452750] WARNING: possible circular locking dependency detected
[ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G            E
[ 3437.453562] ------------------------------------------------------
[ 3437.453981] bcachefs/35533 is trying to acquire lock:
[ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190
[ 3437.454875]
               but task is already holding lock:
[ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.456009]
               which lock already depends on the new lock.

[ 3437.456553]
               the existing dependency chain (in reverse order) is:
[ 3437.457054]
               -> #3 (&type->s_umount_key#48){.+.+}-{3:3}:
[ 3437.457507]        down_read+0x3e/0x170
[ 3437.457772]        bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.458206]        __x64_sys_ioctl+0x93/0xd0
[ 3437.458498]        do_syscall_64+0x42/0xf0
[ 3437.458779]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.459155]
               -> #2 (&c->snapshot_create_lock){++++}-{3:3}:
[ 3437.459615]        down_read+0x3e/0x170
[ 3437.459878]        bch2_truncate+0x82/0x110 [bcachefs]
[ 3437.460276]        bchfs_truncate+0x254/0x3c0 [bcachefs]
[ 3437.460686]        notify_change+0x1f1/0x4a0
[ 3437.461283]        do_truncate+0x7f/0xd0
[ 3437.461555]        path_openat+0xa57/0xce0
[ 3437.461836]        do_filp_open+0xb4/0x160
[ 3437.462116]        do_sys_openat2+0x91/0xc0
[ 3437.462402]        __x64_sys_openat+0x53/0xa0
[ 3437.462701]        do_syscall_64+0x42/0xf0
[ 3437.462982]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.463359]
               -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}:
[ 3437.463843]        down_write+0x3b/0xc0
[ 3437.464223]        bch2_write_iter+0x5b/0xcc0 [bcachefs]
[ 3437.464493]        vfs_write+0x21b/0x4c0
[ 3437.464653]        ksys_write+0x69/0xf0
[ 3437.464839]        do_syscall_64+0x42/0xf0
[ 3437.465009]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.465231]
               -> #0 (sb_writers#10){.+.+}-{0:0}:
[ 3437.465471]        __lock_acquire+0x1455/0x21b0
[ 3437.465656]        lock_acquire+0xc6/0x2b0
[ 3437.465822]        mnt_want_write+0x46/0x1a0
[ 3437.465996]        filename_create+0x62/0x190
[ 3437.466175]        user_path_create+0x2d/0x50
[ 3437.466352]        bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.466617]        __x64_sys_ioctl+0x93/0xd0
[ 3437.466791]        do_syscall_64+0x42/0xf0
[ 3437.466957]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.467180]
               other info that might help us debug this:

[ 3437.469670] 2 locks held by bcachefs/35533:
               other info that might help us debug this:

[ 3437.467507] Chain exists of:
                 sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48

[ 3437.467979]  Possible unsafe locking scenario:

[ 3437.468223]        CPU0                    CPU1
[ 3437.468405]        ----                    ----
[ 3437.468585]   rlock(&type->s_umount_key#48);
[ 3437.468758]                                lock(&c->snapshot_create_lock);
[ 3437.469030]                                lock(&type->s_umount_key#48);
[ 3437.469291]   rlock(sb_writers#10);
[ 3437.469434]
                *** DEADLOCK ***

[ 3437.469670] 2 locks held by bcachefs/35533:
[ 3437.469838]  #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs]
[ 3437.470294]  #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.470744]
               stack backtrace:
[ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G            E      6.7.0-rc7-custom+ #85
[ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 3437.471694] Call Trace:
[ 3437.471795]  <TASK>
[ 3437.471884]  dump_stack_lvl+0x57/0x90
[ 3437.472035]  check_noncircular+0x132/0x150
[ 3437.472202]  __lock_acquire+0x1455/0x21b0
[ 3437.472369]  lock_acquire+0xc6/0x2b0
[ 3437.472518]  ? filename_create+0x62/0x190
[ 3437.472683]  ? lock_is_held_type+0x97/0x110
[ 3437.472856]  mnt_want_write+0x46/0x1a0
[ 3437.473025]  ? filename_create+0x62/0x190
[ 3437.473204]  filename_create+0x62/0x190
[ 3437.473380]  user_path_create+0x2d/0x50
[ 3437.473555]  bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.473819]  ? lock_acquire+0xc6/0x2b0
[ 3437.474002]  ? __fget_files+0x2a/0x190
[ 3437.474195]  ? __fget_files+0xbc/0x190
[ 3437.474380]  ? lock_release+0xc5/0x270
[ 3437.474567]  ? __x64_sys_ioctl+0x93/0xd0
[ 3437.474764]  ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs]
[ 3437.475090]  __x64_sys_ioctl+0x93/0xd0
[ 3437.475277]  do_syscall_64+0x42/0xf0
[ 3437.475454]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.475691] RIP: 0033:0x7f2743c313af
======================================================

In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally
and unlock it at the end of the function. There is a comment
"why do we need this lock?" about the lock coming from
commit 42d237320e ("bcachefs: Snapshot creation, deletion")
The reason is that __bch2_ioctl_subvolume_create() calls
sync_inodes_sb() which enforce locked s_umount to writeback all dirty
nodes before doing snapshot works.

Fix it by read locking s_umount for snapshotting only and unlocking
s_umount after sync_inodes_sb().

Signed-off-by: Su Yue <glass.su@suse.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Su Yue 369acf97d6 bcachefs: kvfree bch_fs::snapshots in bch2_fs_snapshots_exit
bch_fs::snapshots is allocated by kvzalloc in __snapshot_t_mut.
It should be freed by kvfree not kfree.
Or umount will triger:

[  406.829178 ] BUG: unable to handle page fault for address: ffffe7b487148008
[  406.830676 ] #PF: supervisor read access in kernel mode
[  406.831643 ] #PF: error_code(0x0000) - not-present page
[  406.832487 ] PGD 0 P4D 0
[  406.832898 ] Oops: 0000 [#1] PREEMPT SMP PTI
[  406.833512 ] CPU: 2 PID: 1754 Comm: umount Kdump: loaded Tainted: G           OE      6.7.0-rc7-custom+ #90
[  406.834746 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[  406.835796 ] RIP: 0010:kfree+0x62/0x140
[  406.836197 ] Code: 80 48 01 d8 0f 82 e9 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 78 9f 1f 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 56 9f 1f 01 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 b0 00 00 00 66 90 48 8b 07 f6
[  406.837810 ] RSP: 0018:ffffb9d641607e48 EFLAGS: 00010286
[  406.838213 ] RAX: ffffe7b487148000 RBX: ffffb9d645200000 RCX: ffffb9d641607dc4
[  406.838738 ] RDX: 000065bb00000000 RSI: ffffffffc0d88b84 RDI: ffffb9d645200000
[  406.839217 ] RBP: ffff9a4625d00068 R08: 0000000000000001 R09: 0000000000000001
[  406.839650 ] R10: 0000000000000001 R11: 000000000000001f R12: ffff9a4625d4da80
[  406.840055 ] R13: ffff9a4625d00000 R14: ffffffffc0e2eb20 R15: 0000000000000000
[  406.840451 ] FS:  00007f0a264ffb80(0000) GS:ffff9a4e2d500000(0000) knlGS:0000000000000000
[  406.840851 ] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  406.841125 ] CR2: ffffe7b487148008 CR3: 000000018c4d2000 CR4: 00000000000006f0
[  406.841464 ] Call Trace:
[  406.841583 ]  <TASK>
[  406.841682 ]  ? __die+0x1f/0x70
[  406.841828 ]  ? page_fault_oops+0x159/0x470
[  406.842014 ]  ? fixup_exception+0x22/0x310
[  406.842198 ]  ? exc_page_fault+0x1ed/0x200
[  406.842382 ]  ? asm_exc_page_fault+0x22/0x30
[  406.842574 ]  ? bch2_fs_release+0x54/0x280 [bcachefs]
[  406.842842 ]  ? kfree+0x62/0x140
[  406.842988 ]  ? kfree+0x104/0x140
[  406.843138 ]  bch2_fs_release+0x54/0x280 [bcachefs]
[  406.843390 ]  kobject_put+0xb7/0x170
[  406.843552 ]  deactivate_locked_super+0x2f/0xa0
[  406.843756 ]  cleanup_mnt+0xba/0x150
[  406.843917 ]  task_work_run+0x59/0xa0
[  406.844083 ]  exit_to_user_mode_prepare+0x197/0x1a0
[  406.844302 ]  syscall_exit_to_user_mode+0x16/0x40
[  406.844510 ]  do_syscall_64+0x4e/0xf0
[  406.844675 ]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  406.844907 ] RIP: 0033:0x7f0a2664e4fb

Signed-off-by: Su Yue <glass.su@suse.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet 00fff4dd58 bcachefs: bios must be 512 byte algined
Fixes: 023f9ac9f7 bcachefs: Delete dio read alignment check
Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Colin Ian King aead3428e8 bcachefs: remove redundant variable tmp
The variable tmp is being assigned a value but it isn't being
read afterwards. The assignment is redundant and so tmp can be
removed.

Cleans up clang scan build warning:
warning: Although the value stored to 'ret' is used in the enclosing
expression, the value is never actually read from 'ret'
[deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet b97de45365 bcachefs: Improve trace_trans_restart_relock
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet 46bf2e9cc7 bcachefs: Fix excess transaction restarts in __bchfs_fallocate()
drop_locks_do() should not be used in a fastpath without first trying
the do in nonblocking mode - the unlock and relock will cause excessive
transaction restarts and potentially livelocking with other threads that
are contending for the same locks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet 1a5039041b bcachefs: extents_to_bp_state
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet ba96d36ca5 bcachefs: bkey_and_val_eq()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:09 -05:00
Kent Overstreet e6a2566f7a bcachefs: Better journal tracepoints
Factor out bch2_journal_bufs_to_text(), and use it in the
journal_entry_full() tracepoint; when we can't get a journal reservation
we need to know the outstanding journal entry sizes to know if the
problem is due to excessive flushing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:09 -05:00
Kent Overstreet 4ae016607b bcachefs: Print size of superblock with space allocated
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:09 -05:00
Kent Overstreet a6548c8b5e bcachefs: Avoid flushing the journal in the discard path
When issuing discards, we may need to flush the journal if there's too
many buckets that can't be discarded until a journal flush.

But the heuristic was bad; we should be comparing the number of buckets
that need to flushes against the number of free buckets, not the number
of buckets we saw.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:09 -05:00