Commit graph

1217334 commits

Author SHA1 Message Date
Kent Overstreet d1b2c864e0 bcachefs: Defer full journal entry validation
On journal read, previously we would do full journal entry validation
immediately after reading a journal entry.

However, this would lead to errors for journal entries we weren't
actually going to use, either because they were too old or too new
(newer than the most recent flush).

We've observed write tearing on journal entries newer than the newest
flush - which makes sense, prior to a flush there's no guarantees about
write persistence.

This patch defers full journal entry validation until the end of the
journal read path, when we know which journal entries we'll want to use.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:44 -04:00
Kent Overstreet 17fe3b6452 bcachefs: Improve journal_entry_add()
Prep work for the next patch, to defer journal entry validation: we now
track for each replica whether we had a good checksum.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Daniel Hill bf8f8b20a1 bcachefs: time stats now uses the mean_and_variance module.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Daniel Hill 92095781e0 bcachefs: Mean and variance
This module provides a fast 64bit implementation of basic statistics
functions, including mean, variance and standard deviation in both
weighted and unweighted variants, the unweighted variant has a 32bit
limitation per sample to prevent overflow when squaring.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet 07bfcc0b4c bcachefs: Fix for not dropping privs in fallocate
When modifying a file, we may be required to drop the suid/sgid bits -
we were missing a file_modified() call to do this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet 3a4d3656e5 bcachefs: Fix bch2_write_begin()
An error case was jumping to the wrong label, creating an infinite loop
- oops.

This fixes fstests generic/648.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet 40405557b9 fixup bcachefs: Deadlock cycle detector
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet 80df5b8cac fixup bcachefs: Deadlock cycle detector
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet 896f1b316f bcachefs: Fix lock_graph_remove_non_waiters()
We were removing 1 more entry than we were supposed to - oops.

Also some other simplifications and cleanups, and bring back the abort
preference code in a better fashion.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet 65ff2d3a7a bcachefs: Support FS_XFLAG_PROJINHERIT
We already have support for the flag's semantics: inode options are
inherited by children if they were explicitly set on the parent. This
patch just maps the FS_XFLAG_PROJINHERIT flag to the "this option was
epxlicitly set" bit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet bf9cb250ed bcachefs: Don't allow hardlinks when inherited attrs would change
This is the right thing to do, and conforms with our own behaviour on
rename and xfs's behaviour on hardlink.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet f866870f5d bcachefs: Initialize sb_quota with default 1 week timer
For compliance with other quota implementations, we should be
initializing quota information with a default 1 week timelimit: this
fixes fstests generic/235.

Also, this adds to_text() functions for some quota structs - useful
debugging aids.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet de107dc800 bcachefs: Call bch2_btree_update_add_new_node() before dropping write lock
btree nodes can be written by other threads (shrinker, journal reclaim)
with only a read lock, but brand new nodes should only be written by the
thread doing the split/interior update. bch2_btree_update_add_new_node()
sets btree node flags to indicate that this is a new node and should not
be written out by other threads, thus we need to call it before dropping
our write lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet e8540e5681 bcachefs: Reflink now respects quotas
This adds a new helper, quota_reserve_range(), which takes a quota
reservation for unallocated blocks in a given file range, and uses it in
bch2_remap_file_range().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet f42238b5cd bcachefs: Fix a rare path in bch2_btree_path_peek_slot()
In the drop_alloc tests, we may end up calling
bch2_btree_iter_peek_slot() on an interior level that doesn't exist.
Previously, this would hit the path->uptodate assertion in
bch2_btree_path_peek_slot(); this path first checks a NULL btree node,
which is how we know we're at the end of the btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet 7dcbdbd85c bcachefs: bch2_path_put_nokeep()
The btree iterator code may allocate extra btree paths, temporarily,
that do not refer to keys being returned: we don't need to wait until
transaction restart to drop these, when they're not referenced they
should be deleted right away.

This fixes a transaction path overflow bug.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet 5b3243cb52 bcachefs: Fix cached data accounting
Negating without casting to a signed integer means the value wasn't
getting sign extended properly - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet 1f0f731ffe bcachefs: Btree splits now only take the locks they need
Previously, bch2_btree_update_start() would always take all intent
locks, all the way up to the root.

We've finally got data from users where this became a scalability issue
- so, this patch fixes bch2_btree_update_start() to only take the locks
we need.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet 969576ecae bcachefs: bch2_btree_iter_peek() now works with interior nodes
Needed by the next patch, which will be iterating over keys in nodes at
level 1.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet 1ff7849f3b bcachefs: bch2_btree_insert_node() no longer uses lock_write_nofail
Now that we have an error path plumbed through, there's no need to be
using bch2_btree_node_lock_write_nofail().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet a8eefbd324 bcachefs: Add error path to btree_split()
The next patch in the series is (finally!) going to change btree splits
(and interior updates in general) to not take intent locks all the way
up to the root - instead only locking the nodes they'll need to modify.

However, this will be introducing a race since if we're not holding a
write lock on a btree node it can be written out by another thread, and
then we might not have enough space for a new bset entry.

We can handle this by retrying - we just need to introduce a new error
path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet 8cbb000250 bcachefs: Write new btree nodes after parent update
In order to avoid locking all btree nodes up to the root for btree node
splits, we're going to have to introduce a new error path into
bch2_btree_insert_node(); this mean we can't have done any writes or
modified global state before that point.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet fe2de9a8dc bcachefs: Simplify break_cycle()
We'd like to prioritize aborting transactions that have done less work -
however, it appears breaking cycles by telling other threads to abort
may still be buggy, so disable that for now.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet 1148a97f1f bcachefs: Print cycle on unrecoverable deadlock
Some lock operations can't fail; a cycle of nofail locks is impossible
to recover from. So we want to get rid of these nofail locking
operations, but as this is tricky it'll be done incrementally.

If such a cycle happens, this patch prints out which codepaths are
involved so we know what to work on next.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet 1be887979b bcachefs: Handle dropping pointers in data_update path
Cached pointers are generally dropped, not moved: this led to an
assertion firing in the data update path when there were no new replicas
being written.

This path adds a data_options field for pointers to be dropped, and
tweaks move_extent() to check if we're only dropping pointers, not
writing new ones, before kicking off a data update operation.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet 160dff6dad bcachefs: Ratelimit ec error message
We should fix this, but for now this makes this more usable.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet 2da671dc4a bcachefs: Use btree_type_has_ptrs() more consistently
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet 6c22eb7085 bcachefs: Fix "multiple types of data in same bucket" with ec
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet 22f5162133 bcachefs: Ensure fsck error is printed before panic
When errors=panic, we want to make sure we print the error before
calling bch2_inconsistent_error().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet 8aaee94d46 bcachefs: Fix a deadlock in btree_update_nodes_written()
btree_node_lock_nopath() is something we'd like to get rid of, it's
always prone to deadlocks if we accidentally are holding other locks,
because it doesn't mark the lock it's taking in a path: we'll want to
get rid of it in the future, but for now this patch works it by calling
bch2_trans_unlock().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet 13bc41a715 bcachefs: bch2_trans_locked()
Useful debugging function.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet 40a44873a5 bcachefs: Improve btree_deadlock debugfs output
This changes bch2_check_for_deadlock() to print the longest chains it
finds - when we have a deadlock because the cycle detector isn't finding
something, this will let us see what it's missing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet 943f9946a6 bcachefs: Don't quash error in bch2_bucket_alloc_set_trans()
We were incorrectly returning -BCH_ERR_insufficient_devices when we'd
received a different error from bch2_bucket_alloc_trans(), which
(erronously) turns into -EROFS further up the call chain.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet 685e0f0c47 bcachefs: Fix a trans path overflow in bch2_btree_delete_range_trans()
bch2_btree_delete_range_trans() was using btree_trans_too_many_iters()
to avoid path overflow, but this was buggy here (and also
btree_trans_too_many_iters() is suspect in general).

btree_trans_too_many_iters() only returns true when we're close to the
maximum number of paths - within 8 - but extent insert/delete assumes
that it can use more paths than that.

Instead, we need to call bch2_trans_begin() on every loop iteration.
Since we don't want to call bch2_trans_begin() (restarting the outer
transaction) if the call was a no-op - if we had no work to do - we have
to structure things a bit oddly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet ae10fe017b bcachefs: bucket_alloc_state
This refactoring puts our various allocation path counters into a
dedicated struct - the upcoming nocow patch is going to add another
counter.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet 29cea6f483 bcachefs: Fix bch2_btree_path_up_until_good_node()
There was a rare bug when path->locks_want was nonzero, but not
BTREE_MAX_DEPTH, where we'd return on a valid node that wasn't locked -
oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet e0eaf86259 bcachefs: Factor out bch2_write_drop_io_error_ptrs()
Move slowpath code to a separate, non-inline function.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet 99e2146bea bcachefs: Break out bch2_btree_path_traverse_cached_slowpath()
Prep work for further refactoring.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet 2d848dacb2 bcachefs: Kill io_in_flight semaphore
This used to be needed more for buffered IO, but now the block layer has
writeback throttling - we can delete this now.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet 68b6cd194a bcachefs: Improve bucket_alloc tracepoint
It now includes more info - whether the bucket was for metadata or data
- and also call it in the same place as the bucket_alloc_fail
tracepoint.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet c298fd7d34 bcachefs; Mark __bch2_trans_iter_init as inline
This function is fairly small and only used in two places: one very hot,
the other cold, so it should definitely be inlined.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet 25b4b3308e bcachefs: Inline fast path of check_pos_snapshot_overwritten()
This moves the slowpath of check_pos_snapshot_overwritten() to a
separate function, and inlines the fast path - helping performance on
btrees that don't use snapshot and for users that aren't using
snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet c23a9e0882 bcachefs: Improve jset_validate()
Previously, jset_validate() was formatting the initial part of an error
string for every entry it validating - expensive.

This moves that code to journal_entry_err_msg(), which is now only
called if there's an actual error.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet 3f3bc66ef0 bcachefs: Optimize btree_path_alloc()
- move slowpath code to a separate function, btree_path_overflow()
 - no need to use hweight64
 - copy nr_max_paths from btree_transaction_stats to btree_trans,
   avoiding a data dependency in the fast path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet 14d8f26ad0 bcachefs: Inline bch2_trans_kmalloc() fast path
Small performance optimization.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet f3b8403ee7 bcachefs: Run bch2_fs_counters_init() earlier
We need counters to be initialized before initializing shrinkers - the
shrinker callbacks will update those counters. This fixes a segfault in
userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet d704d62355 bcachefs: btree_err() now uses bch2_print_string_as_lines()
We've seen long error messages get truncated here, so convert to the new
bch2_print_string_as_lines().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet dbb9936b0d bcachefs: Improve bch2_fsck_err()
- factor out fsck_err_get()
 - if the "bcachefs (%s):" prefix has already been applied, don't
   duplicate it
 - convert to printbufs instead of static char arrays
 - tidy up control flow a bit
 - use bch2_print_string_as_lines(), to avoid messages getting truncated

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet a8f3542843 bcachefs: bch2_print_string_as_lines()
This adds a helper for printing a large buffer one line at a time, to
avoid the 1k printk limit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet e9174370d0 bcachefs: bch2_btree_node_relock_notrace()
Most of the node_relock_fail trace events are generated from
bch2_btree_path_verify_level(), when debugcheck_iterators is enabled -
but we're not interested in these trace events, they don't indicate that
we're in a slowpath.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00