Commit graph

1217038 commits

Author SHA1 Message Date
Kent Overstreet 7f5c5d20f0 bcachefs: Redo data_update interface
This patch significantly cleans up and simplifies the data_update
interface. Instead of only being able to specify a single pointer by
device to rewrite, we're now able to specify any or all of the pointers
in the original extent to be rewrited, as a bitmask.

data_cmd is no more: the various pred functions now just return true if
the extent should be moved/updated. All the data_update path does is
rewrite existing replicas, or add new ones.

This fixes a bug where with background compression on replicated
filesystems, where rebalance -> data_update would incorrectly drop the
wrong old replica, and keep trying to recompress an extent pointer and
each time failing to drop the right replica. Oops.

Now, the data update path doesn't look at the io options to decide which
pointers to keep and which to drop - it only goes off of the
data_update_options passed to it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet 47ab0c5f6a bcachefs: Fix bch2_check_alloc_key()
bch2_check_alloc_key() was failing to check buckets that didn't have
alloc keys yet (because they'd never been used) - they still need to be
added to the freespace btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet e34da43e33 bcachefs: Improve bch2_check_alloc_info
- In check_alloc_key(), previously we were re-initializing iterators
   for the need_discard and freespace btrees for every alloc key we
   checked. But this was causing us to redo lookups into the journal
   keys every time, since those lookups are cached in struct btree_iter.
   This initializes the iterators in bch2_check_alloc_info and passes
   them into check_alloc_key().

 - Make the looping more consistent/efficient in bch2_check_alloc_info()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet 22add2ec67 bcachefs: Use BTREE_INSERT_LAZY_RW in bch2_check_alloc_info()
This runs before we go rw for journal replay, but after we're allowed to
go rw. It might be time to consider killing BTREE_INSERT_LAZY_RW,
though.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet 3858536744 bcachefs: Bucket invalidate path improvements
- invalidate_one_bucket() now returns 1 when we don't have any buckets
   on this device to invalidate, ensuring we don't spin
 - the tracepoint invocation is moved to after the transaction commit,
   and we now include the number of cached sectors in the tracepoint

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet 962ad1a766 bcachefs: Don't BUG_ON() inode link count underflow
This switches that assertion to a bch2_trans_inconsistent() call, as it
should be.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet 7a47d0993b bcachefs: Always descend to leaf nodes it btree_gc
If a btree node is unreadable, it's the topology repair that fixes that
and it's kicked off by btree_gc, so btree_gc needs to touch every node
and very that they can be read.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Daniel Hill 58aaa0836b bcachefs: fix __dev_available().
__dev_available() now calculates available buckets correctly. Previously
it would almost always return 0 when we have cached data.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:34 -04:00
Kent Overstreet 2817d45381 bcachefs: Fix assertion in topology repair
If we were at the end of the node, when breaking out of the loop we'd
pop the assertion on line 446 when cur wasn't NULL.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet 5a3c24714c bcachefs: Make verbose option settable at runtime
-o verbose is very useful, and we're starting to use it more for runtime
debug statements - making it possible to enable at runtime is a no
brainer.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet 54feff0a7a bcachefs: Improve "copygc requested to run" error message
This improves the "copygc requested to run but no buckets found" to show
the device that requires copygc to be run on - we'll definitely need to
improve this more.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet c501fef6de bcachefs: Pull out data_update.c
This is the start of reorganizing the data IO paths. The plan is to also
break apart io.c into data_read.c and data_write.c, and migrate_write
will be renamed to the data_update path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:34 -04:00
Kent Overstreet 30f0349d62 bcachefs: Split out dev_buckets_free()
Previously, dev_buckets_available() only counted buckets that are
eligible to be allocated right now - i.e. buckets that don't have cached
data, or need discard, or need gc gens, etc.

But most users of this function want to know how many buckets are
eligible to be allocated from without moving data around - copygc,
allocator striping, which means we should be including cached data
buckets etc.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet 8f7f566f57 bcachefs: btree key cache pcpu freedlist
Originally, the btree key cache code would always allocate new entries
by reusing from the recently-freed list, if that list wasn't empty. But
that behaviour was dropped, for lock contention reasons.

But it seems that entries stranded on the freed list have been
contributing to some of our oom issues, because long running btree
transactions will prevent them from being freed.

This patch re-adds allocating from the freed list, but it also adds
percpu buffers to solve the lock contention issues - and the new percpu
freed lists will improve the evict paths, too.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet 7bb61e8c0e bcachefs: Make IO in flight by copygc/rebalance configurable
This adds a new option, move_bytes_in_flight, for configuring the amount
of IO in flight by copygc/rebalance - users with many devices in their
filesystem will want to increase this.

In the future we should be smarter about this, but this is an easy
improvement.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet b5f73fd79f bcachefs: Check for extents with too many ptrs
We have a hardcoded maximum on number of pointers in an extent that's
used by some other data structures - notably bch_devs_list - but we
weren't actually checking for it. Oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet 1c6ff39445 bcachefs: Fix refcount leak in bch2_do_invalidates()
If we fail to queue the work item because it's already in process, we
need to drop the ref we just took.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet a3d7afa5c1 bcachefs: Always use percpu_ref_tryget_live() on c->writes
If we're trying to get a ref and the refcount has been killed, it means
we're doing an emergency shutdown - we always want tryget_live().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet 23189da9eb bcachefs: Improve checksum error messages
We're seeing checksum errors in the bch2_rechecksum_bio() path - give it
a better error message to help track this down.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet 50b13beef0 bcachefs: Improve an error message
When inserting a key type that's not valid for a given btree, we should
print out which btree we were inserting into.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:34 -04:00
Kent Overstreet 2ed6248ab3 bcachefs: Fix assertion in bch2_dev_list_add_dev()
We were only allowing 4 devices in a dev_list, not 16.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet b7c1104612 bcachefs: Increase max size for btree_trans bump allocator
With backpointers, alloc keys have gotten bigger, so we're needing more
memory here.

We're probably going to need to go with something more sophisticated
than a bump allocator, but - let's see if we can avoid doing that just
yet.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet 6f44a9940c bcachefs: Add a persistent counter for bucket discards
Like the previous patch for bucket invalidates, add another counter for
a core allocator path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet c9bd67321e bcachefs: Fix btree node read retries
b->written wasn't being reset to 0 in the btree node read retry path,
causing decrypting & validation of previously read bsets to not be
re-run - ouch.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet 440c15cc91 bcachefs: Add a persistent counter for bucket invalidation
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet 0e96f5dcd7 bcachefs: Call bch2_do_invalidates() when going read write
Like bch2_do_discards(), we should check if this needs to be done when
going rw.

Also, add some sysfs code for debugging bucket invalidation.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet a5d18f9ec0 bcachefs: Improved human readable integer parsing
Printbufs recently switched to using string_get_size() for printing
integers in human readable units. This updates __bch2_strtoh() to parse
numbers printed by string_get_size() - we now have to handle floating
point numbers, and new unit suffixes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet df8c2ccb93 bcachefs: Fix freespace initialization
bch2_dev_freespace_init() was using __bch2_trans_do() incorrectly, and
calling bch2_bucket_do_index() with a stale alloc key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet 401ec4db63 bcachefs: Printbuf rework
This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:33 -04:00
Kent Overstreet 652018d661 bcachefs: Fix btree node read error path
We were forgetting to clear the read_in_flight flag - oops. This also
fixes it to not call bch2_fatal_error() before topology repair has had a
chance to do its thing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet 576179021c bcachefs: Fix btree_and_journal_iter
We had a bug where btree_and_journal_iter would return the same key
twice - after deleting it (perhaps because it was present in both the
btree and the journal?)

This reworks btree_and_journal_iter to track the current position, much
like btree_paths, which makes the logic considerably simpler and more
robust.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet f2aa026575 bcachefs: Fix for cmd_list_journal
cmd_list_journal wasn't correctly listing the most recent journal
entries as blacklisted - because in the recovery path when just reading
the journal, we were failing to add those to the blacklist table.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet cb685ce72c bcachefs: Also log overwrites in journal
Lately we've been doing a lot of debugging by looking at the journal to
see what was changed, and by what code path. This patch adds a new
journal entry type for recording overwrites, so that we don't have to
search backwards through the journal to see what was being overwritten
in order to work out what the triggers were supposed to be doing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet 43ddf44834 bcachefs: Refactor journal entry adding
This takes copying the payload out of bch2_journal_add_entry(), which
means we can use it for journal_transaction_name() - also prep work for
journalling overwrites.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet 4a7a7ea1f5 bcachefs: Add some missing error messages
bch2_opt_parse() was failing to generate error messages in error path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet 636d4eef1e bcachefs: Fix memory corruption in encryption path
When do_encrypt() was passed a vmalloc address and the buffer spanned
more than a single page, we were encrypting/decrypting completely
different pages than the ones intended.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet 0fbf71f80d bcachefs: bch2_trans_reset_updates()
Factor out a new helper.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet 9b688da350 bcachefs: Fix error checking in bch2_fs_alloc()
One of the init calls had a ; instead of a ?:, and errors after that got
dropped - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet c737267821 bcachefs: Print message on btree node read retry success
Right now, we print an error message on btree node read error, and we
print that we're retrying, but we don't explicitly say if the retry
succeeded - this makes things a little clearer.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet 30525f6863 bcachefs: Fix journal_keys_search() overhead
Previously, on every btree_iter_peek() operation we were searching the
journal keys, doing a full binary search - which was slow.

This patch fixes that by saving our position in the journal keys, so
that we only do a full binary search when moving our position backwards
or a large jump forwards.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet 11f5e595bf bcachefs: Always print when doing journal replay in fsck
This logging improvement helps see when the previous fsck pass has
completed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Daniel Hill a8dea22703 bcachefs: Rename group to label for remaining strings.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:33 -04:00
Kent Overstreet c346def9af bcachefs: Fix encryption path on arm
flush_dcache_page() is not a noop on arm, but we were using
virt_to_page() instead of vmalloc_to_page() for an address on the kernel
stack - vmalloc memory, leading to an oops in flush_dcache_page().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet 232697ab9d bcachefs: Switch to key_type_user, not logon
The only difference key_type_logon and key_type_user is that
key_type_logon keys can't be read by userspace.

However, userspace has actually been adding keys to both the logon and
user keychains, because userspace fsck requires the keychain interface -
so we might as well just use user and drop the logon keychain.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:32 -04:00
Kent Overstreet d8a161ad54 bcachefs: LRU repair tweaks
- Drop old unneeded parameter for whether we're in initial GC - which
   was from when btree updates had to be done differently before we
   went RW.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:32 -04:00
Kent Overstreet facc81479c bcachefs: Delete bch_writepage
Per Dave Chinner and the xfs folks, .writepage is no longer needed, and
it's better not to define it if .writepages is the intended path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:32 -04:00
Brett Holman 372c11125a bcachefs: Make bch_option compatible with Rust ffi
Rust FFI lacks support for unnamed structs and unions. The space
saved in bch_option is not enough to be significant.

Signed-off-by: Brett Holman <bholman.devel@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:32 -04:00
Kent Overstreet ee4d17d032 bcachefs: Put btree_trans_verify_sorted() behind debug_check_iterators
This is pretty expensive, and we've tested sufficiently with it now that
it doesn't need to be on by default.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:32 -04:00
Kent Overstreet e320b42dfe bcachefs: Fix extent merging
When merging extents, we have to check that we won't overflow size
fields in any CRC entries - but the check for this was wrong, because in
the loop it was in we weren't keeping a pointer to the (packed, encoded)
CRC field.

Fix this by moving it to its own loop.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:32 -04:00
Kent Overstreet ae21f74e31 bcachefs: Improve invalid bkey error message
Bkeys have gotten a lot bigger since this code was written and now are
often formatted across multiple lines - while the reason a bkey is
invalid will still be short and fit on a single line. This patch prints
the error bfore the bkey, making it a bit more readable.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:32 -04:00