Commit graph

1215829 commits

Author SHA1 Message Date
Kent Overstreet 780c4e43f8 bcachefs: Drop a faulty assertion
This assertion was wrong for interior nodes (and wasn't terribly useful
to begin with)

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:33 -04:00
Kent Overstreet 309c54c3f4 bcachefs: Redo copygc throttling
The code that checked the current free space and waited if it was too
big was causing issues - btree node allocations do not increment the
write IO clock (perhaps they should); but more broadly the check
wouldn't run copygc at all until the device was mostly full, at which
point it might have to do a bunch of work.

This redoes that logic so that copygc starts to run earlier, smoothly
running more and more often as the device becomes closer to full.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:33 -04:00
Kent Overstreet 5873efbfd9 bcachefs: Make io timers less buggy
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:33 -04:00
Kent Overstreet 187c71f6ab bcachefs: Fix a memory splat
In __bch2_sb_field_resize, when a field's old a new size was 0, we were
doing an invalid write just past the end of the superblock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:33 -04:00
Kent Overstreet 22502ac23a bcachefs: Redo filesystem usage ioctls
When disk space accounting was changed to be tracked by replicas entry,
the ioctl interface was never update: this patch finally does that.

Aditionally, the BCH_IOCTL_USAGE ioctl is now broken out into separate
ioctls for filesystem and device usage.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:33 -04:00
Justin Husted 184b1dc1a6 bcachefs: Update directory timestamps during link
Timestamp updates on the directory during a link operation were cached.
This is inconsistent with other metadata operations such as rename, as
well as being less efficient.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:33 -04:00
Kent Overstreet c45d473df7 bcachefs: Fix for an assertion on filesystem error
Normally the in memory i_size is always greater than or equal to i_size
on disk; this doesn't hold on filesystem error.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:33 -04:00
Kent Overstreet b5a5c4c103 bcachefs: Fix a null ptr deref in btree_iter_traverse_one()
When traversing nodes and we've reached the end of the btree, the
current btree node will be NULL.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:33 -04:00
Kent Overstreet ae2f17d5ad bcachefs: Kill btree_node_iter_large
Long overdue cleanup - this converts btree_node_iter_large uses to
sort_iter.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet 8f82280ea3 bcachefs: Use one buffer for sorting whiteouts
We're not really supposed to allocate from the same mempool more than
once.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet c297a763e2 bcachefs: Refactor whiteouts compaction
The whiteout compaction path - as opposed to just dropping whiteouts -
is now only needed for extents, and soon will only be needed for extent
btree nodes in the old format.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet c9bebae65e bcachefs: Whiteout changes
More prep work for snapshots: extents will soon be using
KEY_TYPE_deleted for whiteouts, with 0 size. But we wen't be able to
keep these whiteouts with the rest of the extents in the btree node, due
to sorting invariants breaking.

We can deal with this by immediately moving the new whiteouts to the
unwritten whiteouts area - this just means those whiteouts won't be
sorted, so we need new code to sort them prior to merging them with the
rest of the keys to be written.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet 183797e31d bcachefs: Always emit new extents on partial overwrite
This is prep work for snapshots: the algorithm in
bch2_extent_sort_fix_overlapping() will break when we have multiple
overlapping extents in unrelated snapshots - but, we'll be able to make
extents work like regular keys and use bch2_key_sort_fix_overlapping()
for extent btree nodes if we make a couple changes - the main one being
to always emit new extents when we partially overwrite an existing
(written) extent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet c201e2d976 bcachefs: Fix bch2_verify_insert_pos()
We were calling __btree_node_key_to_offset() on a key that wasn't in the
btree node.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet 07358a82bb bcachefs: Put inline data behind a mount option for now
Inline data extents + reflink is still broken

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet ba239c954e bcachefs: bch2_check_set_feature()
New helper function for setting incompatible feature bits

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet 9ba68f6cdc bcachefs: Switch to macro for bkey_ops
Older versions of gcc refuse to compile it the other way

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet 5934a0caf2 bcachefs: bkey_on_stack_reassemble()
Small helper function.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet bd7e82ee2a bcachefs: kill ca->freelist_lock
All uses were supposed to be switched over to c->freelist_lock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet 4de774952b bcachefs: Reorganize extents.c
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet 4be1a412ea bcachefs: Inline data extents
This implements extents that have their data inline, in the value,
instead of the bkey value being pointers to the data - and the read and
write paths are updated to read from these new extent types and write
them out, when the write size is small enough.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet 08c07fea7b bcachefs: Split out extent_update.c
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet 085ab69357 bcachefs: Rework of cut_front & cut_back
This changes bch2_cut_front and bch2_cut_back so that they're able to
shorten the size of the value, and it also changes the extent update
path to update the accounting in the btree node when this happens.

When the size of the value is shortened, they zero out the space that's
no longer used, so it's interpreted as noops (as implemented in the last
patch).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet ad44bdc351 bcachefs: bkey noops
For upcoming inline data extents, we're going to need to be able to
shorten the value of existing bkeys in the btree - and to make that work
we're going to be able to need to pad out the space the value previously
took up with something.

This patch changes the various code that iterates over bkeys to handle
k->u64s == 0 as meaning "skip the next 8 bytes".

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet aef90ce085 bcachefs: kill bch2_extent_has_device()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet 35189e09ab bcachefs: bkey_on_stack
This implements code for storing small bkeys on the stack and allocating
out of a mempool if they're too big.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet 03c8c747a0 bcachefs: Make memcpy_to_bio() param const
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet 50fe5bd69c bcachefs: Use wbc_to_write_flags()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Kent Overstreet c32bd3ad1f bcachefs: Fix erorr path in bch2_write()
The error path in bch2_write wasn't updated when the end_io callback was
added to bch_write_op.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Justin Husted b627c7d8f4 bcachefs: Set lost+found mode to 0700
For security and conformance with other filesystems, the lost+found
directory should not be world or group accessible.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet 0897705163 bcachefs: Be slightly less tricky with union usage
This is to fix a valgrind complaint - the code was correct, but too
tricky for valgrind to know that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet f7f21ed382 bcachefs: Remove some BKEY_PADDED uses
Prep work for extents with inline data

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet b904a79918 bcachefs: Go back to 16 bit mantissa bkey floats
The previous optimizations means using 32 bit mantissas are now a net
loss - having bkey_float be only 4 bytes is good for prefetching.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet 58404bb236 bcachefs: Fall back to slowpath on exact comparison
This is basically equivalent to the original strategy of falling back to
checking against the original key when the original key and previous key
didn't differ in the required bits - except, now we only fall back when
the search key doesn't differ in the required bits, which ends up being
a bit faster.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet 1bdb67e8cb bcachefs: kill BFLOAT_FAILED_PREV
The assumption underlying BFLOAT_FAILED_PREV was wrong; the comparison
we're doing in bset_search_tree() doesn't have to tell the pivot apart
from the previous key, it just has to tell if search is definitely
greater than or equal to the pivot.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet 70438dc3f0 bcachefs: bch2_read_extent() microoptimizations
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet c45376866a bcachefs: Pipeline binary searches and linear searches
This makes prefetching for the linear search at the end of the lookup
much more effective, and is a couple percent speedup.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet fab4f8c653 bcachefs: Make __bch2_bkey_cmp_packed() smaller
We can probably get rid of the version that dispatches based on type
checking too.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet 6baf2730cc bcachefs: Inline fast path of bch2_increment_clock()
Shaving more cycles.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet f58c22e76f bcachefs: Avoid calling bch2_btree_iter_relock() in bch2_btree_iter_traverse()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet e2ee3eaab7 bcachefs: Add an option for fsck error ratelimiting
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet ef496cd268 bcachefs: Don't BUG_ON() sector count overflow
Return an error instead (still work in progress...)

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet 677fc0562a bcachefs: Some reflink fixes
len might fit into a loff_t when aligned_len does not - make sure we use
a u64 for aligned_len. Also, we weren't always extending the inode
correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet 4a1d8d3efc bcachefs: Fix setting of attributes mask in getattr
Discovered by xfstests generic/553

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet a023127a28 bcachefs: Eliminate function calls in DIO fastpaths
We can assume that usually buffered and O_DIRECT IO won't be mixed, and
the calls to flush the page cache won't be needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet 54847d253a bcachefs: DIO write path only needs to shoot down pagecache once, not twice
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet 1b783a690d bcachefs: Add pagecache_add lock to buffered IO path, fault path
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Justin Husted 6d01598ecd bcachefs: Fix uninitialized field in hash_check_init()
The chain_end field was not initialized before use in
hash_set_chain_start.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet 7edcfbfefe bcachefs: Don't hold inode lock longer than necessary in dio write path
In theory we should be able to do (non appending/extending) dio writes
without taking the inode lock at all - but this gets us most of the way
there.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00
Kent Overstreet f8f3086338 bcachefs: Avoid atomics in write fast path
This adds some horrible hacks, but the atomic ops for closures were
getting to be a pretty expensive part of the write path. We don't want
to rip out closures entirely from the write path, because they're used
for e.g. waiting on the allocator, or waiting on the journal flush, and
that stuff would get really ugly without closures.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:31 -04:00