linux

mirror of https://github.com/torvalds/linux synced 2024-10-15 15:59:15 +00:00

Author	SHA1	Message	Date
Kent Overstreet	c670509134	bcachefs: Allocator prefers not to expand mi.btree_allocated bitmap We now have a small bitmap in the member info section of the superblock for "regions that have btree nodes", so that if we ever have to scan for btree nodes in repair we don't have to scan the whole device(s). This tweaks the allocator to prefer allocating from regions that are already marked in this bitmap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	40574946b8	bcachefs: Better bucket alloc tracepoints Tracepoints are garbage, and perf trace even cuts off some of our fields. Much nicer to just trace a string, and then we can build nicely formatted output with printbufs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	8783856ab1	bcachefs: ob_dev() Wrapper around bch2_dev_have_ref() for open_buckets; we do guarantee that the device an open_bucket points to exists. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	633cf06944	bcachefs: Kill bch2_dev_bkey_exists() in backpointer code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	4cd91e2f87	bcachefs: Convert to bch2_dev_tryget_noerror() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	f295298b8c	bcachefs: New helpers for device refcounts This will be used in the next patch for adding some new debug mode asserts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	e98786ea85	bcachefs: bch2_print_allocator_stuck() If we block on the allocator for more than 10 seconds, print out some useful debugging info. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	c749541353	bcachefs: uninline set_btree_iter_dontneed() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Kent Overstreet	e7f63c67fc	bcachefs: plumb data_type into bch2_bucket_alloc_trans() prep work for making the allocator try to keep btree nodes within the existing member info btree allocated bitmap Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	103304021e	bcachefs: Move gc of bucket.oldest_gen to workqueue This is a nice cleanup - and we've also been having problems with kthread creation in the mount path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	ca563dccb2	bcachefs: bch2_trans_unlock() must always be followed by relock() or begin() We're about to add new asserts for btree_trans locking consistency, and part of that requires that aren't using the btree_trans while it's unlocked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	2f724563fc	bcachefs: member helper cleanups Some renaming for better consistency bch2_member_exists -> bch2_member_alive bch2_dev_exists -> bch2_member_exists bch2_dev_exsits2 -> bch2_dev_exists bch_dev_locked -> bch2_dev_locked bch_dev_bkey_exists -> bch2_dev_bkey_exists new helper - bch2_dev_safe Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	5dd8c60e1e	bcachefs: iter/update/trigger/str_hash flag cleanup Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	e2a316b3cc	bcachefs: BCH_WATERMARK_interior_updates This adds a new watermark, higher priority than BCH_WATERMARK_reclaim, for interior btree updates. We've seen a deadlock where journal replay triggers a ton of btree node merges, and these use up all available open buckets and then interior updates get stuck. One cause of this is that we're currently lacking btree node merging on write buffer btrees - that needs to be fixed as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-01 21:14:02 -04:00
Kent Overstreet	ec35b30481	bcachefs: Fix lost transaction restart error Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-18 00:24:23 -04:00
Kent Overstreet	f1ca1abfb0	bcachefs: pull out time_stats.[ch] prep work for lifting out of fs/bcachefs/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:30:35 -04:00
Kent Overstreet	e58f963cec	bcachefs: helpers for printing data types We need bounds checking since new versions may introduce new data types. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 06:01:45 -05:00
Kent Overstreet	07f383c71f	bcachefs: btree_iter -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	41b84fb489	bcachefs: for_each_member_device_rcu() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	6d5c606c1c	bcachefs: use track_event_change() for allocator blocked stats Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	037a2d9f48	bcachefs: simplify bch_devs_list Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	a0acc24fed	bcachefs: Fix open coded set_btree_iter_dontneed() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	48dade8176	bcachefs: ONLY_SPECIFIED_DEVS doesn't mean ignore durability anymore Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	247ce5f1bb	bcachefs: Fix bch2_alloc_sectors_start_trans() error handling When we fail to allocate because of insufficient open buckets, we don't want to retry from the full set of devices - we just want to retry in blocking mode. But if the retry in blocking mode fails with a different error code, we end up squashing the -BCH_ERR_open_buckets_empty error with an error that makes us thing we won't be able to allocate (insufficient_devices) - which is incorrect when we didn't try to allocate from the full set of devices, and causes the write to fail. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-12-19 19:01:52 -05:00
Kent Overstreet	0af8a06a4c	bcachefs: deallocate_extra_replicas() When allocating from devices with different durability, we might end up with more replicas than required; this changes bch2_alloc_sectors_start() to check for this, and drop replicas that aren't needed to hit the number of replicas requested. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-24 03:03:47 -05:00
Kent Overstreet	6201d91ee3	bcachefs: Put erasure coding behind an EXPERIMENTAL kconfig option We still have disk space accounting changes coming for erasure coding, and the changes won't be as strictly backwards compatible as they'd ought to be - specifically, we need to start accounting striped data under a separate counter in bch_alloc (which describes buckets). A fsck will suffice for upgrading/downgrading, but since erasure coding is the most incomplete major feature of bcachefs it still makes sense to put behind a separate kconfig option, so that users are fully aware. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-24 00:29:58 -05:00
Brian Foster	e0fb0dccfd	bcachefs: update alloc cursor in early bucket allocator A recent bug report uncovered a scenario where a filesystem never runs with freespace_initialized, and therefore the user observes significantly degraded write performance by virtue of running the early bucket allocator. The associated bug aside, the primary cause of the performance drop in this particular instance is that the early bucket allocator does not update the allocation cursor. This means that every allocation walks the alloc btree from the first bucket of the associated device looking for a bucket marked as free space. Update the early allocator code to set the alloc cursor to the last processed position in the tree, similar to how the freelist allocator behaves. With the alloc_cursor being updated, the retry logic also needs to be updated to restart from the beginning of the device when a free bucket is not available between the cursor and the end of the device. Track the restart position in a first_bucket variable to make the code a bit more easily readable and consistent with the freelist allocator. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Brian Foster	385a82f62a	bcachefs: serialize on cached key in early bucket allocator bcachefs had a transient bug where freespace_initialized was not properly being set, which lead to unexpected use of the early bucket allocator at runtime. This issue has been fixed, but the existence of it uncovered a coherency issue in the early bucket allocation code that is somewhat related to how uncached iterators deal with the key cache. The problem itself manifests as occasional failure of generic/113 due to corruption, often seen as a duplicate backpointer or multiple data types per-bucket error. The immediate cause of the error is a racing bucket allocation along the lines of the following sequence: - Task 1 selects key A in bch2_bucket_alloc_early() and schedules. - Task 2 selects the same key A, but proceeds to complete the allocation and associated I/O, after which it releases the open_bucket. - Task 1 resumes with key A, but does not recognize the bucket is now allocated because the open_bucket has been removed from the hash when it was released in the previous step. This generally shouldn't happen because the allocating task updates the alloc btree key before releasing the bucket. This is not sufficient in this particular instance, however, because an uncached iterator for a cached btree doesn't actually lock the key cache slot when no key exists for a given slot in the cache. Thus the fact that the allocation side updates the cached key means that multiple uncached iters can stumble across the same alloc key and duplicate the bucket allocation as described above. This is something that probably needs a longer term fix in the iterator code. As a short term fix, close the race through explicit use of a cached iterator for likely allocation candidates. We don't want to scan the btree with a cached iterator because that would unnecessarily pollute the cache. This mitigates cache pollution by primarily scanning the tree with an uncached iterator, but closes the race by creating a key cache entry for any prospective slot prior to the bucket allocation attempt (also similar to how _alloc_freelist() works via try_alloc_bucket()). This survives many iterations of generic/113 on a kernel hacked to always use the early bucket allocator. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Kent Overstreet	6bd68ec266	bcachefs: Heap allocate btree_trans We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Kent Overstreet	96dea3d599	bcachefs: Fix W=12 build errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Kent Overstreet	1809b8cba7	bcachefs: Break up io.c More reorganization, this splits up io.c into - io_read.c - io_misc.c - fallocate, fpunch, truncate - io_write.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	73ded163e5	bcachefs: Add a comment for should_drop_open_bucket() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:10 -04:00
Kent Overstreet	e6375481c9	bcachefs: Improve bch2_write_points_to_text() Now we also print the open_buckets owned by each write_point - this is to help with debugging a shutdown hang. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:10 -04:00
Kent Overstreet	bf5a261c7a	bcachefs: Assorted fixes for clang clang had a few more warnings about enum conversion, and also didn't like the opts.c initializer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:09 -04:00
Kent Overstreet	e8ee5cc733	bcachefs: Fix try_decrease_writepoints() We were freeing open buckets on the writepoint list, but forgetting to take them off the writepoint list - whoops Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:06 -04:00
Kent Overstreet	067d228bb0	bcachefs: Enumerate recovery passes Recovery and fsck have many different passes/jobs to do, which always run in the same order - but not all of them run all the time. Some are for fsck, some for unclean shutdown, some for version upgrades. This adds some new structure: a defined list of recovery passes that we can run in a loop, as well as consolidating the log messages. The main benefit is consolidating the "should run this recovery pass" logic, as well as cleaning up the "this recovery pass has finished" state; instead of having a bunch of ad-hoc state bits in c->flags, we've now got c->curr_recovery_pass. By consolidating the "should run this recovery pass" logic, in the future on disk format upgrades will be able to say "upgrading to this version requires x passes to run", instead of forcing all of fsck to run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:06 -04:00
Kent Overstreet	494036d862	bcachefs: BCH_WATERMARK_reclaim Add another watermark for journal reclaim - this is needed for the next patches, that unify BCH_WATERMARK with JOURNAL_WATERMARK. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:05 -04:00
Kent Overstreet	e53a961c6b	bcachefs: Rename enum alloc_reserve -> bch_watermark This is prep work for consolidating with JOURNAL_WATERMARK. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	995f9128e0	bcachefs: Fix try_decrease_writepoints() - We may need to drop btree locks before taking the writepoint_lock, as is done in other places. - We should be using open_bucket_free_unused(), so that we don't waste space. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Mikulas Patocka	954ed17e02	bcachefs: fix NULL pointer dereference in try_alloc_bucket On Mon, 29 May 2023, Mikulas Patocka wrote: > The oops happens in set_btree_iter_dontneed and it is caused by the fact > that iter->path is NULL. The code in try_alloc_bucket is buggy because it > sets "struct btree_iter iter = { NULL };" and then jumps to the "err" > label that tries to dereference values in "iter". Here I'm sending a patch for it. From: Mikulas Patocka <mpatocka@redhat.com> The function try_alloc_bucket sets the variable "iter" to NULL and then (on various error conditions) jumps to the label "err". On the "err" label, it calls "set_btree_iter_dontneed" that tries to dereference "iter->trans" and "iter->path". So, we get an oops on error condition. This patch fixes the crash by testing that iter.trans and iter.path is non-zero before calling set_btree_iter_dontneed. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Brian Foster	a1dd428b8b	bcachefs: push rcu lock down into bch2_target_to_mask() We have one caller that cycles the rcu lock solely for this call (via target_rw_devs()), and we'd like to add another. Simplify things by pushing the rcu lock down into bch2_target_to_mask(), similar to how bch2_dev_in_target() works. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	bcb79a51cb	bcachefs: bch2_bkey_get_iter() helpers Introduce new helpers for a common pattern: bch2_trans_iter_init(); bch2_btree_iter_peek_slot(); - bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of the correct type - bch2_bkey_get_val_typed() copies the val out of the btree to a (typically stack allocated) variable; it handles the case where the value in the btree is smaller than the current version of the type, zeroing out the remainder. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:00 -04:00
Kent Overstreet	62a03559d6	bcachefs: Rip out code for storing backpointers in alloc keys We don't store backpointers in alloc keys anymore, since we gained the btree write buffer. This patch drops support for backpointers in alloc keys, and revs the on disk format version so that we know a fsck is required. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	b40901b0f7	bcachefs: New erasure coding shutdown path This implements a new shutdown path for erasure coding, which is needed for the upcoming BCH_WRITE_WAIT_FOR_EC write path. The process is: - Cancel new stripes being built up - Close out/cancel open buckets on write points or the partial list that are for stripes - Shutdown rebalance/copygc - Then wait for in flight new stripes to finish With BCH_WRITE_WAIT_FOR_EC, move ops will be waiting on stripes to fill up before they complete; the new ec shutdown path is needed for shutting down copygc/rebalance without deadlocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:57 -04:00
Kent Overstreet	fba053d2aa	bcachefs: Second layer of refcounting for new stripes This will be used for move writes, which will be waiting until the stripe is created to do the index update. They need to prevent the stripe from being reclaimed until their index update is done, so we need another refcount that just keeps the stripe open. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> # Conflicts: # fs/bcachefs/ec.c # fs/bcachefs/io.c	2023-10-22 17:09:56 -04:00
Kent Overstreet	7635e1a6d6	bcachefs: Rework open bucket partial list allocation Now, any open_bucket can go on the partial list: allocating from the partial list has been moved to its own dedicated function, open_bucket_add_bucets() -> bucket_alloc_set_partial(). In particular, this means that erasure coded buckets can safely go on the partial list; the new location works with the "allocate an ec bucket first, then the rest" logic. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	2a912a9a39	bcachefs: Kill bch2_ec_bucket_written() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	a1fb08f5df	bcachefs: Plumb alloc_reserve through stripe create path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	2f4e9472fa	bcachefs: bch2_open_bucket_to_text() Factor out a common helper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	39a1ea129a	bcachefs: Single open_bucket_partial list Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:54 -04:00

1 2 3

127 commits