Find a file
Brian Foster 3c434cdff0 bcachefs: fix accounting corruption race between reclaim and dev add
When a device is removed from a bcachefs volume, the associated
content is removed from the various btrees. The alloc tree uses the
key cache, so when keys are removed the deletes exist in cache for a
period of time until reclaim comes along and flushes outstanding
updates.

When a device is re-added to the bcachefs volume, the add process
re-adds some of these previously deleted keys. When marking device
superblock locations on device add, the keys will likely refer to
some of the same alloc keys that were just removed. The memory
triggers for these key updates are responsible for further updates,
such as bch2_mark_alloc() calling into bch2_dev_usage_update() to
update per-device usage accounting.

When a new key is added to key cache, the trans update path also
flushes the key to the backing btree for coherency reasons for tree
walks.

With all of this context, if a device is removed and re-added
quickly enough such that some key deletes from the remove are still
pending a key cache flush, the trans update path can view this as
addition of a new key because the old key in the insert entry refers
to a deleted key. However the deleted cached key has not been filled
by absence of a btree key, but rather refers to an explicit deletion
of an existing key that occurred during device removal.

The trans update path adds a new update to flush the key and tags
the original (cached) update to skip running the memory triggers.
This results in running triggers on the non-cached update instead,
which in turn will perform accounting updates based on incoherent
values. For example, bch2_dev_usage_update() subtracts the the old
alloc key dirty sector count in the non-cached btree key from the
newly initialized (i.e. zeroed) per device counters, leading to
underflow and accounting corruption.

There are at least a few ways to avoid this problem, the simplest of
which may be to run triggers against the cached update rather than
the non-cached update. If the key only needs to be flushed when the
key is not present in the tree, however, then this still performs an
unnecessary update. We could potentially use the cached key dirty
state to determine whether the delete is a dirty, cached update vs.
a clean cache fill, but this may require transmitting key cache
dirty state across layers, which adds complexity and seems to be of
limited value. Instead, update flush_new_cached_update() to handle
this by simply checking for the key in the btree and only perform
the flush when a backing key is not present.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:00 -04:00
arch Fix preemption delays in the SGX code, remove unnecessarily UAPI-exported code, 2023-09-10 10:39:31 -07:00
block block: fix pin count management when merging same-page segments 2023-09-06 07:32:27 -06:00
certs certs: Reference revocation list for all keyrings 2023-08-17 20:12:41 +00:00
crypto This update includes the following changes: 2023-08-29 11:23:29 -07:00
Documentation drm ci for 6.6-rc1 2023-09-10 11:55:26 -07:00
drivers bcache: move closures to lib/ 2023-10-19 14:47:33 -04:00
fs bcachefs: fix accounting corruption race between reclaim and dev add 2023-10-22 17:10:00 -04:00
include bcachefs: Update export_operations for snapshots 2023-10-22 17:09:17 -04:00
init sched: Add task_struct->faults_disabled_mapping 2023-09-11 23:59:46 -04:00
io_uring Revert "io_uring: fix IO hang in io_wq_put_and_exit from do_exit()" 2023-09-07 09:41:49 -06:00
ipc Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
kernel locking: export contention tracepoints for bcachefs six locks 2023-10-19 14:47:33 -04:00
lib lib/generic-radix-tree.c: Add peek_prev() 2023-10-19 14:47:33 -04:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
mm LoongArch changes for v6.6 2023-09-08 12:16:52 -07:00
net Including fixes from netfilter and bpf. 2023-09-07 18:33:07 -07:00
rust Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
samples VFIO updates for v6.6-rc1 2023-08-30 20:36:01 -07:00
scripts Fix preemption delays in the SGX code, remove unnecessarily UAPI-exported code, 2023-09-10 10:39:31 -07:00
security Landlock updates for v6.6-rc1 2023-09-08 12:06:51 -07:00
sound sound fixes for 6.6-rc1 2023-09-08 13:07:50 -07:00
tools objtool: Add bcachefs noreturns 2023-10-19 14:58:29 -04:00
usr initramfs: Encode dependency on KBUILD_BUILD_TIMESTAMP 2023-06-06 17:54:49 +09:00
virt ARM: 2023-09-07 13:52:20 -07:00
.clang-format iommu: Add for_each_group_device() 2023-05-23 08:15:51 +02:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore kbuild: rpm-pkg: rename binkernel.spec to kernel.spec 2023-07-25 00:59:33 +09:00
.mailmap for-linus-2023083101 2023-09-01 12:31:44 -07:00
.rustfmt.toml
COPYING
CREDITS USB: Remove Wireless USB and UWB documentation 2023-08-09 14:17:32 +02:00
Kbuild
Kconfig
MAINTAINERS MAINTAINERS: Add entry for bcachefs 2023-10-22 17:08:07 -04:00
Makefile Linux 6.6-rc1 2023-09-10 16:28:41 -07:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.