git/reftable
Patrick Steinhardt 4f36b8597c reftable/stack: fix race in up-to-date check
In 6fdfaf15a0 (reftable/stack: use stat info to avoid re-reading stack
list, 2024-01-11) we have introduced a new mechanism to avoid re-reading
the table list in case stat(3P) figures out that the stack didn't change
since the last time we read it.

While this change significantly improved performance when writing many
refs, it can unfortunately lead to false negatives in very specific
scenarios. Given two processes A and B, there is a feasible sequence of
events that cause us to accidentally treat the table list as up-to-date
even though it changed:

  1. A reads the reftable stack and caches its stat info.

  2. B updates the stack, appending a new table to "tables.list". This
     will both use a new inode and result in a different file size, thus
     invalidating A's cache in theory.

  3. B decides to auto-compact the stack and merges two tables. The file
     size now matches what A has cached again. Furthermore, the
     filesystem may decide to recycle the inode number of the file we
     have replaced in (2) because it is not in use anymore.

  4. A reloads the reftable stack. Neither the inode number nor the
     file size changed. If the timestamps did not change either then we
     think the cached copy of our stack is up-to-date.

In fact, the commit introduced three related issues:

  - Non-POSIX compliant systems may not report proper `st_dev` and
    `st_ino` values in stat(3P), which made us rely solely on the
    file's potentially coarse-grained mtime and ctime.

  - `stat_validity_check()` and friends may end up not comparing
    `st_dev` and `st_ino` depending on the "core.checkstat" config,
    again reducing the signal to the mtime and ctime.

  - `st_ino` can be recycled, rendering the check moot even on
    POSIX-compliant systems.

Given that POSIX defines that "The st_ino and st_dev fields taken
together uniquely identify the file within the system", these issues led
to the most important signal to establish file identity to be ignored or
become useless in some cases.

Refactor the code to stop using `stat_validity_check()`. Instead, we
manually stat(3P) the file descriptors to make relevant information
available. On Windows and MSYS2 the result will have both `st_dev` and
`st_ino` set to 0, which allows us to address the first issue by not
using the stat-based cache in that case. It also allows us to make sure
that we always compare `st_dev` and `st_ino`, addressing the second
issue.

The third issue of inode recycling can be addressed by keeping the file
descriptor of "files.list" open during the lifetime of the reftable
stack. As the file will still exist on disk even though it has been
unlinked it is impossible for its inode to be recycled as long as the
file descriptor is still open.

This should address the race in a POSIX-compliant way. The only real
downside is that this mechanism cannot be used on non-POSIX-compliant
systems like Windows. But we at least have the second-level caching
mechanism in place that compares contents of "files.list" with the
currently loaded list of tables.

This new mechanism performs roughly the same as the previous one that
relied on `stat_validity_check()`:

  Benchmark 1: update-ref: create many refs (HEAD~)
    Time (mean ± σ):      4.754 s ±  0.026 s    [User: 2.204 s, System: 2.549 s]
    Range (min … max):    4.694 s …  4.802 s    20 runs

  Benchmark 2: update-ref: create many refs (HEAD)
    Time (mean ± σ):      4.721 s ±  0.020 s    [User: 2.194 s, System: 2.527 s]
    Range (min … max):    4.691 s …  4.753 s    20 runs

  Summary
    update-ref: create many refs (HEAD~) ran
      1.01 ± 0.01 times faster than update-ref: create many refs (HEAD)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-18 12:02:09 -08:00
..
basics.c reftable: utility functions 2021-10-08 10:45:48 -07:00
basics.h reftable: utility functions 2021-10-08 10:45:48 -07:00
basics_test.c reftable: utility functions 2021-10-08 10:45:48 -07:00
block.c reftable/block: reuse buffer to compute record keys 2023-12-11 07:23:17 -08:00
block.h reftable/block: reuse buffer to compute record keys 2023-12-11 07:23:17 -08:00
block_test.c reftable/block: introduce macro to initialize struct block_iter 2023-12-11 07:23:17 -08:00
blocksource.c reftable/blocksource: use mmap to read tables 2024-01-11 12:10:59 -08:00
blocksource.h reftable: add blocksource, an abstraction for random access reads 2021-10-08 10:45:48 -07:00
constants.h reftable: (de)serialization for the polymorphic record type. 2021-10-08 10:45:48 -07:00
dump.c hash-ll.h: split out of hash.h to remove dependency on repository.h 2023-04-24 12:47:32 -07:00
error.c reftable: ensure git-compat-util.h is the first (indirect) include 2023-04-24 12:47:33 -07:00
generic.c reftable: make assignments portable to AIX xlc v12.01 2022-03-28 13:58:10 -07:00
generic.h reftable: generic interface to tables 2021-10-08 10:45:48 -07:00
iter.c reftable: make reftable_record a tagged union 2022-01-20 11:31:53 -08:00
iter.h reftable/block: introduce macro to initialize struct block_iter 2023-12-11 07:23:17 -08:00
LICENSE
merged.c reftable/merged: reuse buffer to compute record keys 2023-12-11 07:23:16 -08:00
merged.h reftable/merged: reuse buffer to compute record keys 2023-12-11 07:23:16 -08:00
merged_test.c reftable tests: avoid "int" overflow, use "uint64_t" 2022-01-13 13:39:09 -08:00
pq.c reftable: use a pointer for pq_entry param 2022-09-15 11:32:37 -07:00
pq.h reftable: use a pointer for pq_entry param 2022-09-15 11:32:37 -07:00
pq_test.c reftable: use a pointer for pq_entry param 2022-09-15 11:32:37 -07:00
publicbasics.c reftable: ensure git-compat-util.h is the first (indirect) include 2023-04-24 12:47:33 -07:00
reader.c reftable/block: introduce macro to initialize struct block_iter 2023-12-11 07:23:17 -08:00
reader.h reftable: read reftable files 2021-10-08 10:45:48 -07:00
readwrite_test.c reftable/stack: fix use of unseeded randomness 2023-12-11 07:23:16 -08:00
record.c reftable: add print functions to the record types 2022-01-20 11:31:53 -08:00
record.h reftable: add print functions to the record types 2022-01-20 11:31:53 -08:00
record_test.c reftable: make assignments portable to AIX xlc v12.01 2022-03-28 13:58:10 -07:00
refname.c reftable: implement refname validation 2021-10-08 10:45:48 -07:00
refname.h reftable: implement refname validation 2021-10-08 10:45:48 -07:00
refname_test.c reftable: implement refname validation 2021-10-08 10:45:48 -07:00
reftable-blocksource.h reftable: add blocksource, an abstraction for random access reads 2021-10-08 10:45:48 -07:00
reftable-error.h reftable: signal overflow 2021-12-23 12:28:34 -08:00
reftable-generic.h reftable: generic interface to tables 2021-10-08 10:45:48 -07:00
reftable-iterator.h reftable: generic interface to tables 2021-10-08 10:45:48 -07:00
reftable-malloc.h reftable: utility functions 2021-10-08 10:45:48 -07:00
reftable-merged.h reftable: add merged table view 2021-10-08 10:45:48 -07:00
reftable-reader.h reftable: read reftable files 2021-10-08 10:45:48 -07:00
reftable-record.h reftable: make reftable-record.h function signatures const correct 2022-01-20 11:31:53 -08:00
reftable-stack.h reftable: implement stack, a mutable database of reftable files. 2021-10-08 10:45:48 -07:00
reftable-tests.h reftable: add a heap-based priority queue for reftable records 2021-10-08 10:45:48 -07:00
reftable-writer.h reftable: rename writer_stats to reftable_writer_stats 2022-02-23 13:36:26 -08:00
stack.c reftable/stack: fix race in up-to-date check 2024-01-18 12:02:09 -08:00
stack.h reftable/stack: fix race in up-to-date check 2024-01-18 12:02:09 -08:00
stack_test.c reftable/stack: perform auto-compaction with transactional interface 2023-12-11 07:23:16 -08:00
system.h reftable/stack: fix race in up-to-date check 2024-01-18 12:02:09 -08:00
test_framework.c reftable: utility functions 2021-10-08 10:45:48 -07:00
test_framework.h reftable: wrap EXPECT macros in do/while 2023-12-11 07:23:15 -08:00
tree.c reftable: ensure git-compat-util.h is the first (indirect) include 2023-04-24 12:47:33 -07:00
tree.h reftable: a generic binary tree implementation 2021-10-08 10:45:48 -07:00
tree_test.c reftable: ensure git-compat-util.h is the first (indirect) include 2023-04-24 12:47:33 -07:00
writer.c Merge branch 'ep/maint-equals-null-cocci' 2022-05-20 15:26:59 -07:00
writer.h reftable: write reftable files 2021-10-08 10:45:48 -07:00