git/builtin
Ævar Arnfjörð Bjarmason ae35e16cd4 reflog expire: don't lock reflogs using previously seen OID
During reflog expiry, the cmd_reflog_expire() function first iterates
over all reflogs in logs/*, and then one-by-one acquires the lock for
each one and expires it. This behavior has been with us since this
command was implemented in 4264dc15e1 ("git reflog expire",
2006-12-19).

Change this to stop calling lock_ref_oid_basic() with the OID we saw
when we looped over the logs, instead have it pass the OID it managed
to lock.

This mostly mitigates a race condition where e.g. "git gc" will fail
in a concurrently updated repository because the branch moved since
"git reflog expire --all" was started. I.e. with:

    error: cannot lock ref '<refname>': ref '<refname>' is at <OID-A> but expected <OID-B>

This behavior of passing in an "oid" was needed for an edge-case that
I've untangled in this and preceding commits though, namely that we
needed this OID because we'd:

 1. Lookup the reflog name/OID via dwim_log()
 2. With that OID, lock the reflog
 3. Later in builtin/reflog.c we use the OID we looked as input to
    lookup_commit_reference_gently(), assured that it's equal to the
    OID we got from dwim_log().

We can be sure that this change is safe to make because between
dwim_log (step #1) and lock_ref_oid_basic (step #2) there was no other
logic relevant to the OID or expiry run in the cmd_reflog_expire()
caller.

We can thus treat that code as a black box, before and after this
change it would get an OID that's been locked, the only difference is
that now we mostly won't be failing to get the lock due to the TOCTOU
race[0]. That failure was purely an implementation detail in how the
"current OID" was looked up, it was divorced from the locking
mechanism.

What do we mean with "mostly"? It mostly mitigates it because we'll
still run into cases where the ref is locked and being updated as we
want to expire it, and other git processes wanting to update the refs
will in turn race with us as we expire the reflog.

That remaining race can in turn be mitigated with the
core.filesRefLockTimeout setting, see 4ff0f01cb7 ("refs: retry
acquiring reference locks for 100ms", 2017-08-21). In practice if that
value is high enough we'll probably never have ref updates or reflog
expiry failing, since the clients involved will retry for far longer
than the time any of those operations could take.

See [1] for an initial report of how this impacted "git gc" and a
large discussion about this change in early 2019. In particular patch
looked good to Michael Haggerty, see his[2]. That message seems to not
have made it to the ML archive, its content is quoted in full in my
[3].

I'm leaving behind now-unused code the refs API etc. that takes the
now-NULL "unused_oid" argument, and other code that can be simplified now
that we never have on OID in that context, that'll be cleaned up in
subsequent commits, but for now let's narrowly focus on fixing the
"git gc" issue. As the modified assert() shows we always pass a NULL
oid to reflog_expire() now.

Unfortunately this sort of probabilistic contention is hard to turn
into a test. I've tested this by running the following three subshells
in concurrent terminals:

    (
        rm -rf /tmp/git &&
        git init /tmp/git &&
        while true
        do
            head -c 10 /dev/urandom | hexdump >/tmp/git/out &&
            git -C /tmp/git add out &&
            git -C /tmp/git commit -m"out"
        done
    )

    (
	rm -rf /tmp/git-clone &&
        git clone file:///tmp/git /tmp/git-clone &&
        while git -C /tmp/git-clone pull
        do
            date
        done
    )

    (
        while git -C /tmp/git-clone reflog expire --all
        do
            date
        done
    )

Before this change the "reflog expire" would fail really quickly with
the "but expected" error noted above.

After this change both the "pull" and "reflog expire" will run for a
while, but eventually fail because I get unlucky with
core.filesRefLockTimeout (the "reflog expire" is in a really tight
loop). As noted above that can in turn be mitigated with higher values
of core.filesRefLockTimeout than the 100ms default.

As noted in the commentary added in the preceding commit there's also
the case of branches being racily deleted, that can be tested by
adding this to the above:

    (
        while git -C /tmp/git-clone branch topic master &&
	      git -C /tmp/git-clone branch -D topic
        do
            date
        done
    )

With core.filesRefLockTimeout set to 10 seconds (it can probably be a
lot lower) I managed to run all four of these concurrently for about
an hour, and accumulated ~125k commits, auto-gc's and all, and didn't
have a single failure. The loops visibly stall while waiting for the
lock, but that's expected and desired behavior.

0. https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use
1. https://lore.kernel.org/git/87tvg7brlm.fsf@evledraar.gmail.com/
2. http://lore.kernel.org/git/b870a17d-2103-41b8-3cbc-7389d5fff33a@alum.mit.edu
3. https://lore.kernel.org/git/87pnqkco8v.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25 13:27:37 -07:00
..
add.c dir.[ch]: replace dir_init() with DIR_INIT 2021-07-01 12:32:22 -07:00
am.c am: learn to process quoted lines that ends with CRLF 2021-05-10 15:06:22 +09:00
annotate.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
apply.c
archive.c
bisect--helper.c bisect--helper: use BISECT_TERMS in 'bisect skip' command 2021-04-30 09:56:42 +09:00
blame.c Merge branch 'rs/blame-optim' 2021-02-25 16:43:29 -08:00
branch.c ref-filter: reuse output buffer 2021-04-20 11:09:50 -07:00
bugreport.c builtin/bugreport: don't leak prefixed filename 2021-04-28 09:25:45 +09:00
bundle.c bundle: remove "ref_list" in favor of string-list.c API 2021-07-06 12:10:17 -07:00
cat-file.c cat-file: merge two block into one 2021-06-04 07:50:26 +09:00
check-attr.c
check-ignore.c dir.[ch]: replace dir_init() with DIR_INIT 2021-07-01 12:32:22 -07:00
check-mailmap.c shortlog: remove unused(?) "repo-abbrev" feature 2021-01-12 14:04:42 -08:00
check-ref-format.c
checkout--worker.c builtin/checkout--worker: zero-initialise struct to avoid MSAN complaints 2021-06-15 12:07:56 +09:00
checkout-index.c Merge branch 'mt/parallel-checkout-part-3' 2021-05-16 21:05:23 +09:00
checkout.c Merge branch 'mt/parallel-checkout-part-3' 2021-05-16 21:05:23 +09:00
clean.c dir.[ch]: replace dir_init() with DIR_INIT 2021-07-01 12:32:22 -07:00
clone.c Merge branch 'jk/clone-clean-upon-transport-error' 2021-06-14 13:33:26 +09:00
column.c column, range-diff: downcase option description 2021-03-29 14:06:08 -07:00
commit-graph.c builtin/*: update usage format 2021-01-06 15:10:49 -08:00
commit-tree.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
commit.c Merge branch 'ds/sparse-index-protections' 2021-04-30 13:50:26 +09:00
config.c config: unify code paths to get global config paths 2021-04-19 14:16:59 -07:00
count-objects.c
credential-cache--daemon.c unix-socket: add backlog size option to unix_stream_listen() 2021-03-15 14:32:51 -07:00
credential-cache.c unix-socket: disallow chdir() when creating unix domain sockets 2021-03-15 14:32:51 -07:00
credential-store.c crendential-store: use timeout when locking file 2020-11-25 12:30:18 -08:00
credential.c credential: load default config 2020-10-16 12:30:45 -07:00
describe.c hash: provide per-algorithm null OIDs 2021-04-27 16:31:39 +09:00
diff-files.c Merge branch 'jc/diffcore-rotate' 2021-02-25 16:43:30 -08:00
diff-index.c diff-merges: move specific diff-index "-m" handling to diff-index 2021-05-21 09:24:14 +09:00
diff-tree.c Merge branch 'jc/diffcore-rotate' 2021-02-25 16:43:30 -08:00
diff.c hash: provide per-algorithm null OIDs 2021-04-27 16:31:39 +09:00
difftool.c Merge branch 'ab/cmd-foo-should-return' 2021-07-08 13:15:04 -07:00
env--helper.c assert PARSE_OPT_NONEG in parse-options callbacks 2020-09-30 12:53:47 -07:00
fast-export.c hash: provide per-algorithm null OIDs 2021-04-27 16:31:39 +09:00
fast-import.c Use the final_oid_fn to finalize hashing of object IDs 2021-04-27 16:31:38 +09:00
fetch-pack.c connect, transport: encapsulate arg in struct 2021-02-05 13:49:54 -08:00
fetch.c Merge branch 'ab/fetch-negotiate-segv-fix' 2021-07-16 17:42:48 -07:00
fmt-merge-msg.c Lib-ify fmt-merge-msg 2020-03-24 15:04:43 -07:00
for-each-ref.c Merge branch 'ah/plugleaks' 2021-05-07 12:47:41 +09:00
for-each-repo.c for-each-repo: do nothing on empty config 2021-01-07 19:12:02 -08:00
fsck.c Merge branch 'ab/fsck-api-cleanup' 2021-06-02 07:34:27 +09:00
gc.c maintenance: fix two memory leaks 2021-05-12 07:00:45 +09:00
get-tar-commit-id.c builtin/get-tar-commit-id: make hash size independent 2019-04-01 11:57:39 +09:00
grep.c dir.[ch]: replace dir_init() with DIR_INIT 2021-07-01 12:32:22 -07:00
hash-object.c builtin: consistently pass cmd_* prefix to parse_options 2019-05-13 14:22:53 +09:00
help.c help: convert git_cmd to page in one place 2021-07-06 13:09:20 -07:00
index-pack.c Use the final_oid_fn to finalize hashing of object IDs 2021-04-27 16:31:38 +09:00
init-db.c Merge branch 'mt/init-template-userpath-fix' 2021-05-25 16:21:20 +09:00
interpret-trailers.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
log.c *: fix typos which duplicate a word 2021-06-14 10:16:06 +09:00
ls-files.c dir.[ch]: replace dir_init() with DIR_INIT 2021-07-01 12:32:22 -07:00
ls-remote.c Merge branch 'ah/plugleaks' 2021-04-07 16:54:08 -07:00
ls-tree.c tree.h API: simplify read_tree_recursive() signature 2021-03-20 16:09:26 -07:00
mailinfo.c mailinfo: allow squelching quoted CRLF warning 2021-05-10 15:06:22 +09:00
mailsplit.c
merge-base.c rebase: --fork-point regression fix 2020-02-11 09:59:39 -08:00
merge-file.c
merge-index.c merge-index: ensure full index 2021-04-14 13:47:21 -07:00
merge-ours.c builtins + test helpers: use return instead of exit() in cmd_* 2021-06-09 09:15:58 +09:00
merge-recursive.c Ensure index matches head before invoking merge machinery, round N 2019-08-19 10:08:03 -07:00
merge-tree.c xdiff users: use designated initializers for out_line 2021-05-11 12:47:31 +09:00
merge.c Merge branch 'ah/merge-usage-i18n-fix' 2021-06-10 12:04:23 +09:00
mktag.c fsck.c: add an fsck_set_msg_type() API that takes enums 2021-03-28 19:03:10 -07:00
mktree.c builtins + test helpers: use return instead of exit() in cmd_* 2021-06-09 09:15:58 +09:00
multi-pack-index.c midx: allow marking a pack as preferred 2021-04-01 13:07:37 -07:00
mv.c git mv foo FOO ; git mv foo bar gave an assert 2021-03-03 17:07:12 -08:00
name-rev.c oid_pos(): access table through const pointers 2021-01-28 12:03:26 -08:00
notes.c use CALLOC_ARRAY 2021-03-13 16:00:09 -08:00
pack-objects.c Merge branch 'ab/pack-linkage-fix' 2021-05-27 12:36:58 +09:00
pack-redundant.c builtin/pack-redundant: avoid casting buffers to struct object_id 2021-04-27 16:31:38 +09:00
pack-refs.c Honor core.precomposeUnicode in more places 2019-04-26 10:54:03 +09:00
patch-id.c patch-id: use oid_to_hex() to print multiple object IDs 2019-12-09 12:26:40 -08:00
prune-packed.c Lib-ify prune-packed 2020-03-24 15:04:44 -07:00
prune.c Merge branch 'tb/shallow-cleanup' 2020-05-13 12:19:18 -07:00
pull.c pull: trivial whitespace style fix 2021-06-19 16:36:17 +09:00
push.c push: don't get a full remote object 2021-06-02 10:12:03 +09:00
range-diff.c column, range-diff: downcase option description 2021-03-29 14:06:08 -07:00
read-tree.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
rebase.c Merge branch 'bc/hash-transition-interop-part-1' 2021-05-10 16:59:46 +09:00
receive-pack.c hash: provide per-algorithm null OIDs 2021-04-27 16:31:39 +09:00
reflog.c reflog expire: don't lock reflogs using previously seen OID 2021-08-25 13:27:37 -07:00
remote-ext.c strvec: convert builtin/ callers away from argv_array name 2020-07-28 15:02:18 -07:00
remote-fd.c
remote.c Merge branch 'ah/plugleaks' 2021-04-07 16:54:08 -07:00
repack.c repack: avoid loosening promisor objects in partial clones 2021-04-28 13:36:13 +09:00
replace.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
rerere.c xdiff users: use designated initializers for out_line 2021-05-11 12:47:31 +09:00
reset.c reset: free instead of leaking unneeded ref 2021-03-14 15:57:59 -07:00
rev-list.c rev-list: allow filtering of provided items 2021-04-19 14:09:11 -07:00
rev-parse.c rev-parse: mark die() messages for translation 2021-05-17 18:39:53 +09:00
revert.c sequencer: fix edit handling for cherry-pick and revert messages 2021-03-31 14:10:50 -07:00
rm.c Merge branch 'ah/plugleaks' 2021-05-07 12:47:41 +09:00
send-pack.c push: parse and set flag for "--force-if-includes" 2020-10-03 09:59:19 -07:00
shortlog.c Merge branch 'ab/mailmap' 2021-01-25 14:19:19 -08:00
show-branch.c show-branch: don't <COLOR></RESET> for space characters 2021-06-28 09:33:06 -07:00
show-index.c builtin/show-index: set the algorithm for object IDs 2021-04-27 16:31:39 +09:00
show-ref.c refs: switch peel_ref() to peel_iterated_oid() 2021-01-21 15:51:31 -08:00
sparse-checkout.c Merge branch 'ds/sparse-index-protections' 2021-04-30 13:50:26 +09:00
stash.c Merge branch 'ab/struct-init' 2021-07-16 17:42:53 -07:00
stripspace.c
submodule--helper.c Merge branch 'ar/submodule-helper-include-cleanup' 2021-07-16 17:42:51 -07:00
symbolic-ref.c symbolic-ref: don't leak shortened refname in check_symref() 2021-03-14 15:57:59 -07:00
tag.c ref-filter: reuse output buffer 2021-04-20 11:09:50 -07:00
unpack-file.c
unpack-objects.c hash: provide per-algorithm null OIDs 2021-04-27 16:31:39 +09:00
update-index.c update-index: ensure full index 2021-04-14 13:47:29 -07:00
update-ref.c update-ref: disallow "start" for ongoing transactions 2020-11-16 13:44:01 -08:00
update-server-info.c
upload-archive.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
upload-pack.c builtin: consistently pass cmd_* prefix to parse_options 2019-05-13 14:22:53 +09:00
var.c
verify-commit.c Merge branch 'jk/no-system-includes-in-dot-c' 2019-07-31 14:38:56 -07:00
verify-pack.c Merge branch 'bc/sha-256-part-3' 2020-08-11 18:04:11 -07:00
verify-tag.c verify-tag: drop signal.h include 2019-06-19 08:19:21 -07:00
worktree.c Merge branch 'en/dir-traversal' 2021-05-20 08:54:59 +09:00
write-tree.c cmd_{read,write}_tree: rename "unused" variable that is used 2019-05-13 14:22:53 +09:00