Commit graph

486 commits

Author SHA1 Message Date
Junio C Hamano
31e3912369 Merge branch 'ab/refs-errno-cleanup'
A brown-paper-bag fix on top of a topic that was merged during this
cycle.

* ab/refs-errno-cleanup:
  refs API: use "failure_errno", not "errno"
2022-01-14 15:25:15 -08:00
Ævar Arnfjörð Bjarmason
cac15b3fb4 refs API: use "failure_errno", not "errno"
Fix a logic error in refs_resolve_ref_unsafe() introduced in a recent
series of mine to abstract the refs API away from errno. See
96f6623ada (Merge branch 'ab/refs-errno-cleanup', 2021-11-29)for that
series.

In that series introduction of "failure_errno" to
refs_resolve_ref_unsafe came in ef18119dec (refs API: add a version
of refs_resolve_ref_unsafe() with "errno", 2021-10-16). There we'd set
"errno = 0" immediately before refs_read_raw_ref(), and then set
"failure_errno" to "errno" if errno was non-zero afterwards.

Then in the next commit 8b72fea7e9 (refs API: make
refs_read_raw_ref() not set errno, 2021-10-16) we started expecting
"refs_read_raw_ref()" to set "failure_errno". It would do that if
refs_read_raw_ref() failed, but it wouldn't be the same errno.

So we might set the "errno" here to any arbitrary bad value, and end
up e.g. returning NULL when we meant to return the refname from
refs_resolve_ref_unsafe(), or the other way around. Instrumenting this
code will reveal cases where refs_read_raw_ref() will fail, and
"errno" and "failure_errno" will be set to different values.

In practice I haven't found a case where this scary bug changed
anything in practice. The reason for that is that we'll not care about
the actual value of "errno" here per-se, but only whether:

 1. We have an errno
 2. If it's one of ENOENT, EISDIR or ENOTDIR. See the adjacent code
    added in a1c1d8170d (refs_resolve_ref_unsafe: handle d/f
    conflicts for writes, 2017-10-06)

I.e. if we clobber "failure_errno" with "errno", but it happened to be
one of those three, and we'll clobber it with another one of the three
we were OK.

Perhaps there are cases where the difference ended up mattering, but I
haven't found them. Instrumenting the test suite to fail if "errno"
and "failure_errno" are different shows a lot of failures, checking if
they're different *and* one is but not the other is outside that list
of three "errno" values yields no failures.

But let's fix the obvious bug. We should just stop paying attention to
"errno" in refs_resolve_ref_unsafe(). In addition let's change the
partial resetting of "errno" in files_read_raw_ref() to happen just
before the "return", to ensure that any such bug will be more easily
spotted in the future.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-13 10:53:54 -08:00
Junio C Hamano
626f2cabe6 Merge branch 'ab/reflog-prep'
Code refactoring in the reflog part of refs API.

* ab/reflog-prep:
  reflog + refs-backend: move "verbose" out of the backend
  refs files-backend: assume cb->newlog if !EXPIRE_REFLOGS_DRY_RUN
  reflog: reduce scope of "struct rev_info"
  reflog expire: don't use lookup_commit_reference_gently()
  reflog expire: refactor & use "tip_commit" only for UE_NORMAL
  reflog expire: use "switch" over enum values
  reflog: change one->many worktree->refnames to use a string_list
  reflog expire: narrow scope of "cb" in cmd_reflog_expire()
  reflog delete: narrow scope of "cmd" passed to count_reflog_ent()
2022-01-10 11:52:52 -08:00
Ævar Arnfjörð Bjarmason
fcd2c3d9d8 reflog + refs-backend: move "verbose" out of the backend
Move the handling of the "verbose" flag entirely out of
"refs/files-backend.c" and into "builtin/reflog.c". This allows the
backend to stop knowing about the EXPIRE_REFLOGS_VERBOSE flag.

The expire_reflog_ent() function shouldn't need to deal with the
implementation detail of whether or not we're emitting verbose output,
by doing this the --verbose output becomes backend-agnostic, so
reftable will get the same output.

I think the output is rather bad currently, and should e.g. be
implemented with some better future mode of progress.[ch], but that's
a topic for another improvement.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-22 16:24:14 -08:00
Ævar Arnfjörð Bjarmason
7c28875bcd refs files-backend: assume cb->newlog if !EXPIRE_REFLOGS_DRY_RUN
It's not possible for "cb->newlog" to be NULL if
!EXPIRE_REFLOGS_DRY_RUN, since files_reflog_expire() would have
error()'d and taken the "goto failure" branch if it couldn't open the
file. By not using the "newlog" field private to "file-backend.c"'s
"struct expire_reflog_cb", we can move this verbosity logging to
"builtin/reflog.c" in a subsequent commit.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-22 16:24:14 -08:00
Han-Wen Nienhuys
f9f7fd3b23 refs: centralize initialization of the base ref_store.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-22 13:51:38 -08:00
Han-Wen Nienhuys
a6db572af6 refs: print error message in debug output
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-22 13:51:37 -08:00
Han-Wen Nienhuys
99f0d97b73 refs: pass gitdir to packed_ref_store_create
This is consistent with the calling convention for ref backend creation, and
avoids storing ".git/packed-refs" (the name of a regular file) in a variable called
ref_store::gitdir.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-22 13:51:37 -08:00
Junio C Hamano
b174a3c014 Merge branch 'hn/allow-bogus-oid-in-ref-tests'
The test helper for refs subsystem learned to write bogus and/or
nonexistent object name to refs to simulate error situations we
want to test Git in.

* hn/allow-bogus-oid-in-ref-tests:
  t1430: create valid symrefs using test-helper
  t1430: remove refs using test-tool
  refs: introduce REF_SKIP_REFNAME_VERIFICATION flag
  refs: introduce REF_SKIP_OID_VERIFICATION flag
  refs: update comment.
  test-ref-store: plug memory leak in cmd_delete_refs
  test-ref-store: parse symbolic flag constants
  test-ref-store: remove force-create argument for create-reflog
2021-12-15 09:39:54 -08:00
Junio C Hamano
250ca49b4f Merge branch 'hn/reflog-tests'
Prepare tests on ref API to help testing reftable backends.

* hn/reflog-tests:
  refs/debug: trim trailing LF from reflog message
  test-ref-store: tweaks to for-each-reflog-ent format
  t1405: check for_each_reflog_ent_reverse() more thoroughly
  test-ref-store: don't add newline to reflog message
  show-branch: show reflog message
2021-12-15 09:39:49 -08:00
Junio C Hamano
b8148376a2 Merge branch 'hn/create-reflog-simplify'
A small simplification of API.

* hn/create-reflog-simplify:
  refs: drop force_create argument of create_reflog API
2021-12-10 14:35:13 -08:00
Han-Wen Nienhuys
e9706a188f refs: introduce REF_SKIP_OID_VERIFICATION flag
This lets the ref-store test helper write non-existent or unparsable objects
into the ref storage.

Use this to make t1006 and t3800 independent of the files storage backend.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-07 13:15:19 -08:00
Han-Wen Nienhuys
0464d0a134 refs: update comment.
REF_IS_PRUNING is right below this comment, so it clearly does not belong in
this comment. This was apparently introduced in commit 5ac95fee (Nov 5, 2017
"refs: tidy up and adjust visibility of the `ref_update` flags").

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-07 13:15:19 -08:00
Han-Wen Nienhuys
65279256f3 refs/debug: trim trailing LF from reflog message
On iteration, the reflog message is always terminated by a newline. Trim it to
avoid clobbering the console with is this extra newline.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-02 11:14:08 -08:00
Junio C Hamano
96f6623ada Merge branch 'ab/refs-errno-cleanup'
The "remainder" of hn/refs-errno-cleanup topic.

* ab/refs-errno-cleanup: (21 commits)
  refs API: post-migration API renaming [2/2]
  refs API: post-migration API renaming [1/2]
  refs API: don't expose "errno" in run_transaction_hook()
  refs API: make expand_ref() & repo_dwim_log() not set errno
  refs API: make resolve_ref_unsafe() not set errno
  refs API: make refs_ref_exists() not set errno
  refs API: make refs_resolve_refdup() not set errno
  refs tests: ignore ignore errno in test-ref-store helper
  refs API: ignore errno in worktree.c's find_shared_symref()
  refs API: ignore errno in worktree.c's add_head_info()
  refs API: make files_copy_or_rename_ref() et al not set errno
  refs API: make loose_fill_ref_dir() not set errno
  refs API: make resolve_gitlink_ref() not set errno
  refs API: remove refs_read_ref_full() wrapper
  refs/files: remove "name exist?" check in lock_ref_oid_basic()
  reflog tests: add --updateref tests
  refs API: make refs_rename_ref_available() static
  refs API: make parse_loose_ref_contents() not set errno
  refs API: make refs_read_raw_ref() not set errno
  refs API: add a version of refs_resolve_ref_unsafe() with "errno"
  ...
2021-11-29 15:41:45 -08:00
Han-Wen Nienhuys
7b089120d9 refs: drop force_create argument of create_reflog API
There is only one caller, builtin/checkout.c, and it hardcodes
force_create=1.

This argument was introduced in abd0cd3a30 (refs: new public ref function:
safe_create_reflog, 2015-07-21), which promised to immediately use it in a
follow-on commit, but that never happened.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-22 11:01:25 -08:00
Junio C Hamano
162a13b855 Merge branch 'jt/no-abuse-alternate-odb-for-submodules'
Follow through the work to use the repo interface to access
submodule objects in-process, instead of abusing the alternate
object database interface.

* jt/no-abuse-alternate-odb-for-submodules:
  submodule: trace adding submodule ODB as alternate
  submodule: pass repo to check_has_commit()
  object-file: only register submodule ODB if needed
  merge-{ort,recursive}: remove add_submodule_odb()
  refs: peeling non-the_repository iterators is BUG
  refs: teach arbitrary repo support to iterators
  refs: plumb repo into ref stores
2021-10-25 16:06:56 -07:00
Ævar Arnfjörð Bjarmason
f1da24ca5e refs API: post-migration API renaming [2/2]
Rename the transitory refs_werrres_ref_unsafe() function to
refs_resolve_ref_unsafe(), now that all callers of the old function
have learned to pass in a "failure_errno" parameter.

The coccinelle semantic patch added in the preceding commit works, but
I couldn't figure out how to get spatch(1) to re-flow these argument
lists (and sometimes make lines way too long), so this rename was done
with:

    perl -pi -e 's/refs_werrres_ref_unsafe/refs_resolve_ref_unsafe/g' \
    $(git grep -l refs_werrres_ref_unsafe -- '*.c')

But after that "make contrib/coccinelle/refs.cocci.patch" comes up
empty, so the result would have been the same. Let's remove that
transitory semantic patch file, we won't need to retain it for any
other in-flight changes, refs_werrres_ref_unsafe() only existed within
this patch series.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-16 11:17:04 -07:00
Ævar Arnfjörð Bjarmason
ac0986e302 refs API: make files_copy_or_rename_ref() et al not set errno
None of the callers of rename_ref() and copy_ref() care about errno,
and as seen in the context here we already emit our own non-errno
using error() in the case where we'd use it.

So let's have it explicitly ignore errno, and do the same in
commit_ref_update(), which is only used within other code in
files_copy_or_rename_ref() itself which doesn't care about errno
either.

It might actually be sensible to have the callers use errno if the
failure was filesystem-specific, and with the upcoming reftable
backend we don't want to rely on that sort of thing, so let's keep
ignoring that for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-16 11:17:03 -07:00
Ævar Arnfjörð Bjarmason
096a7fbb97 refs API: make loose_fill_ref_dir() not set errno
Change the refs_resolve_ref_unsafe() invoked in loose_fill_ref_dir()
to a form that ignores errno. The only eventual caller of this
function is create_ref_cache(), whose callers in turn don't have their
failure depend on any errno set here.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-16 11:17:03 -07:00
Ævar Arnfjörð Bjarmason
76887df014 refs API: remove refs_read_ref_full() wrapper
Remove the refs_read_ref_full() wrapper in favor of migrating various
refs.c API users to the underlying refs_werrres_ref_unsafe() function.

A careful reading of these callers shows that the callers of this
function did not care about "errno", by moving away from the
refs_resolve_ref_unsafe() wrapper we can be sure that nothing relies
on it anymore.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-16 11:17:03 -07:00
Ævar Arnfjörð Bjarmason
52106430dc refs/files: remove "name exist?" check in lock_ref_oid_basic()
In lock_ref_oid_basic() we'll happily lock a reference that doesn't
exist yet. That's normal, and is how references are initially born,
but we don't need to retain checks here in lock_ref_oid_basic() about
the state of the ref, when what we're checking is either checked
already, or something we're about to discover by trying to lock the
ref with raceproof_create_file().

The one exception is the caller in files_reflog_expire(), who passes
us a "type" to find out if the reference is a symref or not. We can
move the that logic over to that caller, which can now defer its
discovery of whether or not the ref is a symref until it's needed. In
the preceding commit an exhaustive regression test was added for that
case in a new test in "t1417-reflog-updateref.sh".

The improved diagnostics here were added in
5b2d8d6f21 (lock_ref_sha1_basic(): improve diagnostics for ref D/F
conflicts, 2015-05-11), and then much of the surrounding code went
away recently in my 245fbba46d (refs/files: remove unused "errno ==
EISDIR" code, 2021-08-23).

The refs_resolve_ref_unsafe() code being removed here looks like it
should be tasked with doing that, but it's actually redundant to other
code.

The reason for that is as noted in 245fbba46d this once widely used
function now only has a handful of callers left, which all handle this
case themselves.

To the extent that we're racy between their check and ours removing
this check actually improves the situation, as we'll be doing fewer
things between the not-under-lock initial check and acquiring the
lock.

Why this is OK for all the remaining callers of lock_ref_oid_basic()
is noted below. There are only two of those callers:

* "git branch -[cm] <oldbranch> <newbranch>":

  In files_copy_or_rename_ref() we'll call this when we copy or rename
  refs via rename_ref() and copy_ref(). but only after we've checked
  if the refname exists already via its own call to
  refs_resolve_ref_unsafe() and refs_rename_ref_available().

  As the updated comment to the latter here notes neither of those are
  actually needed. If we delete not only this code but also
  refs_rename_ref_available() we'll do just fine, we'll just emit a
  less friendly error message if e.g. "git branch -m A B/C" would have
  a D/F conflict with a "B" file.

  Actually we'd probably die before that in case reflogs for the
  branch existed, i.e. when the try to rename() or copy_file() the
  relevant reflog, since if we've got a D/F conflict with a branch
  name we'll probably also have the same with its reflogs (but not
  necessarily, we might have reflogs, but it might not).

  As some #leftoverbits that code seems buggy to me, i.e. the reflog
  "protocol" should be to get a lock on the main ref, and then perform
  ref and/or reflog operations. That code dates back to
  c976d415e5 (git-branch: add options and tests for branch renaming,
  2006-11-28) and probably pre-dated the solidifying of that
  convention. But in any case, that edge case is not our bug or
  problem right now.

* "git reflog expire <ref>":

  In files_reflog_expire() we'll call this without previous ref
  existence checking in files-backend.c, but that code is in turn
  called by code that's just finished checking if the refname whose
  reflog we're expiring exists.

  See ae35e16cd4 (reflog expire: don't lock reflogs using previously
  seen OID, 2021-08-23) for the current state of that code, and
  5e6f003ca8 (reflog_expire(): ignore --updateref for symbolic
  references, 2015-03-03) for the code we'd break if we only did a
  "update = !!ref" here, which is covered by the aforementioned
  regression test in "t1417-reflog-updateref.sh".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-16 11:17:02 -07:00
Ævar Arnfjörð Bjarmason
c339ff690f refs API: make refs_rename_ref_available() static
Move the refs_rename_ref_available() function into
"refs/files-backend.c". It is file-backend specific.

This function was added in 5fe7d825da (refs.c: pass a list of names
to skip to is_refname_available, 2014-05-01) as rename_ref_available()
and was only ever used in this one file-backend specific codepath. So
let's move it there.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-16 11:17:02 -07:00
Han-Wen Nienhuys
df3458e957 refs API: make parse_loose_ref_contents() not set errno
Change the parse_loose_ref_contents() function to stop setting "errno"
and failure, and to instead pass up a "failure_errno" via a
parameter. This requires changing its callers to do the same.

The EINVAL error from parse_loose_ref_contents is used in files-backend
to create a custom error message.

In untangling this we discovered a tricky edge case. The
refs_read_special_head() function was relying on
parse_loose_ref_contents() setting EINVAL.

By converting it to use "saved_errno" we can migrate away from "errno"
in this part of the code entirely, and do away with an existing
"save_errno" pattern, its only purpose was to not clobber the "errno"
we previously needed at the end of files_read_raw_ref().

Let's assert that we can do that by not having files_read_raw_ref()
itself operate on *failure_errno in addition to passing it on. Instead
we'll assert that if we return non-zero we actually do set errno, thus
assuring ourselves and callers that they can trust the resulting
"failure_errno".

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-16 11:17:02 -07:00
Han-Wen Nienhuys
8b72fea7e9 refs API: make refs_read_raw_ref() not set errno
Add a "failure_errno" to refs_read_raw_ref(), his allows
refs_werrres_ref_unsafe() to pass along its "failure_errno", as a
first step before its own callers are migrated to pass it further up
the chain.

We are leaving out out the refs_read_special_head() in
refs_read_raw_ref() for now, as noted in a subsequent commit moving it
to "failure_errno" will require some special consideration.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-16 11:17:02 -07:00
Junio C Hamano
f6c075ad71 Merge branch 'jk/ref-paranoia'
The ref iteration code used to optionally allow dangling refs to be
shown, which has been tightened up.

* jk/ref-paranoia:
  refs: drop "broken" flag from for_each_fullref_in()
  ref-filter: drop broken-ref code entirely
  ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN
  repack, prune: drop GIT_REF_PARANOIA settings
  refs: turn on GIT_REF_PARANOIA by default
  refs: omit dangling symrefs when using GIT_REF_PARANOIA
  refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag
  refs-internal.h: reorganize DO_FOR_EACH_* flag documentation
  refs-internal.h: move DO_FOR_EACH_* flags next to each other
  t5312: be more assertive about command failure
  t5312: test non-destructive repack
  t5312: create bogus ref as necessary
  t5312: drop "verbose" helper
  t5600: provide detached HEAD for corruption failures
  t5516: don't use HEAD ref for invalid ref-deletion tests
  t7900: clean up some more broken refs
2021-10-11 10:21:47 -07:00
Jonathan Tan
8788195c88 refs: peeling non-the_repository iterators is BUG
There is currently no support for peeling the current ref of an iterator
iterating over a non-the_repository ref store, and none is needed. Thus,
for now, BUG() if that happens.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 15:06:06 -07:00
Jonathan Tan
9bc45a2802 refs: teach arbitrary repo support to iterators
Note that should_pack_ref() is called when writing refs, which is only
supported for the_repository, hence the_repository is hardcoded there.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 15:06:05 -07:00
Jonathan Tan
34224e14d6 refs: plumb repo into ref stores
In preparation for the next 2 patches that adds (partial) support for
arbitrary repositories to ref iterators, plumb a repository into all ref
stores. There are no changes to program logic.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 15:06:05 -07:00
Junio C Hamano
870de59af0 Merge branch 'ab/retire-refs-unused-funcs'
Code cleanup.

* ab/retire-refs-unused-funcs:
  refs/ref-cache.[ch]: remove "incomplete" from create_dir_entry()
  refs/ref-cache.c: remove "mkdir" parameter from find_containing_dir()
  refs/ref-cache.[ch]: remove unused add_ref_entry()
  refs/ref-cache.[ch]: remove unused remove_entry_from_dir()
  refs.[ch]: remove unused ref_storage_backend_exists()
2021-10-06 13:40:13 -07:00
Junio C Hamano
92382d14cd Merge branch 'hn/refs-errno-cleanup'
Futz with the way 'errno' is relied on in the refs API to carry the
failure modes up the call chain.

* hn/refs-errno-cleanup:
  refs: make errno output explicit for read_raw_ref_fn
  refs/files-backend: stop setting errno from lock_ref_oid_basic
  refs: remove EINVAL errno output from specification of read_raw_ref_fn
  refs file backend: move raceproof_create_file() here
2021-10-03 21:49:18 -07:00
Junio C Hamano
842d45d293 Merge branch 'ab/refs-files-cleanup'
Continued work on top of the hn/refs-errno-cleanup topic.

* ab/refs-files-cleanup:
  refs/files: remove unused "errno != ENOTDIR" condition
  refs/files: remove unused "errno == EISDIR" code
  refs/files: remove unused "oid" in lock_ref_oid_basic()
  refs API: remove OID argument to reflog_expire()
  reflog expire: don't lock reflogs using previously seen OID
  refs/files: add a comment about refs_reflog_exists() call
  refs: make repo_dwim_log() accept a NULL oid
  refs/debug: re-indent argument list for "prepare"
  refs/files: remove unused "skip" in lock_raw_ref() too
  refs/files: remove unused "extras/skip" in lock_ref_oid_basic()
  refs: drop unused "flags" parameter to lock_ref_oid_basic()
  refs/files: remove unused REF_DELETING in lock_ref_oid_basic()
  refs/packet: add missing BUG() invocations to reflog callbacks
2021-10-03 21:49:18 -07:00
Ævar Arnfjörð Bjarmason
750036c8f7 refs/ref-cache.[ch]: remove "incomplete" from create_dir_entry()
Remove the now-unused "incomplete" parameter from create_dir_entry(),
all its callers specify it as "1", so let's drop the "incomplete=0"
case. The last caller to use it was search_for_subdir(), but that code
was removed in the preceding commit.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-28 15:12:04 -07:00
Ævar Arnfjörð Bjarmason
5e4546d599 refs/ref-cache.c: remove "mkdir" parameter from find_containing_dir()
Remove the "mkdir" parameter from the find_containing_dir() function,
the add_ref_entry() function removed in the preceding commit was its
last user.

Since "mkdir" is always "0" we can also remove the parameter from
search_for_subdir(), which in turn means that we can delete most of
that function.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-28 15:12:04 -07:00
Ævar Arnfjörð Bjarmason
6a99fa2e9e refs/ref-cache.[ch]: remove unused add_ref_entry()
This function has not been used since 9dd389f3d8 (packed_ref_store:
get rid of the `ref_cache` entirely, 2017-09-25).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-28 15:12:04 -07:00
Ævar Arnfjörð Bjarmason
34e8a20d76 refs/ref-cache.[ch]: remove unused remove_entry_from_dir()
This function was missed in 9939b33d6a (packed-backend: rip out some
now-unused code, 2017-09-08), and has been orphaned since then. Let's
delete it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-28 15:12:04 -07:00
Jeff King
8dccb2244c refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag
When the DO_FOR_EACH_INCLUDE_BROKEN flag is used, we include both actual
corrupt refs (illegal names, missing objects), but also symrefs that
point to nothing. This latter is not really a corruption, but just
something that may happen normally. For example, the symref at
refs/remotes/origin/HEAD may point to a tracking branch which is later
deleted. (The local HEAD may also be unborn, of course, but we do not
access it through ref iteration).

Most callers of for_each_ref() etc, do not care. They don't pass
INCLUDE_BROKEN, so don't see it at all. But for those which do pass it,
this somewhat-normal state causes extra warnings (e.g., from
for-each-ref) or even aborts operations (destructive repacks with
GIT_REF_PARANOIA set).

This patch just introduces the flag and the mechanism; there are no
callers yet (and hence no tests). Two things to note on the
implementation:

  - we actually skip any symref that does not resolve to a ref. This
    includes ones which point to an invalidly-named ref. You could argue
    this is a more serious breakage than simple dangling. But the
    overall effect is the same (we could not follow the symref), as well
    as the impact on things like REF_PARANOIA (either way, a symref we
    can't follow won't impact reachability, because we'll see the ref
    itself during iteration). The underlying resolution function doesn't
    distinguish these two cases (they both get REF_ISBROKEN).

  - we change the iterator in refs/files-backend.c where we check
    INCLUDE_BROKEN. There's a matching spot in refs/packed-backend.c,
    but we don't know need to do anything there. The packed backend does
    not support symrefs at all.

The resulting set of flags might be a bit easier to follow if we broke
this down into "INCLUDE_CORRUPT_REFS" and "INCLUDE_DANGLING_SYMREFS".
But there are a few reasons not do so:

  - adding a new OMIT_DANGLING_SYMREFS flag lets us leave existing
    callers intact, without changing their behavior (and some of them
    really do want to see the dangling symrefs; e.g., t5505 has a test
    which expects us to report when a symref becomes dangling)

  - they're not actually independent. You cannot say "include dangling
    symrefs" without also including refs whose objects are not
    reachable, because dangling symrefs by definition do not have an
    object. We could tweak the implementation to distinguish this, but
    in practice nobody wants to ask for that. Adding the OMIT flag keeps
    the implementation simple and makes sure we don't regress the
    current behavior.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:45 -07:00
Jeff King
9aab952e85 refs-internal.h: reorganize DO_FOR_EACH_* flag documentation
The documentation for the DO_FOR_EACH_* flags is sprinkled over the
refs-internal.h file. We define the two flags in one spot, and then
describe them in more detail far away from there, in the definitions of
refs_ref_iterator_begin() and ref_iterator_advance_fn().

Let's try to organize this a bit better:

  - convert the #defines to an enum. This makes it clear that they are
    related, and that the enum shows the complete set of flags.

  - combine all descriptions for each flag in a single spot, next to the
    flag's definition

  - use the enum rather than a bare int for functions which take the
    flags. This helps readers realize which flags can be used.

  - clarify the mention of flags for ref_iterator_advance_fn(). It does
    not take flags itself, but is meant to depend on ones set up
    earlier.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:45 -07:00
Jeff King
bf708add2e refs-internal.h: move DO_FOR_EACH_* flags next to each other
There are currently two DO_FOR_EACH_* flags, which must not have their
bits overlap. Yet they're defined hundreds of lines apart. Let's move
them next to each other to make it clear that they are related and are a
complete set (which matters if you are adding a new flag and would like
to know what the next available bit is).

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:45 -07:00
René Scharfe
35cf94eaf6 refs/files-backend: remove unused open mode parameter
We only need to provide a mode if we are willing to let open(2) create
the file, which is not the case here, so drop the unnecessary parameter.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 17:40:28 -07:00
Han-Wen Nienhuys
5b12e16bb1 refs: make errno output explicit for read_raw_ref_fn
This makes it explicit how alternative ref backends should report errors in
read_raw_ref_fn.

read_raw_ref_fn needs to supply a credible errno for a number of cases. These
are primarily:

1) The files backend calls read_raw_ref from lock_raw_ref, and uses the
resulting error codes to create/remove directories as needed.

2) ENOENT should be translated in a zero OID, optionally with REF_ISBROKEN set,
returning the last successfully resolved symref. This is necessary so
read_raw_ref("HEAD") on an empty repo returns refs/heads/main (or the default branch
du-jour), and we know on which branch to create the first commit.

Make this information flow explicit by adding a failure_errno to the signature
of read_raw_ref. All errnos from the files backend are still propagated
unchanged, even though inspection suggests only ENOTDIR, EISDIR and ENOENT are
relevant.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25 13:30:26 -07:00
Han-Wen Nienhuys
1ae6ed230a refs/files-backend: stop setting errno from lock_ref_oid_basic
refs/files-backend.c::lock_ref_oid_basic() tries to signal how it failed
to its callers using errno.

It is safe to stop setting errno here, because the callers of this
file-scope static function are

* files_copy_or_rename_ref()
* files_create_symref()
* files_reflog_expire()

None of them looks at errno after seeing a negative return from
lock_ref_oid_basic() to make any decision, and no caller of these three
functions looks at errno after they signal a failure by returning a
negative value. In particular,

* files_copy_or_rename_ref() - here, calls are followed by error()
(which performs I/O) or write_ref_to_lockfile() (which calls
parse_object() which may perform I/O)

* files_create_symref() - here, calls are followed by error() or
create_symref_locked() (which performs I/O and does not inspect
errno)

* files_reflog_expire() - here, calls are followed by error() or
refs_reflog_exists() (which calls a function in a vtable that is not
documented to use and/or preserve errno)

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25 13:30:26 -07:00
Han-Wen Nienhuys
20d422cfd7 refs: remove EINVAL errno output from specification of read_raw_ref_fn
This commit does not change code; it documents the fact that an alternate ref
backend does not need to return EINVAL from read_raw_ref_fn to function
properly.

This is correct, because refs_read_raw_ref is only called from;

* resolve_ref_unsafe(), which does not care for the EINVAL errno result.

* refs_verify_refname_available(), which does not inspect errno.

* files-backend.c, where errno is overwritten on failure.

* packed-backend.c (is_packed_transaction_needed), which calls it for the
  packed ref backend, which never emits EINVAL.

A grep for EINVAL */*c reveals that no code checks errno against EINVAL after
reading references. In addition, the refs.h file does not mention errno at all.

A grep over resolve_ref_unsafe() turned up the following callers that inspect
errno:

* sequencer.c::print_commit_summary, which uses it for die_errno

* lock_ref_oid_basic(), which only treats EISDIR and ENOTDIR specially.

The files ref backend does use EINVAL. The files backend does not call into
the generic API (refs_read_raw), but into the files-specific function
(files_read_raw_ref), which we are not changing in this commit.

As the errno sideband is unintuitive and error-prone, remove EINVAL
value, as a step towards getting rid of the errno sideband altogether.

Spotted by Ævar Arnfjörð Bjarmason <avarab@gmail.com>.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25 13:30:26 -07:00
Ævar Arnfjörð Bjarmason
3fa2e91d17 refs file backend: move raceproof_create_file() here
Move the raceproof_create_file() API added to cache.h and
object-file.c in 177978f56a (raceproof_create_file(): new function,
2017-01-06) to its only user, refs/files-backend.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25 13:30:26 -07:00
Ævar Arnfjörð Bjarmason
48cdcd9ca0 refs/files: remove unused "errno != ENOTDIR" condition
As a follow-up to the preceding commit where we removed the adjacent
"errno == EISDIR" condition in the same function, remove the
"last_errno != ENOTDIR" condition here.

It's not possible for us to hit this condition added in
5b2d8d6f21 (lock_ref_sha1_basic(): improve diagnostics for ref D/F
conflicts, 2015-05-11). Since a1c1d8170d (refs_resolve_ref_unsafe:
handle d/f conflicts for writes, 2017-10-06) we've explicitly caught
these in refs_resolve_ref_unsafe() before returning NULL:

	if (errno != ENOENT &&
	    errno != EISDIR &&
	    errno != ENOTDIR)
		return NULL;

We'd then always return the refname from refs_resolve_ref_unsafe()
even if we were in a broken state as explained in the preceding
commit. The elided context here is a call to refs_resolve_ref_unsafe().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25 13:27:37 -07:00
Ævar Arnfjörð Bjarmason
245fbba46d refs/files: remove unused "errno == EISDIR" code
When we lock a reference like "foo" we need to handle the case where
"foo" exists, but is an empty directory. That's what this code added
in bc7127ef0f (ref locking: allow 'foo' when 'foo/bar' used to exist
but not anymore., 2006-09-30) seems like it should be dealing with.

Except it doesn't, and we never take this branch. The reason is that
when bc7127ef0f was written this looked like:

	ref = resolve_ref([...]);
	if (!ref && errno == EISDIR) {
	[...]

And in resolve_ref() we had this code:

	fd = open(path, O_RDONLY);
	if (fd < 0)
		return NULL;

I.e. we would attempt to read "foo" with open(), which would fail with
EISDIR and we'd return NULL. We'd then take this branch, call
remove_empty_directories() and continue.

Since a1c1d8170d (refs_resolve_ref_unsafe: handle d/f conflicts for
writes, 2017-10-06) we don't. E.g. in the case of
files_copy_or_rename_ref() our callstack will look something like:

	[...] ->
	files_copy_or_rename_ref() ->
	lock_ref_oid_basic() ->
	refs_resolve_ref_unsafe()

At that point the first (now only) refs_resolve_ref_unsafe() call in
lock_ref_oid_basic() would do the equivalent of this in the resulting
call to refs_read_raw_ref() in refs_resolve_ref_unsafe():

	/* Via refs_read_raw_ref() */
	fd = open(path, O_RDONLY);
	if (fd < 0)
		/* get errno == EISDIR */
	/* later, in refs_resolve_ref_unsafe() */
	if ([...] && errno != EISDIR)
		return NULL;
	[...]
	/* returns the refs/heads/foo to the caller, even though it's a directory */
	return refname;

I.e. even though we got an "errno == EISDIR" we won't take this
branch, since in cases of EISDIR "resolved" is always
non-NULL. I.e. we pretend at this point as though everything's OK and
there is no "foo" directory.

We then proceed with the entire ref update and don't call
remove_empty_directories() until we call commit_ref_update(). See
5387c0d883 (commit_ref(): if there is an empty dir in the way, delete
it, 2016-05-05) for the addition of that code, and
a1c1d8170d (refs_resolve_ref_unsafe: handle d/f conflicts for writes,
2017-10-06) for the commit that changed the original codepath added in
bc7127ef0f to use this "EISDIR" handling.

Further historical commentary:

Before the two preceding commits the caller in files_reflog_expire()
was the only one out of our 4 callers that would pass non-NULL as an
oid. We would then set a (now gone) "resolve_flags" to
"RESOLVE_REF_READING" and just before that "errno != EISDIR" check do:

	if (resolve_flags & RESOLVE_REF_READING)
		return NULL;

There may have been some case where this ended up mattering and we
couldn't safely make this change before we removed the "oid"
parameter, but I don't think there was, see [1] for some discussion on
that.

In any case, now that we've removed the "oid" parameter in a preceding
commit we can be sure that this code is redundant, so let's remove it.

1. http://lore.kernel.org/git/871r801yp6.fsf@evledraar.gmail.com

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25 13:27:37 -07:00
Ævar Arnfjörð Bjarmason
ff7a2e4dbb refs/files: remove unused "oid" in lock_ref_oid_basic()
In the preceding commit the last caller that passed a non-NULL OID was
changed to pass NULL to lock_ref_oid_basic(). As noted in preceding
commits use of this API has been going away (we should use ref
transactions, or lock_raw_ref()), so we're unlikely to gain new
callers that want to pass the "oid".

So let's remove it, doing so means we can remove the "mustexist"
condition, and therefore anything except the "flags =
RESOLVE_REF_NO_RECURSE" case.

Furthermore, since the verify_lock() function we called did most of
its work when the "oid" was passed (as "old_oid") we can inline the
trivial part of it that remains in its only remaining caller. Without
a NULL "oid" passed it was equivalent to calling refs_read_ref_full()
followed by oidclr().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25 13:27:37 -07:00
Ævar Arnfjörð Bjarmason
cc40b5ce13 refs API: remove OID argument to reflog_expire()
Since the the preceding commit the "oid" parameter to reflog_expire()
is always NULL, but it was not cleaned up to reduce the size of the
diff. Let's do that subsequent API and documentation cleanup now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25 13:27:37 -07:00
Ævar Arnfjörð Bjarmason
ae35e16cd4 reflog expire: don't lock reflogs using previously seen OID
During reflog expiry, the cmd_reflog_expire() function first iterates
over all reflogs in logs/*, and then one-by-one acquires the lock for
each one and expires it. This behavior has been with us since this
command was implemented in 4264dc15e1 ("git reflog expire",
2006-12-19).

Change this to stop calling lock_ref_oid_basic() with the OID we saw
when we looped over the logs, instead have it pass the OID it managed
to lock.

This mostly mitigates a race condition where e.g. "git gc" will fail
in a concurrently updated repository because the branch moved since
"git reflog expire --all" was started. I.e. with:

    error: cannot lock ref '<refname>': ref '<refname>' is at <OID-A> but expected <OID-B>

This behavior of passing in an "oid" was needed for an edge-case that
I've untangled in this and preceding commits though, namely that we
needed this OID because we'd:

 1. Lookup the reflog name/OID via dwim_log()
 2. With that OID, lock the reflog
 3. Later in builtin/reflog.c we use the OID we looked as input to
    lookup_commit_reference_gently(), assured that it's equal to the
    OID we got from dwim_log().

We can be sure that this change is safe to make because between
dwim_log (step #1) and lock_ref_oid_basic (step #2) there was no other
logic relevant to the OID or expiry run in the cmd_reflog_expire()
caller.

We can thus treat that code as a black box, before and after this
change it would get an OID that's been locked, the only difference is
that now we mostly won't be failing to get the lock due to the TOCTOU
race[0]. That failure was purely an implementation detail in how the
"current OID" was looked up, it was divorced from the locking
mechanism.

What do we mean with "mostly"? It mostly mitigates it because we'll
still run into cases where the ref is locked and being updated as we
want to expire it, and other git processes wanting to update the refs
will in turn race with us as we expire the reflog.

That remaining race can in turn be mitigated with the
core.filesRefLockTimeout setting, see 4ff0f01cb7 ("refs: retry
acquiring reference locks for 100ms", 2017-08-21). In practice if that
value is high enough we'll probably never have ref updates or reflog
expiry failing, since the clients involved will retry for far longer
than the time any of those operations could take.

See [1] for an initial report of how this impacted "git gc" and a
large discussion about this change in early 2019. In particular patch
looked good to Michael Haggerty, see his[2]. That message seems to not
have made it to the ML archive, its content is quoted in full in my
[3].

I'm leaving behind now-unused code the refs API etc. that takes the
now-NULL "unused_oid" argument, and other code that can be simplified now
that we never have on OID in that context, that'll be cleaned up in
subsequent commits, but for now let's narrowly focus on fixing the
"git gc" issue. As the modified assert() shows we always pass a NULL
oid to reflog_expire() now.

Unfortunately this sort of probabilistic contention is hard to turn
into a test. I've tested this by running the following three subshells
in concurrent terminals:

    (
        rm -rf /tmp/git &&
        git init /tmp/git &&
        while true
        do
            head -c 10 /dev/urandom | hexdump >/tmp/git/out &&
            git -C /tmp/git add out &&
            git -C /tmp/git commit -m"out"
        done
    )

    (
	rm -rf /tmp/git-clone &&
        git clone file:///tmp/git /tmp/git-clone &&
        while git -C /tmp/git-clone pull
        do
            date
        done
    )

    (
        while git -C /tmp/git-clone reflog expire --all
        do
            date
        done
    )

Before this change the "reflog expire" would fail really quickly with
the "but expected" error noted above.

After this change both the "pull" and "reflog expire" will run for a
while, but eventually fail because I get unlucky with
core.filesRefLockTimeout (the "reflog expire" is in a really tight
loop). As noted above that can in turn be mitigated with higher values
of core.filesRefLockTimeout than the 100ms default.

As noted in the commentary added in the preceding commit there's also
the case of branches being racily deleted, that can be tested by
adding this to the above:

    (
        while git -C /tmp/git-clone branch topic master &&
	      git -C /tmp/git-clone branch -D topic
        do
            date
        done
    )

With core.filesRefLockTimeout set to 10 seconds (it can probably be a
lot lower) I managed to run all four of these concurrently for about
an hour, and accumulated ~125k commits, auto-gc's and all, and didn't
have a single failure. The loops visibly stall while waiting for the
lock, but that's expected and desired behavior.

0. https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use
1. https://lore.kernel.org/git/87tvg7brlm.fsf@evledraar.gmail.com/
2. http://lore.kernel.org/git/b870a17d-2103-41b8-3cbc-7389d5fff33a@alum.mit.edu
3. https://lore.kernel.org/git/87pnqkco8v.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25 13:27:37 -07:00
Ævar Arnfjörð Bjarmason
7aa7829f75 refs/files: add a comment about refs_reflog_exists() call
Add a comment about why it is that we need to check for the the
existence of a reflog we're deleting after we've successfully acquired
the lock in files_reflog_expire(). As noted in [1] the lock protocol
for reflogs is somewhat intuitive.

This early exit code the comment applies to dates all the way back to
4264dc15e1 (git reflog expire, 2006-12-19).

1. https://lore.kernel.org/git/54DCDA42.2060800@alum.mit.edu/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25 13:27:37 -07:00